使用Python将抓取操作扩展到超过1页可以通过以下步骤实现:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://example.com/page1" # 替换为实际网页的URL
response = requests.get(url)
content = response.content
soup = BeautifulSoup(content, "html.parser")
# 根据网页结构和元素的选择器提取数据
data = soup.find_all("div", class_="item")
results = []
for item in data:
# 提取数据的具体字段
title = item.find("h2").text
description = item.find("p").text
results.append({"title": title, "description": description})
df = pd.DataFrame(results)
results = []
for page in range(1, 6): # 假设要抓取5页数据
url = f"https://example.com/page{page}" # 替换为实际网页的URL模板
response = requests.get(url)
content = response.content
soup = BeautifulSoup(content, "html.parser")
data = soup.find_all("div", class_="item")
for item in data:
title = item.find("h2").text
description = item.find("p").text
results.append({"title": title, "description": description})
df = pd.DataFrame(results)
这样,你就可以使用Python将抓取操作扩展到超过1页。根据实际情况,修改URL模板、选择器和数据存储方式,以适应不同的网页结构和需求。
腾讯云相关产品和产品介绍链接地址:
领取专属 10元无门槛券
手把手带您无忧上云