我使用下面的代码从求职站点抓取数据,并使用BeautifulSoup将其写入csv文件。我看到抓取代码是有效的,因为当我打印提取的文件时,它看起来没有问题。但是,我无法将抓取的数据打印到csv文件中。创建了一个csv文件,但在每一列中,只有一些字母,如a,b,c,而不是描述头衔,薪水等的完整单词。有人可以帮助我吗?
import requests
import csv
r=requests.get("https://www.reed.co.uk/jobs/accountancy-jobs")
r.content
soup=BeautifulSoup(r.content)
#print(soup.prettify())
soup.find_all("article")
jobs=soup.find_all("article")
for job in jobs:
title=job.h3.text
posterline=job.find("div", attrs={"class":"posted-by"})
poster=posterline.find("a").text
postdate=job.find('div',{'class': 'posted-by'}).next_element
description=job.find("div", attrs={"class":"description"})
metadata=job.find("div", attrs={"class":"metadata"})
metadata=job.find("div", attrs={"class":"metadata"})
salary=metadata.find("li", attrs={"class": "salary"})
salary=salary.text
time=metadata.find("li", attrs={"class": "time"})
datas=(title, salary, time, postdate, poster)
with open('reeddata.csv', 'w', newline='') as file:
writer = csv.writer(file)
headers = ['Title','Salary','Time', 'Postdate','Poster']
writer.writerow(headers)
for data in datas:
writer.writerow(data)发布于 2021-02-27 03:03:56
尝试下面的脚本来获取所需的内容,并相应地将其写入csv文件:
import requests
from bs4 import BeautifulSoup
import csv
r = requests.get("https://www.reed.co.uk/jobs/accountancy-jobs")
soup = BeautifulSoup(r.content,"html.parser")
with open('reeddata.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Title','Salary','Time', 'Postdate','Poster'])
for job in soup.find_all("article"):
title = job.find("h3",class_="title").find("a",href=True).get_text(strip=True)
poster = job.find("div", class_="posted-by").find("a").get_text(strip=True)
postdate = job.find('div',class_='posted-by').next_element.strip()
salary = job.find("div",class_="metadata").find("li",class_="salary").get_text(strip=True)
time = job.find("div",class_="metadata").find("li",class_="time").get_text(strip=True)
writer.writerow([title, salary, time, postdate, poster])发布于 2021-02-26 23:39:12
这是一个缩进问题,在这里,对于你找到的每个作业,你打开一个csv,写下作业的描述,然后关闭它,然后对于下一个作业,你覆盖csv。尝试不缩进编写循环,并将值附加到"datas“,而不是为每个作业重新定义它。
https://stackoverflow.com/questions/66388559
复制相似问题