我正在做“用Python自动化无聊的任务”这本书中的webscraping教程的第一个例子。该项目包括在命令行上键入搜索词,并让我的计算机自动打开浏览器,在新选项卡中显示所有排名靠前的搜索结果
它提到我需要找到
<h3 class="r"> 元素,这些元素是指向每个搜索结果的链接。R类仅用于搜索结果链接。
但问题是我在任何地方都找不到它,即使使用Chrome Devtools也是如此。任何关于它在哪里的帮助都将不胜感激。
注意:仅供参考,这是书中看到的完整程序。
# lucky.py - Opens several Google search results.
import requests, sys, webbrowser, bs4
print('Googling..') # display text while downloading the Google page
res= requests.get('http://google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()
#Retrieve top searh result links.
soup = bs4.BeautifulSoup(res.text)
#Open a browser tab for each result.
linkElems = soup.select('.r a')
numOpen = min(5,len(linkElems))
for i in range(numOpen):
webbrowser.open('http://google.com' + linkElems[i].get('href'))发布于 2016-10-17 12:45:49
这将对您起作用:
>>> import requests
>>> from lxml import html
>>> r = requests.get("https://www.google.co.uk/search?q=how+to+do+web+scraping&num=10")
>>> source = html.fromstring((r.text).encode('utf-8'))
>>> links = source.xpath('//h3[@class="r"]//a//@href')
>>> for link in links:
print link.replace("/url?q=","").split("&sa=")[0]输出:
http://newcoder.io/scrape/intro/
https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/
http://docs.python-guide.org/en/latest/scenarios/scrape/
http://webscraper.io/
https://blog.hartleybrody.com/web-scraping/
https://first-web-scraper.readthedocs.io/
https://www.youtube.com/watch%3Fv%3DE7wB__M9fdw
http://www.gregreda.com/2013/03/03/web-scraping-101-with-python/
http://analystcave.com/web-scraping-tutorial/
https://en.wikipedia.org/wiki/Web_scraping备注:我使用的是Python2.7.x,对于Python3.x,您只需将打印输出括起来,就像这样print (link.replace("/url?q=","").split("&sa="))
https://stackoverflow.com/questions/40078536
复制相似问题