如果current_url
已经在列表中,我会尝试跳过它,但会遇到一条错误消息。
我的目标是抓取一个页面,将该网页添加到文本文件中,然后当我重新开始抓取时,我希望将要抓取的网页与列表中的网页进行比较。当网页在中时,我想跳过它。
但是这个问题突然出现,它无法将current_url
与列表进行比较:这段代码:
if cur_url in visited_urls:
完整代码:
打开文本文件
visited_urls = 'C:/webdrivers/visited_p_urls_test.txt' # This specific the location of the text file on the PC
cur_url = driver.current_url
# Go to main test website
driver.get('https://www.google.com')
sleep(randint(1,3))
with open(visited_urls, 'a') as filehandle: # This opens a text file with the "Append to (add) mode."
filehandle.write('\n' + cur_url)
# Go to main test website
driver.get('https://adwords.google.com/home/')
sleep(randint(1,3))
with open(visited_urls, 'a') as filehandle: # This opens a text file with the "Append to (add) mode."
filehandle.write('\n' + cur_url)
driver.get('https://adwords.google.com/home/tools/')
sleep(randint(1,3))
with open(visited_urls, 'a') as filehandle: # This opens a text file with the "Append to (add) mode."
filehandle.write('\n' + cur_url)
if cur_url in visited_urls:
print 'I CANNOT comment because I already did before'
else:
print 'I can comment'
with open(visited_urls, 'r') as filehandle: # This opens a text file with the "Read" mode.
filecontent = filehandle.readlines() # readlines reads ALL lines in a text file
for line in filecontent:
print(line)
我收到这个错误消息:
TypeError: 'str' object is not callable
发布于 2018-07-15 20:00:52
您正在尝试搜索文件路径(C:/webdrivers/visited_p_urls_test.txt
)中的字符串,但您必须搜索文件内容:
if 'blabla' in open('example.txt').read():
print("true")
在您的案例中:
# open file, read it's content and only then search if cur_url exists already in a file
if cur_url in open(visited_urls).read():
print 'I CANNOT comment because I already did before'
else:
print 'I can comment'
https://stackoverflow.com/questions/51351713
复制