我正在尝试让BeautifulSoup读取这个页面,但是URL没有正确地传递给get()
命令。
网址为https://www.econjobrumors.com/topic/supreme-court-to-%e2%80%9cconsider%e2%80%9d-taking-up-harvard-affirmative-action-case-on-june-10。但是,当我尝试使用BeautifulSoup从URL中获取数据时,总是给出一个错误,指出URL不正确。
response = requests.get(url = "https://www.econjobrumors.com/topic/supreme-court-to-%e2%80%9cconsider%e2%80%9d-taking-up-harvard-affirmative-action-case-on-june-10",
verify = False \
)
print(response.request.url, end="\r")
导致错误的是双引号“(U+201C)和”(U+201D)。我已经尝试了几个小时,但仍然不知道如何正确地传递URL。
发布于 2021-05-27 21:09:04
我将URL两边的双引号改为单引号
from bs4 import BeautifulSoup
import requests
url = 'https://www.econjobrumors.com/topic/supreme-court-to-%e2%80%9cconsider%e2%80%9d-taking-up-harvard-affirmative-action-case-on-june-10'
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.content, 'lxml')
print(soup)
按照预期打印出html,我对其进行了编辑以适合以下答案
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xml:lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="IE=8" http-equiv="X-UA-Compatible"/>
<ALL THE CONTENT>Too much to paste in the answer</ALL THE CONTENT>
</html>
https://stackoverflow.com/questions/67729768
复制