我正在尝试从Udacity目录网站中提取所有的课程信息。
当我试图从任何课程页面提取价格时,它会返回一个null months access
和一个空值,如下所示:
page_req = requests.get('https://www.udacity.com/course/data-analyst-nanodegree--nd002')
page_soup = BeautifulSoup(page_req.content, 'html.parser')
page_soup.find('div', class_='price-cards').find('div', class_='price-card bundle')
<div class="price-card bundle"><div class="flag"><p class="flag__text">10% OFF</p></div><div
class="price-info"><div class="price-info__deal" hidden="">BEST DEAL</div><div class="title h6">null
months access</div><div class="price"><span class="price__payable"><span class="skeleton
skeleton__default"><span style="width:100px"> </span></span></span><span class="price__label"><span
class="current-price"> per month</span></span></div><p class="blurb">Start learning today! Switch to
the monthly price afterwards if more time is needed.</p><div class="enroll-button__container"></div>
</div></div>
那么,我怎样才能得到课程的价格呢?
注:价格从一个国家到另一个国家的变化。(即:在美国是美元,意大利是欧元)
发布于 2020-07-23 22:16:31
浏览现代网站最简单的方法就是观察网络流量。您可以通过打开浏览器开发工具或按Ctrl + Shift + I来做到这一点。选择网络,标记保存日志并禁用缓存。下一步只过滤XHR。重新加载页面并观察网络调用。
当我调用您的URI时,web浏览器调用URI。用Python模拟该调用:
from requests import Session
with Session() as httpx:
URI = 'https://braavos.udacity.com/api/prices'
params = dict(item='urn:x-udacity:item:nd-unit:10153',
price_sheet='regular',
currency='USD',
)
response = httpx.get(url=URI, params=params)
data = response.json()
print(type(data)) # dict
print(data) # dict and thus you can access data as you would dicts
# examples
print(data['results'][0]['payment_plans']['upfront_recurring']['description'])
# 'one time payment of $1,436 USD, followed by $399 USD every 1 month'
print(data['results'][0]['payment_plans']['recurring']['description'])
# '$399 USD every 1 month'
发布于 2020-07-23 21:30:26
试试下面的脚本。我已经为https://www.udacity.com/course/data-analyst-nanodegree--nd002实现了脚本,使用了API方式,这是从端点获取数据的最佳方法之一。您可以通过签入开发人员工具的网络部分,只需按CTRL+SHIFT+I
和XHR上的网络过滤器来查看所有API调用即可。使用请求,您可以访问API,它将返回必须转换为JSON格式的结果。
使用API url请求的好处:
如果您查看脚本,它现在正在提取付款计划,您想要刮刮,循环和前期循环。同样,您也可以访问JSON结果中的任何内容,如:促销、原始价格等。此外,我还制作了网址动态,您可以传递任何国家货币的缩写,它会给出结果。例如:-对于美国,我已经通过了美元,对于意大利或欧洲,您可以在capital 中的货币变量中传递EUR。
import json
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
def scrape_prices():
currency = 'USD'
url = 'https://braavos.udacity.com/api/prices?item=urn:x-udacity:item:nd-unit:10153&price_sheet=regular&anonymous_id=ae9be6e5-97af-48ee-ab3d-63456a8cb38f¤cy=' + currency
session = requests.Session()
response = session.get(url,verify=False)
result = json.loads(response.text)
extracted_payment_plans_recurring = result['results'][0]['payment_plans']['recurring']
extracted_payment_plan_upfront = result['results'][0]['payment_plans']['upfront_recurring']
print('-' * 100)
print('Payment Plans Recurring: ',extracted_payment_plans_recurring)
print('-' * 100)
print('Payment Plans Up front Recurring: ',extracted_payment_plan_upfront)
print('-' * 100)
定期付款计划的结果
预先定期付款计划的结果
来自UDACITY网站的API url
JSON价格结果
https://stackoverflow.com/questions/63062062
复制