使用Python 3从网站下载所有pdf文件的方法有很多种,以下是一种常见的实现方式:
pip install requests
pip install beautifulsoup4
import requests
from bs4 import BeautifulSoup
import os
def get_pdf_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
pdf_links = []
for link in soup.find_all('a'):
href = link.get('href')
if href.endswith('.pdf'):
pdf_links.append(href)
return pdf_links
def download_pdf(url, save_path):
response = requests.get(url)
with open(save_path, 'wb') as file:
file.write(response.content)
def download_all_pdf(url, save_directory):
pdf_links = get_pdf_links(url)
for link in pdf_links:
pdf_url = url + link
pdf_name = link.split('/')[-1]
save_path = os.path.join(save_directory, pdf_name)
download_pdf(pdf_url, save_path)
url = 'https://example.com/pdf/'
save_directory = 'path/to/save/directory'
download_all_pdf(url, save_directory)
以上代码会从指定的网页中获取所有的pdf链接,并将其下载到指定的本地目录中。
对于这个问答内容,腾讯云提供了多种相关产品和服务,例如:
这些产品和服务可以帮助用户在云计算环境中进行开发、部署和管理各类应用和服务。
领取专属 10元无门槛券
手把手带您无忧上云