大家好,这是我在这里的第一篇帖子,我想知道ı如何将ı从网站上抓取的图像文件写入csv文件,或者如果不能写入csv文件,ı如何将标题、描述、时间信息和图像写入可能的word文件这里是一切正常的代码只想知道ı如何将我下载到磁盘的图像写入csv或word文件谢谢您的帮助
import csv
import requests
from bs4 import BeautifulSoup
site_link = requests.get("websitenamehere").text
soup = BeautifulSoup(site_link,"lxml")
read_file = open("blogger.csv","w",encoding="UTF-8")
csv_writer = csv.writer(read_file)
csv_writer.writerow(["Header","links","Publish Time"])
counter = 0
for article in soup.find_all("article"):
###Counting lines
counter += 1
print(counter)
#Article Headers
headers = article.find("a")["title"]
print(headers)
#### Links
links = article.find("a")["href"]
print(links)
#### Publish time
publish_time = article.find("div",class_="mkdf-post-info-date entry-date published updated")
publish_time = publish_time.a.text.strip()
print(publish_time)
###image links
images = article.find("img",class_="attachment-full size-full wp-post-image nitro-lazy")["nitro-lazy-src"]
print(images)
###Download Article Pictures to disk
pic_name = f"{counter}.jpg"
with open(pic_name, 'wb') as handle:
response = requests.get(images, stream=True)
for block in response.iter_content(1024):
handle.write(block)
###CSV Rows
csv_writer.writerow([headers, links, publish_time])
print()
read_file.close()发布于 2021-02-24 18:19:26
基本上,您可以转换为base64并在需要时写入文件
import base64
with open("image.png", "rb") as image_file:
encoded_string= base64.b64encode(img_file.read())
print(encoded_string.decode('utf-8'))发布于 2021-02-24 18:41:52
csv文件应该只包含文本字段。即使csv模块尽最大努力引用字段以允许其中几乎任何字符,包括分隔符或换行符,它也无法处理图像文件中可能存在的空字符。
这意味着如果您希望将图像字节存储在csv文件中,则必须对其进行编码。Base64是Python Standard Library原生支持的一种众所周知的格式。因此,您可以将代码更改为:
import base64
...
###Download Article Pictures
response = requests.get(images, stream=True)
image = b''.join(block for block in response.iter_content(1024)) # raw image bytes
image = base64.b64encode(image) # base 64 encoded (text) string
###CSV Rows
csv_writer.writerow([headers, links, publish_time, image])简单地说,图像在被使用之前必须被解码。
https://stackoverflow.com/questions/66348807
复制相似问题