
多模态模型(如OpenAI的CLIP、GPT-4V,Google的Gemini等)能够处理文本、图像、音频等多种输入形式。接入时需注意以下关键点:
场景:上传一张图片,获取AI生成的文字描述并保存结果。
实现步骤:
安装依赖库 使用OpenAI Python库,需提前申请API密钥并设置环境变量。
pip install openai调用API示例代码 将本地图像转换为Base64编码后发送请求:
import base64
import os
from openai import OpenAI
# 初始化客户端
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def encode_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
def describe_image(image_path):
base64_image = encode_image_to_base64(image_path)
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image in detail."},
{"type": "image_url", "image_url": f"data:image/jpeg;base64,{base64_image}"}
],
}
],
max_tokens=300,
)
return response.choices[0].message.content
description = describe_image("example.jpg")
print(description)Google Gemini的API调用方式类似,但需使用google-generativeai库:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-pro-vision')
# 上传图像并提问
response = model.generate_content(["What is in this image?", "image.jpg"])
print(response.text)错误处理 添加重试机制应对网络波动或API限流:
from tenacity import retry, stop_after_attempt
@retry(stop=stop_after_attempt(3))
def safe_api_call(image_path):
try:
return describe_image(image_path)
except Exception as e:
print(f"Error: {e}")
raise性能优化
aiohttp)。数据隐私 敏感数据避免直接调用第三方API,可考虑本地化部署模型(如使用LLaVA或OpenFlamingo)。