大模型服务平台 TokenHub 语言模型调用概览

概述
TokenHub 平台聚合了腾讯混元、DeepSeek、智谱 GLM、Kimi、MiniMax、通义千问 Qwen 等多家厂商的语言模型，覆盖对话交互、内容创作、代码生成、推理分析等场景。所有模型统一兼容 OpenAI Chat Completions API 与 Anthropic Messages API 两种协议，您可以直接使用 OpenAI SDK、Anthropic SDK 或任何兼容客户端接入。
模型支持的协议概览
模型名称
model 参数值
OpenAI Chat Completions
OpenAI Responses
Anthropic
Hy3
hy3
✅
✅
✅
Hy3 preview
hy3-preview
✅
✅
✅
Hy-MT2-Pro
hy-mt2-pro
✅
❌
✅
Hy-MT2-Plus
hy-mt2-plus
✅
❌
✅
Hy-MT2-Lite
hy-mt2-lite
✅
❌
✅
Hy-Role-Latest
hunyuan-role-latest
✅
❌
✅
Hy-Role
hy-role
✅
❌
✅
DeepSeek-V4-Flash 原厂直供
deepseek-v4-flash-202605
✅
兼容支持*
✅
DeepSeek-V4-Pro 原厂直供
deepseek-v4-pro-202606
✅
兼容支持*
✅
DeepSeek-V4-Flash
deepseek-v4-flash
✅
兼容支持*
✅
DeepSeek-V4-Pro
deepseek-v4-pro
✅
兼容支持*
✅
Deepseek-v3.2
deepseek-v3.2
✅
❌
✅
GLM-5.2
glm-5.2
✅
兼容支持*
✅
GLM-5.1
glm-5.1
✅
兼容支持*
✅
GLM-5V-Turbo
glm-5v-turbo
✅
❌
✅
GLM-5-Turbo
glm-5-turbo
✅
❌
✅
GLM-5
glm-5
✅
❌
✅
Kimi K3
kimi-k3
✅
兼容支持*
✅
Kimi K2.7 Code HighSpeed
kimi-k2.7-code-highspeed
✅
兼容支持*
✅
Kimi K2.7 Code
kimi-k2.7-code
✅
兼容支持*
✅
Kimi-K2.6
kimi-k2.6
✅
兼容支持*
✅
Kimi-K2.5
kimi-k2.5
✅
❌
✅
MiniMax-M3
minimax-m3
✅
✅
✅
MiniMax-M2.7
minimax-m2.7
✅
✅
✅
MiniMax-M2.5
minimax-m2.5
✅
✅
✅
Qwen3.5-Flash
qwen3.5-flash
✅
✅
✅
Qwen3.5-Plus
qwen3.5-plus
✅
✅
✅
说明：
兼容支持 OpenAI Responses 协议的模型调用指引可查询 Responses API 兼容模式说明。
OpenAI API 使用
BaseURL
广州：https://tokenhub.tencentmaas.com/v1
新加坡：https://tokenhub-intl.tencentmaas.com/v1
请求参数
下表列出 TokenHub 网关支持的常用请求参数。完整字段定义与最新更新请参见 OpenAI API 官方文档。
说明：
本文为语言模型的公共调用指南，不同模型支持的参数范围可能有差异。
参数名
必选
类型
描述
model
是
String
服务 ID。
对于平台默认创建的服务，服务 ID 与模型名称相同（例如 hy3、deepseek-v3.2），完整列表请参见 模型支持的协议概览 中的 model 参数值 列。
对于用户创建的自定义服务，服务 ID 格式为 ep-xxxxxxxx，可在 在线推理服务 页面查看。
messages
是
Array
聊天上下文消息数组，详细信息请参见 messages 参数说明。
stream
否
Boolean
是否启用流式输出。
取值范围：true / false，默认值为 false。
stream_options
否
Object
流式输出选项。常用：{"include_usage": true} 让最后一个 chunk 携带 usage 统计字段（仅在 stream=true 时有效）。
temperature
否
Float
采样温度，控制输出随机性。
取值范围：[0.0, 2.0]，默认值为 1.0。值越高输出越随机。
部分模型有特殊取值约束，请参见对应模型的专用文档。
top_p
否
Float
核采样（Nucleus Sampling）概率阈值。
取值范围：[0.0, 1.0]，默认值为 1.0。建议与 temperature 二选一使用。
max_tokens
否
Integer
限制单次响应最大输出 Token 数。思考类模型的推理 Token 与回答 Token 共享此额度，建议适当调大。
n
否
Integer
为同一次请求生成的候选回复数量，默认 1。
注意：n > 1 时按总 Token 量计费。
stop
否
String 或 Array of String
指定模型输出的停止序列。当生成结果命中任一指定序列时，模型将停止输出，且响应内容中不包含该停止序列。支持传入单个字符串或字符串数组，最多 4 个。
例如：让模型生成一个 10 条的清单，不希望它继续往下写第 11 条，此处可填写为：["11."]。
seed
否
Integer
随机种子，用于结果复现。在多次请求中使用相同的 seed 值，并且其他参数也保持一致时，模型更有可能返回一致或非常接近的结果。
frequency_penalty
否
Float
频率惩罚。范围 [-2.0, 2.0]，默认 0。正值会降低已频繁出现的 Token 被再次选中的概率，可缓解重复内容。
presence_penalty
否
Float
存在惩罚。范围 [-2.0, 2.0]，默认 0。正值会鼓励模型谈论新话题（只看 Token 是否出现过，不看次数）。
logit_bias
否
Map
修改特定 Token 出现在结果中的概率。键为 Token ID，值为 [-100, 100] 的偏置；-100 表示禁用该 Token，100 表示强制使用。
logprobs
否
Boolean
是否返回输出 Token 的对数概率，默认 false。
top_logprobs
否
Integer
每个位置返回概率最高的 N 个 Token，取值 [0, 20]。需要同时设置 logprobs=true。
response_format
否
Object
指定响应输出格式。常用：
{"type": "text"}：默认文本输出。
{"type": "json_object"}：JSON 模式，强制输出合法 JSON。
{"type": "json_schema", "json_schema": {...}}：结构化输出，按指定 Schema 约束。
tools
否
Array
Function Calling 工具定义列表。每个工具包含 type: "function" 与 function 对象（含 name / description / parameters）。
tool_choice
否
String 或 Object
工具调用策略：
"none"：禁止调用工具。
"auto"：自动判断是否调用（默认）。
"required"：强制调用任意工具。
{"type": "function", "function": {"name": "xxx"}}：强制调用指定工具。
parallel_tool_calls
否
Boolean
是否允许在一次响应中并行调用多个工具，默认 true。设为 false 强制工具串行调用，便于调试。
thinking
否
Object
思考模式控制，不同模型默认值有所不同。详细信息请参见 深度思考。
取值范围：{"type": "enabled"} / {"type": "disabled"}。
reasoning_effort
否
String
推理深度控制，仅对思考类模型生效，不同模型默认值有所不同。详细信息请参见 深度思考。
取值范围：low / medium / high。
user
否
String
终端用户的稳定标识符，便于审计与异常排查。
messages 参数说明
消息数组中的每个对象包含以下字段：
字段
类型
描述
role
String
角色：system（系统提示）、user（用户）、assistant（助手）、tool（工具返回）
content
String
消息文本内容。
消息顺序规则：[system(可选) → user → assistant → user → ...]，必须以 user 角色结尾。
返回参数
参数名
类型
描述
id
String
请求唯一标识。
object
String
对象类型，固定 chat.completion。
created
Integer
创建时间（Unix 时间戳）。
model
String
实际使用的模型名称。
choices
Array
模型针对同一次请求返回的候选结果列表，详情请参见 choices 数组元素。
usage
Object
Token 消耗统计。
choices 数组元素
字段
类型
描述
index
Integer
选项索引。
message
Object
回复消息，包含 role 和 content。
finish_reason
String
结束原因：stop（正常结束）、length（达到最大长度）、tool_calls（需要调用工具）
usage 对象
OpenAI 协议在响应的 usage 对象中返回本次请求的 Token 消耗明细（也是 TokenHub 的计费依据）。下表说明各字段的含义。
字段
类型
说明
prompt_tokens
Integer
本次请求的输入（prompt）Token 总数，包含命中缓存的部分。
prompt_tokens_details.cached_tokens
Integer
prompt_tokens 中命中缓存（cached）的 Token 数。
completion_tokens
Integer
模型生成的输出 Token 数。思考类模型的推理过程（reasoning）Token 也计入此项。
completion_tokens_details.reasoning_tokens
Integer
输出中属于思考过程（reasoning）的 Token 数，已包含在 completion_tokens 内，仅作明细展示。
total_tokens
Integer
prompt_tokens 与 completion_tokens 之和。
示例代码
说明：
本文为语言模型的公共调用指南。不同模型在思考模式开关、推理字段返回、多模态格式、特殊参数取值等方面会略有差异，请同时参考对应模型的专用文档：
混元模型：混元调用指南﻿
DeepSeek 模型：DeepSeek 调用指南﻿
GLM 模型：GLM 调用指南﻿
Kimi 模型：Kimi 调用指南﻿
MiniMax 模型：MiniMax 调用指南﻿
Qwen 模型：Qwen 调用指南﻿
﻿示例：基础对话﻿
﻿示例：流式输出﻿
﻿示例：System Prompt﻿
﻿示例：多轮对话﻿
﻿示例：Function Calling（工具调用）﻿
示例：基础对话
说明：
请将 YOUR_API_KEY 替换为您创建的 API Key，并将 model 替换为您需要体验的服务 ID。
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub.tencentmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {"role": "user", "content": "你好，请介绍一下你自己"}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub.tencentmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[
        {"role": "user", "content": "你好，请介绍一下你自己"},
    ],
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub.tencentmaas.com/v1',
});
﻿
const response = await client.chat.completions.create({
  model: 'deepseek-v3.2',
  messages: [
    { role: 'user', content: '你好，请介绍一下你自己' },
  ],
});
console.log(response.choices[0].message.content);
// 基于 OpenAI 兼容协议，使用 OkHttp 直接调用 HTTP 接口
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class BasicChat {
    public static void main(String[] args) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "deepseek-v3.2");
        body.put("messages", Arrays.asList(
            Map.of("role", "user", "content", "你好，请介绍一下你自己")
        ));
﻿
        RequestBody requestBody = RequestBody.create(
            new Gson().toJson(body),
            MediaType.parse("application/json")
        );
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub.tencentmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(requestBody)
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body := map[string]interface{}{
        "model": "deepseek-v3.2",
        "messages": []map[string]string{
            {"role": "user", "content": "你好，请介绍一下你自己"},
        },
    }
    payload, _ := json.Marshal(body)
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub.tencentmaas.com/v1/chat/completions",
        bytes.NewBuffer(payload))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()
﻿
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
返回示例：
{
    "id": "5e9c7ae9-e0e4-4ec1-bbd0-22bcfda61e45",
    "object": "chat.completion",
    "model": "deepseek-v3.2",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "你好！很高兴见到你！😊\\n\\n我是DeepSeek，由深度求索公司创造的AI助手。让我简单介绍一下自己：\\n\\n**我的特点：**\\n- 📚 知识截止到2024年7月，是DeepSeek的最新版本模型\\n- 💬 纯文本对话模型，专注于理解和生成文字内容\\n- 📁 支持文件上传功能——可以处理图像、txt、pdf、ppt、word、excel等文件，并从中读取文字信息\\n- 🌐 支持联网搜索（需要你在Web/App中手动开启）\\n- 💾 拥有128K的上下文长度，能记住我们较长的对话内容\\n\\n**我能帮你做什么：**\\n- 回答各种问题，进行深入讨论\\n- 协助写作、翻译、分析\\n- 处理上传的文档内容\\n- 提供学习、工作、生活方面的建议\\n\\n**重要提醒：**\\n- 我完全免费使用，没有任何收费计划\\n- 目前不支持语音功能\\n- 你可以通过官方应用商店下载App使用\\n\\n我的回复风格比较热情细腻，希望能给你带来温暖的交流体验！有什么想聊的或需要帮助的，尽管告诉我吧！✨"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 244,
        "total_tokens": 254,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
示例：流式输出
说明：
请将 YOUR_API_KEY 替换为您创建的 API Key，并将 model 替换为您需要体验的服务 ID。
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub.tencentmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {"role": "system", "content": "你是一个有帮助的 AI 助手。"},
      {"role": "user", "content": "计算 1+1"}
    ],
    "stream": true
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub.tencentmaas.com/v1",
)
﻿
stream = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[
        {"role": "system", "content": "你是一个有帮助的 AI 助手。"},
        {"role": "user", "content": "计算 1+1"},
    ],
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub.tencentmaas.com/v1',
});
﻿
const stream = await client.chat.completions.create({
  model: 'deepseek-v3.2',
  messages: [
    { role: 'system', content: '你是一个有帮助的 AI 助手。' },
    { role: 'user', content: '计算 1+1' },
  ],
  stream: true,
});
﻿
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
// 流式调用基于 SSE，使用 OkHttp 接收逐行响应
import okhttp3.*;
import okhttp3.sse.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class Streaming {
    public static void main(String[] args) {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "deepseek-v3.2");
        body.put("messages", Arrays.asList(
            Map.of("role", "system", "content", "你是一个有帮助的 AI 助手。"),
            Map.of("role", "user", "content", "计算 1+1")
        ));
        body.put("stream", true);
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub.tencentmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        EventSources.createFactory(new OkHttpClient()).newEventSource(request,
            new EventSourceListener() {
                @Override public void onEvent(EventSource es, String id, String type, String data) {
                    if (!"[DONE]".equals(data)) System.out.print(data);
                }
            });
    }
}
package main
﻿
import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model": "deepseek-v3.2",
        "messages": []map[string]string{
            {"role": "system", "content": "你是一个有帮助的 AI 助手。"},
            {"role": "user", "content": "计算 1+1"},
        },
        "stream": true,
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub.tencentmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") && line != "data: [DONE]" {
            fmt.Println(strings.TrimPrefix(line, "data: "))
        }
    }
}
流式返回采用服务器发送事件 SSE（Server-Sent Events）格式：
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"role":"assistant","content":"1"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"+"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"="},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"2"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
﻿
data: [DONE]
示例：System Prompt
说明：
请将 YOUR_API_KEY 替换为您创建的 API Key，并将 model 替换为您需要体验的服务 ID。
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub.tencentmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {"role": "system", "content": "你是一个专业的英语翻译助手。将用户输入的中文翻译为英文，将英文翻译为中文。只返回翻译结果，不做解释。"},
      {"role": "user", "content": "今天天气真好"}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub.tencentmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[
        {"role": "system", "content": "你是一个专业的英语翻译助手。将用户输入的中文翻译为英文，将英文翻译为中文。只返回翻译结果，不做解释。"},
        {"role": "user", "content": "今天天气真好"},
    ],
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub.tencentmaas.com/v1',
});
﻿
const response = await client.chat.completions.create({
  model: 'deepseek-v3.2',
  messages: [
    { role: 'system', content: '你是一个专业的英语翻译助手。将用户输入的中文翻译为英文，将英文翻译为中文。只返回翻译结果，不做解释。' },
    { role: 'user', content: '今天天气真好' },
  ],
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class SystemPrompt {
    public static void main(String[] args) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "deepseek-v3.2");
        body.put("messages", Arrays.asList(
            Map.of("role", "system", "content", "你是一个专业的英语翻译助手。将用户输入的中文翻译为英文，将英文翻译为中文。只返回翻译结果，不做解释。"),
            Map.of("role", "user", "content", "今天天气真好")
        ));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub.tencentmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model": "deepseek-v3.2",
        "messages": []map[string]string{
            {"role": "system", "content": "你是一个专业的英语翻译助手。将用户输入的中文翻译为英文，将英文翻译为中文。只返回翻译结果，不做解释。"},
            {"role": "user", "content": "今天天气真好"},
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub.tencentmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
返回示例：
{
    "id": "5d42fea3-413e-42ce-99b2-0d1595dae996",
    "object": "chat.completion",
    "model": "deepseek-v3.2",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The weather is really nice today."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 38,
        "completion_tokens": 7,
        "total_tokens": 45,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
示例：多轮对话
说明：
请将 YOUR_API_KEY 替换为您创建的 API Key，并将 model 替换为您需要体验的服务 ID。
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub.tencentmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {"role": "user", "content": "请介绍一下量子计算"},
      {"role": "assistant", "content": "量子计算是一种利用量子力学原理进行信息处理的计算方式..."},
      {"role": "user", "content": "它和传统计算有什么区别？"}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub.tencentmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[
        {"role": "user", "content": "请介绍一下量子计算"},
        {"role": "assistant", "content": "量子计算是一种利用量子力学原理进行信息处理的计算方式..."},
        {"role": "user", "content": "它和传统计算有什么区别？"},
    ],
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub.tencentmaas.com/v1',
});
﻿
const response = await client.chat.completions.create({
  model: 'deepseek-v3.2',
  messages: [
    { role: 'user', content: '请介绍一下量子计算' },
    { role: 'assistant', content: '量子计算是一种利用量子力学原理进行信息处理的计算方式...' },
    { role: 'user', content: '它和传统计算有什么区别？' },
  ],
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class MultiTurn {
    public static void main(String[] args) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "deepseek-v3.2");
        body.put("messages", Arrays.asList(
            Map.of("role", "user", "content", "请介绍一下量子计算"),
            Map.of("role", "assistant", "content", "量子计算是一种利用量子力学原理进行信息处理的计算方式..."),
            Map.of("role", "user", "content", "它和传统计算有什么区别？")
        ));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub.tencentmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model": "deepseek-v3.2",
        "messages": []map[string]string{
            {"role": "user", "content": "请介绍一下量子计算"},
            {"role": "assistant", "content": "量子计算是一种利用量子力学原理进行信息处理的计算方式..."},
            {"role": "user", "content": "它和传统计算有什么区别？"},
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub.tencentmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
返回示例：
{
    "id": "fda59c08-6a85-4514-bdbf-d77a8d68e018",
    "object": "chat.completion",
    "model": "deepseek-v3.2",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "好的，这是一个非常核心的问题。量子计算和传统计算的根本区别在于它们处理信息的基本单位和工作原理。\\n\\n我们可以用一个非常经典的比喻来开始：\\n\\n*   **传统计算机** 像是在一个巨大的**图书馆**里，一个**图书管理员**（CPU）在一条很长的走廊（总线）上奔跑，一次只能打开一个房间（内存地址），查看一本书（一个比特的数据），然后做出决定。\\n*   **量子计算机** 则像是让**所有图书管理员**（量子比特）同时进入**所有房间**，并在一瞬间阅读**所有书籍的每一种可能的组合**，然后告诉你最终的结果。\\n\\n下面我们从几个关键维度进行详细对比：\\n\\n### 1. 基本信息单位：比特 vs. 量子比特\\n\\n| 特征 | 传统计算（比特） | 量子计算（量子比特） |\\n| :--- | :--- | :--- |\\n| **状态** | **二进制**：只能是 **0** 或 **1**。就像一盏灯，要么开，要么关。非常确定。 | **叠加态**：可以**同时**是0和1，或者说是0和1的任意概率组合。就像一盏同时处于开和关状态的“量子灯”。 |\\n| **表示方式** | 一个明确的、离散的值。 | 一个状态向量，用狄拉克符号表示为：\\\\|ψ⟩ = α\\\\|0⟩ + β\\\\|1⟩，其中α和β是复数，且\\\\|α\\\\|² + \\\\|β\\\\|² = 1。 |\\n| **核心差异** | **确定性**：每个比特在任何时刻都有明确的值。 | **概率性**：测量量子比特时，它会以 \\\\|α\\\\|² 的概率坍缩为0，以 \\\\|β\\\\|² 的概率坍缩为1。 |\\n\\n### 2. 工作原理：逻辑门 vs. 量子特性\\n\\n| 特征 | 传统计算 | 量子计算 |\\n| :--- | :--- | :--- |\\n| **操作方式** | 使用**逻辑门**（如与门、或门、非门）对比特进行运算。一次操作改变一个或一组比特的状态。 | 使用**量子逻辑门**对量子比特进行操作。这些操作是**可逆的**，并能利用叠加态进行**并行计算**。 |\\n| **核心优势** | **串行处理**：任务被分解为一系列步骤，按顺序执行。对于简单、逻辑清晰的任务效率极高。 | **量子并行性**：由于量子比特处于叠加态，一次量子操作可以**同时作用于所有可能的输入**。这是量子加速的根源。 |\\n| **独特现象** | 无 | **量子纠缠**：两个或多个量子比特可以形成一种神秘的关联，无论它们相距多远，对一个量子比特的测量会瞬间决定另一个的状态。这允许量子计算机将不同量子比特的状态紧密联系起来，进行高度协同的计算。 |\\n\\n### 3. 性能与适用领域\\n\\n| 特征 | 传统计算 | 量子计算 |\\n| :--- | :--- | :--- |\\n| **擅长任务** | - **通用计算**：办公软件、网页浏览、游戏<br>- **逻辑控制**：操作系统、应用程序逻辑<br>- **大部分数据处理**：数据库管理、电子表格 | - **特定领域的指数级加速**：<br>  - **密码学**：破解RSA等加密算法（Shor算法）<br>  - **材料模拟**：精确模拟分子和材料的量子性质<br>  - **优化问题**：物流路线规划、金融投资组合优化<br>  - **人工智能**：加速机器学习训练 |\\n| **计算复杂度** | 对于某些复杂问题（如大数分解），传统算法需要**指数级**增长的时间。 | 对于特定问题，量子算法可将复杂度降至**多项式**级别，实现“量子优越性”。 |\\n| **输出结果** | 精确、确定的结果。 | 通常是**概率性**的结果。由于需要测量，我们得到的是一个可能正确的答案，因此算法通常需要多次运行以提高置信度。 |\\n\\n### 4. 物理实现与挑战\\n\\n| 特征 | 传统计算机 | 量子计算机 |\\n| :--- | :--- | :--- |\\n| **硬件基础** | 基于**晶体管**（半导体），技术成熟，可大规模集成（如CPU有数十亿晶体管）。 | 需要能保持量子态的物理系统，如：超导电路、离子阱、光量子等。技术尚在早期。 |\\n| **主要挑战** | 功耗、散热、晶体管尺寸接近物理极限（摩尔定律放缓）。 | **量子退相干**：量子态极其脆弱，极易受环境（如热、振动）干扰而失去量子特性。需要极低温（接近绝对零度）和高度隔离的环境。 |\\n| **错误纠正** | 错误率极低，纠错相对简单（如奇偶校验）。 | 错误率很高，需要复杂的**量子纠错码**，用多个物理量子比特来编码一个逻辑量子比特，开销巨大。 |\\n\\n### 总结表格\\n\\n| 对比维度 | 传统计算 | 量子计算 |\\n| :--- | :--- | :--- |\\n| **基本单位** | 比特 (0 或 1) | 量子比特 (叠加态：0和1的叠加) |\\n| **操作方式** | 逻辑门（串行） | 量子门（并行） |\\n| **核心原理** | 布尔逻辑 | 叠加、纠缠、干涉 |\\n| **结果输出** | 确定性 | 概率性 |\\n| **擅长领域** | 通用任务、逻辑控制 | 特定复杂问题（如模拟、优化、密码破译） |\\n| **技术成熟度** | 非常成熟，广泛应用 | 早期阶段，主要用于研究和特定计算 |\\n| **与用户关系** | **替代关系**：量子计算机**不是**用来取代你的手机或笔记本电脑的。它更像一个**专用加速器**，用于解决传统计算机在可预见未来内都无法解决的特定难题。未来，我们可能通过云端访问量子计算机，让它处理最复杂的部分，而传统计算机负责日常任务和用户交互。 |\\n\\n简单来说，传统计算机是“精准的快枪手”，而量子计算机是“能同时探索所有可能性的先知”。它们各有千秋，将在未来很长一段时间内协同工作。"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 32,
        "completion_tokens": 1321,
        "total_tokens": 1353,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
示例：Function Calling（工具调用）
说明：
请将 YOUR_API_KEY 替换为您创建的 API Key，并将 model 替换为您需要体验的服务 ID。思考模式下的工具调用，需在每一轮请求都回填历史 reasoning_content，以获取最佳效果，详情请参见 交错式思考模式（Interleaved Thinking）。
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub.tencentmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {"role": "user", "content": "北京今天天气怎么样？"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "获取指定城市的天气信息",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string", "description": "城市名称，如：北京"}
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub.tencentmaas.com/v1",
)
﻿
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "获取指定城市的天气信息",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string", "description": "城市名称，如：北京"}},
            "required": ["city"],
        },
    },
}]
﻿
response = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[{"role": "user", "content": "北京今天天气怎么样？"}],
    tools=tools,
    tool_choice="auto",
)
print(response.choices[0].message)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub.tencentmaas.com/v1',
});
﻿
const tools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: '获取指定城市的天气信息',
    parameters: {
      type: 'object',
      properties: { city: { type: 'string', description: '城市名称，如：北京' } },
      required: ['city'],
    },
  },
}];
﻿
const response = await client.chat.completions.create({
  model: 'deepseek-v3.2',
  messages: [{ role: 'user', content: '北京今天天气怎么样？' }],
  tools,
  tool_choice: 'auto',
});
console.log(response.choices[0].message);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class FunctionCalling {
    public static void main(String[] args) throws Exception {
        Map<String, Object> tool = Map.of(
            "type", "function",
            "function", Map.of(
                "name", "get_weather",
                "description", "获取指定城市的天气信息",
                "parameters", Map.of(
                    "type", "object",
                    "properties", Map.of("city", Map.of("type", "string", "description", "城市名称，如：北京")),
                    "required", List.of("city")
                )
            )
        );
﻿
        Map<String, Object> body = new HashMap<>();
        body.put("model", "deepseek-v3.2");
        body.put("messages", List.of(Map.of("role", "user", "content", "北京今天天气怎么样？")));
        body.put("tools", List.of(tool));
        body.put("tool_choice", "auto");
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub.tencentmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    tool := map[string]interface{}{
        "type": "function",
        "function": map[string]interface{}{
            "name":        "get_weather",
            "description": "获取指定城市的天气信息",
            "parameters": map[string]interface{}{
                "type": "object",
                "properties": map[string]interface{}{
                    "city": map[string]interface{}{"type": "string", "description": "城市名称，如：北京"},
                },
                "required": []string{"city"},
            },
        },
    }
﻿
    body, _ := json.Marshal(map[string]interface{}{
        "model":       "deepseek-v3.2",
        "messages":    []map[string]string{{"role": "user", "content": "北京今天天气怎么样？"}},
        "tools":       []map[string]interface{}{tool},
        "tool_choice": "auto",
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub.tencentmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
当模型决定调用工具时，返回：
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\\"city\\": \\"北京\\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
将工具执行结果返回模型，继续对话：
{
  "model": "deepseek-v3.2",
  "messages": [
    {"role": "user", "content": "北京今天天气怎么样？"},
    {"role": "assistant", "content": null, "tool_calls": [{"id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\\"city\\": \\"北京\\"}"}}]},
    {"role": "tool", "tool_call_id": "call_abc123", "content": "{\\"temperature\\": 22, \\"weather\\": \\"晴\\", \\"humidity\\": 45}"}
  ]
}
Anthropic API 使用
BaseUrl
广州：https://tokenhub.tencentmaas.com
新加坡：https://tokenhub-intl.tencentmaas.com
HTTP Headers
字段
支持状态
说明
anthropic-beta
忽略
不处理此头部
anthropic-version
忽略
不处理此头部
x-api-key
完全支持
用于身份验证
请求参数
下表列出 TokenHub 网关对 Anthropic 协议的支持情况。完整字段定义与最新更新请参见 Anthropic API 官方文档。
字段
支持状态
说明
model
支持
使用模型列表中 模型参数值 替代
max_tokens
完全支持
最大输出令牌数
container
忽略
不处理此字段
mcp_servers
忽略
不处理此字段
metadata
忽略
不处理此字段
service_tier
忽略
不处理此字段
stop_sequences
完全支持
停止序列
stream
完全支持
流式响应
system
完全支持
系统消息
temperature
完全支持
温度参数 (0.0-2.0)
thinking
忽略
不处理此字段
top_k
忽略
不处理此字段
top_p
完全支持
Top-p 采样
工具支持
tools
字段
支持状态
说明
name
完全支持
工具名称
input_schema
完全支持
输入参数模式
description
完全支持
工具描述
cache_control
忽略
不处理此字段
tool_choice
字符串格式
完全支持
tool_choice
对象格式
完全支持
tool_choice.disable_parallel_tool_use
忽略
不处理此字段
tool_choice
字段
支持状态
none
完全支持
auto
完全支持
any
完全支持
tool
完全支持
disable_parallel_tool_use
忽略
消息字段支持
字段类型
变体
子字段
支持状态
content
string
-
完全支持
content
array, type="text"
text
完全支持
content
array, type="text"
cache_control
忽略
content
array, type="text"
citations
忽略
content
array, type="image"
-
部分模型支持。
具体参见各模型调用指南。
content
array, type="document"
-
不支持
content
array, type="search_result"
-
不支持
content
array, type="thinking"
-
忽略
content
array, type="redacted_thinking"
-
不支持
content
array, type="tool_use"
id
完全支持
content
array, type="tool_use"
input
完全支持
content
array, type="tool_use"
name
完全支持
content
array, type="tool_use"
cache_control
忽略
content
array, type="tool_result"
tool_use_id
完全支持
content
array, type="tool_result"
content
完全支持
content
array, type="tool_result"
cache_control
忽略
content
array, type="tool_result"
is_error
忽略
注意：
1. 忽略的字段：某些 Anthropic 特有的字段会被忽略，但不会报错。
2. 工具并行调用：disable_parallel_tool_use 参数被忽略。
3. 缓存控制：所有 cache_control 相关字段都被忽略。
示例代码
说明：
请将 YOUR_API_KEY 替换为您创建的 API Key，并将 model 替换为您需要体验的服务 ID。
cURL
Python
Node.js
Java
Go
curl https://tokenhub.tencentmaas.com/v1/messages \\
  -H "Content-Type: application/json" \\
  -H "x-api-key: YOUR_API_KEY" \\
  -d '{
    "model": "deepseek-v3.2",
    "max_tokens": 1000,
    "stream": true,
    "system": [
      {"type": "text", "text": "You are a helpful assistant."}
    ],
    "messages": [
      {"role": "user", "content": [{"type": "text", "text": "Hi, how are you?"}]}
    ]
  }'
import anthropic
﻿
client = anthropic.Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub.tencentmaas.com",
)
﻿
with client.messages.stream(
    model="deepseek-v3.2",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hi, how are you?"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
import Anthropic from '@anthropic-ai/sdk';
﻿
const client = new Anthropic({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub.tencentmaas.com',
});
﻿
const stream = await client.messages.stream({
  model: 'deepseek-v3.2',
  max_tokens: 1000,
  system: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'Hi, how are you?' }],
});
﻿
for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    process.stdout.write(event.delta.text);
  }
}
import okhttp3.*;
import okhttp3.sse.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class AnthropicCall {
    public static void main(String[] args) {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "deepseek-v3.2");
        body.put("max_tokens", 1000);
        body.put("stream", true);
        body.put("system", List.of(Map.of("type", "text", "text", "You are a helpful assistant.")));
        body.put("messages", List.of(Map.of(
            "role", "user",
            "content", List.of(Map.of("type", "text", "text", "Hi, how are you?"))
        )));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub.tencentmaas.com/v1/messages")
            .header("x-api-key", "YOUR_API_KEY")
            .header("Content-Type", "application/json")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        EventSources.createFactory(new OkHttpClient()).newEventSource(request,
            new EventSourceListener() {
                @Override public void onEvent(EventSource es, String id, String type, String data) {
                    System.out.println(data);
                }
            });
    }
}
package main
﻿
import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model":      "deepseek-v3.2",
        "max_tokens": 1000,
        "stream":     true,
        "system": []map[string]string{
            {"type": "text", "text": "You are a helpful assistant."},
        },
        "messages": []map[string]interface{}{
            {
                "role": "user",
                "content": []map[string]string{
                    {"type": "text", "text": "Hi, how are you?"},
                },
            },
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub.tencentmaas.com/v1/messages",
        bytes.NewBuffer(body))
    req.Header.Set("x-api-key", "YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") {
            fmt.Println(strings.TrimPrefix(line, "data: "))
        }
    }
}
示例返回：
data: {"content_block":{"text":"","type":"text"},"index":1,"type":"content_block_start"}
﻿
event: content_block_delta
data: {"delta":{"text":"Hey","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":"! I'm doing well, thanks for asking! I'm","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":" here and ready to help with whatever you need.","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":" How are you doing today? Is there something I","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":" can assist you with?","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_stop
data: {"index":1,"type":"content_block_stop"}
﻿
event: message_delta
data: {"delta":{"stop_reason":"end_turn","stop_sequence":null},"type":"message_delta","usage":{"output_tokens":57}}
﻿
event: message_stop
data: {"type":"message_stop"}
返回参数
Anthropic Messages API 在响应的 usage 对象中返回本次请求的 Token 消耗明细（也是 TokenHub 的计费依据）。下表说明各字段的含义。
字段
类型
说明
input_tokens
Integer
本次请求的输入（prompt）Token 数。该值不包含命中缓存的部分，命中缓存的 Token 单列在 cache_read_input_tokens 中。
output_tokens
Integer
模型生成的输出 Token 数。思考类模型的推理过程（thinking）Token 也计入此项。
cache_read_input_tokens
Integer
命中缓存（cached）的输入 Token 数。该部分已从 input_tokens 中独立拆出。
cache_creation_input_tokens
Integer
写入缓存（cached write）的输入 Token 数。TokenHub 自动管理缓存，该字段通常为 0（Anthropic 协议下的 cache_control 字段会被忽略）。
非流式调用时，usage 对象示例如下：
{
    "id": "msg_013Zva2CMHLNnXjNJJKqJ2EF",
    "type": "message",
    "role": "assistant",
    "model": "deepseek-v3.2",
    "content": [
        {"type": "text", "text": "Hello! How can I help you today?"}
    ],
    "stop_reason": "end_turn",
    "usage": {
        "input_tokens": 25,
        "output_tokens": 13,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
    }
}
流式调用说明：开启 stream 后，usage 字段会拆分到多个事件中返回——input_tokens 在首个 message_start 事件中给出，output_tokens 在最后的 message_delta 事件中给出（参见上方流式示例返回）。
将模型接入 Claude Code
安装 Claude Code
安装或更新 Anthropic Claude Code，运行以下命令：
npm install -g @anthropic-ai/claude-code
配置环境变量
export ANTHROPIC_BASE_URL=https://tokenhub.tencentmaas.com
export ANTHROPIC_AUTH_TOKEN=${API_KEY}
export API_TIMEOUT_MS=600000
export ANTHROPIC_MODEL=${MODEL_NAME}
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
注意：
设置 API_TIMEOUT_MS 是为了防止输出过长，触发 Claude Code 客户端超时，这里设置的超时时间为 10 分钟，使用者可自行设置。
执行 claude 命令
进入项目目录，执行 claude 命令，即可开始使用。
cd my-project
claude
﻿

字段	类型	说明
`input_tokens`	Integer	本次请求的输入（prompt）Token 数。该值不包含命中缓存的部分，命中缓存的 Token 单列在 `cache_read_input_tokens` 中。
`output_tokens`	Integer	模型生成的输出 Token 数。思考类模型的推理过程（thinking）Token 也计入此项。
`cache_read_input_tokens`	Integer	命中缓存（cached）的输入 Token 数。该部分已从 `input_tokens` 中独立拆出。
`cache_creation_input_tokens`	Integer	写入缓存（cached write）的输入 Token 数。TokenHub 自动管理缓存，该字段通常为 `0`（Anthropic 协议下的 `cache_control` 字段会被忽略）。

模型名称	model 参数值	OpenAI Chat Completions	OpenAI Responses	Anthropic
Hy3	`hy3`	✅	✅	✅
Hy3 preview	`hy3-preview`	✅	✅	✅
Hy-MT2-Pro	`hy-mt2-pro`	✅	❌	✅
Hy-MT2-Plus	`hy-mt2-plus`	✅	❌	✅
Hy-MT2-Lite	`hy-mt2-lite`	✅	❌	✅
Hy-Role-Latest	`hunyuan-role-latest`	✅	❌	✅
Hy-Role	`hy-role`	✅	❌	✅
DeepSeek-V4-Flash 原厂直供	`deepseek-v4-flash-202605`	✅	兼容支持*	✅
DeepSeek-V4-Pro 原厂直供	`deepseek-v4-pro-202606`	✅	兼容支持*	✅
DeepSeek-V4-Flash	`deepseek-v4-flash`	✅	兼容支持*	✅
DeepSeek-V4-Pro	`deepseek-v4-pro`	✅	兼容支持*	✅
Deepseek-v3.2	`deepseek-v3.2`	✅	❌	✅
GLM-5.2	`glm-5.2`	✅	兼容支持*	✅
GLM-5.1	`glm-5.1`	✅	兼容支持*	✅
GLM-5V-Turbo	`glm-5v-turbo`	✅	❌	✅
GLM-5-Turbo	`glm-5-turbo`	✅	❌	✅
GLM-5	`glm-5`	✅	❌	✅
Kimi K3	`kimi-k3`	✅	兼容支持*	✅
Kimi K2.7 Code HighSpeed	`kimi-k2.7-code-highspeed`	✅	兼容支持*	✅
Kimi K2.7 Code	`kimi-k2.7-code`	✅	兼容支持*	✅
Kimi-K2.6	`kimi-k2.6`	✅	兼容支持*	✅
Kimi-K2.5	`kimi-k2.5`	✅	❌	✅
MiniMax-M3	`minimax-m3`	✅	✅	✅
MiniMax-M2.7	`minimax-m2.7`	✅	✅	✅
MiniMax-M2.5	`minimax-m2.5`	✅	✅	✅
Qwen3.5-Flash	`qwen3.5-flash`	✅	✅	✅
Qwen3.5-Plus	`qwen3.5-plus`	✅	✅	✅

参数名	必选	类型	描述
`model`	是	String	服务 ID。对于平台默认创建的服务，服务 ID 与模型名称相同（例如 `hy3`、`deepseek-v3.2`），完整列表请参见模型支持的协议概览中的 model 参数值列。对于用户创建的自定义服务，服务 ID 格式为 `ep-xxxxxxxx`，可在在线推理服务页面查看。
`messages`	是	Array	聊天上下文消息数组，详细信息请参见 messages 参数说明。
`stream`	否	Boolean	是否启用流式输出。取值范围：`true` / `false`，默认值为 `false`。
`stream_options`	否	Object	流式输出选项。常用：`{"include_usage": true}` 让最后一个 chunk 携带 `usage` 统计字段（仅在 `stream=true` 时有效）。
`temperature`	否	Float	采样温度，控制输出随机性。取值范围：`[0.0, 2.0]`，默认值为 `1.0`。值越高输出越随机。部分模型有特殊取值约束，请参见对应模型的专用文档。
`top_p`	否	Float	核采样（Nucleus Sampling）概率阈值。取值范围：`[0.0, 1.0]`，默认值为 `1.0`。建议与 `temperature` 二选一使用。
`max_tokens`	否	Integer	限制单次响应最大输出 Token 数。思考类模型的推理 Token 与回答 Token 共享此额度，建议适当调大。
`n`	否	Integer	为同一次请求生成的候选回复数量，默认 `1`。注意：`n > 1` 时按总 Token 量计费。
`stop`	否	String 或 Array of String	指定模型输出的停止序列。当生成结果命中任一指定序列时，模型将停止输出，且响应内容中不包含该停止序列。支持传入单个字符串或字符串数组，最多 4 个。例如：让模型生成一个 10 条的清单，不希望它继续往下写第 11 条，此处可填写为：`["11."]`。
`seed`	否	Integer	随机种子，用于结果复现。在多次请求中使用相同的 `seed` 值，并且其他参数也保持一致时，模型更有可能返回一致或非常接近的结果。
`frequency_penalty`	否	Float	频率惩罚。范围 `[-2.0, 2.0]`，默认 `0`。正值会降低已频繁出现的 Token 被再次选中的概率，可缓解重复内容。
`presence_penalty`	否	Float	存在惩罚。范围 `[-2.0, 2.0]`，默认 `0`。正值会鼓励模型谈论新话题（只看 Token 是否出现过，不看次数）。
`logit_bias`	否	Map	修改特定 Token 出现在结果中的概率。键为 Token ID，值为 `[-100, 100]` 的偏置；`-100` 表示禁用该 Token，`100` 表示强制使用。
`logprobs`	否	Boolean	是否返回输出 Token 的对数概率，默认 `false`。
`top_logprobs`	否	Integer	每个位置返回概率最高的 N 个 Token，取值 `[0, 20]`。需要同时设置 `logprobs=true`。
`response_format`	否	Object	指定响应输出格式。常用： `{"type": "text"}`：默认文本输出。 `{"type": "json_object"}`：JSON 模式，强制输出合法 JSON。 `{"type": "json_schema", "json_schema": {...}}`：结构化输出，按指定 Schema 约束。
`tools`	否	Array	Function Calling 工具定义列表。每个工具包含 `type: "function"` 与 `function` 对象（含 `name` / `description` / `parameters`）。
`tool_choice`	否	String 或 Object	工具调用策略： `"none"`：禁止调用工具。 `"auto"`：自动判断是否调用（默认）。 `"required"`：强制调用任意工具。 `{"type": "function", "function": {"name": "xxx"}}`：强制调用指定工具。
`parallel_tool_calls`	否	Boolean	是否允许在一次响应中并行调用多个工具，默认 `true`。设为 `false` 强制工具串行调用，便于调试。
`thinking`	否	Object	思考模式控制，不同模型默认值有所不同。详细信息请参见深度思考。取值范围：`{"type": "enabled"}` / `{"type": "disabled"}`。
`reasoning_effort`	否	String	推理深度控制，仅对思考类模型生效，不同模型默认值有所不同。详细信息请参见深度思考。取值范围：`low` / `medium` / `high`。
`user`	否	String	终端用户的稳定标识符，便于审计与异常排查。

字段	支持状态	说明
anthropic-beta	忽略	不处理此头部
anthropic-version	忽略	不处理此头部
x-api-key	完全支持	用于身份验证

字段	支持状态
none	完全支持
auto	完全支持
any	完全支持
tool	完全支持
disable_parallel_tool_use	忽略

字段类型	变体	子字段	支持状态
content	string	-	完全支持
content	array, type="text"	text	完全支持
content	array, type="text"	cache_control	忽略
content	array, type="text"	citations	忽略
content	array, type="image"	-	部分模型支持。具体参见各模型调用指南。
content	array, type="document"	-	不支持
content	array, type="search_result"	-	不支持
content	array, type="thinking"	-	忽略
content	array, type="redacted_thinking"	-	不支持
content	array, type="tool_use"	id	完全支持
content	array, type="tool_use"	input	完全支持
content	array, type="tool_use"	name	完全支持
content	array, type="tool_use"	cache_control	忽略
content	array, type="tool_result"	tool_use_id	完全支持
content	array, type="tool_result"	content	完全支持
content	array, type="tool_result"	cache_control	忽略
content	array, type="tool_result"	is_error	忽略

语言模型调用概览

本页目录：

概述

模型支持的协议概览

OpenAI API 使用

BaseURL

请求参数

messages 参数说明

返回参数

choices 数组元素

usage 对象

示例代码

示例：基础对话

示例：流式输出

示例：System Prompt

示例：多轮对话

示例：Function Calling（工具调用）

Anthropic API 使用

BaseUrl

HTTP Headers

请求参数

工具支持

tools

tool_choice

消息字段支持

示例代码

返回参数

将模型接入 Claude Code

安装 Claude Code

配置环境变量

执行 claude 命令