我们首先来为 Demo 创建一个可用的环境。
# 创建环境
conda create -n demo python=3.10 -y
# 激活环境
conda activate demo
# 安装 torch
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# 安装其他依赖
pip install transformers==4.38
pip install sentencepiece==0.1.99
pip install einops==0.8.0
pip install protobuf==5.27.2
pip install accelerate==0.33.0
pip install streamlit==1.37.0
首先,我们创建一个目录,用于存放我们的代码。并创建一个 cli_demo.py
。
mkdir -p /root/demo
touch /root/demo/cli_demo.py
然后,我们将下面的代码复制到 cli_demo.py
中。
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name_or_path = "/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, device_map='cuda:0')
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='cuda:0')
model = model.eval()
system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
"""
messages = [(system_prompt, '')]
print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")
while True:
input_text = input("\nUser >>> ")
input_text = input_text.replace(' ', '')
if input_text == "exit":
break
length = 0
for response, _ in model.stream_chat(tokenizer, input_text, messages):
if response is not None:
print(response[length:], flush=True, end="")
length = len(response)
接下来,我们便可以通过 python /root/demo/cli_demo.py
来启动我们的 Demo。
效果如下图所示:
在本节中,我们将演示如何使用 Streamlit 部署 InternLM2-Chat-1.8B 模型。
我们执行如下代码来把本教程仓库 clone 到本地,以执行后续的代码。
cd /root/demo
git clone https://github.com/InternLM/Tutorial.git
然后,我们执行如下代码来启动一个 Streamlit 服务。
cd /root/demo
streamlit run /root/demo/Tutorial/tools/streamlit_demo.py --server.address 127.0.0.1 --server.port 6006
接下来,我们在本地的 PowerShell 中输入以下命令,将端口映射到本地。
ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p 你的 ssh 端口号
然后将 SSH 密码复制并粘贴到 PowerShell 中,回车,即可完成端口映射。正常情况下,我们会看到如下输出:
在完成端口映射后,我们便可以通过浏览器访问 http://localhost:6006
来启动我们的 Demo。
效果如下图所示:
如果遇到了 OSError: [Errno 28] inotify watch limit reached
的问题,请不要慌张。稍等一段时间后重新执行即可。
InternLM-XComposer2 是一款基于 InternLM2 的视觉语言大模型,其擅长自由形式的文本图像合成和理解。其主要特点包括:
LMDeploy 是一个用于压缩、部署和服务 LLM 的工具包,由 MMRazor 和 MMDeploy 团队开发。它具有以下核心功能:
LMDeploy 已经支持了 InternLM-XComposer2 系列的部署,但值得注意的是 LMDeploy 仅支持了 InternLM-XComposer2 系列模型的视觉对话功能。
接下来,我们将演示如何使用 LMDeploy 部署 InternLM-XComposer2-VL-1.8B 模型。
首先,我们激活环境并安装 LMDeploy 以及其他依赖。
conda activate demo
pip install lmdeploy[all]==0.5.1
pip install timm==1.0.7
接下来,我们使用 LMDeploy 启动一个与 InternLM-XComposer2-VL-1.8B 模型交互的 Gradio 服务。
lmdeploy serve gradio /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b --cache-max-entry-count 0.1
在使用 Upload Image 上传图片后,我们输入 Instruction 后按下回车,便可以看到模型的输出。
InternVL2 是上海人工智能实验室推出的新一代视觉-语言多模态大模型,是首个综合性能媲美国际闭源商业模型的开源多模态大模型。InternVL2 系列从千亿大模型到端侧小模型全覆盖,通专融合,支持多种模态。
LMDeploy 也已经支持了 InternVL2 系列模型的部署,让我们一起来使用 LMDeploy 部署 InternVL2-2B 模型。
我们可以通过下面的命令来启动 InternVL2-2B 模型的 Gradio 服务。
conda activate demo
lmdeploy serve gradio /share/new_models/OpenGVLab/InternVL2-2B --cache-max-entry-count 0.1
在完成端口映射后,我们便可以通过浏览器访问 http://localhost:6006
来启动我们的 Demo。
在使用 Upload Image 上传图片后,我们输入 Instruction 后按下回车,便可以看到模型的输出。
提示词 请基于真实历史和事实,创作一个800字左右包含智慧,启迪人心的故事,并且最好能着重说明如何启发人们
故事不错,就是启动和回复的太慢了。
回答全文:
在远古时期,人类社会经历了漫长而艰难的探索与进化,从狩猎采集到农业文明,再到工业革命,科技的飞速发展改变了人们的生活方式和社会结构。然而,在这个过程中,我们不断追求智慧与进步的同时,也必须警惕其可能带来的负面影响。 曾经有一位名叫“智慧老人”的智者,他拥有着丰富的智慧和经验,对于人类社会的演变有着深刻的理解。在人类历史的某个时刻,智慧老人发现,虽然科技进步带来了许多便利和进步,但同时也加剧了社会的分化和贫富差距。 智慧老人决定提出一个名为“智慧之河”的概念,希望通过智慧的力量,引导人类走向更加平衡和和谐的未来。他希望通过教育、文化传承和社区建设,让每个人都能够从中受益,并成为推动社会进步的力量。 智慧老人提出了以下智慧之河的三个阶段:
智慧老人的智慧之河不仅是一种理念的表达,更是一种行动的指南。他希望通过自己的智慧和行动,激发人们的内在动力,让每个人都能在智慧的指引下,为构建一个更加美好、公正和繁荣的社会而努力。 在实践过程中,智慧老人不仅以身作则,积极参与社区建设,还通过各种方式传播他的理念,如举办讲座、组织培训和参与社区活动。他鼓励人们在日常生活中实践智慧之河的理念,比如通过志愿服务、文化交流和环保行动,来推动社会的和谐发展。 智慧老人的智慧之河不仅是一种愿景,更是一种行动指南,它提醒我们,在追求进步的同时,我们必须时刻警惕智慧的过度应用可能带来的负面影响,并努力寻求平衡,以实现人类社会的持续进步和繁荣。
(demo) root@intern-studio-50001213:~/demo# lmdeploy serve gradio /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b --cache-max-entry-count 0.1
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
2024-08-08 22:21:26,969 - lmdeploy - INFO - matching vision model: Xcomposer2VisionModel
Set max length to 4096
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
2024-08-08 22:21:44,404 - lmdeploy - INFO - matching type of ModelType.XCOMPOSER2
2024-08-08 22:22:19,666 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=8192, max_batch_size=128, cache_max_entry_count=0.1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-08-08 22:22:19,667 - lmdeploy - INFO - input chat_template_config=ChatTemplateConfig(model_name=None, system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-08-08 22:22:19,850 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='internlm-xcomposer2', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-08-08 22:22:19,850 - lmdeploy - INFO - model_source: hf_model
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
Could not locate the ixc_utils.py inside /share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-1_8b.
2024-08-08 22:22:25,044 - lmdeploy - INFO - model_config:
[llama]
model_name = internlm-xcomposer2
model_arch = InternLMXComposer2ForCausalLM
tensor_para_size = 1
head_num = 16
kv_head_num = 8
vocab_size = 92544
num_layer = 24
inter_size = 8192
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
session_len = 8192
weight_type = bf16
rotary_embedding = 128
rope_theta = 1000000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.1
cache_block_seq_len = 64
cache_chunk_size = -1
enable_prefix_caching = False
num_tokens_per_iter = 8192
max_prefill_iters = 1
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 32768
rope_scaling_factor = 2.0
use_dynamic_ntk = 1
use_logn_attn = 0
lora_policy = plora
lora_r = 256
lora_scale = 1.0
lora_max_wo_r = 256
lora_rank_pattern =
lora_scale_pattern =
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` = 8192.
2024-08-08 22:22:25,502 - lmdeploy - WARNING - get 411 model params
2024-08-08 22:22:54,298 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=8192, max_batch_size=128, cache_max_entry_count=0.1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] [BlockManager] block_size = 6 MB
[TM][INFO] [BlockManager] max_block_count = 48
[TM][INFO] [BlockManager] chunk_size = 48
[TM][WARNING] No enough blocks for `session_len` (8192), `session_len` truncated to 3072.
[TM][INFO] LlamaBatch<T>::Start()
Running on local URL: http://0.0.0.0:6006
Could not create share link. Missing file: /root/.conda/envs/demo/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2.
Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps:
1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
2. Rename the downloaded file to: frpc_linux_amd64_v0.2
3. Move the file to this location: /root/.conda/envs/demo/lib/python3.10/site-packages/gradio
2024-08-08 22:24:58,737 - lmdeploy - INFO - prompt: ('图中有神马?', [<PIL.Image.Image image mode=RGB size=768x1152 at 0x7F950D92A3B0>])
2024-08-08 22:24:58,737 - lmdeploy - WARNING - Can not found event loop in current thread. Create a new event loop.
2024-08-08 22:24:58,737 - lmdeploy - WARNING - auto append <IMAGE_TOKEN> at the beginning, the user can manually insert the token to prompt
2024-08-08 22:24:58,738 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-08-08 22:24:58,738 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-08-08 22:24:58,738 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-08-08 22:25:03,816 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 5.079s
2024-08-08 22:25:03,817 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-08-08 22:25:03,819 - lmdeploy - INFO - preprocess cost 5.082s
2024-08-08 22:25:03,819 - lmdeploy - INFO - input_ids: [1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 4452, 285, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 313, 505, 395, 7445, 17218, 2881, 7659, 1813, 4287, 1762, 560, 505, 8020, 684, 36956, 15358, 31288, 451, 68589, 76659, 71581, 699, 1226, 505, 6342, 442, 517, 11100, 328, 10894, 328, 454, 51978, 756, 285, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 313, 777, 3696, 454, 19187, 19829, 4563, 435, 410, 4287, 12032, 684, 410, 1341, 1893, 569, 6519, 454, 262, 69093, 756, 285, 4576, 11146, 30778, 1234, 20248, 451, 62442, 60752, 60721, 61255, 61104, 313, 505, 13026, 446, 12824, 2613, 454, 27943, 15613, 14644, 13585, 3285, 519, 410, 4054, 2321, 281, 92542, 364, 92543, 1008, 364, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60472, 70635, 60780, 60832, 60504, 92542, 364, 92543, 525, 11353, 364]
2024-08-08 22:25:03,820 - lmdeploy - INFO - Register stream callback for 0
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 0 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1373, max_q = 1373, max_k = 1373
# 这里卡住,很长时间不出结果。。。
图中有神马?
卡这出不来结果了
重试了几次,核崩溃了
描述一下这个图片,并数数有几个人?
后台等了三分多钟还是没有结果,我也不太看明白log 啥意思
大模型实战营 地址