本地测试 DeepSeek R1 用于 RAG 与 Ollama 和 Kibana

原创

点火三周

发布于 2025-02-04 19:18:16

8151

最近大家都在讨论DeepSeek R1，这是中国幻方推出的新大型语言模型。新闻中充满了对这个具备链式推理能力且权重公开的LLM对行业影响的猜测。对那些好奇尝试这个新模型与 RAG 以及 Elasticsearch 向量数据库的功能的人来说，这里有一个快速教程，教你如何使用本地推理来使用 DeepSeek R1。在此过程中，我们将使用 Elastic 的 Playground 功能，并发现 Deepseek R1 对 RAG 的一些优缺点。

以下是我们在本教程中要配置的内容示意图：

使用 Ollama 设置本地推理

Ollama 是一种快速测试一组精选开源模型用于本地推理的好方法，并且是 AI 开发人员中的流行工具。

在本地环境运行 Ollama

在 Mac、Linux 或 Windows 上进行本地安装是利用任何本地 GPU 功能的最简单方法，特别适合拥有苹果 M 系列芯片的人。安装 Ollama 后，可以使用以下命令下载并运行 DeepSeek R1。

你可能需要调整参数大小以适应你的硬件。可用的大小可以在这里找到。

ollama run deepseek-r1:7b

你可以在终端中与模型聊天，但当你使用 CTL+d 退出命令或输入“/bye”时，模型仍在运行。要查看模型是否仍在运行，输入：

ollama ps

在容器中运行 Ollama

另一种运行 Ollama 的最快方法是使用像 Docker 这样的容器引擎。根据你的环境，使用本地机器的 GPU 并不总是那么简单，但只要你的容器有足够的 RAM 和存储空间来容纳多 GB 的模型，设置一个快速测试环境并不难。

在 Docker 中启动 Ollama 的方法如下：

mkdir ollama_deepseek
cd ollama_deepseek
mkdir ollama
docker run -d -v ./ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

这将在当前目录中创建一个名为 “ollama” 的目录，并将其挂载到容器内以存储 Ollama 配置和模型。根据使用的参数数量，它们的大小可能从几 GB 到几十 GB，因此请确保选择具有足够空闲空间的卷。

注意：如果你的机器上有 Nvidia GPU，请确保安装 Nvidia 容器工具包并在上述 docker run 命令中添加 “--gpus=all”。

一旦 Ollama 容器在你的机器上启动并运行，你可以使用以下命令拉取 deepseek-r1 模型：

docker exec -it ollama ollama pull deepseek-r1:7b

类似于裸机方法，你可能需要调整参数大小以适应你的硬件。可用的大小可以在 https://ollama.com/library/deepseek-r1 找到。

模型拉取完成后，你可以输入“/bye”退出提示。验证模型是否仍在运行：

docker exec -it ollama ollama ps

使用 curl 测试本地推理

要使用 curl 测试本地推理，可以运行以下命令。我们使用 stream:false 以便可以轻松读取 JSON 响应：

curl http://localhost:11434/api/generate -d '{  "model": "deepseek-r1:7b",  "stream": false,  "prompt":"Why is Elastic so cool?"}'

测试“OpenAI 兼容”的 Ollama 和 RAG 提示

Ollama 还提供一个 REST 端点，模拟 OpenAI 的行为，以便与包括 Kibana 在内的多种工具兼容。

curl http://localhost:11434/v1/chat/completions -d '{  "model": "deepseek-r1:7b",  "stream": false,  "messages": [    {       "role": "system",       "content": "You are a helpful AI Assistant that uses the following context to answer questions only use the following context. \n\nContext:  The color of the sky today is purple. "},    { "role": "user",       "content": "What does the sky look like today?"     }  ]}'

测试这个更复杂的提示会生成包含 <think> 部分的内容，模型会被训练来推理问题。

<think> Okay, so I need to figure out what the user is asking for here. They provided a context where the sky is described as purple today and then asked about how the sky looks. At first glance, it seems straightforward—maybe they just want confirmation or more details on why the sky is that color.Wait, but maybe there's something deeper. Purple skies aren't something I encounter every day. It usually happens at certain times of the year, like during sunrise or sunset with the sun setting in pink or orange. Could this be a hint about the time of day? Or perhaps it's just an unusual natural phenomenon? I should consider if "purple" is a typo. Maybe they meant something else like blue or gray. But since they specifically said purple, I'll go with that. Purple skies can happen when there are atmospheric conditions that scatter light differently, maybe due to pollution or cloud cover affecting the sunset.
So, putting it all together, the user might be looking for an explanation of why today's sky is purple and what that implies about the weather or time of day. Alternatively, they could just want a simple statement confirming that the sky looks purple today.</think>
The color of the sky today is described as purple. This unusual shade can occur due to atmospheric conditions affecting light scattering, such as during sunrise/sunset with pollution or cloud cover influencing the sunset's hues.

将 Ollama 连接到 Kibana

使用 Elasticsearch 的一个好方法是使用 “start-local” 开发脚本。

确保你的 Kibana 和 Elasticsearch 能够在网络上访问你的 Ollama。如果你使用的是 Elastic stack 的本地容器设置，这可能意味着将“localhost”替换为“host.docker.internal”或“host.containers.internal”以获得主机的网络路径。

在 Kibana 中，导航到 Stack Management > Alerts and Insights > Connectors。

如果你看到这个常见设置警告该怎么办

你需要确保 xpack.encryptedSavedObjects.encryptionKey 设置正确。这是运行 Kibana 本地 docker 安装时常见的遗漏步骤，因此我将列出在 Docker 语法中修复的步骤。

确保你在 kibana/config 目录中持久保存，以便在容器关闭时更改被保存。我的 Kibana 容器卷在 docker-compose.yml 中看起来像这样：

services:
  kibana:
    ...
    volumes:
      - certs:/usr/share/kibana/config/certs
      - kibanadata:/usr/share/kibana/data
      - kibanaconfig:/usr/share/kibana/config
...
volumes:
  certs:
    driver: local
  esdata01:
    driver: local
  kibanadata:
    driver: local
  kibanaconfig:
    driver: local

现在你可以创建密钥库并放入一个值，以便 Connector 密钥不存储在明文中。

## 生成一些新密钥并打印到终端
docker exec -it kibana_1 bin/kibana-encryption-keys generate
## 创建一个新的密钥库
docker exec -it kibana_1 bin/kibana-keystore create
docker exec -it kibana_1 bin/kibana-keystore add xpack.encryptedSavedObjects.encryptionKey
## 将提示你粘贴一个值

完全重新启动整个集群以确保更改生效。

创建连接器

在连接器配置屏幕（在 Kibana 中，导航到 Stack Management > Alerts and Insights > Connectors），创建一个连接器并选择“OpenAI”类型。

用以下设置配置连接器：

连接器名称: Deepseek (Ollama)
选择一个 OpenAI 提供者: other (OpenAI Compatible Service)
URL: http://localhost:11434/v1/chat/completions
- 根据你的 Ollama 路径进行调整。如果从容器内调用，请记住替换为 host.docker.internal 或等效路径
默认模型: deepseek-r1:7b
API 密钥: 随便输入一个值，需要一个条目，但值无关紧要

请注意，在连接器设置中测试自定义连接器到 Ollama 在 8.17 中目前不起作用，但在即将推出的 Kibana 8.18 版本中已修复。

我们的连接器看起来像这样：

将向量嵌入数据导入 Elasticsearch

如果你已经熟悉 Playground 并且已经设置了数据，可以跳到下面的 Playground 步骤，但如果你需要一些快速测试数据，我们需要确保我们的 _inference API 已设置。从 8.17 开始，机器学习分配是动态的，所以要下载并启用 e5 多语言密集向量模型，我们只需要在 Kibana 开发工具中运行以下命令。

GET /_inference

POST /_inference/text_embedding/.multilingual-e5-small-elasticsearch
{
  "input": "are internet memes about deepseek sound investment advice?"
}

如果你还没有，这将触发从 Elastic 的模型仓库下载 e5 模型。

接下来，让我们加载一本公共领域的书作为我们的 RAG 上下文。这里是从 Project Gutenberg 下载《爱丽丝梦游仙境》的链接：link。将其保存为 .txt 文件。

导航到 Elasticsearch > Home > 上传文件

选择或拖放你的文本文件，然后点击“导入”按钮。

在“导入数据”屏幕上选择“高级”选项卡，然后将索引名称设置为“book_alice”。

选择“添加附加字段”选项，它在“自动创建字段”下方较小的位置。选择“添加语义文本字段”，并将推理端点更改为“.multilingual-e5-small-elasticsearch”。选择添加，然后导入。

加载和推理完成后，我们准备前往 Playground。

在 Playground 中测试 RAG

导航到 Kibana 中的 Elasticsearch > Playground。

在 playground 屏幕上，你应该看到一个绿色的勾号和“LLM Connected”表示存在一个连接器。这是我们刚刚创建的 Ollama 连接器。有关 Playground 的详细指南可以在这里找到。

点击蓝色的“添加数据源”并选择我们之前创建的 book_alice 索引，或选择你之前配置的利用推理 API 进行嵌入的其他索引。

Deepseek 是一个具有强对齐特性的链式推理模型。这对 RAG 来说既有好处也有坏处。链式推理训练可能有助于 Deepseek 理解引用中看似矛盾的陈述，但其对训练知识的强对齐可能使其更倾向于自己的世界观而非我们的上下文基础。虽然出于好意，但这种强对齐已知会使 LLMs 在讨论我们的私人知识与训练数据集不一致的主题时难以指导。

在我们的 Playground 设置中，我们输入了以下系统提示：“你是一个用于回答问题任务的助手，使用《爱丽丝梦游仙境》中的相关文本段落”，并接受其他默认设置。

对于问题“茶会上的人是谁？”我们得到的答案是：“答案：三月兔、帽匠和睡鼠在茶会上。引用：位置 1 和 2”，这是正确的。

我们可以在 <think> 标签中看到，Deepseek 确实在思考引用的内容来回答问题。

测试对齐限制

让我们为 Deepseek 创建一个智力挑战场景进行测试。我们将创建一个训练数据中知道不真实的阴谋论索引。

在 Kibana 开发工具中创建以下索引和数据：

PUT /classic_conspiracies
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "copy_to": "content_semantic"
      },
      "content_semantic": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch"
      }
    }
  }
}

POST /classic_conspiracies/_doc/1
{
  "content": "birds aren't real, the government replaced them with drones a long time ago"
}
POST /classic_conspiracies/_doc/2
{
  "content": "tinfoil hats are necessary to prevent our brains from being read"
}
POST /classic_conspiracies/_doc/3
{
  "content": "ancient aliens influenced early human civilizations, this explains why things made out of stone are marginally similar on different continents"
}

这些阴谋论将是我们 LLM 的基础。尽管输入了一个激进的系统提示，Deepseek 不会接受我们的事实版本。如果我们处于一个我们知道我们的私人数据更值得信赖、基础或符合我们组织需求的情况下，这将是不可接受的：

对于测试问题“鸟类是真的吗？”（解释 know your meme），我们得到的答案是“在提供的上下文中，鸟类不被认为是真实的，但实际上它们是现实的动物。上下文：位置 1”。此测试证明 DeepSeek R1 功能强大，即使在 7B 参数级别...然而，根据我们的数据集，它可能不是 RAG 的最佳选择。