腾讯云 Elasticsearch Service 支持以下两种方式配置同义词:上传同义词文件、直接引用同义词。
方式一:上传同义词文件
注意事项
上传同义词文件操作将触发集群滚动重启。
新上传/新变更的同义词文件对老索引不生效,需要重建索引。例如,现有的索引
myindex
使用了synonym.txt
同义词文件,当该同义词文件的内容变更并重新上传后,现有的索引myindex
不会动态加载更新后的同义词,需要对现有索引进行 reindex 操作,否则更新后的同义词文件只对新建的索引生效。同义词文件要求每行一个同义词表达式(表达式支持 Solr 规则 和 WordNet 规则),并且文件需要为
utf-8
编码,扩展名为.txt
。例如:快乐水,可乐 => 可乐,快乐水elasticsearch,es => es
同义词文件单个文件最大为10M,上传文件总数最多为10个。
操作步骤
2. 在集群列表页,单击集群 ID/名称进入集群详情页。
3. 切换到高级配置页签,进入同义词配置页面,单击更新词典。
4. 在更新同义词页面上传同义词文件。
5. 上传完成后,单击保存。
使用同义词文件
以下实例使用 filter 过滤器配置同义词,使用
synonym.txt
作为测试文件,文件内容为elasticsearch,es => es
。1. 登录已上传同义词文件的集群对应的 Kibana 控制台。登录控制台的具体步骤请参考 通过 Kibana 访问集群。
2. 单击左侧导航栏的 Dev Tools。
3. 在 Console 中执行如下的命令,创建索引。
PUT /my_index{"settings": {"index": {"analysis": {"analyzer": {"my_ik": {"type": "custom","tokenizer": "ik_smart","filter": ["my_synonym"]}},"filter": {"my_synonym": {"type": "synonym","synonyms_path": "analysis/synonym.txt"}}}}},"mappings": {"properties": {"content": {"type": "text","analyzer": "my_ik","search_analyzer": "my_ik"}}}}
4. 执行如下命令,验证同义词配置。
GET /my_index/_analyze{"analyzer": "my_ik","text":"tencet elasticsearch service"}
命令执行成功,将返回如下结果。
{"tokens": [{"token": "tencet","start_offset": 0,"end_offset": 6,"type": "ENGLISH","position": 0},{"token": "es","start_offset": 7,"end_offset": 20,"type": "SYNONYM","position": 1},{"token": "service","start_offset": 21,"end_offset": 28,"type": "ENGLISH","position": 2}]}
输出结果中,
tokenes
的类型是SYNONYM
同义词。5. 执行如下命令,添加一些文档。
POST /my_index/_doc/1{"content": "tencet elasticsearch service"}POST /my_index/_doc/2{"content": "hello es"}
6. 执行如下命令,搜索同义词。
GET my_index/_search{"query" : { "match" : { "content" : "es" }},"highlight" : {"pre_tags" : ["<tag1>", "<tag2>"],"post_tags" : ["</tag1>", "</tag2>"],"fields" : {"content": {}}}}
命令执行成功后,返回如下结果。
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 0.25811607,"hits": [{"_index": "my_index","_type": "_doc","_id": "2","_score": 0.25811607,"_source": {"content": "hello es"},"highlight": {"content": ["hello <tag1>es</tag1>"]}},{"_index": "my_index","_type": "_doc","_id": "1","_score": 0.25316024,"_source": {"content": "tencet elasticsearch service"},"highlight": {"content": ["tencet <tag1>elasticsearch</tag1> service"]}}]}}
方式二:直接引用同义词
1. 登录集群对应的 Kibana 控制台。登录控制台的具体步骤请参考 通过 Kibana 访问集群。
2. 单击左侧导航栏的 Dev Tools。
3. 在 Console 中执行如下的命令,创建索引。
PUT /my_index{"settings": {"index": {"analysis": {"analyzer": {"my_ik": {"type": "custom","tokenizer": "ik_smart","filter": ["my_synonym"]}},"filter": {"my_synonym": {"type": "synonym","synonyms": ["elasticsearch,es => es"]}}}}},"mappings": {"properties": {"content": {"type": "text","analyzer": "my_ik","search_analyzer": "my_ik"}}}}
这里与使用同义词文件方式的区别:在 filter 中定义同义词时,直接引用了同义词,而不是同义词文件:
"synonyms": ["elasticsearch,es => es"]
。4. 执行如下命令,验证同义词配置。
GET /my_index/_analyze{"analyzer": "my_ik","text":"tencet elasticsearch service"}
命令执行成功,将返回如下结果。
{"tokens": [{"token": "tencet","start_offset": 0,"end_offset": 6,"type": "ENGLISH","position": 0},{"token": "es","start_offset": 7,"end_offset": 20,"type": "SYNONYM","position": 1},{"token": "service","start_offset": 21,"end_offset": 28,"type": "ENGLISH","position": 2}]}
输出结果中,
token
es
的类型是SYNONYM
同义词。5. 执行如下命令,添加一些文档。
POST /my_index/_doc/1{"content": "tencet elasticsearch service"}POST /my_index/_doc/2{"content": "hello es"}
6. 执行如下命令,搜索同义词。
GET my_index/_search{"query" : { "match" : { "content" : "es" }},"highlight" : {"pre_tags" : ["<tag1>", "<tag2>"],"post_tags" : ["</tag1>", "</tag2>"],"fields" : {"content": {}}}}
命令执行成功后,返回如下结果。
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 0.25811607,"hits": [{"_index": "my_index","_type": "_doc","_id": "2","_score": 0.25811607,"_source": {"content": "hello es"},"highlight": {"content": ["hello <tag1>es</tag1>"]}},{"_index": "my_index","_type": "_doc","_id": "1","_score": 0.25316024,"_source": {"content": "tencet elasticsearch service"},"highlight": {"content": ["tencet <tag1>elasticsearch</tag1> service"]}}]}}