构建在开源基础之上, Elastic Stack 让您能够安全可靠地获取任何来源、任何格式的数据,并且能够实时地对数据进行搜索、分析和可视化
Elasticsearch 是基于 JSON 的分布式搜索和分析引擎,专为实现水平扩展、高可用和管理便捷性而设计。
Kibana 能够以图表的形式呈现数据,并且具有可扩展的用户界面,供您全方位配置和管理 Elastic Stack。
Logstash 是动态数据收集管道,拥有可扩展的插件生态系统,能够与 Elasticsearch 产生强大的协同作用。
Beats 是轻量型采集器的平台,从边缘机器向 Logstash 和 Elasticsearch 发送数据。
note:
ES和Kibana的安装很简单,前提需要先安装好Java8,然后执行以下命令即可
# 在Ubuntu16.04上安装,方式有很多种,选择二进制压缩包的方式安装
# 1. 在普通用户家目录下,下载压缩包
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.tar.gz
# 2. 解压
tar -xvf elasticsearch-6.3.2.tar.gz
# 3. 移动至/opt目录下
sudo mv elasticsearch-6.3.2 /opt
# 4. 修改配置文件elasticsearch.yml中的 network.host 值为 0.0.0.0,其他的配置参考官方文档
cd /opt/elasticsearch-6.3.2vi config/elasticsearch.yml
# 5. 启动单节点,然后浏览器访问host:9200即可看到ES集群信息
bin/elasticsearch
wget https://artifacts.elastic.co/downloads/kibana/kibana-6.3.2-linux-x86_64.tar.gz
shasum -a 512 kibana-6.3.2-linux-x86_64.tar.gz
tar -xzf kibana-6.3.2-linux-x86_64.tar.gz
sudo mv kibana-6.3.2-linux-x86_64 /optcd /opt/kibana-6.3.2-linux-x86_64
# 修改 config/kibana.yml中 server.host: 0.0.0.0# 启动Kibana,访问 host:5601即可进入kibana界面
Elasticsearch集群对外提供RESTful API
note: 我们后面主要使用 Kibana Devtools 这种交互方式
GET /_cat/health?v# 结果epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent1534319381 15:49:41 elasticsearch green 3 3 118 59 0 0 0 0 - 100.0%
集群的健康状态(status)有三种:
GET /_cat/nodes?v
# 结果(我的ES集群安装了三个节点)ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name10.100.97.207 30 96 13 0.15 0.08 0.08 mdi * master10.100.97.246 68 96 3 0.00 0.00 0.00 mdi - hadoop210.100.98.22 15 97 2 0.00 0.02 0.04 mdi - hadoop3
GET /_cat/indices?v
# 结果health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-2015.05.20 4BjPjpq6RhOSCNUPMsY0MQ 5 1 4750 0 46.8mb 24.5mb
green open logstash-2015.05.18 mDkUKHSWR0a8UeZlKzts8Q 5 1 4631 0 45.6mb 23.8mb
green open hockey g1omiazvRSOE117w_uy_wA 5 1 11 0 45.3kb 22.6kb
green open .kibana AGdo8im_TxC04ARexUxqxw 1 1 143 10 665.6kb 332.8kb
green open shakespeare 5009bDa7T16f5qTeyOdTlw 5 1 111396 0 43.9mb 22mb
green open logstash-2015.05.19 az4Jen4nT7-J9yRYpZ0A9A 5 1 4624 0 44.7mb 23.1mb
...
# 插入一个文档PUT /customer/_doc/1?pretty
{ "name": "John Doe"}# 结果{ "_index": "customer", "_type": "_doc", "_id": "1", "_version": 1, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0
}, "_seq_no": 1, "_primary_term": 1}# 查询该文档GET /customer/_doc/1#结果{ "_index": "customer", "_type": "_doc", "_id": "1", "_version": 1, "found": true, "_source": { "name": "John Doe"
}
}
note:
customer
为索引名,_doc
为type,1为文档_id,需要注意的是:在es6.x建议索引的type值固定为_doc
,在之后的版本将删除type了;文档id若不指定,es会自动分配一个_id给文档GET /_cat/indices?v
可以看到多了 customer 的索引信息用于标注文档的元信息
shard_num = hash(_routing) % num_primary_shards
DELETE customer#结果{ "acknowledged": true}GET /_cat/indices?v# 再次查看索引信息,可以发现 customer 不存在,已被删除
PUT /customer/_doc/1?pretty{ "name": "John Doe"}
POST /customer/_doc/1/_update{ "doc": { "name": "Jane Doe" }
}
POST /customer/_doc/1/_update{ "doc": { "name": "Jane Doe", "age": 20 }
}# 可以看到 \_version的值一直在增加
DELETE /customer/_doc/2
es提供了_bulk API供批量操作,可以提高索引、更新、删除等操作的效率
_bulk操作的类型有四种:
# _bulk 任务:# 1. index创建 customer索引下id为3的文档# 2. delete删除 customer索引下id为3的文档# 3. create创建 customer索引下id为3的文档# 4. update更新 customer索引下id为3的文档POST _bulk
{"index":{"_index":"customer","_type":"_doc","_id":"3"}}
{"name":"whirly"}
{"delete":{"_index":"customer","_type":"_doc","_id":"3"}}
{"create":{"_index":"customer","_type":"_doc","_id":"3"}}
{"name":"whirly2"}
{"update":{"_index":"customer","_type":"_doc","_id":"3"}}
{"doc":{"name":"whirly3"}}
note:
一个简单的数据集,数据结构如下:
{ "account_number": 0, "balance": 16623, "firstname": "Bradshaw", "lastname": "Mckenzie", "age": 29, "gender": "F", "address": "244 Columbus Place", "employer": "Euron", "email": "bradshawmckenzie@euron.com", "city": "Hobucken", "state": "CO"}
导入这个简单的数据集到es中
# 下载wget https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json# 导入curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
上述命令是通过 _bulk API 将 account.json 的内容插入 bank 索引中,type 为 _doc
# account.json的内容:{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}...# 导入完成后可以看到 bank 索引已存在 1000 条数据GET bank/_search
任务:查询所有数据,根据 account_number 字段升序排序
GET /bank/_search?q=*&sort=account_number:asc&pretty
GET /bank/_search
{ "query": { "match_all": {} }, "sort": [
{ "account_number": "asc" }
]
}
结果
{ "took": 41, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0
}, "hits": { "total": 1000, "max_score": null, "hits": [
{ "_index": "bank", "_type": "account", "_id": "0", "_score": null, "_source": { "account_number": 0, "balance": 16623, "firstname": "Bradshaw", "lastname": "Mckenzie", "age": 29, "gender": "F", "address": "244 Columbus Place", "employer": "Euron", "email": "bradshawmckenzie@euron.com", "city": "Hobucken", "state": "CO"
}, "sort": [ 0
]
}...
]
}
}
各个参数意思:
这个Elasticsearch提供的基于 json 的查询语言,我们通过一个小任务来了解一下
任务要求:
GET bank/_search
{ "query": { "bool": { "must": [
{ "match_phrase_prefix": { "firstname": "R"
}
}
], "filter": { "range": { "age": { "gte": 20, "lte": 30
}
}
}
}
}, "from": 0, "size": 10, "sort": [
{ "age": { "order": "desc"
}
}
], "_source": [ "firstname", "city", "address", "email", "balance"
], "highlight": { "fields": { "firstname": {}
}
}, "aggs": { "avg_age": { "avg": { "field": "balance"
}
}
}
}
其中:
GET /bank/_search
{ "query": { "match_all": {}
}, "size": 2}
GET /bank/_search
{ "query": { "match": { "address": "mill lane"
}
}
}
GET /bank/_search
{ "query": { "match_phrase": { "address": "mill lane"
}
}
}
note: match 和 match_phrase 的区别:
GET /bank/_search
{ "query": { "bool": { "must": [
{ "match": { "age": "40" } }
], "must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
GET /bank/_search
{ "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000
}
}
}
}
}
}
GET /bank/_search
{ "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword"
}, "aggs": { "average_balance": { "avg": { "field": "balance"
}
}
}
}
}
}
分别计算 age 值在 20~30 ,3040,4050 三个年龄段的男和女的平均存款balance
GET /bank/_search
{ "size": 0, "aggs": { "group_by_age": { "range": { "field": "age", "ranges": [
{ "from": 20, "to": 30
},
{ "from": 30, "to": 40
},
{ "from": 40, "to": 50
}
]
}, "aggs": { "group_by_gender": { "terms": { "field": "gender.keyword"
}, "aggs": { "average_balance": { "avg": { "field": "balance"
}
}
}
}
}
}
}
}
更多内容请访问我的个人博客:http://laijianfeng.org 参考文档: