【 ES 私房菜】收集 Nginx 访问日志

原创

张戈

修改于 2017-10-31 09:46:42

2.7K0

文章被收录于专栏：张戈的专栏张戈的专栏

在上一篇系列文章《【ES私房菜】收集 Apache 访问日志》中，我们已经完成了ES收集Apache日志的目标，再收集其他WEB日志也就小菜一碟了。

下面，我们一起看看ES如何收集Nginx日志。

一、日志格式

和Apache一样，Nginx也可以变相将日志输出为Json格式，给我们的收集带来了极大的便利。在Apache日志收集一文，我们已经设计好了必要的日志格式，所以这里只需要将Apache对应的日志变量改为Nginx的就好了，配置代码如下：

# 使用map获取客户端真实IP，这里比Apache方便多了
map $http_x_forwarded_for $clientRealIp {
 "" $remote_addr;
 ~^(?P<firstAddr>[0-9\.]+),?.*$ $firstAddr;
 }

#新增日志格式
log_format access_log_json '{"access_path":"$proxy_add_x_forwarded_for","client_ip":"$clientRealIp","http_host":"$host","@timestamp":"$time_iso8601","method":"$request_method","url":"$request_uri","status":"$status","http_referer":"$http_referer","body_bytes_sent":"$body_bytes_sent","request_time":"$request_time","http_user_agent":"$http_user_agent","total_bytes_sent":"$bytes_sent","server_ip":"$server_addr"}';

#在站点对应的server模块内配置日志：
access_log /data/wwwlogs/$host.log access_log_json;

Ps：如果需要调整日志参数，可以参考本文最末的附录：Nginx日志变量详解。

Tips：后文的内容和《收集Apache日志》所用配置是一样的，看过前文的可以忽略。

二、部署Filebeat

按照《【ES私房菜】Filebeat安装部署及配置详解》在需要采集日志的WEB服务器上部署filebeat，然后编写如下配置：

vim filebeat.yml

############################# input #########################################
filebeat.prospectors:
- input_type: log
  paths: /data/wwwlogs/*.log
  document_type: "web_access_log"

spool_size: 1024
idle_timeout: "5s"
name: 172.16.x.xxx

############################# Kafka #########################################
output.kafka:
  # initial brokers for reading cluster metadata
  hosts: ["x.x.x.1:9092","x.x.x.2:9092","x.x.x.3:9092"]
  # message topic selection + partitioning
  topic: '%{[type]}'
  flush_interval: 1s
  partition.round_robin:
    reachable_only: false
  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000

############################# Logging #########################################
logging.level: info
logging.to_files: true
logging.to_syslog: false
logging.files:
  path: /data/filebeat/logs
  name: filebeat.log
  keepfiles: 7

三、配置template

在正式上报数据之前，我们先配置下ES的template：

Ps：如果前面已经做过Apache日志收集，且设计的日志格式一样，则可以跳过这一步

{
    "template": "web_access_log-*",
    "mappings": {
      "log": {
              "properties": {
                "@timestamp": {
                   "include_in_all" : false,
                    "type" : "date"
                },
                "server_ip": {
                  "index": "not_analyzed",
                  "type": "ip"
                },
                "http_host": {
                  "index": "not_analyzed",
                  "type": "string"
                },
                "client_ip": {
                  "index": "not_analyzed",
                  "type": "ip"
                },
                "access_path": {
                   "type": "string"
                },
                "method": {
                    "type": "string"
                },
                "url": {
                  "type": "string"
                },
                "http_referer": {
                    "type": "string"
                },
                "body_bytes_sent": {
                  "index": "not_analyzed",
                  "type": "long"
                },
                "total_bytes_sent": {
                  "index": "not_analyzed",
                  "type": "long"
                },
                "status": {
                  "index": "not_analyzed",
                  "type": "long"
                },
                "request_time": {
                  "index": "not_analyzed",
                  "type": "double"
                },
                "http_user_agent": {
                  "type":"string"
                }
            }
        }
    }
}

Ps：这里就不详细说明每个字段含义了，请参考系列文章《ElastiSearch template简介(整理中)》.

将上述模板保存为 web.json 的文件，然后执行如下命令进行导入：

curl -XPUT http://x.x.x.x:9200/_template/template-web_access_log -d @web.json

主机为ES地址和端口
_template 表示模板方法
template-web_access_log 是我们给这个模板取得名字
-d @模板文件，表示将这个模板文件导入到ES

正常将返回如下结果：

{
  "acknowledged" : true
}

四、配置logstash

模板导入之后，我们再配置 logstash。

Ps：这里和上一篇Apache日志收集的配置一样，如果是从同一个Kafka读取，则复用一套即可。

vim logstash.conf

input {
    kafka {
        bootstrap_servers => "x.x.x.1:9092,x.x.x.2:9092,x.x.x.3:9092"
        topics => "web_access_log"
        group_id => "logstash"
        codec => json {
            charset => "UTF-8"
        }
        add_field => { "[@metadata][type]" => "web_access_log" }
    }
}

filter {
    if [@metadata][type] == "web_access_log" {
      # 这里对UTF-8单字节编码做了下替换处理，否则URL有中文会出现json无法解析报错
      mutate {  
        gsub => ["message", "\\x", "\\\x"]
      }
      # 这里排除了下HEAD请求，如需要排除其他关键词，可自行添加
      if ( 'method":"HEAD' in [message] ) {
           drop {}
      }
      json {
            source => "message"
            remove_field => "message"          
            remove_field => "[beat][hostname]"      
            remove_field => "[beat][name]"      
            remove_field => "@version"      
            remove_field => "[beat][version]"
            remove_field => "input_type"
            remove_field => "offset"
            remove_field => "tags"
            remove_field => "type"
            remove_field => "host"
        }
    }
}

output {
    #stdout{
    #    codec => rubydebug
    #}
    if [@metadata][type] == "web_access_log" {
        elasticsearch {
            hosts => ["x.x.x.x:9200"]
            index => "web_access_log-%{+YYYY.MM.dd}"
            # 禁止logstash管理模板，并指定es模板
            manage_template => false
            template_name => "template-web_access_log"
        }
    }
}

这里由于我们上报的已经是json格式，所以不需要做正则匹配和其他处理，简单多了。

五、配置Kibana

启动logstash上报数据之后，我们还需要在kibana里面配置下索引：

①、如图打开索引管理：

②、如图点击创建索引：

③、如图输入logstash指定的索引前缀，自动带出字段后选择时间戳字段，点击【Create】即可：

最后，回到Discover界面就能看到期待已久的高清美图了：

本文就介绍这么多，更多Kibana的奇淫巧计请关注《ES私房菜系列文章之教你玩转Kibana（整理中）》。

六、附录：Nginx日志变量详解

$args                    #请求中的参数值
$query_string            #同 $args
$arg_NAME                #GET请求中NAME的值
$is_args                 #如果请求中有参数，值为"?"，否则为空字符串
$uri                     #请求中的当前URI(不带请求参数，参数位于$args)，可以不同于浏览器传递的$request_uri的值，它可以通过内部重定向，或者使用index指令进行修改，$uri不包含主机名，如"/foo/bar.html"。
$document_uri            #同 $uri
$document_root           #当前请求的文档根目录或别名
$host                    #优先级：HTTP请求行的主机名>"HOST"请求头字段>符合请求的服务器名.请求中的主机头字段，如果请求中的主机头不可用，则为服务器处理请求的服务器名称
$hostname                #主机名
$https                   #如果开启了SSL安全模式，值为"on"，否则为空字符串。
$binary_remote_addr      #客户端地址的二进制形式，固定长度为4个字节
$body_bytes_sent         #传输给客户端的字节数，响应头不计算在内；这个变量和Apache的mod_log_config模块中的"%B"参数保持兼容
$bytes_sent              #传输给客户端的字节数
$connection              #TCP连接的序列号
$connection_requests     #TCP连接当前的请求数量
$content_length          #"Content-Length" 请求头字段
$content_type            #"Content-Type" 请求头字段
$cookie_name             #cookie名称
$limit_rate              #用于设置响应的速度限制
$msec                    #当前的Unix时间戳
$nginx_version           #nginx版本
$pid                     #工作进程的PID
$pipe                    #如果请求来自管道通信，值为"p"，否则为"."
$proxy_protocol_addr     #获取代理访问服务器的客户端地址，如果是直接访问，该值为空字符串
$realpath_root           #当前请求的文档根目录或别名的真实路径，会将所有符号连接转换为真实路径
$remote_addr             #客户端地址
$remote_port             #客户端端口
$remote_user             #用于HTTP基础认证服务的用户名
$request                 #代表客户端的请求地址
$request_body            #客户端的请求主体：此变量可在location中使用，将请求主体通过proxy_pass，fastcgi_pass，uwsgi_pass和scgi_pass传递给下一级的代理服务器
$request_body_file       #将客户端请求主体保存在临时文件中。文件处理结束后，此文件需删除。如果需要之一开启此功能，需要设置client_body_in_file_only。如果将次文件传 递给后端的代理服务器，需要禁用request body，即设置proxy_pass_request_body off，fastcgi_pass_request_body off，uwsgi_pass_request_body off，or scgi_pass_request_body off
$request_completion      #如果请求成功，值为"OK"，如果请求未完成或者请求不是一个范围请求的最后一部分，则为空
$request_filename        #当前连接请求的文件路径，由root或alias指令与URI请求生成
$request_length          #请求的长度 (包括请求的地址，http请求头和请求主体)
$request_method          #HTTP请求方法，通常为"GET"或"POST"
$request_time            #处理客户端请求使用的时间,单位为秒，精度毫秒； 从读入客户端的第一个字节开始，直到把最后一个字符发送给客户端后进行日志写入为止。
$request_uri             #这个变量等于包含一些客户端请求参数的原始URI，它无法修改，请查看$uri更改或重写URI，不包含主机名，例如："/cnphp/test.php?arg=freemouse"
$scheme                  #请求使用的Web协议，"http" 或 "https"
$server_addr             #服务器端地址，需要注意的是：为了避免访问linux系统内核，应将ip地址提前设置在配置文件中
$server_name             #服务器名
$server_port             #服务器端口
$server_protocol         #服务器的HTTP版本，通常为 "HTTP/1.0" 或 "HTTP/1.1"
$status                  #HTTP响应代码
$time_iso8601            #服务器时间的ISO 8610格式
$time_local              #服务器时间（LOG Format 格式）
$cookie_NAME             #客户端请求Header头中的cookie变量，前缀"$cookie_"加上cookie名称的变量，该变量的值即为cookie名称的值
$http_NAME               #匹配任意请求头字段；变量名中的后半部分NAME可以替换成任意请求头字段，如在配置文件中需要获取http请求头："Accept-Language"，$http_accept_language即可
$http_cookie
$http_host               #请求地址，即浏览器中你输入的地址（IP或域名）
$http_referer            #url跳转来源,用来记录从那个页面链接访问过来的
$http_user_agent         #用户终端浏览器等信息
$http_x_forwarded_for
$sent_http_NAME          #可以设置任意http响应头字段；变量名中的后半部分NAME可以替换成任意响应头字段，如需要设置响应头Content-length，$sent_http_content_length即可
$sent_http_cache_control
$sent_http_connection
$sent_http_content_type
$sent_http_keep_alive
$sent_http_last_modified
$sent_http_location
$sent_http_transfer_encoding

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

nginx

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

nginx

登录后参与评论

0 条评论

热度