几年前使用icinga2和nagios plugins搭建了主机监控,后来为了方便监控网络设备使用centron获取snmp信息,再加上自己写的一些监控插件,基本完善了监控项目(主机,网络设备,各种操作系统,数据库,中间件,各种服务)。
前些天打算把openvpn用户数放进监控,发现以前的办法行不通了:需要在ovpn服务器上运行脚本获取用户数,在icinga2服务器上再使用check_snmp去ovpn服务器上调用snmp extend功能调用该脚本并把结果返回来。
步骤看来简单,中间有几个坑,值得记录。
ovpn上获取用户数
这里懒得自己写了,找了个现成的python脚本,有点复杂,可以自己写个简单点的
复制过来格式乱了,懂python就自己修改下,要不用shell写个也可以,只要输出是通用格式即可
# cat /etc/snmp/scripts/check_ovpn_users.py
# check_ovpn_users.py - a script for checking the
# amount of OpenVPN users
#
# 2016 By Christian Stankowic
# <info at stankowic hyphen development dot net>
# https://github.com/stdevel
from optparse import OptionParser, OptionGroup
import logging
import re
LOGGER = logging.getLogger("check_ovpn_users")
log=[]
matches=[]
state=0
def set_code(int):
global state
if int > state: state = int
def get_return_str():
if state == 3: return "UNKNOWN"
elif state == 2: return "CRITICAL"
elif state == 1: return "WARNING"
else: return "OK"
def check_users():
for line in log:
ips = re.findall("^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line)
if len(ips) > 0: matches.append(ips[0])
LOGGER.debug("Found the following clients: {0}".format(",".join(matches)))
if len(matches) > options.users_crit:
set_code(2)
snip_users = "OpenVPN users CRITICAL ({0})".format(len(matches))
elif len(matches) > options.users_warn:
set_code(1)
snip_users = "OpenVPN users WARNING ({0})".format(len(matches))
else:
snip_users = "OpenVPN users OK ({0})".format(len(matches))
perfdata=" | "
if options.show_perfdata:
perfdata = "{0}'vpn_users'={1};{2};{3}".format(perfdata,len(matches),options.users_warn,options.users_crit)
print "{0}: {1}{2}".format(get_return_str(), snip_users, perfdata)
exit(state)
def get_log():
global log
with open(options.log_file, 'r') as my_log:
log=my_log.read().splitlines()
if len(log) == 0:
print "UNKNOWN: Log file seems to be invalid!"
exit(3)
if __name__ == "__main__":
desc='''%prog is used to check the amount of logged in OpenVPN users.
Checkout the GitHub page for updates: https://github.com/stdevel/check_ovpn_users'''
parser = OptionParser(description=desc,version="%prog version 0.5.0")
gen_opts = OptionGroup(parser, "Generic options")
usr_opts = OptionGroup(parser, "User options")
parser.add_option_group(gen_opts)
parser.add_option_group(usr_opts)
gen_opts.add_option("-d", "--debug", dest="debug", default=False, action="store_true", help="enable debugging outputs")
gen_opts.add_option("-P", "--show-perfdata", dest="show_perfdata", default=False, action="store_true", help="enables performance data, requires -i (default: no)")
gen_opts.add_option("-f", "--log-file", dest="log_file", default="/var/run/ovpnserver.log", action="store", help="defines the OpenVPN server log file (default: /var/run/ovpnserver.log)")
usr_opts.add_option("-w", "--users-warning", dest="users_warn", default=10, action="store", metavar="NUMBER", help="defines the user warning threshold (default: 5)")
usr_opts.add_option("-c", "--users-critical", dest="users_crit", default=20, action="store", metavar="NUMBER", help="defines the user critical threshold (default: 10)")
(options, args) = parser.parse_args()
if options.debug:
logging.basicConfig(level=logging.DEBUG)
LOGGER.setLevel(logging.DEBUG)
else:
logging.basicConfig()
LOGGER.setLevel(logging.INFO)
LOGGER.debug("OPTIONS: {0}".format(options))
get_log()
check_users()
试运行一下,重点是"-f"指定状态ovpn的日志,"-P"表示获取性能数据,至于"-w"和"-c"是报警阈值,不写的话默认是10和20
# python /etc/snmp/scripts/check_ovpn_users.py -f /var/log/openvpn-status.log -P -w 30 -c 50
OK: OpenVPN users OK (15) | 'vpn_users'=15;30;50
# python /etc/snmp/scripts/check_ovpn_users.py -f /var/log/openvpn-status.log -P
WARNING: OpenVPN users WARNING (15) | 'vpn_users'=15;10;20
用shell脚本调用python脚本
snmp的扩展功能不直接支持python,但可以调用执行shell脚本,所以用shell脚本去调用上面的python脚本
# cat /etc/snmp/scripts/check_ovpn_users.sh
#! /bin/bash
python /etc/snmp/scripts/check_ovpn_users.py -f /var/log/openvpn-status.log -P -w 20 -c 30
记得shell脚本要可以执行
# chmod +x /etc/snmp/scripts/check_ovpn_users.sh
用snmp调用shell脚本
这个就简单了,在snmp服务的配置文件/etc/snmp/snmpd.conf中追加一行:
extend ovpn_users /etc/snmp/scripts/check_ovpn_users.sh
重启snmpd服务后就可以看到snmp输出多了几行:
# snmpwalk -v 2c -c public 192.168.1.20 NET-SNMP-EXTEND-MIB::nsExtendObjects
NET-SNMP-EXTEND-MIB::nsExtendNumEntries.0 = INTEGER: 3
NET-SNMP-EXTEND-MIB::nsExtendCommand."ovpn_users" = STRING: /etc/snmp/scripts/check_ovpn_users.sh
NET-SNMP-EXTEND-MIB::nsExtendArgs."ovpn_users" = STRING:
NET-SNMP-EXTEND-MIB::nsExtendInput."ovpn_users" = STRING:
NET-SNMP-EXTEND-MIB::nsExtendCacheTime."ovpn_users" = INTEGER: 5
NET-SNMP-EXTEND-MIB::nsExtendExecType."ovpn_users" = INTEGER: exec(1)
NET-SNMP-EXTEND-MIB::nsExtendRunType."ovpn_users" = INTEGER: run-on-read(1)
NET-SNMP-EXTEND-MIB::nsExtendStorage."ovpn_users" = INTEGER: permanent(4)
NET-SNMP-EXTEND-MIB::nsExtendStatus."ovpn_users" = INTEGER: active(1)
NET-SNMP-EXTEND-MIB::nsExtendOutput1Line."ovpn_users" = STRING: OK: OpenVPN users OK (20) | 'vpn_users'=20;20;30
NET-SNMP-EXTEND-MIB::nsExtendOutputFull."ovpn_users" = STRING: OK: OpenVPN users OK (20) | 'vpn_users'=20;20;30
NET-SNMP-EXTEND-MIB::nsExtendOutNumLines."ovpn_users" = INTEGER: 1
NET-SNMP-EXTEND-MIB::nsExtendResult."ovpn_users" = INTEGER: 0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."ovpn_users".1 = STRING: OK: OpenVPN users OK (20) | 'vpn_users'=20;20;30
接下来获取其对应的OID
在ovpn服务器上可以看到nsExtendOutLine的值
# snmptranslate -On NET-SNMP-EXTEND-MIB::nsExtendOutLine
.1.3.6.1.4.1.8072.1.3.2.4.1.2
在icigna2服务器上进一步解析其OID
# snmpwalk -v 2c -c public 192.168.1.20 .1.3.6.1.4.1.8072.1.3.2.4.1.2
iso.3.6.1.4.1.8072.1.3.2.4.1.2.10.111.118.112.110.95.117.115.101.114.115.1 = STRING: "OK: OpenVPN users OK (17) | 'vpn_users'=17;20;30"
获得oid后,就可以使用check_snmp获取检测结果
# /usr/lib/nagios/plugins/check_snmp -H 192.168.1.20 -C public --invert-search -R "CRITICAL|WARNING" -o iso.3.6.1.4.1.8072.1.3.2.4.1.2.10.111.118.112.110.95.117.115.101.114.115.1
SNMP OK - OK: OpenVPN users OK (17) | 'vpn_users'=17;20;30 |
整合到icinga2
接下来整合到icinga2里面就好了。
为了测试,还写了个脚本检测icinga和ovpn服务的进程数。
定义主机
object Host "openvpn" {
check_command = "hostalive"
address = "192.168.1.20"
vars.os = "Linux"
vars.snmp_community = "public"
vars.snmp_oid["procIcinga"] = {
displayName = "icinga2进程数"
OID = "iso.3.6.1.4.1.8072.1.3.2.4.1.2.12.105.99.105.110.103.97.50.95.112.105.100.115.1"
vars.snmp_warn = "3"
vars.snmp_crit = "6"
}
vars.snmp_oid["procOvpn"] = {
displayName = "ovpn进程数"
OID = "iso.3.6.1.4.1.8072.1.3.2.3.1.2.12.111.112.101.110.118.112.110.95.112.105.100.115"
vars.snmp_warn = "3"
vars.snmp_crit = "6"
}
vars.snmp_oid["vpnUsers"] = {
displayName = "ovpn连接用户数"
vars.snmp_warn = "10"
vars.snmp_crit = "30"
OID = "iso.3.6.1.4.1.8072.1.3.2.4.1.2.10.111.118.112.110.95.117.115.101.114.115.1"
}
vars.client_endpoint = "master"
icon_image = "img/icons/centos.png"
vars.notification["pager"] = {
users = ["gly1"]
}
}
定义服务
apply Service for (OIDs => config in host.vars.snmp_oid) {
check_command = "snmp"
vars += config
......
vars.snmpoid = vars.OID
vars.snmpv3_invert_search = true
vars.snmpv3_ereg = "CRITICAL|WARNING"
.....
}
图形化整合
最后使用把grafana绘图整合到icingaweb2里面,这里就不再多说了,上图