Nagios 集成¶
Nagios 是电脑系统和网络监控程序,用于检测主机和服务,当异常发生和解除时能提醒用户;是基于 GPLv2 开发的开源软件,可免费获得及使用。 Nagios 原名 NetSaint,由 Ethan Galstad 开发并维护至今。
Nagios 集成 CA 步骤¶
- 在睿象云Cloud Alert界面创建 nagios 应用,并获取 appkey
-
在 nagios server 端安装 CA 探针
-
下载 CA 探针
wget https://download.aiops.com/ca_agent/nagios/ca_agent-4.1.3.1-linux-x64.tar.gz # 请使用root或nagios用户下载
-
安装 Agent
注意!下文以 nagios 默认安装路径
/usr/local/nagios/
为例,如果你的 nagios 服务器不是安装在该目录,请自行替换。tar -xzf ca_agent-4.1.3.1-linux-x64.tar.gz cp -R ca_agent /usr/local/nagios/libexec/ cp ca_agent/plugin/nagios-plugin/nagios /usr/local/nagios/libexec/ chmod +x /usr/local/nagios/libexec/nagios cp ca-agent/plugin/nagios-plugin/cloudalert.cfg /usr/local/nagios/etc/objects/
-
修改配置
修改
/usr/local/nagios/etc/objects/cloudalert.cfg
,设置pager
为 CA 应用 appkey。vi /usr/local/nagios/etc/objects/cloudalert.cfg
define contact{ contact_name cloudalert ; The name of this contact template alias ca ; service_notification_period 24x7 ; service notifications can be sent anytime host_notification_period 24x7 ; host notifications can be sent anytime service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events service_notification_commands notify-service-by-cloudalert ; send service notifications via email host_notification_commands notify-host-by-cloudalert ; send host notifications via email pager -- --处填入您新建应用时生成的appkey ; }
# 'notify-host-by-cloudalert' command definition define command{ command_name notify-host-by-cloudalert command_line $USER1$/nagios "alarmName:$NOTIFICATIONTYPE$ Host Alert: $HOSTADDRESS$ is $HOSTSTATE$" "alarmContent:$HOSTNAME$/$HOSTADDRESS$ $HOSTOUTPUT$ Date/Time: $SHORTDATETIME$" "entityName:$HOSTADDRESS$" "priority:$HOSTSTATE$" "app:$CONTACTPAGER$" "eventType:$NOTIFICATIONTYPE$" } # 'notify-service-by-cloudalert' command definition define command{ command_name notify-service-by-cloudalert command_line $USER1$/nagios "alarmName:$NOTIFICATIONTYPE$ Service Alert: $HOSTADDRESS$/$SERVICEDESC$ is $SERVICESTATE$" "alarmContent:$HOSTALIAS$/$HOSTADDRESS$/$SERVICEDESC$ $SERVICEOUTPUT$ Date/Time: $SHORTDATETIME$" "entityName:$HOSTADDRESS$/$SERVICEDESC$" "priority:$SERVICESTATE$" "app:$CONTACTPAGER$" "eventType:$NOTIFICATIONTYPE$" }
修改
/usr/local/nagios/etc/objects/contacts.cfg
,新增 cloudalert 到默认联系组vi /usr/local/nagios/etc/objects/contacts.cfg define contactgroup{ contactgroup_name admins alias Nagios Administrators members nagiosadmin,cloudalert }
修改
/usr/local/nagios/etc/nagios.cfg
,将cloudalert.cfg
新增到nagios.cfg
中vi /usr/local/nagios/etc/nagios.cfg cfg_file=/usr/local/nagios/etc/objects/cloudalert.cfg
可选:为了让告警信息显示更友好,建议修改
nagios.cfg
由原先us
更改为iso8601
vi /usr/local/nagios/etc/nagios.cfg
-
重启 nagios
重启前检查下配置是不是正确.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
请使用 root 账号重启 Nagios
service nagios restart
集成结果验证¶
登录 Nagios 页面控制台发送通知
Warning
请确认对应服务的notifications_enabled为1。
define service{ use local-service ; Name of service template to use host_name localhost service_description Tomcat18080 check_command check_http18080 notifications_enabled 1 }
查看 agent 日志,出现 sucess 字样代表成功,如果发送的告警通知,则会同步发送微信、移动 app、短信、邮件
tail -f /usr/local/nagios/libexec/ca_agent/log/agent.log
正常返回 success 即表示成功
10-05-2015 15:48:53,056 CST INFO [main] [com.upyoo.agent.NagiosClient@45] start to call alert ... 10-05-2015 15:48:53,063 CST INFO [main] [com.upyoo.agent.CommandClient@82] alarmName:PROBLEM Service Alert: 127.0.0.1/Tomcat18080 is CRITICAL 10-05-2015 15:48:53,064 CST INFO [main] [com.upyoo.agent.CommandClient@82] alarmContent:localhost/127.0.0.1/Tomcat18080 connect to address 127.0.0.1 and port 18080: Connection refused Date/Time: 2015-05-10 15:48:52 10-05-2015 15:48:53,064 CST INFO [main] [com.upyoo.agent.CommandClient@82] entityName:127.0.0.1/Tomcat18080 10-05-2015 15:48:53,066 CST INFO [main] [com.upyoo.agent.CommandClient@82] priority:CRITICAL 10-05-2015 15:48:53,066 CST INFO [main] [com.upyoo.agent.CommandClient@82] app:9c4bc722-6677-9fc9-fbdc-003d8977d17e 10-05-2015 15:48:53,067 CST INFO [main] [com.upyoo.agent.CommandClient@82] 10-05-2015 15:48:53,068 CST INFO [main] [com.upyoo.agent.CommandClient@82] 10-05-2015 15:48:53,068 CST INFO [main] [com.upyoo.agent.CommandClient@82] 10-05-2015 15:48:53,069 CST INFO [main] [com.upyoo.agent.CommandClient@82] 10-05-2015 15:48:53,105 CST INFO [main] [com.upyoo.agent.CommandClient@58] start to post url:http://api.aiops.com/alert/api/event 10-05-2015 15:48:53,180 CST INFO [main] [com.upyoo.agent.CommandClient@65] body:{"app":"9c4bc722-6677-9fc9-fbdc-003d8977d17e","alarmContent":"localhost/127.0.0.1/Tomcat18080 connect to address 127.0.0.1 and port 18080: Connection refused Date/Time: 2015-05-10 15:48:52","eventId":"8G8OGOYUCOOLOENYOGGENOOOOONYNOLU","priority":"3","alarmName":"PROBLEM Service Alert: 127.0.0.1/Tomcat18080 is CRITICAL","eventType":"trigger","entityName":"127.0.0.1/Tomcat18080"} 10-05-2015 15:48:53,775 CST INFO [main] [com.upyoo.agent.CommandClient@68] result:{"result":"success","message":null,"data":"3516","totalCount":0,"code":"200"}
集成后收不到告警排错方法¶
若在 nagios 新触发测试告警后,CA 平台无法看到告警,请点击 告警
->所有告警
,确认是否有告警:
-
若有,则说明告警已成功发送到 CA 平台,需要您在
配置
->分派策略
菜单下添加分派策略; -
若无,则说明告警未成功发送到 CA 平台,排错方法如下:
进入 nagios.log 文件,可以看到 CA 探针的 log 信息,确保 nagios 给 cloudalert 这个成员发送了告警信息,使用以下命令进行告警测试
./nagios app:"--" --处填入您新建应用时生成的appkey eventType:trigger eventId:1234 alarmName:"hello"
- 如果返回成功,证明部署已经成功,在此时若是收不到告警消息,判断为 nagios 环境的问题;
-
如果返回失败,请做出如下调整,再次进行测试。
!!! example 1. ca_agent 目录的权限设置成 nagios:nagios 2. 和 ca_agent 目录同级目录下有一个 nagios 脚本,权限设置成 nagios 3. ca_agent 目录下的 bin 目录和 jre 下的 bin 目录权限设置成 777
3.
排查cloudalert.cfg中的command路径 vi /usr/local/nagios/etc/objects/cloudalert.cfg 检查cloudalert.cfg 中 # 'notify-host-by-cloudalert' command definition和 # 'notify-service-by-cloudalert' command definition下的 command_line $USER1$/nagios 把$USER1$/nagios 这部分替换成nagios的绝对路径,比如/usr/local/ngaiox/libexec/nagios
Nagios 与 CA 告警级别映射关系¶
睿象云 | nagios(alerts.status) |
---|---|
致命 | down |
严重 | critical |
警告 | Warning |
提醒 | Unknown |
通知 | -- |
睿象云 | Nagios |
---|---|
事件ID (eventId) | alerts.incident_key |
以上是Nagios 告警设置中集成的步骤。