Installation


Upgrades


DTC-Xen Installation


DTC-Xen / Dom0 Howtos

DTC-Xen / DomU Howtos

FAQ


DTC Howtos


Manuals


Features


Roadmap


Devel docs


Wiki - i18n


Wiki - Meta


Nagios

How to Set up a Nagios monitering system to use on DTC.

This is a page header set up by don@bowenvale.co.nz with notes on how to set up Nagios to monitor your systems.

My trial systems are running Debian Squeeze.


Edit by Florian Heigl:

Ideally for a very quick Nagios install, use the Open Monitoring Distribution ("OMD"). This has an automatic Nagios installer and various things that are much better than alternative methods.

That aside I had promised on the mailing list to write down my mail / dtc monitoring along with the NagVis map. So maybe we can work on that together. I'm sure I still have some things missing in my monitoring, we can compare once don has added his Nagios settings.

check for ~500MB free in /opt or prepare a corresponding symlink

wget http://omdistro.org/attachments/download/118/omd-0.50_0.squeeze_amd64.deb
apt-get install gdebi
gdebi omd-0.50_0.squeeze_amd64.deb

omd create dtcmon
omd start dtcmon

If you want to use the check_mk addon, the next steps would be:

apt-get install check-mk-agent check-mk-agent-logwatch 

on the DTC box & Set up connectivity / xinetd / firewalling to port 6556 on the DTC box (unless you're on it anyway). It's also possible to use SSH transport or stunnel, this is hidden in the docs down below as something called DATASOURCE_PROGRAM. Actually *anything* is possible. But the agent part does never process or accept any input anyway, so let's not get off the track.

Then at Nagios box:

  1. Turn into the user associated with this Nagios "site"
su - dtcmon
  1. Add the DTC server to your check_mk config (which is in etc/check_mk and conf.d,
  2. then run an inventory of the DTC server detecting all *basic* services
echo "all_hosts += [ 'dtc.domain.com|tcp' ]" >> ~/etc/check_mk/conf.d/dtc.mk

cmk -I dtc.domain.com # case must match 

It should print something like this: (Some of the checks will not show since they are relying on extra plugins)

apt.upgrades      1 new checks
cpu.loads         1 new checks
cpu.threads       1 new checks
df                1 new checks
diskstat          1 new checks
kernel            3 new checks
kernel.util       1 new checks
lnx_if            1 new checks
logwatch          3 new checks
mem.used          1 new checks
mounts            1 new checks
mysql_check       9 new checks
mysql_status      16 new checks
postfix_mailq     1 new checks
ps.perf           22 new checks
tcp_conn_stats    1 new checks
uptime            1 new checks

Right now, you can run a manual check of the DTC box, based on what has been found, it doesn't even need a running nagios.

# cmk -v dtc.example.com
Check_mk version 2011.09.30
CPU load             OK - 15min Load 1.00 at 2 CPUs
CPU utilization      OK - user: 35.6%, system: 0.4%, wait: 0.2%
Database apachelogs  OK - No errors found
Database dtc         OK - No errors found
Database floh_phtagr OK - No errors found
Database phtagr      OK - No errors found
Database roundcube   OK - No errors found
Database wordpress   OK - No errors found
Counter wrapped, not handled by check, ignoring this check result: diskstat.SUMMARY.read: Counter initialization
Interface eth0       OK - [2] (up) speed unknown
Kernel Context Switches OK - 94/s in last 27 secs
Kernel Major Page Faults OK - 0/s in last 27 secs
Kernel Process Creations OK - 3/s in last 27 secs
LOG /var/log/kern.log OK - no old or new error messages
LOG /var/log/mail.warn WARN - error messages present!
LOG /var/log/messages OK - no old or new error messages
Memory used          OK - 0.85 GB used (0.52 GB RAM + 0.33 GB SWAP, this is 114.0% of 0.75 GB RAM)
Mount options of /   OK - mount options are data=ordered,errors=remount-ro,relatime,rw
MySQL Status Created_tmp_files OK - count 5 (levels at 0/0)
MySQL Status Innodb_buffer_pool_pages_dirty OK - count 0 (levels at 20/40)
MySQL Status Innodb_buffer_pool_pages_free OK - count 1 (levels at 0/0)
MySQL Status Innodb_buffer_pool_pages_total OK - count 512 (levels at 0/0)
MySQL Status Innodb_data_pending_fsyncs OK - count 0 (levels at 50/100)
MySQL Status Innodb_os_log_pending_fsyncs OK - count 0 (levels at 5/10)
MySQL Status Innodb_row_lock_current_waits OK - count 0 (levels at 5/10)
MySQL Status Innodb_row_lock_time_avg OK - count 0 (levels at 0/0)
MySQL Status Innodb_row_lock_time_max OK - count 0 (levels at 0/0)
MySQL Status Open_files OK - count 46 (levels at 100/200)
MySQL Status Qcache_free_memory OK - count 412136 (levels at 0/0)
MySQL Status Qcache_hits OK - count 40614 (levels at 0/0)
MySQL Status Qcache_not_cached OK - count 7225 (levels at 0/0)
MySQL Status Slow_launch_threads OK - count 0 (levels at 5/10)
MySQL Status Slow_queries OK - count 3 (levels at 5/10)
MySQL Status Table_locks_waited OK - count 0 (levels at 100/500)
Number of threads    OK - 181 threads
Postfix Queue        OK - The mailqueue is empty
TCP Connections      OK - ESTABLISHED: 2, CLOSE_WAIT: 10, TIME_WAIT: 3
Uptime               OK - up since Mon Jun 13 17:53:00 2011 (131d 23:56:28)
apt.upgrades         OK - All packages are up to date.
fs_/                 OK - 19.0% used (5.98 of 31.5 GB), (levels at 80.0/90.0%), trend: +21.46MB / 24 hours
proc_Amavis          OK - 7 processes
proc_Apache2         OK - 7 processes
proc_Bacula FD       OK - 1 processes
proc_ClamAV          OK - 1 processes
proc_Courier Authdaemon OK - 7 processes
proc_Courier IMAP    OK - 5 processes
proc_Courier POP     OK - 4 processes
proc_DTC Stats       OK - 1 processes
proc_DkimProxy       OK - 6 processes
proc_MySQL           OK - 1 processes
proc_Postfix Pickup  OK - 1 processes
proc_Postfix Postmaster OK - 1 processes
proc_Postfix Queue Manager OK - 1 processes
proc_Postfix TLS manager OK - 1 processes
proc_SSH             OK - 1 processes
proc_SpamD           OK - 1 processes
proc_bind            OK - 1 processes
proc_fail2ban        OK - 1 processes
proc_syslog          OK - 1 processes
proc_xinetd          OK - 1 processes
OK - Agent version 1.1.11i1, execution time 5.1 sec|execution_time=5.076

If it has (like above) detected new stuff and things look somewhat OK (like above), then it's time to use the following command. If things are NOT working, it's 99.999% a connection issue.

cmk -R 
Generating Nagios configuration...OK
Validating Nagios configuration...OK
Precompiling host checks...OK
Restarting Nagios...OK

to generate a new nagios configuration & apply it.

After that you should be able to go to http://monitoringbox/dtcmon/(approve sites) and find a startup banner there. Select one of the Nagios guis there, there's a standard nagios gui (haven't used that in a year), "thruk" which is a faster version of the old GUI and Multisite, which is the one coming with check_mk.

Docs are at: http://mathias-kettner.de/check_mk.html(approve sites)

Adding a config file (mail_procs.mk) as an example that does the nagios monitoring for DTC's mail stuff would look like this:

# mailserver
inventory_processes_perf += [
    ( "Courier Authdaemon",     "~.*courier-authlib/authdaemond", ANY_USER, 0, 4, 10, 20),
    ( "Courier IMAP",           "~.*imaplogin /usr/bin/imapd",  ANY_USER, 0, 2, 60, 70),
    ( "Courier POP",            "~.*pop3login /usr/lib/courier/courier/courierpop3d", ANY_USER, 0, 2, 60, 70),
    ( "Amavis",                 "~.*amavisd",                   ANY_USER, 0, 2, 8, 12),
    ( "Postfix Postmaster",     "~.*postfix/master",            ANY_USER, 0, 1, 2, 6),
    ( "Postfix Pickup",         "~.*pickup -l",                 ANY_USER, 0, 1, 2, 6),
    ( "Postfix Queue Manager",  "~.*qmgr -l",                   ANY_USER, 0, 1, 2, 6),
    ( "Postfix Queue show",     "~.*showq -l",                  ANY_USER, 0, 1, 2, 6),
    ( "Postfix TLS manager",    "~.*tlsmgr -l",                 ANY_USER, 0, 1, 2, 6),
    ( "ClamAV",                 "~.*/usr/sbin/clamd",           ANY_USER, 0, 1, 2, 6),
    ( "DkimProxy",              "~.*/usr/sbin/dkimproxy.in",    ANY_USER, 0, 4, 10, 20),
    ( "DkimProxy",              "~.*/usr/sbin/dkimproxy.out",   ANY_USER, 0, 4, 10, 20),
    ( "SpamD",                  "~.*sbin/spamd",                ANY_USER, 0, 1, 60, 70),
]

Replacing ANY_USER with a certain username will keep that user context checked, and the numbers to the end are the process numbers to expect. (see cmk -M ps for that, it's a bit tricky)

The other parts for dtc:

inventory_processes_perf += [
    ( "MySQL", "~.*sbin/mysqld", ANY_USER, 0, 1, 5, 8),
    ( "MySQL", "~.*libexec/mysqld", ANY_USER, 0, 1, 5, 8),
    ( "bind",           "~.*sbin/named", "bind", 0, 1, 2, 4),
    ( "syslog",         "~.*syslogd", ANY_USER, 0, 1, 2, 6),
    ( "syslog",         "~.*syslog-ng", ANY_USER, 0, 1, 5, 10),
    ( "xinetd",         "~.*xinetd", ANY_USER, 0, 1, 5, 10),
    ( "Apache2", "~.*sbin/apache2", ANY_USER, 1, 4, 30, 50),
    ( "Apache2", "~.*sbin/httpd", ANY_USER, 1, 4, 30, 50),
    ( "DTC Stats", "~.*share/dtc/admin/dtc-stats-daemon.php", ANY_USER, 0, 1, 2, 2),
    ( "Nagios" , "~.*bin/nagios" , ANY_USER, 0, 1, 60, 70),
    ( "DTC-Xen Daemon", "~.*usr/sbin/dtc-soap-server", ANY_USER, 0, 1, 1, 2),
]

Here's two screenshots, and there I'll leave it before it all sounds advertise-ish. But I promised to write the docs. :>

http://www.wartungsfenster.de/resources/cmk_dtc.JPG(approve sites) http://www.wartungsfenster.de/resources/cmk_dtc2.JPG(approve sites)

I noticed I broke my NagVis map. I'll add that once it's working again :(

TODOs I thought of:

  • FS alert levels + Trend alerts
  • FS Quota reporting (that's not available atm)
  • Adding the MySQL, apt update check, SMART, lmsensors and Xen plugins
  • Set averaged bandwidth alerts (i.e. 60 min average traffic exceeds 75% of wire rate)
  • logwatch.cfg rules specially for DTC / Mail
  • adding monitoring for the DTC cron job log

Please add any obvious stuff I'm missing. i.e. defining contacts and setting up notifciations (I hate that part)

Editing this page means accepting its license.

Page last modified on February 10, 2012, at 10:17 PM EST