Sunday, February 14, 2010

NSClient++ external vbs script to check MSSQL jobs

Edit: New fixed version

This script for NSClient++ checks if there is any failed jobs in MS SQL server. The database connection data is stored in a UDL file.

External script alias in nsclient:
check_mssql_jobs=cscript.exe //T:30 //NoLogo scripts\check_mssql_jobs.vbs /MaxWarn:0 /MaxCrit:1 /truncate:900

Nagios command:
define service{
use generic-service
host_name SEHQFVD03
service_description MSSQL jobs
check_command check_nrpe_check_mssql_jobs
high_flap_threshold 35.0
low_flap_threshold 30.0
notification_options w,u,c,r,f
}


Script:
'########################################################
'# Check MSSQL jobs over UDL TEST v0.0.0.4
'# VIKJON0 2010-02-13
'# VIKJON0 2010-03-02 Only look at latest time job was run
'#########################################################

Option Explicit
Dim strConnection, conn, rs, strSQL
Dim objConn
Dim connStr
Dim foundTXT, returnTXT, perfTXT
Dim errLevel, errLevelTXT
Dim MaxWarn, MaxCrit, numberOfRows, truncate
Dim wshArgs

'--Set working directory
Dim WshShell
Set WshShell = WScript.CreateObject("WScript.Shell")
'WScript.Echo WshShell.CurrentDirectory
WshShell.CurrentDirectory = "C:\Program Files\NSClient++\scripts"

'--Get command line arguments-- /MaxWarn:Y /MaxCrit:Z /truncate:x
Set wshArgs = wscript.arguments

if wshArgs.Named.exists("MaxWarn") then
MaxWarn = cint(wshArgs.named.item("MaxWarn"))
else
MaxWarn = 0
end if

if wshArgs.Named.exists("MaxCrit") then
MaxCrit = cint(wshArgs.named.item("MaxCrit"))
else
MaxCrit = 0
end if
if wshArgs.Named.exists("truncate") then
truncate = cint(wshArgs.named.item("truncate"))
else
truncate = 0
end if

'--Run db query-----------------------
strConnection = "File Name=myUDL.udl; "
Set conn = CreateObject("ADODB.Connection")
conn.Open strConnection

Set rs = CreateObject("ADODB.recordset")

'Check only category = 3 for maintence jobs
'------------------------------------------------------

strSQL = "SELECT name, message,category_id FROM msdb.dbo.sysjobs AS J LEFT OUTER JOIN msdb.dbo.sysjobhistory AS H ON J.job_id = H.job_id " &_
"WHERE enabled = 1 AND run_status != 1 AND step_id = 0 " &_
"AND (cast(run_date as varchar) + RIGHT('00' + cast(run_time as varchar),6)) = " &_
"(select max(cast(run_date as varchar) + RIGHT('00' + cast(run_time as varchar),6)) " &_
"FROM msdb.dbo.sysjobhistory AS H2 where H2.job_id = H.job_id AND H2.step_id = 0 " &_
")order by name"

'----------------------------------------------


rs.open strSQL, conn, 3,3
numberOfRows = rs.recordcount

foundTXT = ""
if numberOfRows = 0 then
foundTXT = "check_db OK"
else
rs.MoveFirst
WHILE NOT rs.EOF
foundTXT = foundTXT & rs("name") & "#"
rs.MoveNext
wend
end if


'--Ceck result and build return data
If truncate > 0 then
foundTXT = left(foundTXT,truncate)
end if

errLevel = 0
errLevelTXT = ""
if not MaxCrit = 0 AND numberOfRows >= MaxCrit then
errLevel = 2
errLevelTXT = ", found errors: " & numberOfRows & " > critical"
else
if not MaxWarn = 0 AND numberOfRows >= MaxWarn then
errLevel = 1
errLevelTXT = ", found errors: " & numberOfRows & " > warning"
end if
end if

perfTXT = "|'found errors'=" & numberOfRows &";" & MaxWarn & ";" & MaxCrit & ";"
returnTXT = foundTXT & errLevelTXT & perfTXT
'msgbox returnTXT

'--Close and exit

rs.Close
Set rs = Nothing
conn.Close
Set conn = Nothing

Wscript.StdOut.WriteLine ReturnTXT
WScript.Quit(errLevel)

Friday, February 12, 2010

Substitute nagios as sender for notifications email

If you want to replace the sender in the nagios email notifications add this to the end of the email command:
-a "From: mememe@company.com"

($HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ -a "From: mememe@company.com")

Sunday, February 7, 2010

Monitor MS SQL with ceck_mssql

!I had major problem getting the user authentication to work. As soon as I removed freetds it started to work...

One way is to run queries from Nagios directly to the MSSQL server with this plugin:
http://exchange.nagios.org/directory/Plugins/Databases/SQLServer/check_mssql/details

This is a risk since user / pwd has to be stored on the Nagios server and also sent over network. It is perhaps better to create vbs scripts that reads the database and access it over nsclient++.

1) Install check_mssql
download the script from http://exchange.nagios.org/directory/Plugins/Databases/SQLServer/check_mssql/details
Copy it to libexec
sudo cp check_mssql /usr/local/nagios/libexec
*Change owner to nagios and make it executable

2) Install php support in ubuntu
sudo apt-get install php5 php5-mysql php5-cli libapache2-mod-php5 php5-sybase
(I dont see why php5-mysql should be installed, but installed by mistake and if removed it stops working. Will test without next scratch install)

3) Test from commandline
/usr/local/nagios/libexec/check_mssql -H 192.168.0.100 -U myuser -P mypwd

x) from doc
sec
Hide Sensitive Information With $USERn$ Macros. The CGIs read the main config file and object config file(s), so you don't want to keep any sensitive information (usernames, passwords, etc) in there. If you need to specify a username and/or password in a command definition use a $USERn$ macro to hide it. $USERn$ macros are defined in one or more resource files. The CGIs will not attempt to read the contents of resource files, so you can set more restrictive permissions (600 or 660) on them. See the sample resource.cfg file in the base of the Nagios distribution for an example of how to define $USERn$ macros.

Saturday, February 6, 2010

More Nagios service definitions for windows

Monitor disks with NSClient
define command {
command_name check_nrpe_DriveSpace
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckDriveSize -a MinWarn=$ARG1$ MinCrit=$ARG3$ CheckAll $ARG3$
# ARG3 : FilterType=FIXED FilterType=REMOTE
}

define service{
use generic-service
host_name myhost
service_description Disk space
#Warning,Critical,Filter
check_command check_nrpe_DriveSpace!15%!10%!FilterType=FIXED
high_flap_threshold 35.0
low_flap_threshold 30.0
notification_options w,u,c,r,f
}

Check all fixed disks NSClient command line:
CheckDriveSize MinWarn=15% MinCrit=10% CheckAll FilterType=FIXED
Nagios command line:
sudo /usr/local/nagios/libexec/check_nrpe -H 192.168.100 -c CheckDriveSize -a MinWarn=50% MinCrit=25% CheckAll FilterType=FIXED

Montor services with NSClient

nsclient commandline:
CheckServiceState CheckAll exclude=wampmysqld exclude=ccmsetup exclude=tcsd_win32.exe

define command {
command_name check_nrpe_AutoStartedServices
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckServiceState -a CheckAll $ARG1$
# ARG1 : exclude=ccmsetup exclude=test
}

define service{
use generic-service
host_name SELANSFVD99
service_description Autostarted Services
# ARG1 : exclude=ccmsetup exclude=test
check_command check_nrpe_AutoStartedServices!exclude=SysmonLog
high_flap_threshold 35.0
low_flap_threshold 30.0
notification_options w,u,c,r,f
}

Friday, February 5, 2010

Nagios configuring notifications

0)
mailx must be installed and configured. (see prev post)
The commands also have to be amended:
###
Edit the Nagios email notification commands found in /usr/local/nagios/etc/objects/commands.cfg and change any '/bin/mail' references to '/usr/bin/mail'. Once you do that you'll need to restart Nagios to make the configuration changes live.

sudo /etc/init.d/nagios restart

1)
Contacts and groups are configured in contact.cfg. By default email are sent to admin. Default member is nagiosadmin. Nagios admin email address (your's) are set in the same file.

2)States and flapping
When a problem appear it first goes to "soft state" and after 3 checks (3/3) it change to "hard state" and the notification is sent.

If the hard state change on and off too often the service is determined to be "flapping". To change the flapping thresholds add these lines to the service definition:
high_flap_threshold 35.0
low_flap_threshold 30.0

When the service is flapping no notifications is sent until it stops flapping.
If we want to have a notification when flapping start add this line to service definition:
notification_options w,u,c,r,f
(f for flapping)

Monitor windows server with Nagios - Configure nrpe

To do some more advance application monitoring we need to configure nrpe between nagios and nsclient++.
The first thing we need to monitor is that no files get stuck in import or export folders. (Next problem will be to monitor MS SQL servers, will probably have to find another plug-in for this.

Reference:
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf

1) Config nsclient
uncomment
NRPEListener.dll
allow_arguments=1
use_ssl=1

It is a risk to allow arguments. The alternative I think is to create local commands/ alias on the remote machine.

Test in nsclient command line (start nsclient /test)
CheckFile2 path=c:\test pattern=*.txt MaxCrit=1 filter+written=gt:10m
(checks for files older than 10minutes in c:\test)

2) Install nrpe plug-in in Nagios(local server)
The NRPE addon consists of two pieces:
– The check_nrpe plugin, which resides on the local monitoring machine
– The NRPE daemon, which runs on the remote Linux/Unix machine

To monitor a windows machine we need the plugin. The demaon will be NSClient on windows. If we want to test nrpe on the local machine we should also install the daemon locally.

a) Install check_nrpe plugin
sudo apt-get install libssl-dev

sudo -s

mkdir ~/downloads
cd ~/downloads

http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
tar xzf nrpe-2.12.tar.gz
cd nrpe-2.12

./configure
make all
make install-plugin

b) Test
sudo /usr/local/nagios/libexec/check_nrpe -H 192.168.0.100

It is very hard to get a clear answer about ssl but I am now sure it is enabled by default and it is controlled by the remote machines setting. If ssl is enabled in nsclient and you run a command without ssl you get an error
sudo /usr/local/nagios/libexec/check_nrpe -H 192.168.0.100 -n
(-n => skip ssl)

To make it secure I think it is also necessary to generate a new key or something. (See nrpe README.SSL). I have not tested that yet.

3) Configure Nagios
a) Add some nrpe commands
gedit /usr/local/nagios/etc/objects/commands.cfg

define command {
command_name check_nrpe_CheckOldFiles
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c CheckFile2 -a path=$ARG1$ pattern=$ARG2$ MaxCrit=1 filter+written=gt:$ARG3$ max-dir-depth=0
}
!!!!!!!!!!! I encountered some problems with checkfile2, when too many files are found I get a "buffert too small" error. My biggest problem issue was with a sub folder with many old files. This I finally fixed by adding the max-dir-depth=0 argument. The actual buffert problem I cannot solve for the moment. max-dir-depth set the number of sub folder levels checkfile2 will look into.
Error in nsclient log
2010-02-07 18:05:38: error:include\NSCHelper.cpp:241: Inject buffer to small, increase the value of: string_length.
2010-02-07 18:07:43: error:NSClient++.cpp:1101: UNKNOWN: Return buffer to small to handle this command.

!!!

define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}


b) Add some nrpe services
gedit /usr/local/nagios/etc/objects/windows.cfg

#Check for old files in c:\test
define service{
use generic-service
host_name myHost
service_description Check old files in test
#path, pattern, age
check_command check_nrpe_CheckOldFiles!c:/test!*!10m
}

#Use the generic check_nrpe command.
#(will not work on windows but will show-up in log on client)
define service{
use generic-service
host_name myHost
service_description CPU load
check_command check_nrpe!check_load
}


c) Check config and restart
sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
sudo /etc/init.d/nagios restart

d) test
nagios command line
/usr/local/nagios/libexec/check_nrpe -H 192.168.169.100

/usr/local/nagios/libexec/check_nrpe -H 192.168.169.100 -c CheckFile2 -a path=c:/test pattern=*.txt MaxCrit=1 filter+written=gt:10m

Monitor windows server with Nagios - basic configuration

reference
http://nagios.sourceforge.net/docs/3_0/monitoring-windows.html
http://nsclient.org/nscp/

According to the NSClient wiki NRPE is the recommended way to monitor windows. However, the preconfigured commands in Nagios is using check_nt. I will stick to check_net for the basic stuff, at least for the moment.


1) Install NSClient on remote windows server
http://nagios.sourceforge.net/docs/3_0/monitoring-windows.html
The latest client have a different installation than in the instruction.

a) run setup .exe. Install all components. Skip configuration.
Firewall exception will give an error but never mind that.

b)in NSC.ini
FileLogger.dll
CheckSystem.dll
CheckDisk.dll
NSClientListener.dll
CheckHelpers.dll
(and NRPEListener.dll id nrpe should be used)

add nagios server to allowed_hosts=

c) start service

??? More later. (already setup on test machine)

2) Configure Nagios
http://nagios.sourceforge.net/docs/3_0/monitoring-windows.html
a) Enable the windows config file
sudo nano /usr/local/nagios/etc/nagios.cfg

Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/etc/objects/windows.cfg

b) Define host
sudo gedit /usr/local/nagios/etc/objects/windows.cfg

Find

define host{
use windows-server ;
host_name winserver
alias My Windows Server
address 192.168.1.2
}

Change host_name, alias and address (dns name is ok as address)
(Add new host by copy example...)

c) sudo gedit /usr/local/nagios/etc/objects/windows.cfg
Change the host_name in the sample
define service{
and comment out the ones you do not need
(copy sample to adds service for new host)

d) If NSClient is set up to use password:
sudo nano /usr/local/nagios/etc/objects/commands.cfg

Change the definition of the check_nt command to include the "-s " argument (where PASSWORD is the password you specified on the Windows machine) like this:

define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$
}

e) Check config and restart
sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

sudo /etc/init.d/nagios restart

Now you should have basic monitoring of one win server!

Thursday, February 4, 2010

Howto install Nagios in Ubuntu 9.10 Karmic

0) Install OS
Ubuntu server 9.10
Use expert mode and set static ip (no dhcp in server room)
select to install openssh server

sudo apt-get update
sudo apt-get upgrade

Add core gnome desktop:
sudo apt-get install xorg gnome-core

Add nx server:
mkdir downloads
cd downloads

wget http://64.34.161.181/download/3.4.0/Linux/nxclient_3.4.0-5_i386.deb
wget http://64.34.161.181/download/3.4.0/Linux/nxnode_3.4.0-6_i386.deb
wget http://64.34.161.181/download/3.4.0/Linux/FE/nxserver_3.4.0-8_i386.deb

sudo dpkg -i nxclient_3.4.0-5_i386.deb nxnode_3.4.0-6_i386.deb nxserver_3.4.0-8_i386.deb

add browser
sudo apt-get install epiphany-browser epiphany-extensions


Required Packages

sudo apt-get install apache2
sudo apt-get install libapache2-mod-php5
sudo apt-get install build-essential
sudo apt-get install libgd2-xpm-dev

a) Create Account Information

Become the root user.
sudo -s

Create a new nagios user account and give it a password.
useradd -m -s /bin/bash nagios

passwd nagios
/usr/sbin/usermod -G nagios nagios

Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.

/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd www-data

b) Download Nagios and the Plugins

Create a directory for storing the downloads.

mkdir ~/downloads
cd ~/downloads

Download the source code tarballs of both Nagios and the Nagios plugins (visit http://www.nagios.org/download/ for links to the latest versions).

wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz

wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz

c) Compile and Install Nagios

Extract the Nagios source code tarball.
tar xzf nagios-3.2.0.tar.gz

cd nagios-3.2.0
./configure --with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode

d) Customize Configuration


Edit the /usr/local/nagios/etc/objects/contacts.cfg config file and change the email address associated with the nagiosadmin contact definition to the address you'd like to use for receiving alerts.

sudo nano /usr/local/nagios/etc/objects/contacts.cfg

e) Configure the Web Interface
make install-webconf
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
/etc/init.d/apache2 reload

Note: Consider implementing the ehanced CGI security measures described here to ensure that your web authentication credentials are not compromised.

f) Compile and Install the Nagios Plugins
cd ~/downloads

tar xzf nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14

./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install

g) Start Nagios

Configure Nagios to automatically start when the system boots.
ln -s /etc/init.d/nagios /etc/rcS.d/S99nagios

Verify the sample Nagios configuration files.
sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If there are no errors, start Nagios
sudo /etc/init.d/nagios start

h) Login to the Web Interface (nagiosadmin) and password you specified earlier.
http://localhost/nagios/

i) Other Modifications

If you want to receive email notifications for Nagios alerts, you need to install the mailx (Postfix) package.

sudo apt-get install mailx
Select sattellite and enter relay server (in this case the exchange server)

test in terminal
mail sss@sss.com
(NB: company server only accept relay for local addresses)
Enter to end subject
CTRL-D to end body


Edit the Nagios email notification commands found in /usr/local/nagios/etc/objects/commands.cfg and change any '/bin/mail' references to '/usr/bin/mail'. Once you do that you'll need to restart Nagios to make the configuration changes live.

sudo /etc/init.d/nagios restart

See next few posts about configuring Nagios and NSClient