17 Mayıs 2013 Cuma

Nagios için HP ILO check

Herkese Merhaba;

Uzun zaman sonra birşeyler yazmaya karar verdim. Bu sefer diğer makalelerden farklı bir konuya değineceğim. Nagios uygulaması için HP ILO yönetim arayüzüne sorgular gönderip donanım aygıtlarını sorgulayan ve herhangi bir arıza durumunda önceden belirlenmiş uyarılar oluşturan bir script/program geliştirdim. Program perl ile yazıldı. HP sunuculardaki ILO portları ile direk iletişim kurduğundan işletim sistemi bağımsız ve bağlantısız çalışmaktadır. Bu nedenle sunucu shutdown konumunda iken bile donanım üzerindeki kontrolleri gerçekleştirebilirsiniz. Öncelikle Nagios hakkında biraz bilgi paylaşmak istiyorum. Nagios GPL lisansı ile dağıtımı yapılan Open Source Infrastructure Monitoring yazılımıdır. Temel olarak agent server mantığı ile çalışır ve monitor edilen cihazlarda yüklü olan agent'lardan gelen bilgileri değerlendirerek gerekli durumlarda uyarılar oluşturur. Kurulumu, implementasyonu ve yönetimi hayli kolay olmakla bu konuya başka bir makalede değineceğiz. Yazdığım program temel olarak Nagios Agent'ı olarak çalışmaktadır. Ancak aslında bir program olduğu için komut saturundan da çalıştırılabilir. Komut dizilimi aşağıdaki gibidir.

check_hpilo.pl SERVERNAME ILO_IP_NUMBER USERNAME PASSWORD (DISK|POWER|FAN|TEMP_SENSOR) DEVICEORDER|ALL

Burada server name olarak kontrol edilen HP sunucunun adı belirtilmelidir. Bu isim yalnızca uyarı mesajlarında kullanılacağından gerçek hostname kullanılması zorunlu değildir. ILO_IP_NUMBER alanı ise bağlantı kurulacak ILO IP'sidir. USERNAME ve PASSWORD ise HP ilo üzerinde tanımlı ve geçerli kullanıcı bilgileri içindir. Bu kullanıcı için standart kullanıcılar kullanılabileceği gibi yalnızca okuma yetkisi olan bir kullanıcı da oluşturabilirsiniz. Bir sonraki alan ise kontol edilecek donanım bileşenini tanımlamaktadır. Bu bileşenler DISK, POWER, FAN ve TEMP_SENSOR olabilir. Aynı anda yanlızca tek tip donanım bileşenine ait bilgiler sorgulanabilir. Yani aynı komutta hem disk hemde powersupply için sorgulama yapmaz. DEVICEORDER ise hangi aygıtların sorgulanacağı bilgisini içerir. Örneğin sunucuda 8 adet disk yuvası var ve bunlardan 4 tanesinde disk bulunuyorsa 1 2 3 4 olarak belirtilmelidir. Eğer  bütün aygıtlar kontrol edilecekse ALL oalrak belirtilmeldir. Aşağıda komut yazım örnekleri ve buna ilişkin ekran çıktılarını görebilirsiniz.

./check_hpilo.pl DBSERVER 192.168.1.1 nagios nagiospass TEMP_SENSOR ALL

 Everything is OK

Eğer hiçbir sorun yoksa program yukarıdaki sonucu verir ve 0 değerini döndürür.

./check_hpilo.pl DBSERVER 192.168.1.1 nagios nagiospass POWER 1 2 3 4 5 6 7 8 9

Wrong power order for 9. Pelase check your power number.

Eğer var olmayan bir aygıt numarası girilirse program tarafından yukarıdaki uyarı verilir. ve UNKNOWN değeri döndürülür.


./check_hpilo.pl DBSERVER 192.168.1.1 nagios nagiospass  DISK 1 2 3 4 5 6

 DBSERVER sunucusundaki 5 6 numarali DISK(ler/lar) uyari vermistir. Lutfen kontrol edin!

eğer sorgulanan aygıtlar için hata alınmışsa yukarıdakine benzer bir hata mesajı oluşturulur ve CRITICAL değeri döndürülür. Bu durumda nagios tarafından Critical uyarı oluşturulur ve gerekli aksiyonlar alınır.

Bu eklentinin Nagios tarafında tanımlanması ise aşağıdaki gibi gerçekleştirilir. Öncelikle commands.cfg dosyasına aşağıdaki satır eklenmeldir.

define command{
        command_name check_hpilo
        command_line $USER1$/check_hpilo.pl $HOSTNAME$ $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$ $ARG8$ $ARG9$ $ARG10$ $ARG11$ $ARG12$ $ARG13$ $ARG14$ $ARG15$ $ARG16$
}

Daha sonra kontrol edilecek host'a ait konfigürasyon dosyasında aşağıdaki gibi tanımlama yapılmalıdır.

define service{
        use                             generic-service
        host_name                 DBSRV
        service_description             DBSRV ILO POWER CHECK
        check_command                   check_hpilo!"nagios"!"nagioapass"!"POWER"!"ALL"
        contact_groups                  linux,linux_sms
        service_groups                  hpilo_services
        max_check_attempts              1
        check_interval                  30
        retry_interval                  15
        }




daha sonra nagios servisi restart edilerek kullanıma başlanabilir.

Kolay gelsin.


Code:


#!/usr/bin/perl

## hardware query via ILO
## written by Cengizhan CANLI

use XML::Simple;
use Sys::Hostname;
#use IO::Socket::SSL qw(debug2);
use IO::Socket::SSL;
use Getopt::Long;
#use HTTP::Request::Common;
#use Term::ReadKey;   

$return_text="";

$STATE_OK=0;
$STATE_WARNING=1;
$STATE_CRITICAL=2;
$STATE_UNKNOWN=3;
$STATE_DEPENDENT=4;
my $debug=1;
my $socket,$sendsize,$ln="",$xml_response_ref="",@xml_response="";



my $time = localtime time;

my $localhost = hostname() || 'localhost';



sub print_help
{
            print "The usage is check_hpilo.pl SERVERNAME ILO_IP_NUMBER USERNAME PASSWORD (DISK|POWER|FAN|TEMP_SENSOR) DEVICEORDER|ALL\n";
        print "Example: check_hpilo.pl DBSERVER 192.168.1.2 nagios 12ab34cd DISK 1 2 3 4\n";
        exit 0;
}


sub if_debug()
{
        if ($debug)
        {
                open (DEBUG_FILE, ">>/tmp/$ARGV[0].$ARGV[4].debug");
                print DEBUG_FILE "";
                print DEBUG_FILE "-----------" . $time . "-------------------\n";
                print DEBUG_FILE "-----------DEBUG START-------------------\n";

        }
}





sub generate_xml_str #create xml string to send ILO with SSL
{
        my $xml_str='<RIBCL VERSION="2.21"><LOGIN USER_LOGIN="'. $username.'" ' . ' PASSWORD='.'"'. $password .'"><SERVER_INFO MODE="read"><GET_EMBEDDED_HEALTH/></SERVER_INFO></LOGIN></RIBCL>';
        return $xml_str;

}


sub send_to_client
{
       print $socket $_[1];
}

sub send_or_calculate
{
  $sendsize = 0;
  $sendsize = length($ln);
 
 
  if ($_[0]==1)
  {
        print $socket $ln;
  }    
}

sub read_chunked_reply    # used for iLO 3 and iLO 4 only
{
  my $hide=1;
  my $isSizeOfChunk=1;
  my $chunkSize;
  my $cache = 1;

  $response = "";
  $RIBCLbusy = 0;

  while(1) {
    $ln=<$socket>;

    if (length($ln) == 0)
    {
        last;
    }
    if ($hide)
    {
        # Skip HTTP response headers and "\r\n"s preceding chunked responses
        if (length($ln) <= 2)
        {
            $hide=0;
        }
    }
    else {
        # Process chunked responses
        if ($isSizeOfChunk) {
            chomp($ln);
            $ln =~ s/\r|\n//g;           # clean $ln up
            $chunkSize=hex($ln);
            $isSizeOfChunk=0;
            next;
        }
        if ($chunkSize == 0) {           #End of responses; Empty responses
            last;
        }
        if ($chunkSize == length($ln)) {
            $isSizeOfChunk=1;
            $hide=1;                     #End of chunk; Skip next line
        }
        else {
            if ($chunkSize > length($ln)) {
                $chunkSize -= length($ln);
                #$ln = substr($ln,0,length($ln));
            }
            else {
                $isSizeOfChunk=1;        #Next line is size of next chunk
                $ln = substr($ln,0,$chunkSize);
            }
        }

        #now, print or cache the response
        if ($cache && $ln =~ m/MESSAGE/i) {
            if ($ln =~ m/RIBCL parser is busy/i) {
                $RIBCLbusy = 1;
            }
            else {
                $cache = 0;
            }
        }
        if ($cache) {
            # This isn't really required, but it makes the output look nicer
            $ln =~ s/<\/RIBCL>/<\/RIBCL>\n/g;
        }
        else {
            # This isn't really required, but it makes the output look nicer
            $ln =~ s/<\/RIBCL>/<\/RIBCL>\n/g;
            $response=$response.$ln ;
        }
    }

  }
  if ($socket->error()) {
     print "Error: connection error " . $socket->error() . "\n";
     print DEBUG_FILE "Error: connection error " . $socket->error() . "\n" if $debug;
  }
@xml_response=split(/\n/,$response);
        print DEBUG_FILE "--------------". $time . "-------------------\n" if $debug;
        print DEBUG_FILE "--------------XML FILE -------------------\n" if $debug;
        print DEBUG_FILE $response ."\n"  if $debug;


return(\@xml_response);
}

sub send_xml_query()
{
        my $boundary;
        my ($start_time, $end_time, $xml_str);
        $RIBCLbusy = 0;
        $retry = 0;
        $start_time = 0;
        $end_time = 0;
        $response = "";
        $ln = generate_xml_str;
        send_or_calculate(0);                                    # Calculate $sendsize
        while (!$retry || $RIBCLbusy) {
       if ($retry == 1) { # 1st retry
                        my ($sec,$min) = localtime(time);
                        $start_time = $min * 60 + $sec;
                }
                if ($retry > 1) {
                        my ($sec,$min) = localtime(time);
                        $end_time = $min * 60 + $sec;
                        if ($end_time-$start_time > RETRY_TIMEOUT) {     # retry upto RETRY_TIMEOUT seconds
                                print "\n----- Retry timed out. Script sent unsuccessfully.\n" if ($verbose);
                                last;
                        }
                }
                if ($retry) {
                        #print "\n----- iLO is busy. Resending the script... (Attempt #$retry)\n" if ($verbose);
                        sleep(RETRY_DELAY);  # delay RETRY_DELAY seconds
                }


    # Send the HTTP header and begin processing the file
                send_to_client(0, "POST /ribcl HTTP/1.1\r\n");
                send_to_client(0, "HOST: $localhost\r\n");           # Mandatory for http 1.1
                send_to_client(0, "User-Agent: locfg-Perl-script/".VERSION."\r\n");
                send_to_client(0, "TE: chunked\r\n");
                send_to_client(0, "Connection: Close\r\n");          # Required
                send_to_client(0, "Content-length: $sendsize\r\n");  # Mandatory for http 1.1
                send_to_client(0, "\r\n");
                send_or_calculate(1);  #Send it to iLO
       
    # Ok, now read the responses from iLO
                $xml_response_ref=read_chunked_reply();
                $retry++;
        } # end while
        make_xml_data($xml_response_ref);
        return;
}


sub open_https_connection()
{
        $ilo_ip .= ":443" unless ($ilo_ip =~ m/:/);
        $socket = IO::Socket::SSL->new(PeerAddr => $ilo_ip, SSL_verify_mode => 0x00) || die $ConnectionErrorMessage;
        return;
}
sub close_https_connection()
{
        $socket->close();
        return;
}

sub make_xml_data($)
{
        my $i=0,$j=2,$line_count=0,$switch=0;
        my $xml_response_ref=$_[0];
        @xml_response=@$xml_response_ref;
        $line_count = $#xml_response;
        @ln=0;$ln=0;
        $xml_data[0]='<?xml version="1.0"?>'."\n";
        $xml_data[1]='<RIBCL VERSION="2.22">'."\n";

        while ($i <= $line_count)
        {
                $ln=$xml_response[$i];
                $search_string='<GET_EMBEDDED_HEALTH_DATA>';
                if ($ln =~ /$search_string/)
                {
                        $switch=1;
                }
                if ($switch==1) {
                        $xml_data[$j]=$ln . "\n";
                        $j++;
                }
                $search_string='</GET_EMBEDDED_HEALTH_DATA>';
                if ($ln =~ /$search_string/)
                {
                        $switch=0;
                        $i=$line_count;
                }
                $i++;
        }

        $xml_data[$j+1]='</RIBCL>'. "\n";
        print DEBUG_FILE "----------------- Edited XML FILE START--------------\n" if ($debug);
        print DEBUG_FILE "------------------- ". $time . " --------------\n" if ($debug);
        print DEBUG_FILE @xml_data if ($debug);
        print DEBUG_FILE "----------------- Edited XML FILE END --------------\n" if ($debug);
        print DEBUG_FILE @xml_data if ($debug);


        chomp(@xml_data);
        $i=0;
        while ($i <= $#xml_data)
        {
                $xml_string.=$xml_data[$i];
                $i++;
        }

        $xml= new XML::Simple;
        $xml_data = $xml->XMLin($xml_string) or die "XML Data Error";

}

sub get_disk_status($)
{
                my (@result_sub) = 0;
                my @control_order_sub=0, $control_order_ref=0, $disk_number=0,$disk_number_mod=0, $backplane_count, $disk_count, $backplane_number,$i=0,$status,$all_disks;
                $control_order_ref = $_[0];
                @control_order_sub = @$control_order_ref;
                open_https_connection();
                send_xml_query();
                close_https_connection();

                 ## how many backplane in server
                $backplane_count = @{$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{DRIVES}->{BACKPLANE}};
                ##how many disks in each backplane
                $disk_count = @{$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{DRIVES}->{BACKPLANE}->[0]->{STATUS}};
                if ($control_order_sub[0] eq "ALL")
                {
                        $all_disks = $backplane_count * $disk_count ;
                        $i=0;
                        while($i < $all_disks)
                        {
                                $control_order_sub[$i]=$i+1;
                                $i++;
                        }
                }
                $i=0;
                foreach $disk_number (@control_order_sub)
                {
                                $backplane_number=int(($disk_number-1)/$disk_count);
                                $disk_number_mod = ($disk_number - 1) % $disk_count;
                                if ($backplane_number < $backplane_count)
                                {
                                        $status = $xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{DRIVES}->{BACKPLANE}->[$backplane_number]->{STATUS}->[$disk_number_mod]->{VALUE};

                                        print DEBUG_FILE $time . "  ". "$localhost" . " Read Disk Status : " . "$status" ." \n" if $debug;
                                        if ($status eq "Ok")
                                        {
                                                $result_sub[$i] = 0;
                                        }
                                        else
                                        {
                                                $result_sub[$i] = 2;
                                        }
                                        $i++;
                                }
                                else
                                {
                                        print "Wrong disk order for $disk_number. Pelase check your disk number.\n";
                                        print_help;
                                        exit 1;
                                }
                }
        return(\@result_sub);
}

sub get_power_status($)
{
      
        my @control_order_sub=0,$control_order_ref=0;
        my $power_number=0, $power_count,$i=0,$status;
        my @result_sub = 0;
        $control_order_ref = @_[0];
        @control_order_sub = @$control_order_ref;
        open_https_connection();
        send_xml_query();
        close_https_connection();
        my $i = 0;
                $power_count = @{$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{POWER_SUPPLIES}->{SUPPLY}};
                if ($control_order_sub[0] eq "ALL") #means all device will be check
                {
                        $i=0;
                        while($i < $power_count)
                        {
                                $control_order_sub[$i]=$i+1;
                                $i++;
                        }
                }
                $i=0;
                foreach $power_number (@control_order_sub)
                {
                        if ($power_number <= $power_count)
                        {
                                $status=$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{POWER_SUPPLIES}->{SUPPLY}->[$power_number-1]->{STATUS}->{VALUE};
                                print DEBUG_FILE $time ."  ". "$localhost" . " Read Power  Status : " . "$status" ." \n" if $debug;
                                if ($status eq "OK")
                                {
                                        $result_sub[$i] = 0;
                                }
                                else
                                {
                                        $result_sub[$i] = 2;
                                }
                                $i++;
                        }
                        else
                        {
                                print "Wrong power order for $power_number. Pelase check your power number.\n";
                                print_help;
                                exit 1;
                        }
                }

            return(\@result_sub);
}


sub get_fan_status($)
{
        my $i = 0;
        my  @result_sub = 0,$control_order_ref=0, @control_order_sub=0;
        my $fan_number=o, $fan_count=0,$i=0,$status;
        $control_order_ref = @_[0];
        @control_order_sub = @$control_order_ref;
        open_https_connection();
        send_xml_query();
        close_https_connection();
        $fan_count=@{$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{FANS}->{FAN}};
        if ($control_order_sub[0] eq "ALL") #means all device will be check
        {
                $i=0;
                while($i < $fan_count)
                {
                        $control_order_sub[$i]=$i+1;
                        $i++;
                }
        }
        $i=0;
                foreach $fan_number (@control_order_sub)
                {
                        if ($fan_number <= $fan_count)
                        {

                                $status=$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{FANS}->{FAN}->[$fan_number-1]->{STATUS}->{VALUE};
                                print DEBUG_FILE $time ."  ". "$localhost" . "  Read Fan  Status : " . "$status" ." \n" if $debug;
                                if ($status eq "OK")
                                        {
                                                $result_sub[$i] = 0;
                                        }
                                else
                                        {
                                                $result_sub[$i] = 2;
                                        }
                                $i++;
                        }
                        else
                        {
                                print "Wrong fan order for $fan_number. Pelase check your fan number.\n";
                                print_help;
                                exit 1;
                        }
                }

            return(\@result_sub);
}

sub get_temperature_status($)
{
      
        my $temperature_number=o, $temperature_count,$i=0,$status,$current,$warning,$critical;
        my @result_sub = 0, @control_order_sub = 0, $control_order_ref = 0,$i = 0;
        $control_order_ref = @_[0];
        @control_order_sub = @$control_order_ref;
        open_https_connection();
        send_xml_query();
        close_https_connection();

        $temperature_count=@{$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{TEMPERATURE}->{TEMP}};
        if ($control_order_sub[0] eq "ALL") #means all device will be check
        {
                $i=0;
                while($i < $temperature_count)
                {
                        $control_order_sub[$i]=$i+1;
                        $i++;
                }
        }
        $i=0;
        foreach $temperature_number (@control_order_sub)
        {
                if ($temperature_number <= $temperature_count)
                {
                        $current=$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{TEMPERATURE}->{TEMP}->[$temperature_number-1]->{CURRENTREADING}->{VALUE};
                        $warning=$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{TEMPERATURE}->{TEMP}->[$temperature_number-1]->{CAUTION}->{VALUE};
                        $critical=$xml_data->{'GET_EMBEDDED_HEALTH_DATA'}->{TEMPERATURE}->{TEMP}->[$temperature_number-1]->{CRITICAL}->{VALUE};

                        print DEBUG_FILE $time ."  " . "$localhost" . "  Read Temperature  Status : " ."Current: " .  "$current" . " Warning: ". " $warning" . "Critical: " . "$critical". " \n" if $debug;


        if ($current >= $critical )
                        {
                                $result_sub[$i] = 2;
                                }
                        elsif ( $current >= $warning )
                        {
                                $result_sub[$i] = 1;
                        }
                        else
                        {
                                $result_sub[$i] = 0;
                        }
                        $i++;
                        }
                else
                {
                        print "Wrong temperature sensor order for $temperature_number. Pelase check your temperature sensor number.\n";
                        print_help;
                        exit 1;
                }
        }

  return(\@result_sub);
}

sub create_nagios_alarm($$$$)
{
        my (@result_str) = 0;
        my ($device_status_ref, $control_order_ref, $device_name_str, $server_name_str) = @_;
        my (@device_status_sub) = @$device_status_ref;
        my (@control_order_sub) = @$control_order_ref;
        my ($n) = 0;
                if ($control_order_sub[0] eq "ALL") #means all device will be check
                {
                        $i=0;
                        while($i <= $#device_status_sub)
                        {
                                $control_order_sub[$i]=$i+1;
                                $i++;
                        }
                }

        $result_str="$server_name_str" . " sunucusundaki " ;
        $EXIT_STATE = $STATE_OK;
        chomp(@device_status_sub);
        while ($n <= $#device_status_sub)
        {
                if ($device_status_sub[$n] ==  1)

                {
                        $result_str= "$result_str" . "$control_order_sub[$n]" . " ";
                        $EXIT_STATE=$STATE_WARNING;
                }
               if ($device_status_sub[$n] ==  2)

                {
                        $result_str= "$result_str" . "$control_order_sub[$n]" . " ";
                        $EXIT_STATE=$STATE_CRITICAL;
                }              
                $n++;
        }
         if ($EXIT_STATE == $STATE_OK)
        {
                $result_str = "Everything is OK";
        } else
        {

                $result_str = "$result_str" .  "numarali " . "$device_name_str" . "(ler/lar)" . " uyari vermistir. Lutfen kontrol edin!";
        }
        return ($result_str);


}

###Main#####

my $socket;
my ($numArgs) = 0;
my $numArgs=$#ARGV ;

if_debug();


#print "Arguments: @ARGV \n";

if (($ARGV[0] eq "-h") || ($ARGV[0] eq "--help"))
{
        print_help;
        exit 2;
}



if ($numArgs < 5)
{
        print "wrong argument number:". ($numArgs + 1) ." \n";
        print_help;
        exit 2 ;
}


$server_name = $ARGV[0];
$ilo_ip = $ARGV[1];
$username=$ARGV[2];
$password=$ARGV[3];
$device_name = $ARGV[4];
$n=0;
foreach $argnum (5 .. $numArgs)
{
        $control_order[$n] = $ARGV[$argnum];
        $n++;
}


if ($device_name eq "DISK")
{
        $result = get_disk_status(\@control_order);
        @result_array=@$result;
        $return=create_nagios_alarm(\@result_array, \@control_order, $device_name, $server_name);
        print" $return \n";
        exit  $EXIT_STATE;
}
elsif ($device_name eq "POWER")
{

        $result = get_power_status(\@control_order);
        @result_array=@$result;
        $return=create_nagios_alarm(\@result_array, \@control_order, $device_name, $server_name);
        print" $return \n";
        exit  $EXIT_STATE;

}
elsif ($device_name eq "FAN")
{

        $result = get_fan_status(\@control_order);
        @result_array=@$result;
        $return=create_nagios_alarm(\@result_array, \@control_order, $device_name, $server_name);
        print" $return \n";
         exit  $EXIT_STATE;

}
elsif ($device_name eq "TEMP_SENSOR")
{

        $result = get_temperature_status(\@control_order);
        @result_array=@$result;
        $return=create_nagios_alarm(\@result_array, \@control_order, $device_name, $server_name);
        print" $return \n";
         exit  $EXIT_STATE;

}
else
{
        print "Wrong device name.\n";
        print_help;
        exit 2;
          
}

2 yorum:

  1. Merhaba,

    Paylasim icin tesekkurler. HP DL380 G8 - iLO4 icin asagidaki hatayi aliyorum.

    Modification of non-creatable array value attempted, subscript -1 at ./check_hpilo.pl line 460.

    Gonderdigim komut : TEMP_SENSOR ALL ve POWER ALL

    YanıtlaSil
  2. Merhaba;

    Programı debug=1 olarak çalıştırıp /tmp/ altında oluaşn .debug dosyasını bana atabilirseniz yardımcı olmaya çalışırım. Şu an elimde G8 sunucu olmadığından test etme şansım malesef yok.

    YanıtlaSil