Quick fixes for NAGIOS plug-in check_dell_warranty

Erinn Looney-Triggs wrote a nice plug-in for NAGIOS to check the warranty status of your Dell boxes with Dell.  However, some of the systems I support are running a seriously old version of Dell OMSA.  I sent him a quick patch back in May to handle old OMSA versions:

[root@x plugins]# diff -p check_dell_warranty
check_dell_warranty_test
*** check_dell_warranty       Thu Apr 15 14:27:43 2010
--- check_dell_warranty_test  Thu May  6 11:45:16 2010
*************** def extract_serial_number_snmp( hostname
*** 314,319 ****
--- 314,320 ----
                                  hostname,
community_string).split('\n')
>           for encl_id in snmp_out:
+             if not encl_id: continue
              #Get enclosure type.
              #   1: Internal
              #   2: DellTM PowerVaultTM 200S (PowerVault 201S)

This morning it looks like Dell has a few back-end servers that are serving up an older version of the warranty status table.  I put together a quick patch to handle either warranty status table returned by Dell:

[root@x plugins]# diff -p check_dell_warranty- check_dell_warranty

*** check_dell_warranty-    Thu May  6 11:47:42 2010
--- check_dell_warranty    Sat Jul 17 13:24:29 2010
*************** def get_warranty(serial_numbers):
*** 438,444 ****

 return result_list

! def parse_exit(result_list, short_output=False):
 '''This parses the results from the get_warranty() function and outputs
 the appropriate information.
 '''
--- 437,443 ----

 return result_list

! def parse_exit(result_list, debug, short_output=False):
 '''This parses the results from the get_warranty() function and outputs
 the appropriate information.
 '''
*************** def parse_exit(result_list, short_output
*** 505,515 ****
 warranties = parse_table(match)

 #Remove the header lines.
!             warranties.pop(0)
!             
 for entry in warranties:
!                 (description, provider, start_date, end_date,
!                  days_left) = entry[0:5]

 #Convert the dates to international standard
 start_date = str(i8n_date(start_date))
--- 504,524 ----
 warranties = parse_table(match)

 #Remove the header lines.
!             header = warranties.pop(0)
!             notice_present = (header[2].find("Warranty") >= 0)
!         if debug:
!         print "header is %s", header
!         print "notice_present is ", notice_present
!
 for entry in warranties:
!         if debug:
!             print "entry is %s", entry
!         if notice_present:
!                      (description, provider, notice, start_date, end_date,
!                      days_left) = entry[0:6]
!         else:
!                      (description, provider, start_date, end_date,
!                      days_left) = entry[0:5]

 #Convert the dates to international standard
 start_date = str(i8n_date(start_date))
*************** remaining. These values can be adjusted
*** 633,638 ****
--- 642,652 ----
 help=('Number of days under which to return a warning '
 '(Default: %default)'), type='int', metavar='<ARG>' )

+     parser.add_option('-d', '--debug', dest='debug', action="store_true",
+                       default=False,
+                       help=('Print debugging information '
+                       '(Default: %default)'), metavar='<ARG>' )
+     
 (options, args) = parser.parse_args()

 signal.signal(signal.SIGALRM, sigalarm_handler)
*************** remaining. These values can be adjusted
*** 651,654 ****

 signal.alarm(0)

!     parse_exit(RESULT, options.short_output)
--- 665,668 ----

 signal.alarm(0)

!     parse_exit(RESULT, options.debug, options.short_output)

I should really use a name mapping hash and should probably use a better debug logging method, but this was a quick fix for a Saturday morning.

The United States is not Mexico

The Problem Report

A programmer I support sent me the following last October:

Hey Jim,

I just found a difference between our windows and unix boxes regarding time that is standing in the way of me finishing […] I’ve been working on and wanted your help.

[Description of test code and environment]

This is Test.java taking a milliseconds since the epoch time: 1256975836665 and turning that into a date.  It should give 2009-10-31 00:57:16 but is off by one hour and does if you run the same from a windows box on our network.

I don’t know how to do the same from the command line to see if it’s java specific or machine specific, but was wondering if you could look at those machines and see if perhaps timezone or daylight saving is set incorrectly?

The Investigation

Honestly, it had been a long time since I looked at the time zone configuration on a Linux box.  I had helped build out CNET‘s Kickstart infrastructure in 2002 and hadn’t really looked at system settings like time zone since.  Upgrading the zoneinfo database to deal with changes was just an RPM upgrade and never required much thought.  So, I had to refresh my knowledge.  Further, all of the systems I support for 42Lines and our clients I inherited in mid-January, so I didn’t know how they were set up.

The key file here is /etc/localtime.  What I found surprising was that instead of a symlink to the correct file in /usr/share/zoneinfo/, it was just a file.  This is a problem as the zoneinfo data for a zone is a compiled binary format and not human readable.  So, how to determine what zone it was?  In cases like this I turn to my trusty friend md5sum.

# md5sum /etc/localtime
f3e91959e492f62136812f8f556713a3  /etc/localtime
# find /usr/share/zoneinfo/America -type f | xargs md5sum | grep f3e91959e492f62136812f8f556713a3
f3e91959e492f62136812f8f556713a3  /usr/share/zoneinfo/America/Ensenada
f3e91959e492f62136812f8f556713a3  /usr/share/zoneinfo/America/Tijuana

Okay, so the timezone is set to Tijuana.  Tijuana is in the Pacific Timezone just like San Francisco, right?  Sometimes.

I  had to coordinate a massive update at CNET in 2007 thanks to Congress changing the DST rules for the United States.  So I suspected that that the problem might be due to the new rules starting in 2007.  I wasn’t working with these systems during the last DST transition, so it is possible that they just had an old zoneinfo database that didn’t include the recent rule changes.

I checked the version of tzdata (rpm -qf /usr/share/zoneinfo/America/Tijuana) and the version post-dated that 2007 changes (tzdata-2009u-1).  So, perhaps Mexico didn’t implement the same changes to DST that the United States and Canada did?  Again, the zoneinfo database is binary and somewhat to my surprise the RPMs do not install the source.  So, I hunted down the source and took a look.

It turns out the Mexico has had a pretty turbulent relationship with DST, far more so than the US.  The comments in the zoneinfo source file reflect not only the difficulty in following such things, but the dedication of a few select people to try and translate political intent and law into something logical (a rough task regardless of country.)  Here’s what Mexico was using starting in 2002:

Rule    Mexico  2002    max     -       Apr     Sun>=1  2:00    1:00    D
Rule    Mexico  2002    max     -       Oct     lastSun 2:00    0       S

So as soon as the US changed in 2007 to:

Rule    US      2007    max     -       Mar     Sun>=8  2:00    1:00    D
Rule    US      2007    max     -       Nov     Sun>=1  2:00    0       S

Anyone in the United States using Tijuana as their zone had the wrong time during DST.  It turns out that the border cities in Mexico will change to US rules in 2010.

My response to the programmer:

This is because the TZ on that machine is set to America/Tijuana instead of US/Pacific and the US (but not Mexico) changed the DST rules starting for 2007.  10/30 falls in the window between the old DST ending date (last Sunday in Oct) and the current ending date (first Sun in Nov) for the US.  Thanks, Congress.

Another programmer chimed in with the witty comment:

Those machines are not allowed to be in Tijuana according to our health care plan.  If their drives crash, they won’t be covered.

The Solution

Ensure your time zone is set correctly.  We changed to US/Pacific.  I also ensured that /etc/localtime was a symlink to the appropriate zoneinfo file to make things a bit more obvious.

ESFW

Despite my intention that this blog be chock full of meaty technical solutions for my legions of readers, I’m going to break my content selection rule on the very first post so I can introduce what is hopefully a new term to the world: ESFW

I left CNET after seven years in mid-January.  My new job with 42Lines is full time work at home.  Everyone at 42Lines works from home.  Well, they can work from anywhere; I assume everyone is usually working from home.  For all I know they are in a cafe or on a boat.  One co-worker was signed in one day from an airplane.  This type of company was very difficult to pull off fifteen years ago when I started my first company.  I remember longing for someone to build what we now call VoIP phones so I could cover our support line from home, rather than sitting in the office by myself on a beautiful mid-summer Saturday.  Now with IM, video chat, screen sharing, wiki, JIRA, etc. we can run an organization in a fully distributed fashion.  The key innovation, however, are outsourced HR companies.  When I started my first company, there were consultants that would help you with things like health plans and 401K.  But payroll was the only task you could entirely outsource.  Now 42Lines has a company that handles all employee HR functions which allows management to focus on hiring the right employees regardless of where they might live.  Want to hire an employee in Idaho and know nothing of that state’s insurance, taxation, and payroll laws?  No problem, it’s just a flat fee to add Idaho to your HR account.

Where was I?  Oh, yes.  People send me links all day and sometimes they are tagged “NSFW”.  I just chuckle at this outdated phrase as now Everything is Safe For Work in my home office.  I do suppose that one day a NSFW tag will save me some embarassment when I finally get around to working from that hypothetical cafe.