Welcome to BOND

Client Support »

Server Monitoring – Minimize Unnecessary Downtime

Posted by : - May 17, 2011

If you are a true IT geek you probably don’t have a lot of friends unless you count the servers you manage. That’s not to imply we are social outcasts but we do get along well and enjoy the company of our servers. I can safely say I have more servers to manage than friends on Facebook. Needless to say simple network management protocol (SNMP) has become one of my best friends as well as Google and other tools of the trade over the past decade. SNMP however is the one that decides if I sleep at night and yet I still consider it a friend (I need to get a life).

The advantage of using SNMP tools to monitor your staffing software and other software servers and devices is that you can track and measure resource utilization (memory, CPU, disk), service availability, and much more.

Taking this a step further you can monitor when resource utilization is at its peak and set thresholds to alert you when conditions are causing a performance bottleneck. One of the simplest forms of monitoring we do is to verify the disk space and set thresholds to alert us when we reach the low disk space threshold. It’s amazing to me how many customer outages are called into the support center which end up to be related to their data disk being out of free space.

Now let’s consider how much money these businesses lost due to lack of productivity and ultimately lost revenue. Most of the SNMP tools have several options to alert the IT team. My personal favorite is a text message to each team members cell phone but we also select the email option to send non-critical warnings that aren’t causing an outage but could if left unresolved. There are several methods of notifications available and up to the IT team to select what works best for them.

Probably the most popular cause for an outage condition is the nightly processes which need to stop and restart critical services or the Windows server reboot task we have scheduled once a week. We all know that Windows servers won’t operate forever without a reboot and I believe it is safe to say we have all fallen victim to this inevitable task causing an outage. We typically need to reboot our servers once a week and with a reboot always comes the chance of the server not starting all services for one reason or another (separate discussion). If the service is a critical service and is monitored you would be notified prior to user access unless of course you reboot your server in the middle of the workday (not recommended). In our case these reboots occur in the wee hours of the morning so this of course is where the lack of sleep comes in but I personally would rather lose a little sleep than any one customer. We can also review weekly, monthly and annual statistics to proactively plan for growth in our environments to meet our customer’s needs and of course keep the CFO happy by properly planning and budgeting our hardware expenses.

The value of monitoring your business critical servers and proactively acting upon the collected statistics can save you many hours of stressful nail biting days at the office, lost productivity and revenue. There are many SNMP tools available, many of which are inexpensive and will do the trick for most environments in our industry. Every windows system and every device that is placed on your network is capable of supporting SNMP and if your organization is not using SNMP you should consider implementing it. A good starting point for your windows servers would be to visit this link http://support.microsoft.com/kb/324263 on the Microsoft site.

John Bill

Posted by
John has been with Bond, formerly with VCG, since 1997 and is the Director of Information Services. John has more than 25 years of experience in the IT industry. His team consists of personnel responsible for the administration and deployment of Bond’s SaaS/Managed Services, systems level support for enterprise customers, corporate IS infrastructure support, administration and advancement.