August 2010 IT Business Consulting Newsletter

Maintain Zero Downtime with Systems Monitoring!

By Tom K

Over the last several months we’ve discussed putting a number of systems & best practices in place to protect your environment and achieve Zero Downtime. Now we need to keep an eye on them to optimize their effectiveness and to ensure they are working properly. This is the major component to Zero Downtime I touched on in last month’s newsletter (Zero Downtime! It IS Attainable)...

In this month’s newsletter I’ll show you how to effectively monitor the important systems we have put in place. In many cases, we can automate the monitoring process so systems send you daily reports & special emails when something goes awry. Proactively monitoring and managing your systems is the key to maintaining Zero Downtime!


What to Watch

You need to keep an eye on all the internal systems that your business relies on, and all the external systems that your users and customers use to communicate and to investigate and book inventory. This includes servers and operating systems, the key applications that run on them, your power environment, your Internet circuits, and your web sites. Specifically, we are concerned with:

  • Server Hardware, especially redundant components: If you are not monitoring your server hardware, you’ll not know when a redundant component fails or is in a potential fail state. If a redundant component fails you are in a high risk situation that requires immediate attention.
  • Operating System Event Logs: Your servers all maintain very detailed Event Logs in several categories. These logs tell you the state of the Operating Systems’ health, advise you of any current issues, and often alert you to impending problems.
  • Free Disk Space: An issue we often see is a renegade application or process (or user) gobbling up unusual amounts of disk space. Unchecked, this can kill a server.
  • Anti-Virus Software Status: You want to check your Anti-Virus software regularly to ensure that all devices are up to date, are being scanned, and are reporting in with no problems. If the AV software encounters a virus it can’t process, you want to be advised so it can be cleaned manually.
  • Windows Server Update Services: As with AV, you want to check WSUS regularly to ensure that all devices are up to date and are reporting in with no problems. If WSUS reports an update failure, you want to be advised so it can be addressed manually.
  • Backups: It is imperative to know that all of your deployed backup processes are working properly, every night. This includes local tape or disk BUs, Internet BUs, and special application (ie: Prop Mgmt DBs & financial) BUs.
  • Uninterruptible Power Supplies: Your UPS’ keep logs of the condition of your internal power, and can alert you if there are any power faults in your environment. You also want to ensure that all UPS’ are working properly, and that the batteries are charging properly.
  • Internet Circuit Stability and Reliability: As this is a critical business resource, we like to keep track of how reliable the circuits are, and have data to work with your ISP if an Internet circuit is unstable. More importantly, if you utilize two Internet Circuits (Load Balancing as recommended and discussed in a previous newsletter Improving the Reliability & Speed of Your Business Internet Connection), you’ll not know if a circuit fails until they both fail simultaneously… unless you are monitoring both circuits!
  • Web Site Response and Reliability: Most of my Clients are booking around 70% of their reservations via their web site’s on-line booking engines. If on-line booking is also a significant factor in your success, we recommend monitoring your web sites for outages. You don’t want to hear that your web site is down from an irate home owner! We also recommend monitoring for responsiveness. If your web site is up, but is unresponsive and pages are taking a long time to load, you’ll lose potential guests.

How to Watch

There are 4 primary methods used to monitor the items discussed above:

1. Configure your systems to send an email alert when the system detects a problem. These generally include server errors, backup errors, and virus events.

2. Configure your systems to send daily email status reports every morning. These generally include overall server status, backup status, and WSUS status.

3. Manual in-depth review of system indicators such as server logs and systems consoles, typically weekly. This generally includes server logs, backup logs, hardware details, WSUS details, AV details, UPS logs, and disk space.

4. Utilize monitoring applications designed to continuously monitor circuit and web site availability and responsiveness. These applications send emails to note a fault condition, and again when the condition returns to normal.


Utilizing this combination of monitoring processes and tools allows you to monitor the health of your key systems, ensure everything is operating at peak performance, and proactively react to any potential failures!

Many of our Clients, however, don’t have the time to perform this monitoring regimen, nor do they have the knowledge to fully understand the information provided through the consoles and logs. We have, therefore, developed an offering specifically designed to perform this service for them, which is described here: Proactive Systems Management.


As always, if you have any questions or comments concerning this article or our Proactive Systems Management offering, I’d be happy to discuss them with you at your convenience. Feel free to contact me at TomK@TomKConsulting.com, or via my cell 443.310.5110.


While Zero Downtime can be attained, we have to plan for that unlikely possibility when your data is destroyed or corrupted. So, next month’s newsletter will be dedicated to effective backups. I’ll discuss various internal and external backup systems, methodologies, and strategies that can get your business back in operation should a disaster strike. See "Backup the Company Jewels!"