July 2010 IT Business Consulting Newsletter

Zero Downtime!

It IS attainable!!

By Tom K

This month’s newsletter discusses how to use cost effective redundancy to achieve Zero Downtime, where to apply rapid response support contracts, and introduces proactively monitoring the ongoing health of your systems. These three concepts, coupled with the strategies I’ve already shown you, have provided Zero Downtime for many of my clients. It can be yours as well!


It Starts With Your Server(s)

Your business servers are absolutely critical to your operations. (If you don’t use servers on site, see Key Ancillary Components, below). In many cases, your business can’t run without your servers. So, we need to do everything possible to keep them running!

A properly designed server will have redundant components wherever possible, so if one component fails the server keeps running. The usual suspects include processors, memory, hard drives, network interfaces, and power supplies. These redundant components will add to the cost of the server, but with the exception of the power supply, all the redundant components will improve the performance of your servers, as well as increasing their resiliency.

It is also imperative you use Uninterruptible Power Supplies (UPS) with your servers. These devices will protect your servers from power surges, will keep them running during short term (30 minute) power outages, but most importantly, will keep your servers from restarting over a simple momentary power glitch.


Monitoring Your Servers

Redundant components give you one “Get Out of Jail” card. If one redundant component fails, the second component holds your server up… until IT fails. So you need to keep a close eye on your server (or assign this task to someone) so you can correct that first failure immediately to keep everything redundant. You also want to keep an eye on the Operating System and your Business Software, watching for errors or other indications that something is not quite right. In most instances, issues that can develop into failures raise their heads long before the actual failure occurs. We call this “proactive monitoring”.

I will cover this important topic in greater detail in next month’s newsletter (Maintain Zero Downtime with Systems Monitoring!), describing different methods to monitor your systems, many of them automated.


Support Contracts

When a server component fails, it is imperative you receive a replacement component and hardware support to repair the failure as soon as possible, especially if it is a non-redundant component bringing your server down. This is where a solid support contract from your server vendor comes in to play.

I recommend a 24x7 contract with a 4 hr replacement clause. This means you get phone support anytime, day or night, and on weekends (your company operates on weekends, so your support systems should as well!) And, if you do experience a hardware failure, the part will be in your office, with a Tech to replace it, within 4 hrs of the call, even at 7 AM on a Saturday morning. As the commercial says… Priceless!

The less expensive contract is Next Business Day (NBD) which means the part will arrive the next business day, which doesn’t sound bad… unless the component bringing down your server fails after 5 PM on a Thursday. The next business day is Monday! Your server will be down for the whole weekend!!

Note that in some remote areas, the 24x7x4 hr contract is not available, as the vendor can’t guarantee they can get there in 4 hrs. In this instance, NBD is your only option.


Key Ancillary Equipment

There are a few relatively inexpensive infrastructure components whose failure can bring down your business. While they rarely fail, you might want to consider keeping replacements on the shelf to ensure Zero Downtime. The warranties on these devices are usually NBD (or non-existent), so a failure could bring you down for a weekend or until you could get a replacement shipped in.

The first is an inexpensive Ethernet Switch. If you have a larger environment utilizing two or more switches, this might not be a requirement if you can operate your essential devices with one of your switches gone. If you only have one switch, or would be impacted by the loss of one of your multiple switches, you should buy a spare. An unmanaged 24 port switch costs less than $200, and could be the difference between having computers (and your Property Management software) over a weekend or not!

The second is your Firewall. If your Property Management data is hosted off site, the Internet is your most precious resource. If you lose your Firewall, you lose all access to your data.

If you house your data on site, losing the Internet will impact many of your staff’s ability to do their job, and will kill your web site’s ability to provide real-time inventory and on-line booking. If you have multiple sites that connect to the main site via the Internet, losing your main site Firewall will effectively bring down all your remote sites! You may, therefore, want to consider keeping a pre-configured spare Firewall on the shelf. Cost is dependent on your firewall, typically between $150 and $1200.

It is also imperative you use a UPS with these devices. The UPS will protect them from power surges and will keep them running during short term power outages. There is no point in keeping your servers powered up with a UPS if the switches and Firewall that the servers rely on to communicate with the rest of their world have no power!


Previous Discussions

In previous newsletters I discussed additional strategies that will contribute to Zero Downtime. Centralized management of your Anti Virus solution (Anti Virus protections for Servers, PCs, and email) and Microsoft Updates (Centrally Manage Microsoft Updates Across Your Enterprise) are critical.

Possibly more important than having a spare Firewall on the shelf is utilizing redundant Internet connections (Improving the Reliability & Speed of Your Business Internet Connection). While it is generally rare for a Firewall to fail, it is not at all unusual for an Internet circuit to fail. Redundant Internet circuits will make Internet frustration a thing of the past!


As always, if you have questions or comments concerning this article I’d be happy to discuss them with you at your convenience. Feel free to contact me at TomK@TomKConsulting.com, or via my cell 443.310.5110.


Next month I will provide more detail on the importance of proactively monitoring the health of your systems, how to automate the process, and how to teach your systems to generate email alerts when they need attention. See "Maintain Zero Downtime with Systems Monitoring!"