“We don’t need external service to monitor our sites. We have people on staff to take care of this.”
Many people think like this. Many people, who make important decisions in your company, might think like this. They are both right and wrong. They are right about 99% of the time and we are wrong about 1% of the time. Then again, that 1% percent seems to matter to business owners, as we have come to understand.
Recently, a company hosting many online services, experienced massive downtime. It turned out that when 99% of your business is online, 1% of downtime is a lot more that it looked. Centralization and decentralization of hosting is probably a topic for a whole different post, but for the means of our argument – keeping all the eggs in one bag proved extremely harmful.
Just before Christmas, all sites went down for about 6 hours - 6 good business hours, we might add. There were immediate consequences and there still are collateral consequences to this. Six hours of downtime in a single month means ~99.2% uptime. This is bad. No hosting provider will advertise this on his homepage. The initial damage consisted of loosing business as well as keeping employees unoccupied for the better part of the day. A lot of crucial administrative tasks failed at that point too.
Everything went crashing down in a matter of hours, leaving more than one business vulnerable. When company owners have multiple businesses to look after and they all fail simultaneously, they don’t experience 6 hours of downtime, but 6 hours of downtime multiplied by the number of businesses they have.
The cause of all this was a failed backup device, which should have kicked in, but it never did. The problem could have been easily prevented if performance issues were early detected. With no proactive alerts, the only way one could have seen the problem coming is if he spent all day checking how sites load. It sounds pretty much like what our monitoring services do and they don’t require work related benefits and a parking space :) .