There have been many high profile outages lately which have caught peoples attention. These failures are being used as an argument for why critical systems should remain internal and not be deployed as SaaS or in the Cloud. Some of these outages included Google App Engine’s performance issues in early July , Rackspace’s loss of their Dallas data center due to power failure and the fire in Seattle that took Authorize.Net offline for 12 hours to name but a few.
What amazes me is how so many people point to this and argue that this is proof for why Cloud and/or SaaS is bad and that everything should be in house. It’s preposterous. The fact that these systems went down with a data center failure (or otherwise) is nothing more than an argument for inadequate system design, where High Availability (HA) is concerned. The bottom line is it takes planning, forethought and good design to make a system highly available, and most systems simply are not designed with that in mind.
The reasons for not making a system highly available are many and include the following:
- Naivete: People don’t believe it could happen to their system and thus choose not to put in the time, effort and cost of making a system highly available
- Cost: Bottom line is it costs a lot of money to make a system HA and for a lot of firms, particularly when starting out or for smaller businesses, it just not a viable option
- Difficulty: Its bloody hard to make a system HA. Its one thing to ensure no data loss, its quite another to ensure little to no down time.
For most of my career I have built systems for the World’s largest financial companies including the World’s leading Investment Banks and Stock Exchanges. These firms take high availability very seriously as a rule, but even with their resources and decades of experience systems still go down.
Consider the London Stock Exchange (whose system I did not design), who last year had a very public outage when they were down for most of a trading day. This was not a SaaS system or one deployed in a Cloud. It was an internal system run by a highly reputable company whose business is based on being reliable and never losing a trade. These exchanges, for the most part, have highly redundant systems, multiple backup data centers, design for High Availability and run fail over tests regularly, yet they still experience downtime from time to time.
The point is, failures happen, whether the system is run internally, or in the cloud. Whether its a SaaS system or one of home grown legacy design. The objective is to minimize those failures and the downtime associated with them.
That said, with today’s technologies, some careful planning and good design, it is possible to build systems that should almost never go down, even in the face of a 9/11 type event, but that’s a topic for another day.
(Guest post by Paul Michaud, Global Executive IT Architect for the Financial Markets sector at IBM. Paul blogs @ Technology Musings)