One of the key parameters in the push to accelerate enterprise cloud adoption is the SLA (Service Level Agreements). It is an important requirement before enterprises can even think of jumping into the cloud. After a slow start, companies are coming out with SLAs for their services but it is still a messy affair with different companies offering varying terms with ambiguity. Recently, US General Services Administration, part of federal government, came up with a RFQ (Request For Quotations) that demands a 99.95% uptime per month. Let us try to understand the SLA dynamics in this post and see how government’s requirement will affect the SLA game.
During the early days of Cloud Ave, in response to concerns about cloud computing, I wrote a post on Questions to ask before trusting a cloud vendor. In that post I had highlighted some of the important questions about data ownership, data portability, security, etc.. Few months back, I wrote another post about whose throat to choke if something goes wrong. In that post, I had highlighted the importance of holding cloud vendors responsible for downtimes. SLAs can answer these questions (and more) and, also, give a single point of contact to address any issues that might arise when enterprises move their apps and data to public clouds. In fact, Frank Ohlhorst has written an article on SearchCIO helping CIOs understand SLA. In addition to the questions I have in my post, he points out to some other important issues that are important. He identifies issues concerning SLAs under three categories, data, service continuity and costs. Before the enterprises can test the cloudy waters, they need to get the SLAs right. So, in addition to the questions raised in my previous post and Mr. Ohlhorst’s article, CIOs should also consider other issues based on specific needs of their companies and other regulatory measures.
The most important factor in SLAs is the uptime guarantee offered by service providers. It can range from 99.9% to 99.95% to 99.999% to, even, 100%. Well, the 100% uptime guarantee is a stretch because the providers cannot prevent things like natural disasters. If anyone promises 100% uptime, it should be taken with a pinch of salt and investigated further. Different providers offer different uptime guarantees and they also vary in the way the uptime is calculated. It is important to understand these minutest details of SLA before committing to a vendor. In March 2009, John M. Willis wrote a great post comparing the SLAs of three cloud providers, Amazon, Rackspace and 3tera. It is a must read for anyone interested in the SLA game. As I mentioned before, cloud providers are slowly coming up with SLAs for their services but there are no uniform uptime guarantees and no standards for the calculation of these guaranteed uptimes.
The recent RFQ released by US GSA becomes important at a time when there is no uniform SLA option available to customers. After all, US government is one of the biggest IT customer with a more than 70 Billion dollars budget for spending. With Obama administration emphasizing on the use of Cloud Computing for government’s IT needs and a bipartisan support for such a move, a big chunk of the IT budget will be spent on Cloud services. If one of the biggest customer of Cloud services demand certain requirements, then, market forces will ensure that the needs are met even if the customer making such a demand is a government.
Along with many other requirements regarding service provisioning, security, etc., one of the main demands in the US GSA RFQ is about the uptime guarantee and how it is calculated. The RFQ categorically states that there should be a 99.5% uptime guarantee and a per month service availability calculation. US GSA RFQ demands an SLA that will include
- The Contractor shall provide a robust, fault tolerant infrastructure that allows for high availability of 99.95%
- Service Availability (Measured as Total Uptime Hours / Total Hours within the Month) displayed as a percentage of availability up to one tenth of a percent (e.g. 99.95%)
- Within a month of a major outage occurrence resulting in greater than 1-hour of unscheduled downtime. The Contractor shall describe the outage including description of root-cause and fix
- Service provisioning and de-provisioning times (scale up and down) in near real-time
- The Contractor shall provide Helpdesk and Technical support services to include system maintenance windows
These are game changing requirements coming from government not in the form of regulation but in the form of a demand from a customer with mouth watering IT budget. Once the providers alter their SLAs to get a share of the government’s IT budget pie, we can see an emergence of some uniformity in how SLAs are offered to other customers. The availability of SLAs that will empower customers instead of protecting the providers from any liability, will eventually boost a widespread enterprise adoption of cloud computing. In that sense, US GSA’s demands will augur well for the Cloud Computing landscape.