Last week, a lightning strike rendered part of Amazon EC2 belonging to a single zone cutoff from the real world. I don’t want to go into whether it is an outage or not debate but towards a different kind of debate. Ever since Cloud Computing started gaining traction, we have a debate in the industry about whether the instance based setup is better or a fabric based one. I thought I will revisit this debate again in the light of the recent Amazon EC2 “it’s not an outage” incident. Let me do a brief recap of the terminologies and, then, see how the debate shapes up in the aftermath of the “Amazon lightning incident”.
Amazon jumped into Cloud Computing bandwagon with the release of raw computing power inside virtual machine containers (EC2). They are like individual servers but, rather, based on virtualization than any actual physical hardware. It is an outgrowth of VPS from the previous dedicated server era but with the added advantage of elasticity. This is a simplistic description but it gives an idea of what an Instance is in the Cloud Computing terminology. This kind of approach has its own merits and demerits. While it offers the ability to port apps without having to rewrite the code among other advantages, scaling the apps on instances are not as smooth as in the case of fabric. Also, incidents like the recent lightning strikes make us wonder if the instance based approach is well suited for the needs of startups, in particular.
The other approach is called fabric based approach. In this case, all the physical and virtual infrastructure are abstracted out and attributes like transparent scaling, fault tolerance/self healing, etc. are added, thereby, offering developers an uniform fabric to work with. Some of the advantages of fabric includes linear scaling and under the hood fault tolerance. Unlike the instances, we need not worry about individual hardware or virtual machines going down. The most cited disadvantage is that there is a need for rewriting the code and a possibility of vendor lock-in. There are lots of confusion about the definition and characteristics of the Cloud Fabric but I will not dig into it right now. The above, somewhat simplistic, definition of fabric is good enough for our discussion. Some of the Cloud fabrics are closely tied to particular software develoment platform and/or vendors while others are more general in nature. For example, Google App Engine and Microsoft Azure fits into the former category. With Google App Engine, developers can only develop apps using Python and, now, Java. Both Google App Engine and Microsoft Azure can run from only the vendors’ datacenters. One cannot take them and run on Amazon EC2 Cloud or their own private datacenters. However, there are also general purpose Cloud fabrics like Appistry’s CloudIQ Platform, Gigaspaces’ eXtreme Application Platform, Eucalyptus, etc. that are not tied to a particular vendors’ datacenter and they can be installed on any public cloud and/or private datacenters. Moreover, they are, generally, vendor agnostic when it comes to supporting software development platforms. Some of them are released under proprietary licenses and others under Open Source licenses.
Whether it is a startup or an enterprise, everyone strives to have a zero downtime. When a startup bets their Cloud deployment on the Instance based infrastructure like standalone Amazon instances or GoGrid Servers or Rackspace Cloud Servers, issues like the recent lightning strikes could render their application(s) unavailable. Well, there are ways to fire up instances in other availabilty zones from the backups but there will still be some downtime. If there are no options for multiple availability zones, the startups will have to wait and watch till the Cloud provider fixes the issues. Instead, if the startups use a Cloud fabric with multiple vendors, such disaster strikes will have no impact because the fabric can manage the whole healing process in a completely transparent manner. The same thing can also be achieved in a properly architected instance based setup but the Cloud fabric makes it much more seamless to manage such incidents. By selecting a general purpose fabric supporting many software development platforms, it is possible to avoid platform and vendor lock-in. Well, the platform lock-in will always be there in any IT deployment. We cannot avoid it. The moment a particular software platform is selected for developing the apps, we are locking ourselves into that platform. Also, by selecting an Open Source fabric, we can even avoid some pitfalls associated with the proprietary ones.
I am not dismissing the Instance based approach at all. It has its own advantages but I feel that by selecting a fabric based approach, we can eliminate some of the risks that are unavoidable in the Instance approach. Well, this is definitely not the end of the debate but I thought that it is time to revisit this debate in the light of recent events. I have put forward my arguments in favor of a fabric based approach in this post and I would love to hear from the opposite camp. Feel free to pick me apart on this discussion and, also, offer your own technical arguments on this topic. As we always encourage here in Cloud Ave, you are welcome to post the rebuttal as a guest post.