Yesterday at Defrag, during a panel on Foundational Infrastructure and Enabling Technology, some of the panelists were trying to stretch the concept of cloud to suit their agenda. I thought I will clear the air here on those concepts. It is time for us to accept the widely adopted definitions and take the discussions to the next level. I have put together some of the myths promoted by people with varied business interests. These may not be the actual statements made in the panel but it is something I have heard from many sources including some of the panelists.
1) Myth: Cloud is not built for fault tolerance
Truth: A well architected cloud should have fault tolerance built in. I do agree that Amazon EC2 instances go down due to some issues on their datacenters. However, EC2 need not be the poster boy/girl of Clouds. With EC2, it is imperative for the app developer to architect their app using EC2 instances running in different availability zones. A single Amazon EC2 instance is not Cloud per se, though I can argue against my own claim. AWS EC2 is called cloud because it offers you an option to run multiple instances in different availability zones and use the programmatic control they offer to these instances to almost eliminate the downtime and make your app sit on a fault tolerant infrastructure. Here is a request to my friends who are over enthusiastic in offering their own views on Cloud Computing. The concept of Cloud Computing is much deeper than what many in the industry want it to be. I hope everyone accept the widely adopted definitions and go forward.
2) Myth: Virtualization is an important part of Cloud Computing
Truth: This is plain wrong. Even though most of the Cloud environments use virtualization, it is not necessary. Take the case of Google. They have built their cloud without the use of virtualization, using low end 386 machines. I can take 10K 386 machines and throw mapreduce kind of fabric on top of it and I can call it a Cloud. Just because majority of the cloud environments use virtualization, one cannot claim virtualization to be essential for the very definition of Cloud.
3) Myth: Cloud are not useful in HPC
Truth: Even though HPC requires powerful CPU for their processing, it is possible to tap into the Cloud for some of the HPC computations. Sometime back, scientists from MIT did some benchmark tests on Amazon EC2 and found that it is a credible candidate for small sized HPC applications like low order coupled atmosphere-ocean simulation. We have already covered here at Cloud Ave about how Wolfram Research is tapping the power of cloud computing inside mathematica to do some HPC calculations. Platform Computing is tapping into Clouds to seamlessly redirect peak workloads from their internal HPC infrastructure to external cloud resources on a pay-per-use basis in order to optimally meet service levels. Even though I do agree with the argument that the cloud cannot replace the cluster of high end machines, I don’t accept the bold claims, like the one made in yesterday’s panel, that HPC cannot be done on Clouds.
Well, there are many more myths going around in the web about Cloud Computing. I will talk about them on another occasion.
Hey Krishnan – thanks for the analysis on the panel yesterday.
I agree that the cloud can be used for small-scale HPC apps, but that’s exactly my point. HPC tends to focus on large-scale computation. At the scale most HPC guys think, a cluster solution (or distributed grid in some cases) makes a ton more sense. If you go to Supercomputing 2009 next week, I would be very surprised if you find anyone interested in using the cloud.
A more general point is that most of the guys using the cloud aren’t really thinking at “web-scale”, which is to say, a data set the size of the Internet. At that scale, you do better by using your own data center (or by using 80legs, but I’ll refrain from the sales pitch 🙂 ).