Skype went down this morning and, as expected, Techmeme and Twitter are already going crazy with the news about downtime. There were speculations about problems with their centralized infrastructure used to authenticate users (which was a cause of trouble in one of the past outages) and some were even talking about “The Cloud Fail”. Whether it is President or Cloud, there are people who are eager to see them fail because of their political or business interests. Skype has already responded about the downtime and, in spite of their reasoning, the talk about “the cloud fail” continues to spread. I thought I will do a quick post to highlight the FUD as a public service message :-).
According to Skype blog, the reasons for failure are not their centralized infrastructure but parts of their distributed infrastructure.
Skype isn’t a network like a conventional phone or IM network – instead, it relies on millions of individual connections between computers and phones to keep things up and running. Some of these computers are what we call ‘supernodes’ – they act a bit like phone directories for Skype. If you want to talk to someone, and your Skype app can’t find them immediately (for example, because they’re connecting from a different location or from a different device) your computer or phone will first try to find a supernode to figure out how to reach them.
Under normal circumstances, there are a large number of supernodes available. Unfortunately, today, many of them were taken offline by a problem affecting some versions of Skype. As Skype relies on being able to maintain contact with supernodes, it may appear offline for some of you.
Clearly, the problem here is not indicative of any trouble that could arise with cloud computing. It is more of a distributed system failure due to software problems than any failure in a centralized infrastructure. Failures do happen in cloud computing in the same way it happens in traditional IT. No one is claiming that cloud, by its very nature, will offer 100% uptime. In fact, commodity cloud providers like Amazon advise you to expect failures to happen and architect your application for such failures. When someone gets out to explore cloud computing services, they don’t just go out because it offers enormous cost savings and elasticity but they put this advantage in the context of associated risks and compare it with the risk-benefit equation of traditional infrastructure. Any FUD promoted using the Skype downtime against the risks in public clouds is irresponsible and there is also an inherent assumption that people who listen to such FUD are complete idiots who only explore public clouds based on its advantages and without considering the risks. Enoughhhhh.