In my journey through the cloud I often come across great new initiatives. The interesting fact is that although the cloud is a pure revolution terms such as SLA, TCO and ROI remain valid, new methodologies and techniques are presented to support them in the cloud.
I recently met Uri Wolloch, the founder of N2W Software. He has over 15 years of software development experience working at various companies in different roles. In the past 10 years, Uri’s professional focus has been on IT infrastructure software and storage, including working as software architect at IBM Tivoli where he focused on data protection software in physical and virtual environments. Based on his experience, he founded N2W Software which delivers the Cloud Protection Manager (CPM) – a comprehensive backup and recovery solution for Amazon EC2. With this solution you can manage your backup policies including schedule, frequency and data retention.
As part of an email discussion on Cloud and availability, Wolloch first presented what DR means and how it relates to Amazon EC2:
Disaster Recovery is an important consideration when you plan your IT deployment. But what does disaster recovery mean when you use EC2 for your servers? EC2 is built for high availability; each region is divided into availability zones, which have separate infrastructure: communication, power etc… So, in case of a malfunction or outage in one or some of a region’s zones, operations will continue in other zones. If you have a solution that keeps a hot standby server/replica or you have a backup solution that allows you to recover your servers in other availability zones, you have a pretty powerful solution. That said there is still the small chance of a region-wide outage. Such an outage may not necessarily happen due to a real man-made or natural disaster, it can also happen due to a technical malfunction or some other reason. If you use EC2 for your critical business operations, you want to have an answer for such a scenario as well.
From the discussion it was clear that there is no real difference between the guidelines and the principles of the traditional world and the cloud. The only difference that I have recognized was around the fact that in the cloud your DR can cost much less – when you don’t use it you should not pay for it (of course, this depends on the architecture and SLA). Wolloch noted the important difference between High availability and backup, touching the costs involved:
There is a fundamental difference between high availability of a system and the ability to recover that system from backup. High availability assumes you can’t tolerate any downtime and allows your system to remain operational even in case of an outage… a backup solution will save you a copy of your data so you’ll be able to recover it in case you need to. Its advantage over a high availability solution will be lower costs and the fact that you have a history, which can overcome any unexpected data loss.
The system developed by Wolloch actually demonstrates the ease of deployment in the cloud while using the AWS APIs, the Cloud Protection Manager (CPM), as he named it, which supports easy configuration of the backup policy and automation of restoring the service in case of a failure. Wolloch noted,
… the challenge you have is when recovering instances. Instances are connected to other objects: An image (or AMI) – if you are not recovering your root device from a snapshot, a Key pair (if needed), security group/s, kernel and ram disk (if needed). When recovering locally, CPM will remember the original instance’s configuration and that will be the default for any recovery operation. With CPM and some simple pre-perpetrations you can in a click build the whole new environment from ground up in a different Amazon region
Although the baseline principles for backup and high availability remain the same, CPM demonstrates that the cloud has brought with it realization of the administrator’s fantasy of being able to do it all while sitting at home and not at night from the servers’ room.