News of Friday’s problems with the Virginia Data Centers power system taking down sites like Netflix and Pinterestshows that sometimes not programming for fail over
or data center failure is a pretty foolish thing to do. Especially with costs somewhat reasonable per gigabyte in Amazon’s S3 system.
Anyone who does not program for fail over in any system takes foolish risks with their computing systems and their companies.
So synching S3 buckets is fairly easy, and there is no reason not to do it, you can cut your own custom script that will determine all the contents in an S3 bucket and make sure that even though the data centers are somewhat isolated ensure the contents are the same in each S3 bucket. You can use a freeware/minimal cost software like S3CMD from S3 tools to help you out if you need a baseline system command set to do this. Or you can manually do an object check between S3 buckets using an EC2 instance as a master controller for the S3 buckets. The EC2 instances should be able to talk to each other using the public interface without having to get an IP lease from AWS to do so.
You should also be load balancing between regions, doing a simple health check and using cloud watch to ensure that once the parameters are met for an outage for the world to be notified that one data center is down. Load balancing is a necessary process if you want to program for fail over.
Elastic Load Balancing that comes as part of AWS is a good way to ensure that regions are healthy, and that an outage in one region will not influence or impact the good working conditions of the company that is using AWS. On the ELB page on Amazon they state:
Elastic Load Balancing can detect the health of Amazon EC2 instances. When it detects unhealthy load-balanced Amazon EC2 instances, it no longer routes traffic to those Amazon EC2 instances and spreads the load across the remaining healthy Amazon EC2 instances.
Using Elastic Load Balancing, you can distribute incoming traffic across your Amazon EC2 instances in a single Availability Zone or multiple Availability Zones. Elastic Load Balancing automatically scales its request handling capacity in response to incoming application traffic.
You have to use S3, a master EC2 in each availability region to perform the automation for each region, Elastic Load Balancing, and software like S3CMD to make all this work. You will probably spend the better part of an afternoon setting this up, but failure to take into account an outage in any region is simply poor cloud computing or normal computing practice. There will always be disasters regardless of how stable any cloud computing solution seems, and you must architect for failover for every system you design and develop regardless of the platform.
Netflix, Pinterest and others who were down because they were only in the Virginia data center need to program for failover, and if they lost business or customers because of it, then they need to go back and reevaluate their AWS dependencies and work out a decent failover process.
- Storm Knocks Out Amazon’s Power, Taking Down Instagram, Netflix, Pinterest (allthingsd.com)
- Amazon EC2 outage lessons (agilesysadmin.net)
- More Problems for Amazon EC2 Cloud (datacenterknowledge.com)
(Cross-posted @ Techwag)