I’ve been dealing a lot with Amazon’s AWS platform lately. Mostly doing offline data processing using Hadoop but the latest load balancing features finally opened the door for frontend applications to take advantage of Amazon’s cloud computing platform – making it easier for developers to make application more cost efficient an scalable.
Keeping in mind that there are a lot of applications out there who can benefit from moving to the cloud (including my own) I’ve made a list of tasks/considerations to make when preparing for such a move:
Step One: Move Static Content to S3
Things to consider:
- GZIP content. S3 does not support serving GZIPed content natively so you’ll have to upload both GZIPed and plaintext version of each file and figure out in code which one to use according to the headers sent by the user’s browser.
The following post describes how its done: Using Amazon S3 as a CDN
- Amazon’s S3 service is “eventually consistent” which means that files uploaded to S3 may not be immediately available to read.
- Use separate sub-domains for content.
Once your content is on S3 you can also use CloudFront, Amazon’s CDN (Content Delivery Network), to serve the files and improve your application’s performance.
Step Two: Move Web Servers and Backend Servers to EC2
Move your web server code and backend services – database, memcached, etc. – to run on Amazon EC2 instances.
Consider using Amazon’s availability zones to setup servers in different availability zones. This can help your serve customers at different parts of the worlds better, while making your infrastructure tolerant to the unlikely event of a datacenter failures at Amazon.
The Web Servers
Moving your web servers to EC2 should be fairly simple. You can setup EC2 images that are configured exactly the same way your current web servers are.
If you require a queuing service as part of your architecture, consider switching to Amazon’s SQS to make administration easier.
Moving your database to EC2 is probably the hardest part of the move to AWS. If you plan on keeping your database (as opposed to migrating to a cloud solution like SimpleDB) you should use EBS (Elastic Block Storage) so that your storage is persists independently from the life of your EC2 instance.
Backup. Figure out how to take scheduled snapshots of your EBS and store them on S3.
Consider replication and sharding. If you’re using availability zone you should consider sharding your data. For example, store data European accounts data in Europe only. You should also consider replication between the different availability zones to ensure keep your site available even when one of the datacenter is unavailable.
- Running MySQL on EC2 with EBS
- EC2 SQL Server backup strategies and tactics
- Create a MySQL-EBS Database Setup
Step Three: Scale. Take advantage of the cloud services
Now that your application is entirely running on Amazon’s platform it’s time to take the full advantage of the platform and make it scale.
Setup Monitoring to keep up with what’s going on on your system. Amazon provides a service called CloudWatch that allows you to monitor your machines and applications.
Based on the monitoring metrics you should start using Amazon’s auto-scaling and load balancing capabilities to be able to consume and release computing resources according to demand.
At this point you should also investigate reducing your dependency on relational databases (RDMS) as much as possible (as its the most complex, and hardest to scale, component in the system) and try to move as much functionality as possible to S3 and SimpleDB.
S3 is suitable for storing large objects while SimpleDB is ideal for small stubs of data.
- Amazon’s load balancer doesn’t support SSL. This can be a showstopper for some applications…
- Simple DB has a max row size limitation. If your data exceeds that limit you should consider using SimpleDB as a metadata store that references the full data stored on S3.
- Load Balancing, Auto Scaling and CloudWatch resources on the AWS blog.
(Cross-posted at DeveloperZen)