
Image by via CrunchBase
I am a strong believer of the utility of Cloud Computing in scientific research. Computing resources over clouds, along with its inherent advantages, are a boon to the cash starved scientific community. I have talked about this again and again.
I am not alone in this and there are many others who believe in the use of Cloud Computing in the academia.
Yesterday, Amazon took a first step in creating an environment that helps people in the academic community use the Amazon Web Services in their research work. They released Public Data Sets on AWS, a collection of various public data sets available from different organizations/institutes that will come handy for researchers in various fields of research. Various databases currently available on AWS include Human Genome Data from ENSEMBL, PubCHEM Library from Indiana University, Various Census databases from US Census Bureau. They will be adding lots more in the near future.
These datasets are available on Elastic Block Store (EBS) in the EC2 ecosystem. Anyone who wants to use these datasets can mount these EBS volumes as their own personal EBS volume and just pay for their EC2 computing usage, storage of their own data and bandwidth for any data transferred in/out to any external networks. Researchers can then tap these data sets from within their EC2 instances for their computational needs.
By doing this, Amazon is helping research community save money on storage and bandwidth costs associated with assessing these public data from any EC2 instances they use in their research. When the data in question is in hundreds
of terabytes or petabytes, we are talking about huge cost savings here. However, this data stored on AWS servers are useful only if the researchers use Amazon EC2 for their computing needs. They cannot tap into these datasets from any other external computing resource like, say, Google App Engine. Well, even if they could tap into it from external platforms, it doesn’t mean much if these public datasets are accessible using some kind of API from their original source itself. In a way, this move is just Amazon’s marketing ploy with the scientists than anything of public value.
This is a significant step in getting academic community into the world of Cloud Computing. Already, there are some researchers using Amazon EC2 for their computational needs. Hopefully, these datasets will lure more people to use AWS
and help the academic community utilize the value of Cloud Computing.