Image by
Yodel
Anecdotal via Flickr
Yahoo has been pretty active on Cloud Computing research. They are the
biggest contributors to Hadoop, which is fast becoming a significant player as
an enabler of Cloud Computing. They have played a significant role in the
development of Pig, a high level language for Hadoop. Apart from these
development efforts, they have also partnered with Carnegie Mellon University to
use the supercomputing cluster to analyze several millions of web documents.
They, then, joined
hands with Computation Research Laboratories in India to tap into their
super computer to help scientists perform data intensive computing research.
They partnered with HP and Intel for an open collaboration among industry,
academia and government. In fact, they even reached out to undergraduate institutions to bring Cloud
Computing as a part of their curriculum. Their Cloud Computing research website
can be found here.
Late last week, Yahoo announced an expansion of their partnership with
Carnegie Mellon University by adding three more universities to the group. They
have expanded the partnership with the University of California at Berkeley,
Cornell University and the University of Massachusetts at Amherst. Along with
Carnegie Mellon, these universities will tap into Yahoo’s supercomputing
clusters to conduct large-scale systems software research and explore new
applications that analyze Internet-scale data sets, ranging from voting records
to online news sources.
Yahoo’s supercomputing cluster, called as M45, has been operational since
2007. Yahoo’s M45 cluster runs Hadoop, an open source distributed file system
and parallel execution environment that enables its users to process massive
amounts of data. The cluster has approximately 4,000 processor-cores and 1.5
petabytes of storage. The scientists at Carnegie Mellon University has been
using it for more than a year now. They have conducted research over 200 Million
documents on the web. Their research, including a performance comparison of
Hadoop file system with other parallel computing file systems, has resulted in
many academic papers.
According to Shankar Sastry, dean of the College of Engineering at the
University of California, Berkeley, this partnership will help them in many
different fields from processing large scale data like voting records, online
news sources and polling data to conduct computationally intensive econometrics
research, combining economic theory with statistics to analyze and test
large-scale economic relationships to, even, wildlife preservation and
biodiversity to managing renewable sources of energy.
The impact of this announcement is huge and it will be felt in the years to
come. I always had a special liking towards Yahoo for their role in supporting
Open Source projects. Now, with their role in Hadoop and, also, with their
participation in Cloud Computing research, my respect for them has increased
many-fold and after realizing the impact of their research on the society, I
really want them to succeed.
[..] space [..]