I have documented Yahoo Inc.’s Cloud Computing efforts in this space many times already. They are doing some classy work on projects like Hadoop, etc.. What about their own use of Cloud Computing internally? There was not much info about it till recently. Now, the company has announced how they are tapping into a Cloud based data store for the storage needs of Yahoo properties. It is called as Sherpa.
Sherpa, based on the technology called PNUTS (research paper on this technology can be found here) is a home grown technology developed with Yahoo Scale in mind. Toby Negrin, Product Manager of Sherpa, describes Yahoo Scale as follows.
At Yahoo!, our systems have to scale horizontally (we have to handle
tens of thousands of requests per second in a single datacenter) and
geographically (our users are around the globe and their data needs to
be close to them wherever they are). Scaling on these two axes
simultaneously is a problem that very few companies have to deal with.
At the same time, we must meet the latency SLAs required by user-facing
It appears Sherpa is architected to handle these requirements well using a simple RESTful interface
with four basic operations: Get, Set, Delete, and Scan.
There is something called CAP theorem that imposes a “restriction” in all distributed systems. Using the simpler explanation offered by Mr. Negrin, the CAP theorem restrictions can be explained as follows.
unavoidable trade-offs between consistency (all records are the same in
all replicas), availability (all replicas can accept updates or
inserts), and tolerance of network partitions (the system still
functions when distributed replicas cannot talk to each other).
In order to take care of the limitations imposed by the CAP theorem, Yahoo team has chosen the tradeoff to be between consistency and availability. They offer the users various options to choose between consistency and availability and then offer tools to minimize the impact of their choice on the other parameter.
Various Yahoo properties have started using this Cloud based storage. There are no user facing APIs available at this point of time but it may change in the future once they fully integrate themselves with this new technology.