Over the past few days I have been having some issues with my Twitter account. Beyond the well known pauses in the service, outages, etc there are some less known but more annoying problems with twitter search. It turns out that many accounts don’t show up in search at all. Therefore, if you are one of those lucky accounts, no one other than direct followers can see your tweets and no one can find you or any of your Tweets. This makes the accounts pretty useless. It also turns out its been a know issue with no fix for over a year other than to create a new account and tweet with that. Well it turns out that my account was one such account which needless to say was very annoying and cost me 2 days of my time trying to figure out a viable work around. As a result, Twitter earned the place of honor in today’s blog.
Now in the defense of Evan Williams, Biz Stone and the rest of the gang at Twitter, they find themselves in the enviable position of having a hugely successful product on their hands which has no doubt outpaced their wildest growth projections over the past few years and thus put stress on their design and everything else. I on the other hand have the advantage of 20/20 hindsight and thus in this blog we can design Twitter on Steroids from scratch using technology that was not even available when Twitter was conceived. I know the Team at twitter is busting their butts to keep up with their phenomenal growth and my hats of to them for their success.
So for those who have not read my Bio, I have been designing and building ultra high performance systems for the World’s Largest Banks and Stock Exchanges for about 25 years. Just this June a couple of colleagues of mine and I designed and ran a stock exchange prototype system capable of 4.5 million transaction per second with round trip response time as low as 15 microseconds (yes that’s microseconds for multiple network hops, I/O, parsing, matching and the whole shebang, everything the NYSE does to tell you that you just bought 100 shares of IBM). We also showed this system can scale linearly for throughput by adding hardware, was fully fault tolerant and could do dynamic load balancing if traffic at the exchange spiked. In this design, I will be leveraging the lessons learned over that 25 years and the technologies used for the system above.
So lets dive in.
Requirements:
So what does our Twitter on Steroids need to do. Here is my overly simplistic list of requirements (I am only going to deal with the big ones):
Functional Requirements:
The system shall allow users to create accounts.
The system must provide a means for users to submit Tweets
The System must persist those Tweets
Users shall be able to follow other users Tweets
The system shall provide a mechanism to search Tweets
Non-Functional Requirements:
The system shall be highly responsive
The system shall maintain response times even under load
The system shall be highly scalable
The system shall be highly available with 99.999% or better uptime (its doable)
Where Do We Start?
First some design principles:
- We will use a componentized SOA design
- The Twitter Web Site will use the same Service API that is exposed publicly
- The System will use a Hot/Hot High Availability Model based on component replication for reliability
- All Service Components will be implemented in a manner that ensure deterministic behaviour (Easiest way to do that, but not the only way, is to make it single threaded which is what we do for most exchange systems. Thread context switches are expensive at speed and multithreading can result in coherency issues which Twitter seems to be suffering from based on the comments on their support site)
- To the maximum extent possible all I/O, remote Service calls, etc will be asynchronous
- All internal communication will be message based using multicast for efficiency
About the Technology
I don’t normally like to reference specific technologies in my blog but in this case I am going to as there are a couple which provide unique capabilities to implement this system design, and which people are probably not familiar with. Apologies in advance for the product plug. They are as follows:
Websphere MQ Low Latency Messaging (LLM):
LLM is a unique high performance messaging product that has some purpose built capabilities specifically designed for ultra high throughput, low latency, transactional systems. For one it’s the fastest messaging available on the market, capable of throughput in excess of 9 million Tweets per second per connection, and latency application to application across a switch as low as 3 microseconds with Infiniband Networking and about 12 microseconds with 10Gbe.
More important than its speed for this type of application though is its unique high availability mechanisms. LLM provides a unique mechanism that allows me to deliver messages to a primary and secondary Service Component at speed, while maintaining total order across all receivers. In addition it provides unique mechanisms to perform failure detection, failover, state synchronization and component replication all at speed. In exchange systems, LLM has detected and failed over from a primary system to a backup in as little as 7 milliseconds, with no loss of messages or duplication and no system level down time even though a component failed.
Datapower XM70:
This is an appliance that was originally designed for Web Service and Web Edge Security. This model is specifically enabled to work with LLM above. It will allow us to expose REST or SOAP based services and convert them to message based for internal consumption. The XM70 can also do content based routing, parsing and transformation for us on the fly at wire speed taking load of the back end Service Components.
XIV Storage:
This is a low cost storage appliance that has great throughput and reliability. I have been able to sustain write speeds with this in excess of 5.5 Gb per second per intel box writing to it.
The rest we can use pretty commodity stuff. The disk above can also be easily swapped for your preferred flavour, this one just has great price performance.
What Does Twitter on Steroids Look Like?
My version of Twitter on Steroids would look like this (except I didn’t have room on the drawing to add the Account Management Service Componets or the Follower Service Components, so just imagine they are in the diagram and follow the same pattern ):
Twitter on Steroids
So let’s walk through this diagram.
- Firstly we are using Big IP to load balance across the Web Servers and also across the Datapower appliances. This is pretty standard Web design no surprises. The BIG IP could also do this to a remote backup site as well if configured correctly, where we could twin this setup for failov
er or load balancing. Or we could put
the Instance 2’s in the second site. It just depends on the SLA’s you are trying to meet. The logical design and coding would not change regardless. - The Web Servers are making calls through the Datapower to the back end Services Components just like the external API calls. This ensures consistent behaviour and reduces the need to test and maintain two API’s
- Datapower is converting all REST and SOAP payload into messages on top of LLM
- This is important. Datapower is multicasting all messages out of the appliance using LLM’s high availability mechanisms. It is also putting those messages on different topics based on the content of the message. I am suggesting partitioning the incoming Tweets based on the first few letters of the Tweeter’s ID. The first 2 letters will do to start giving us 676 topics to work with for load balancing. We can add more topics for finer partitioning later if need be.
- LLM is delivering the messages throughout the systems and also providing all the reliability. It handles NAK’s and ACK’s automatically, retransmissions, etc to asssure messages get where they need to be without any additional work by the application.
- Tweets are first picked up by the Tweet Capture Service Components. Each partition subscribes to and handles a subset of the topics in order to provide load balancing. It is possible to add an external system which monitors load per topic and dynamically changes the subscriptions to adjust load. Also by partitioning, we can use multiple databases in parallel thus eliminating the databases as a bottleneck, throughput wise.
- I/O, in the Tweet Capture Service Components, is Asynchronous providing very fast response times. We can batch write the tweets for higher throughput and because we do compoent replication using LLM, if the primary Instance 1 fails, Instance 2 just takes over where it left off with no loss of messages or duplication.
- The Tweet Capture rebroadcasts (multicast) all messages to the Tweet Indexing Service Component. These are also twinned for High Availability and Partitioned for Scalability. The indexing component does as the name says and indexes into the tweets and stores a record in a database. I would recommend an in memory database be used with a traditional database behind it, with bi-directional synchronization of current data between the two. SolidDB/DB2 is one pair or possibly TimesTen/Oracle is another (but the latter pair is slower). I/O would be batched and asynchronous again for speed.
- When a search request comes in, it would be routed by Datapower to the Search Service Components, which would then query the Indexing Service and receive back the matching records for each key word in the search. A fast parallel algorithm would then be used to handle any “or” or “and” statements in the search
- These results would be returned to the caller via the datapower box as a response to the original service call.
So how fast would this be and how big could it scale?
Well this is just a guess based on my experience and without ever having looked at any specific search algorithms that might be used by Twitter. Lets assume we write everything in C behind the Datapower for speed and stability and that we use 1Gbe for networking which is the slowest at about 27 microseconds per hop. All latencies are round trip to and from the Datapower Box.
- I think for Tweet Capture, we could achieve round trip latency per tweet of about 50-60 microseconds with throughput per partition somewhere in the 100,000-200,000 thousand Tweets per second range if using a fast database and some solid state disk for the database log files, etc. Even higher if a custom binary file system is used (15,000,000+ Tweets per second which have done with stock orders with similar sized messages)
- Similar performance is possible for the Tweet Indexing per partition to that of the Capture
- For Tweet Search it a bit tougher to gauge, but I woudl guess it would be about 100-150 microseconds per search depending on the algorithm used. Throughput should also be well into the 100’s of thousands per second if in mkemory databases are used.
- Response times could be reduced by as much as 25 microseconds per network hop by using Infiniband networking instead of 1Gbe
- From a scaling poerspective, this should be able to scale linearly by adding hardware almost without limit (only limited by the avilable network bandwidth)
Now clearly, this is a simplified case and I am sure there are lots of design details we are missing but I think you get the idea. A bigger, badder Twitter (or any other app for that matter) is definately possible and by using the SOA pattern, Async I/O, component replication, etc we can do this to almost anything. So if anyone from Twitter (or any one else for that matter) wants to talk specifics or other examples feel free to leave a comment or reach me on Twitter (@techmusings) any time.
Sorry for picking on Twitter they just seemed like a good example given my struggles. We all wish we had the “problems” that come with such a huge success.
(Guest post by Paul Michaud, Global Executive IT Architect for the Financial Markets sector at IBM. Paul blogs @ Technology Musings)

On the subject of load balancing, why not get the highest availability while not getting caught in high prices? Kemp’s got some great load balancers that are low priced and high in quality:
http://www.kemptechnologies.com/?utm_source=blog&utm_medium=pv&utm_content=zs&utm_campaign=home
Actually Frank, Any load balancer would do, I just picked one as an example. It need not be BIG IP.
Paul – This is a great article and architecture. One small improvement – the DataPower box is hardware appliance that is costly to scale as you have to buy more physical boxes when twitter on steriods reaches its limit – its the only part of the architecture that is still living in the early 2000s…
What about software alternatives to DataPower that are faster and can be virtualized? This would reduce cost over time and increase scalability, especially on cheap Intel Multi-core servers (Intel chips just keep getting faster).
It is definitely possible to do this with a software bridge in the place where I put the data power. We do it all the time for high performance apps include for stock exchanges were we need to parse and route messages at very high rates. That said this particular Datapower box (XM70) is new and is very fast (It will easily saturate GigE network with very low latency). The next version of it will be even faster but that’s a ways off in the road map.
Bottom line, you can absolutely do this with software.
SOA & Cloud Computing India 2009 – Get close to cutting-edge technologies
Like it or not, the DNA of IT is changing. Thanks to economic slowdown and cuts in IT budget, CIOs have to go lean and ensure good returns on IT spending. They have come under pressure to change their traditional approach to IT development and seriously explore technologies and approaches such as Cloud, SaaS, SOA, Lean IT, Green technologies and so on.
Developers, architects and other IT professionals have also come under pressure to adapt to the changing IT landscape. They have to rethink the tools of their trade, the platforms they use, get familiar with emerging approaches and frameworks to stay relevant. There is one conference which offers them a front row seat to the rapidly changing IT world.
Business Technology presents Third Annual Edition of SOA & Cloud Computing India 2009 Conferences, 15th October at Bangalore.
The international conference is designed to give progressive executives and developers a ringside view of the current developments in the emerging critical technologies. It will feature technical sessions from leading industry players of the Cloud ecosystem – the infrastructure firms, the platform providers, and application developers.
SOA India 2009 caters to the need of Indian Enterprise IT community of CIOs, CTOs, Management, IT directors, IT managers, IT architects, Network and Infrastructure specialists, Project Managers, Project Leaders, and Software Architects.
If you are responsible for business optimization, designing, developing and/or implementing your organization’s IT strategy, join us this year for an unbiased source of insight, and expertise from the top experts to ensure the success of your initiatives.
To avail early bird discounts or for more information Call +91 080 41124392/93 or visit conference website http://www.soaindia.com.