For anyone who has been living under a rock for a few weeks, Amazon Web Services (AWS) had an outage a week or so back which was the latest in a relatively long list of outages that has plagued their US-East region. Commentary has generally been voiced in two areas – the naysayers have come out yet again questioning the validity of cloud as a delivery model and suggesting that AWS’ woes question organizations using cloud. On the other hand, more thoughtful commentators have said that the outages show us, once again, that good planning, a multi-faceted deployment model, automation and monitoring of the infrastructure will all deliver good levels of uptime, regardless of individual node outages.
I took the opportunity to have a chat with Beau Christensen, Lead Site Reliability Engineer at identity vendor Ping Identity about what Ping use to ensure uptime – regardless of completely unplanned outages such as those plaguing AWS East and the more predicted ones like superstorm Sandy. Ping has a hybrid public/private cloud approach towards its architecture and interestingly (and a bit of salt-in-the-wound for AWS) all its public cloud systems are multi-AZ and purposely NOT in the US-East Region.
A credit to its systems, Ping was not impacted by the Amazon outage or the storm, so this is a good “best practice” story about hedging bets on IT infrastructure and using monitoring and configuration software to create a stable public/private cloud infrastructure. So, what approaches do Ping take to give them a better-than-average chance of staying live?
Ping started using Boundary, the application monitoring product, in its testing environment when assessing the scalability of one of its applications. According to Christensen the analysis showed several areas for performance optimization. Their use of the product has increased through to today, where Boundary is delivering a comprehensive dashboard across the complete Ping application infrastructure, including monitoring the traffic between three data centers. Christensen believes that Boundary has helped the company improve application scalability and reduce bottlenecks on its networks, improve the distribution of traffic across its servers, and model network traffic to prepare financially for adding a new data center in the coming months.
Ping is using IT automation software from Puppet Labs to help scale its applications both on AWS and on its own private network. It was used to automate configuration management tasks across the Ping Identity environment and deploy Boundary to three distinct data centers.
Closed Loop Between Analysis and Automation
This is where things start to get interesting. When Puppet completes configuration tasks, often several times a day, those updates appear in the Boundary dashboard. It’s the sum of the parts that proves valuable to ping, Puppet Labs and Boundary deliver Ping real-time visibility AND control over its dynamically-changing environment. I’ve long said that visibility without control, or indeed control without visibility is a nonsensical approach towards IT management. For this reason I’ve been bullish on solutions that combine the two aspects into one – having spoken with Ping, and seeing what the combination of Boundary and Puppet means for them, I’m happy to accept that well integrated but discrete best of breed approaches can work well. Says Christensen on this point:
we’re not to sacrifice any metrics that are important to us by going with one single solution monitoring/automation solution
The IT automation and application monitoring systems help prevent performance issues for customers and allow the security application company to operate more efficiently. Puppet also automates the deployment of Boundary, so that Ping could quickly scale out Boundary across its hybrid cloud environment.
(Cross-posted @ The Diversity Blog - SaaS, Cloud & Business Strategy)