If you’ve been following breaking developments in Cloud this past week, it will probably have included at least eleventy-six references to vmware’s Cloud Foundry announcement along with some excellent write ups on the topic – including this one from my Cloudave brother, Krish Submaranian and this, quite literally from the horse’s mouth, ex-Spring head honcho, Rod Johnson.
I’m going to leave the deep analysis of Cloud Foundry to those who are paid to fill pixels, but as I mentioned during a recently recorded Infosmack podcast, I think this is a fantastic step forward for organizations who can, even at this early stage, begin to realize the potential affect that this open source project could have on one of the most important metrics of any IT shop – “speed”.
As Sam Johnson eloquently puts it in this recent tweet:
Sam’s point, with which I wholeheartedly agree, is that it’s not the open source nature of the project that will kick it on, but the level of interoperability (specifically across federated clouds) that comes “built in” to Cloud Foundry that I believe can really help enable “speed” in the sense of time to market for an organizations applications.
I’ve been mulling over the question of “speed” as it relates to time to market, or “ability to deliver quickly” – that’s a pretty logical goal for any cloud strategy (irrespective of denomination) and by further automating the swathes of redundant “hands and eyes” out of today’s manual overall “application” deployment processes, it’s surprisingly simple in concept and highly effective in return – but what about speed as it relates to performance ? Now that I’ve got a growing capability to choose where I deploy my application(s), how can I be sure that where I choose to chuck my code is the right place to ensure it performs well ?
There’s no point putting a Hyundai Accent engine inside a Porsche 911 body. Period.
As I considered this, and tossed around the implications on the potentially unfamiliar networking territories of the new cool breed of DevOps, I remembered a question posed a short while ago by Ben Kepes on the excellent Focus site, specifically around the topic of “QOS for Enterprise Cloud ?”:
With the advent of the SpotCloud clearing house, people are concerned about lack of transparency and SLAs. I contend that even high brow enterprise customers expect a lower level of service and quality from a clearing house play and they will chose workloads to deploy on SpotCloud based on their understanding that reduced price may introduce a degree of risk.
Of course, SpotCloud and CloudFoundry are completely different concepts (though it could be argued they may have synergies down the road) but my response to Ben’s question still captures the essence of my wonder.
One of the great unspoken worries around public cloud is the performance (and cost of transfer in/out) of the network that carries the traffic to and from the provider. There are some concerns that even the “leading” IaaS providers don’t have particularly great networks, in terms of peering relationships, and so the question of whether I would accept lower stability or quality would depend on the workload. If I used SpotCloud to run a dev project, I’m inclined to care less about the network performance than I am the stability of the kit under the application, but in most production cases, I don’t think anyone in their right mind would want to forego network performance for cheaper compute and storage. But I could be wrong.
At the lowest common denominator, relative to the somewhat rhetorical question of “how will my application perform”, is the equally complex question of “how good is the network”. In my experience of building our private cloud, it became incredibly important to understand not just the “private” network side, but equally (if not more) so the “public” side. In simple terms, if you’re going to provide business critical services using the public internet as your transport, then it would certainly make sense for you to understand the important characteristics of the network that you or the service provider offers.
Without delving too deeply, and deliberately not including the options available to us through various CDN solutions, a key component of our network provisioning strategy was exploring “how the internet really works”. Words that generally never pass through an Enterprise IT shop became topics of concentrated focus as we worked hard and fast to gain understanding and the ability to answer this question:
How can we operate like an ISP, but with a captive set of customers….our global projects….how can we make the Internet work for us ?
What is peering ? How does it differ from transit ? How do I configure BGP ? What can I do with this AS ? What affect does hot potato routing have on me ?
If some of the above resonates with you, then you’re either a networking geek, working for a carrier or just plain strange. The fact is, like most commodities (which I used to include internet capacity as), we naturally assume that all carriers are roughly the same. We would never have known, nor imagined, that the world of the internets is actually quite a murky one – where peering is the staple diet, but where, in some cases, the underlying “BGP economy” is based on “getting your packet off my network as quickly as I can” – cost savings or avoidance win out over performance – and that’s pretty much the norm.
So, what do you do ? Well….
1) First – there’s the “do nothing” option and hope that the flagship carrier you choose is a good bet and has a great network. It’s worked before and will probably continue to be normal for most enterprises for the foreseeable future (until they realize that inbound traffic is as important as outbound).
2) Second – you can do what we did and think about how to make the internet work for you. This requires a great deal of sideways thinking and pre-supposes, like in our case, you can get to grips with certain fundamental elements (like an AS, some v4/v6 IP Space, BGP route announcements…etc). Here’s a quick peek into some publicly available information on how we are structured today.
3) Third – be really smart and diligent by exploring the provider’s networks by using tools that are available. Of course, this doesn’t guarantee anything (you have SLAs for that – cough cough) because the murky underbelly of the internets change literally on a daily basis, but you will at least be a little more clued in than flying blind.
Here’s a quick example of how I did an assessment on AWS. Please note that this is simply meant to be indicative and is intended without prejudice – caveat emptor – everything can, and does, change on the internet !
1) I took note of the address of one of the Elastic IPs that was assigned to an EC2 instance.
2) Use the Univ. Of Oregon Route Views tool. (Telnet to route-views.routeviews.org use rviews as the username).
3) Trace route to the Elastic IP address identified in 1) this will give you the source, target and all intermediate AS paths to the destination.
4) In my case, the destination AS was identified as AS 14618.
5) Use a tool such as CAIDA to show the publicly known information on the AS. (The link takes you straight to it for ease of explanation).
6) You will see from the below image that in this case, AS 14618 has one upstream AS (this is an AS adjacency) as a “provider”. This is AS 16509.
7) In this example, AS 16509 shows a number of additional “connections” as provider, peer or customer. The graphic below helps to show this:
In summary, from the above, we can see that the “destination” AS 14618 for my Elastic IP address is at the bottom of the diagram, it’s only upstream connection is to AS 16509 and then on to many other networks, each of which have their own AS numbers (just like administrative domains) and associated agreements for passing traffic between them. Simple eh ?
I’ll end by assuming that if you’re still reading this, you’re interested in how performance, along with speed, are critical elements of any application deployment strategy, for both public and private clouds, assuming that the latter has a public facing element, like ours. There’s a lot more to it, of course, especially in understanding how to engineer and manage your external and internal eBGP / iBGP route advertisements / announcements and how to deal with return paths as they aren’t always the same !
If you at least consider these in your overall plan, then you can absolutely assure your management team that you have checked another “due diligence” box on your way to cloud superstardom.
