Not all clouds are created equal

So with many people looking to adopt cloud services, it is worth taking a look at what you currently run and how that fits into the two major cloud providers with IaaS offerings, i.e. Amazon AWS and Microsoft Azure.  Now most people will be running one of the editions of VMware vSphere, so this really looks at some of the basics of what you need to know if you’re coming from this as a typical environment.

Firstly two of the big headline features of vSphere that are important to consider and some of what they are used for:

  • High Availability (HA) – my VM starts up again somewhere else in the case of a host failure
  • Distributed Resource Scheduler (DRS) – separating work-loads, e.g. web01 and web02 shouldn’t run on the same physical host

Now why are these two features in particular important?  This is all about providing the high availability for your workload at the infrastructure layer.  Your application architecture may also have other tricks up its sleeves with multiple front-ends sitting behind a load balancer, clustering and other complementary measures of ensuring high-availability.

Now how do these translate into a cloud environment?  First off, lets look at Amazon’s AWS.

High availability in AWS

Now from a starting point, AWS services are divided up into multiple geographic regions, each of which comprise of at least 2 availability zones.  Now what does this actually mean?

I’ll use the Sydney region as an example, as this is most relevant for those of us down under (note: this region is otherwise known as ap-southeast-2).  There are currently 3 availability zones within ap-southeast-2, with availability zones basically meaning:

  • Separate datacentres, each with their own independent compute, storage, network and other infrastructure.  Apparently selected to ensure that significant events (e.g. flooding, power loss, earthquakes etc) should not affect multiple zones simultaneously
  • These availability zones are all connected to each other via high speed, low latency links.  From a practical sense in my experience these availability zones have transfer speeds and latency basically as if it was all LAN connected.

So that we’re using the vendor specific terminology, within AWS their “VM” service is called EC2 and individual VMs are referred to as “instances”.

Important things you need to know:

  • If a physical host fails that is running your instance, it does not start on another host automatically.
    • You have to manually intervene to startup your instance somewhere else.  You will need to be using Elastic Block Store (EBS) persistent storage to do this, which means you can connect the EBS disk from the failed instance to another instance that you configure and startup
    • There is one kind of workaround with this though, you could use an auto-scaling group with a single member to resurrect a new instance to replace the old one if it did fail – but it has to be able to come up automatically without human intervention (so fully automated restore) – but in this case, this is REPLACING the instance, so the old instance is not being moved.
  • Within an availability zone AWS do not guarantee that two instances of the same type will be running on different hardware.  There is a good chance that they will be though, but it isn’t guaranteed.
    • Important to understand is that under the hood AWS don’t mix and match instance types across a particular physical host.  So if you were running a t2.medium instance, it will only be located on a physical host running other t2.medium instances.
    • So of course you should split your workload across availability zones, but also see the next point.
  • Layer 2 domains (i.e. subnets) don’t span availability zones, so in terms of your network design if you want to split workloads across availability zones then they will be on different subnets.
  • Availability zones are ordered differently per customer.  So in terms of the Sydney region, what I have assigned as 2a could be assigned to you as 2c.  This is obviously done so that usage across the availability zones is balanced.

Fundamentally the way AWS works is that they provide you the tools to make things highly available at the application layer, they don’t provide it at the infrastructure level like a vSphere environment does.

High availability in Azure

Microsoft divide up their Azure cloud into regions similar to AWS and within Australia they have two regions that are based in Sydney and Melbourne.  Microsoft’s names for these are Australia Southeast (Sydney) and Australia South (Melbourne) to be clear.

Now Microsoft have a couple of concepts to understand and they are fault domains and update domains.  These specific concepts are as follows:

  • Fault domains:  A fault domain is a separate set of hardware, i.e. compute, storage and local network that runs independently from another set.
  • Update domains:  Microsoft do infrastructure maintenance every so often and sometimes this is disruptive and will require a VM to be restarted.  The environment is divided up into separate update domains, so that they are only taking down a portion of the environment at any given time.

So now onto the important things you need to know:

  • If the physical host fails that is running your VM, it does start up on another host automatically.
  • To keep your VMs of the same type running within separate update and/or fault domains you need to add the VMs to what is called an availability set.
    • There are an undefined number of fault domains, all Microsoft will guarantee (within a 99.95% SLA) is that not ALL of your VMs within an availability set will go down.  Which does mean you could end up with just a single VM in your availability set running.
    • There are supposedly 5 update domains, so if the number of VMs within an availability set exceeds this number, then the distribution of the VMs will wrap around, e.g. if you have 7 VMs, then you would have 1 VM in each update domain, with an additional VM in 2 of the update domains.

So fundamentally Microsoft Azure provides a closer experience to what you have with VMware vSphere from the point of view of infrastructure high availability, but still with caveats and catches.  This may be important if the application you’re running can’t be made highly available outside of ensuring the infrastructure is.

Aside from this, one other consideration is that both AWS and Azure provide certain managed software services (e.g. Database-as-a-Service) where they guarantee high availability of services and deal with any availability issues without you having to lift a finger.  These are alternatives that are also worth considering in terms of how you operate your infrastructure.

There are a heck of a lot more considerations to think about when moving to the cloud than just availability and even the availability areas discussed, but at least this should provide some initial food for thought and push you to start questioning everything before making the jump into cloud.

This entry was posted in Blog posts and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s