Key characteristics of a truly cloud-enabled environment include: Multi-tenancy, Redundancy, Failover, Availability, Capacity [of the underlying hardware and metal and VMs]; along with Security.  Client expectations are that the system will be available 99.99% or better, which basically entails a downtime of about 10 minutes or less during the year. 

 

To achieve this, Availability and Capacity best practices must be employed.  AWS automatically provisions the components to allow best of breed Availability and Capacity of Cloud systems, of any size.  Self-hosting, or 3rd party hosting of Cloud systems should follow best practices in providing a scalable and redundant architecture.

 

Hybrid Cloud:

 

AVAILABILITY MANAGEMENT

 

Cloud providers should utilize numerous technologies such as autoscaling and bursting, redundancy, failover, disaster recovery, data replication, and multiple datacenters to ensure system availability. Inclusion of availability statistics should be included in the cloud management portal for customer visibility.

 

The technologies used in a modern datacenter and in cloud facilitate failing-over active servers, VMs, and applications to secondary systems to accommodate maintenance and upgrades.  The expectation is that a cloud provider should never intentionally or accidentally have all systems offline and unavailable to its customers.

 

CAPACITY MANAGEMENT

 

Constant monitoring of the cloud compute servers and storage systems is required. Because ordering and provisioning is done automatically, 24-7, it is easy for the system to run out of available physical servers or storage, thus causing a failure in future provisioning of new orders. There is lead time required to purchase, install, configure, and certify any new equipment, so monitoring and establishing alert thresholds is critical; the cloud provider needs sufficient time to add capacity. The cloud provider could purchase too much capacity that remains idle until utilized, but this costs money to procure, power, and cool, costs which are eventually passed on to customers. It is far preferable to have a reasonable amount of extra capacity but also put into place rapid replenishment plans.

 

Capacity management needs to consider the following:

 

  • All cloud compute physical servers normally run a hypervisor product such as VMware or Hyper-V. These servers and VMs boot from a shared storage and may have no local hard drives.
  • Thin provisioning is commonly used throughout the SAN, thus you need to carefully calculate actual disk usage versus what has been sold and what is remaining in capacity.
  • Thin provisioning free-space reclamation might be a scheduled process to run, not an automatic one. Automatic is preferable, but not all SAN system support it.
  • If over subscription of processors or memory was calculated within the hypervisor configuration, monitoring of system performance and capacity is even more critical.
  • Usable capacity on the SAN does not include additional space to hold any daily backups or snapshots, so actual usable capacity will be 25%-50% higher.
  • Consider having the SAN supplier provide a utility storage agreement, whereby it stages additional SAN capacity at the cloud provider’s datacenters but does not charge the cloud provider until it is utilized. This shares the costs and risk of managing extra storage capacity between the cloud provider and its SAN vendor.

 

 

The most important thing to remember with capacity management in a cloud environment is the impact of failure. In a traditional IT environment, running out of capacity might cause a minor inconvenience, a delay in staging a new service or application, or even a short outage while you free up some disk space. In an automated, highly elastic, rapid-provisioning, multitenant cloud environment, failure to monitor, anticipate, and keep up with capacity needs will effect a significant number of customers, costing the organization significant money, reputation, loss of future business, and more. The bottom line is that capacity management has gone from a relatively low-importance item to an extremely high-importance role in a cloud environment.