Key characteristics of a truly cloud-enabled environment include: Multi-tenancy, Redundancy, Failover, Availability, Capacity [of the underlying hardware and metal and VMs]; along with Security. Client expectations are that the system will be available 99.99% or better, which basically entails a downtime of about 10 minutes or less during the year.
To achieve this, Availability and Capacity best practices must be employed. AWS automatically provisions the components to allow best of breed Availability and Capacity of Cloud systems, of any size. Self-hosting, or 3rd party hosting of Cloud systems should follow best practices in providing a scalable and redundant architecture.
Cloud providers should utilize numerous technologies such as autoscaling and bursting, redundancy, failover, disaster recovery, data replication, and multiple datacenters to ensure system availability. Inclusion of availability statistics should be included in the cloud management portal for customer visibility.
The technologies used in a modern datacenter and in cloud facilitate failing-over active servers, VMs, and applications to secondary systems to accommodate maintenance and upgrades. The expectation is that a cloud provider should never intentionally or accidentally have all systems offline and unavailable to its customers.
Constant monitoring of the cloud compute servers and storage systems is required. Because ordering and provisioning is done automatically, 24-7, it is easy for the system to run out of available physical servers or storage, thus causing a failure in future provisioning of new orders. There is lead time required to purchase, install, configure, and certify any new equipment, so monitoring and establishing alert thresholds is critical; the cloud provider needs sufficient time to add capacity. The cloud provider could purchase too much capacity that remains idle until utilized, but this costs money to procure, power, and cool, costs which are eventually passed on to customers. It is far preferable to have a reasonable amount of extra capacity but also put into place rapid replenishment plans.
Capacity management needs to consider the following:
The most important thing to remember with capacity management in a cloud environment is the impact of failure. In a traditional IT environment, running out of capacity might cause a minor inconvenience, a delay in staging a new service or application, or even a short outage while you free up some disk space. In an automated, highly elastic, rapid-provisioning, multitenant cloud environment, failure to monitor, anticipate, and keep up with capacity needs will effect a significant number of customers, costing the organization significant money, reputation, loss of future business, and more. The bottom line is that capacity management has gone from a relatively low-importance item to an extremely high-importance role in a cloud environment.