Like mentioned in my previous post; virtualization helped datacenters provide rapid deployment, increased business continuity, and provided a tremendous amount of capital savings with the reduction of hardware. However, with the substantial benefits and ease of deployment came virtualization sprawl and resource proliferation. Right-sizing is the process of reclaiming under-utilized resource components, such as compute and memory resources. In conjunction, there should be a process in place to validate that a guest virtual machine is still required by the business, this is typically considered a recertification.
A regular right-sizing lifecycle on a quarterly or semi-annual basis can ensure maximum performance of your workloads and efficient use of your underlying hardware. But, in order to make certain you don’t impact the business, you are going to want a structured process to understand the application workload.
Business Approval Process
Right-sizing of virtual machines, should be done on a routine basis; such as monthly, quarterly, or semi-annually. This is done to ensure application owners and business partners have an opportunity to control virtual machine costs and help make the underlying infrastructure to run efficiently.
This process should be a collaboration and not done in a silo. The impact of right-sizing a virtual machine and not understanding the application requirements can have a serious impact on production systems. For instance, an application batch process may run on a quarterly, semi-annual, or annual basis; and a database may perform at 0 to 5 percent utilization, all but a few days of the year (seasonal workloads). Also, if an application team has a new system that hasn’t gone “live”, the virtual machine may look like it is underutilized because the user load has not occurred on the virtual machine.
There should be several steps when working on a right-sizing lifecycle.
- Alert the application owner or business partner of the overprovisioned virtual machine based on trends and reports
- Gain agreement to adjust resource overprovisioning
- Adjust resources
- Notify application owner of adjustments and verify that there were no incidents after the virtual machine was right-sized
Sample Reclamation Letter
Dear Application Owner,
Every quarter we perform a reclamation process on our virtual machines to right-size them for underutilized resources. This operation provides us an opportunity to maximize the efficiency of the virtual machines and the underlying servers hosting the resources. According to our analysis, your virtual machine VMAPP-001 can be configured with 2 vCPU and 2 GB of memory. This could increase the overall performance of your virtual machine by decreasing the amount of ready time needed for scheduling the 4 virtual CPUs that are presently configured.
In addition, you would be reducing the annual virtual machine chargeback for VMAPP-001 by $475.20 a year.
We recognize that there may be monthly, quarterly, or seasonal workloads that we are not taking into account. If you could provide us with justification for the current virtual machine configuration, we will review our reports and request an exception approval from senior leadership.
If you would like further information on the current virtual machine resource utilization, please reply to email@example.com and we will assign an infrastructure engineer to work with you.
Thank you for working with us on maintaining peak efficiency on our hosting platform. We would appreciate a response in 15 days, which would be October 16, 2014. If we don’t receive a response, we will send an email to the department head asking to review of the reclamation and approval to right-size the resource.
Why is recertification such an important process? Before we abstracted the operating system away from the physical resources with the hypervisor, we had a regular server refresh process, this process was typically performed on a three year cycle before we had to purchase extended maintenance from the platform vendor. During the server refresh process, the application that was operating on that physical server needed to be migrated to the new compute resource. The application owner was advised of the upcoming server replacement and he had to work with infrastructure services to move the application; or notify them that the resource was no longer needed.
That isn’t the case with virtualization, when an underlying server host is refreshed the virtual machines are not reviewed for retirement. Instead, the virtual machines are live migrated to the new host in the cluster and the legacy resource is retired. This leads to virtual machine sprawl and lack of lifecycle management for the virtual machines.
Recertification of the virtual machines should be practiced on an annual basis. This helps reduce the amount of virtual machines in the environment providing capital effectiveness with better utilization of the hosting resources, and improves operational efficiency by reducing the amount of virtual machines the infrastructure operations team has to support.
Sample Recertification Letter
Dear Application Owner,
Every year we perform a recertification process to ensure the virtual machines are still required by the application owner and business unit. This process provides us an opportunity to maximize the efficiency of the hosting infrastructure by retiring unneeded virtual machines.
If you retire this virtual machine, you would be reducing the annual virtual machine chargeback by $1,914.61 a year.
If we don’t hear from you in the next in 30 days, which would be October 16, 2014, we will assume the virtual machine is still needed and mark it as recertified.
If you would like further information, please reply to firstname.lastname@example.org and we will work with you to answer any questions.
Thank you for working with us on maintaining peak efficiency on our hosting platform.
Utilizing vCenter Operations Manager
Capacity management includes establishing and preserving a safe and reliable amount of resources to meet the business demand. Demand management is an important factor of providing reliable computing services. It calls for a variety of non-IT oriented skills and knowledge, and is therefore an often-neglected area. However, it is becoming significantly more important as virtual machine growth weighs down IT staffing ratios and infrastructure budgets. For large organizations, most capacity management is done with spreadsheets; fortunately vCenter Operations Manager provides data to better account for needed resources and underutilized capacity.
The calculations for virtual machines that vCenter Operations Manager considers Oversized are found in the Configuration screen. In the default policy, it considers a virtual machine oversized when the amount of CPU demand is below 30% and/or when the amount of memory demand is below 30%. By default, the range of data it is going to analyze is 30 weeks, which is roughly 7 ½ months. This can be changed on the Manage Display Settings.
To produce a report that shows the oversized virtual machines, on the Reports tab of VMware vCenter Operations Manager, click the Run Now button for the Oversized Virtual Machine Report for the specific cluster.
vCenter Operations Manager produces a detailed report by policy on the configured amount of resources and the recommended configured resources. Again, you want to make sure you understand the application workload and take into account any application requirements.
For a more detailed understanding of the usage in vCenter Operations Manager, we can go to the Operations tab and select the All Metrics button. Under the Metric Selector for the virtual machine, select the specific counter you want to look at for demand and specify the date range to analyze. This will give you a greater detail into the CPU and memory demand for the virtual machine, and help you understand if there are any scheduled application workloads that hit peak utilization at certain times of the week, month, or quarter.
Idle Virtual Machines Report
vCenter Operations Manager identifies machines that have capacity that use below a certain threshold for a significant portion over the lifetime of the virtual machine. Another report that can be very helpful when starting the recertification process is the Idle Virtual Machines report, it shows virtual machines that have had a significant amount of idle time, which may indicate that the application on the virtual machine is no longer in use.