Thursday, October 23, 2014

Right-Sizing and Recertification (Part 1) - Resource Consumption and Effeciency

In the past 35 years, IT organizations have evolved from a narrowly focused data processing elements to a function that supports, and in many cases, drives, nearly every area of a company. But with this increase in technology dependency, the number of applications and therefore servers, supported by IT increased dramatically, placing strains on datacenter floor space, power, and operational support. Over the past 15 years, virtualization has helped businesses make the most out of their technology investments, became a disruptive technology in datacenter consolidation, and relieved pressure on IT operations.

However, without IT governance in place to measure the efficiency of the hosting resources, IT organizations are now faced with virtual machine sprawl and resource waste. At one of my accounts, they have seen an 81% growth rate in virtual servers over the past five years. Adding nearly 2,000 virtual machines in the past two years. That places a tremendous amount of stress on IT operations staff and the infrastructure resources.

It is very important to properly size your virtual machines from a vCPU and memory perspective to get the most out of your virtualization infrastructure, while keeping application users happy with the performance. We also need to ensure there is a life-cycle management process for the virtual machine. This is done by having a mature right-sizing and recertification process in place, and using tools like vCenter Operations Manager to understand the guest workload.

In my next two posts I am going to focus on right-sizing and recertification. The first post will go over resource consumption, the material I used to create this post can be found at the end of the article. In my next post, I am going to talk about the business approval process for right-sizing and recertification; and utilizing vCenter Operations Manager reports to determine candidates for these workflows.

Resource Consumption

VMware supports running Monster VMs. Application performance is based on four infrastructure measures. Today, virtual machines on vSphere 5.5 can scale to 64 vCPUs, 1 TB of memory, 62 TB of storage, and 1 million storage IOPS. The benefits of this large scale is that resource-intensive applications that are mission critical to your business perform very well on vSphere.

Although the benefits of being able to scale-up resources to meet the requirements of demanding applications are clear, there are some drawbacks for scaling-up those resources if they are not being utilized by the virtual machine. If you overprovision a virtual machine with the intention of allowing the vSphere scheduler to manage the underlying system resources, you will be impacting the overall system performance, efficiency and density.

vCPU Consumption

VMware Virtual Symmetric Multiprocessing (Virtual SMP) enhances virtual machine performance by enabling a single virtual machine to use multiple physical processor cores simultaneously. The biggest advantage of an SMP system is the ability to use multiple processors to execute multiple tasks concurrently, thereby increasing throughput. Only workloads that support parallelization can really benefit from SMP.

The virtual processors from SMP-enabled virtual machines are co-scheduled. That is, if physical processor cores are available, the virtual processors are mapped one-to-one onto physical processors and are then run simultaneously. In other words, if one vCPU in the virtual machine is running, a second vCPU is co-scheduled so that they execute synchronously.

If multiple idle physical CPUs are not available when the virtual machine wants to run, the virtual machine remains in a special wait state. The time a virtual machine spends in this wait state is called ready time.

Even idle processors perform a limited amount of work in an operating system. In addition to this minimal amount, the ESX host manages these “idle” processors, resulting in some additional work by the hypervisor. These low-utilized vCPUs compete with other vCPUs for system resources.

Mark Achtemichuk did a great VMware blog post on vCPU SMP consumption efficiency (referenced below). VMware conducted a lab test using a single threaded CPU intensive process as a fixed workload to benchmark the inefficiency. The benchmark was then run using multiple virtual machines with different vCPU’s assigned to the virtual machine and left idle, simulating an oversizing of the virtual machine.

The resulting data demonstrates that “CPU Efficiency” decreases as the virtual machines were assigned additional idle vCPUs. This highlights the fact that there is a small amount of waste, but that doesn’t become visible until very large virtual machine configurations are under-utilized.

Next, VMware repeated the same benchmark, but this time ensured that each additional vCPU that was assigned to the virtual machine was also running the CPU intensive process. This simulated a right-sized virtual machine that was using all vCPUs.

The resulting data demonstrates that “CPU Efficiency” was constant as the virtual machines were scaled-up to meet the application demand. This highlights the fact that there is no measurable waste when virtual machines are using all their assigned vCPU.

Non-Uniform Memory Access (NUMA) compatible systems contain multiple nodes that consist of a set of processors and memory. The access to memory in the same node is local, while access to the other node is remote. Remote access can take longer because it involves a multi-hop operation. In NUMA-aware applications, there is an attempt to keep threads local to improve performance.

When a virtual machine is powered on, ESXi assigns it a home node. A virtual machine runs only on processors within its home node, and its newly allocated memory comes for the home node as well. Unless a virtual machine’s home node changes, it uses only local memory, avoiding the performance penalties associated with remote memory access to other NUMA nodes.

In applications that scale out, it is beneficial to size the virtual machines with the NUMA node size in mind. For example, in a system with four quad-core processors and 128 GB of memory (above picture), sizing the virtual machine to four virtual CPUs and 32 GB or less (ex. NUMA Node 3) means that the virtual machine does not have to span multiple nodes.

When working with application teams, many times their minimum processor requirements may be excessive. For this reason, VMware recommends reducing the number of vCPUs if monitoring shows the actual workload of the virtual machine is not benefiting from the increased virtual CPUs. Having virtual CPUs allocated, but sitting idle, reduces the consolidation level and efficiency of the vSphere host. Also, by reducing the number of virtual CPUs, a virtual machine may gain significant performance improvements by reducing CPU wait times and not requiring multi-hop operations with NUMA nodes.

Memory Consumption

vSphere virtualizes guest physical memory by adding an extra level of address translation. Shadow page tables make it possible to provide this additional translation with little overhead.

Managing the memory in the hypervisor enables memory sharing across virtual machines that have similar data (such as redundant copies of the same guest OS in memory pages), memory compression, and memory balloon technique, whereby virtual machines that do not need all the memory they were allocated give memory to virtual machines that require additional allocated memory.

Over-allocating memory to virtual machines can waste memory unnecessarily, but it can also increase the amount of memory overhead required to run virtual machine, thus reducing the overall available memory for other virtual machines. Memory space overhead comes in two components:

  •     A fixed, system-wide overhead for the VMkernal
  •     Additional overhead for each virtual machine

Overhead memory includes space reserved for the virtual machine and various virtualization data structures, such as shadow page tables. Overhead memory depends on the number of vCPUs and the configured memory for the guest operating system. As an example, a running virtual machine with two vCPU and 16 GB of memory may consume approximately 143 MB of memory overhead. The graph below shows the configured memory in the first column and the number of vCPUs in the first row, the rows underneath show the amount of memory overhead in megabytes.

Another aspect to keep in mind, when you configure the memory size for the guest virtual machine, it creates a virtual swap file of equal size that is located with the virtual machine’s configuration file. These swap files can add up as you grow your virtual environment and it can consume a significant amount of storage capacity.

Proper sizing of memory for a virtual machine is based on many factors. Some applications are consistent in how they utilize resources and may not perform as expected with vSphere memory management techniques. Others, such as Web servers, have periods where resources can be reclaimed and are perfect candidates for higher consolidation ratios. For virtual machines running Search and SQL services that consume more memory resources than other applications, memory reservations may be considered to guarantee that those services have the resources they require while still allowing for high consolidation ratios of other virtual machine workloads.

In all cases, making sure the memory resources are being used effectively helps maximize the technology investment in the infrastructure hardware, while ensuring peak performance for business partners.

In conclusion, for many companies, their is an enormous disconnect between the application owners that believe they need a certain amount of resources for their virtual machines and IT operations ability to have a business process in place to make sure those resources are being used wisely and efficiently. In my next post, we will discuss the business approval process and utilizing vCenter Operations Manager.

Resources for this Post

News: Top vBlog 2016 Trending: DRS Advanced Settings