Saturday, February 21, 2015

Right-Sizing with vRealize Operations 6.0

With vRealize Operations Manager 6.0, VMware has redesigned the reclaimable capacity analysis badge and added additional functionality with the ability to set the CPU and memory on oversized virtual machines using the vCenter Python Actions Adapter.

Because we want to take advantage of vRealize Operations Manager ability to configure CPU and memory settings on our virtual machines, let's take a look at how to configure the vCenter Python Actions Adapter. Click on the Administration Link in the top left corner of the vRealize Operations Product UI. Select the vCenter Python Actions Adapter in the VMware vSphere Solutions Details window and click the Configure button, as shown in the image below.

Make sure you have the vCenter Python Actions Adapter selected, then click the green + icon to create a new instance. In the Adapter Settings, create a Display Name, add the vCenter Server FQDN or IP address, add the privileged credentials for the vCenter Server and then test the connection. When you configure the vCenter Python Actions Adapter, you must provide vCenter Server credentials that have sufficient privileges to connect and make changes to objects on the vCenter Server. After the test runs successfully, click Save Settings.

Now click on the Environment link and select a vCenter cluster. Select the Analysis tab and click on the Reclaimable Capacity icon. An ESXi host has resource capacity that can be under-committed or over-committed, the load of a virtual machine on the host may not require all the resources that it was originally configured with when deployed. By reclaiming these resources, we can use them elsewhere and have a hosting environment that runs much more efficiently. This can lower capital costs and in some cases improve virtual machine performance. The Reclaimable Capacity badge is the amount of reclaimable capacity that can be regained without causing stress or performance degradation. In the image below, you can see in my lab environment I have a reclaimable waste badge of 64, which is in the warning threshold. 

The Reclaimable Capacity badge score is made up of several different factors. It includes the oversized extra capacity (current capacity vs optimal capacity), idle capacity, powered off capacity, and unused file capacity. Unused file capacity includes old templates and snapshots that have not been accessed in a considerable amount of time. Like unused file capacity, reclaiming capacity from powered off virtual machines reclaims disk space.

I am using the default settings, the Capacity Rules for Analysis are set to flag as oversized when it has 50% reclaimable capacity, when the virtual machines have been idle for more than 90% of the time, and when they have been powered off for more than 90% of the time.

On the Reclaimable Waste page, we can see the Reclaimable Capacity Trend over the last 6 weeks. As we start to reclaim resources, the trend line should go down. As you deploy more virtual machines, if they are not sized correctly and flagged as over-sized, then you will start to see the trend line grow over time. In the middle of the screen we see the Reclaimable Capacity Breakdown, it tells us the amount or resources that can be reclaimed for CPU, memory, and disk space. The reclaimable disk space includes virtual machines that are powered off and snapshot space from old data.

At the bottom of the page, we recognize that the resource that has the most wasted capacity is memory. We have 28.75 GB provisioned and we can reclaim 18.38 GB, which is a 63.94% reclaimable percentage. 

If you scroll down the screen, you will see the child objects associated with the hosts. It shows that 84% of my objects are oversized and the remaining 16% is a single powered off virtual machine. The objects listed show the amount of CPU, memory and disk that is reclaimable. You will observe, the VMware vCenter Server Appliance has 5 GB of memory that we can reclaim.

Switching over to the Workload analysis for the VMware vCenter Server Appliance, it shows us that the demand for the guest is 836 MB (10% of the configured capacity) and the usage has been 2 GB (23% of the usable capacity). vRealize Operations Manager is configured to give us a yellow Reclaimable Capacity badge when there is more than 50% of usable capacity available.

Moving over to the vSphere Web Client, we can see that the VMware vCenter Server Appliance is configured with 8,192 MB of memory and 901 MB is being used.

But, as IT professionals, we want further details of the memory consumption for our vCenter Server Appliance over the past 7 days. On the right hand side of the screen, notice we have options for Further Analysis, I am going to select Virtual Machine Memory Diagnose.

Select the Virtual Machine Memory Demand Forecast Trend; in the image below we see that over the past 7 days memory has had a few spikes when I initially built my environment, it rose up to 2 GB, but the trend has typically been under 1 GB or 10% of capacity. vRealize Operations has calculated that we can take the peak of 2 GB for memory demand and add an additional 10% to ensure we don't experience performance degradation or stress to the system. That leaves us with 5 GB of memory to reclaim.

This is where the vCenter Python Actions Adapter comes into play. We are going to click on the Actions drop down list and select Set CPU Count and Memory for VM.

It brings us to the CPU Count and Memory for VM screen. Notice that it has already set the recommended configuration on the screen for us, it has configured the new CPU count to 1 vCPU and the new memory to 3,056 MB, which will reclaim 5 GB of memory. After we click OK, the vCenter Python Actions Adapter will configure the virtual machine. That is a major improvement in functionality, something that doesn't exist in vCenter Operations Manager 5.8.

This process should be a collaboration and not done in a silo. The impact of right-sizing a virtual machine and not understanding the application requirements can have a serious impact on production systems. For instance, an application batch process may run on a quarterly, semi-annual, or annual basis; and a database may perform at 0 to 5 percent utilization, all but a few days of the year (seasonal workloads). Also, if an application team has a new system that hasn’t gone “live”, the virtual machine may look like it is underutilized because the user load has not occurred on the virtual machine.
News: Top vBlog 2016 Trending: DRS Advanced Settings