Thursday, May 29, 2014

vCenter Operations Manager Workloads

We are going to dive a little deeper into the Workload badge, because it is fundamental to understanding the health of your infrastructure. Let's go back to the Operations tab and click on the Workload badge.


On the right hand side of the pane, you will notice CPU, Memory, Disk I/O, and Network I/O. For each category, there is a bar that illustrates the host performance levels and a bar that illustrates the virtual machine performance levels.


The Demand is what is green, the usage is in grey, and the configured amount is the white background. Pop Quiz - what is demand and what is usage? The demand is what is being requested and the usage is what is being delivered. In our case above, the demand from the virtual machines is 395 MHz (4% of Configured) and the host is delivering 396 MHz (4% of Configured). Because the demand is about even with the usage, it seems very unlikely that there is any performance degradation to the application owners. None of the virtual machines are suffering because they aren't getting the resources requested. Now if you mouse over one of the virtual machines, it gives you the amount of MHz being consumed by the specific virtual machine, in this case my VMware vCenter Server Appliance.



What if the demand is close to the right? It changes color to warn you that you are getting close to not having the ability to meet the demand. If I mouse over the host memory bar, it shows us that the demand for the Memory is 4,720 MB, and since this is the lab on my Macbook Pro that is close the amount I have allocated for this particular host, which is 6,095 MB. 


Taking a closer look at the Memory on this host, we can see that there is a little blue bar at the bottom which indicates the demand threshold. This is the range it is expecting the host to be working within that day. We can also see that even though we don't have much capacity remaining, that the usage is still higher than the demand which should indicate that there isn't a performance problem to our business partners.



Let's dive into the memory a bit more,  it is one of the hardest metric to understand conceptually because of the way hypervisors work. If we look at the diagram below, we can see the demand is for 2,349 MB and the usage is 7,262 MB. What the heck? The virtual machines are looking for 2,349 MB and the host is delivering 7,262 MB of memory?


ESXi allocates physical RAM only as needed. My vCenter Server Appliance has 8,192 MB of memory allocated, but it is currently only demanding 738 MB of memory. Because the virtual machine has touched 7,887 MB of physical memory that is the amount allocated. The usage is the memory address blocks that are being held in physical memory, but may not be in active use by the virtual machine. The normal demand is between 227 MB and 1,507 MB, however the usage is 7,887 MB which is 96% of what is configured.


Again, the demand is the recent activity for memory and the usage is the amount of memory that has been used, whether it has been touched recently or not. That is why the usage is higher than demand for memory, this is common. Another thing to keep in mind, ESXi only allocates memory as needed, if it has plenty of memory resources it doesn't bother to reclaim it. But, if there is memory contention, it does go through a process of reclaiming some of the memory.

Workload is always demand divided by capacity, when we start to look at Disk I/O we have to remember that hosts have multiple datastores and it could have different types of disk that provide different levels of I/O. Disk I/O is very complex to measure. How many datastores does it have? How many hosts are sharing the data stores? How much cache is on the storage array? How many other servers are sharing the same storage processor ports? There are a lot of factors that go into how many I/Os this host should provide. The way vCOps does that is by estimating the past behavior. It takes the highest percentage over a 20 second interval and then averages that out over a 5 minute interval, it then compares that to the maximum reading it has received in the past.



Network I/O is pretty similar, it calculates based on the maximum value it has experienced in the past.

For more detailed information and longer range views, we will go into the All Metrics button on the Operations page. With the Metric Selector, we can drill down on specific metrics for a higher level of detail and specify the date range we want to observe.



In my next post, we will take about some of the key performance metrics to check for infrastructure problems.
News: Top vBlog 2016 Trending: DRS Advanced Settings