Tuesday, November 11, 2014

vCenter Operations Manager - Anomalies

For each attribute vCenter Operations Manager (vCOps) collects, it maintains thresholds of normal behavior, they can either be hard thresholds that you define or dynamic thresholds that vCenter Operations Manager calculates for the upper and lower range of normal behavior. The vCenter Operations Manager analytics engine pulls the full history of all the metrics amassed in the vCOps repository every 12 hours. It then runs the data history through eight difference algorithms, which determines the expected upper and lower level for that specific metrics for each of the 12 upcoming hours. Once completed, there is another algorithm applied that competitively scores each upper and lower level for each hour and selects which of the eight algorithms wins for that level for that hour. This process helps to produce the optimal hour-by-hour range for normal behavior, which is the dynamic threshold.

Learning behavior depends on the amount of differences presented, usually in the first week the system generates a basic understanding of thresholds in which metrics are classified. In the second week, the thresholds are validated. In the third week, the system reacts on abnormal and normal behavior. The more time the data evolves, the better it can establish the baseline of normal behavior.

When a metrics violates its attribute's threshold, vCenter Operations Manager generates an anomaly. It is a value that is out of the expected range. The Anomalies badge on the Operations tab can be moderately confusing; on the image below I have noted some of the key facets of the dashboard. On the sub-category of Memory under the Virtual Machine symptoms, you will observe the numbers (1 of 3). That is indicating that there are 3 child objects in the selected parent object. In the diagram below, I have selected the host 172.16.78.130 and the child objects are the VMware vCenter Server Appliance, the vCloud Connector Node, and the vCloud Connector Server. The 1 is showing a single child object has the Memory anomaly. In the bar, you will notice 33%, that is the percentage of the child objects that have the Memory symptom: 1 child object / 3 total child objects = 33%.

Another key item is the (5 out of 7 Symptoms) for the Host System; a single sub-category only shows up to 5 abnormal metrics. For instance, in this case there are 7 total Symptoms, but vCenter Operations Manager is providing the top 5.


Anomalies badge score ranges are:
  • 0-50 Normal anomaly range
  • 50-75 Exceeds the normal range
  • 75-90 The range is high
  • 90-100 Most metrics are beyond their thresholds

When vCOps algorithms determine that a combination of anomalies may indicate a real issue, it will generate an alert.

Like mentioned in one of my previous posts, just because you have anomalies doesn't indicate there is an issue or degraded performance. If a virtual machine hasn't gone live with an application, it may show a very low dynamic threshold for CPU or memory. When there are more concurrent connections accessing the application, you will get an anomaly if your resources are over the expected threshold for that hour. If your Workload and your Anomaly badges are both red, then there is something brewing that you want to investigate.
News: Top vBlog 2016 Trending: DRS Advanced Settings