Wednesday, November 26, 2014

Trickle Down Servernomics

VMware recommends that all hosts in a cluster have similar CPU and memory configurations to have a balanced cluster and optimal HA resource calculations. This will not only help you in the event of a physical server outage in a cluster, it can help improve performance by taking advantage of all the capabilities in your latest generation servers.

In order to have multiple processor architectures in a single cluster you need to enable Enhanced vMotion Compatibility (EVC) mode. EVC mode allows migration of virtual machines between different generations of CPUs, making it possible to aggregate older and new server hardware generations in a single cluster.

However, despite the obvious advantages of EVC mode, you need to factor in the costs associated with this feature. Some applications will potentially lose performance due to certain advanced CPU features not being made available to the guest, even though the underlying host supports them. When an ESXi host with a newer generation CPU joins the cluster, the baseline will automatically hide the CPU features that are new and unique to that CPU generation. The below table lists the EVC Levels and a description of the features that are enabled.

To illustrate some of the performance variations, VMware ran some test that replicated applications in our customer environments to find out the impact of EVC mode. They created several guest virtual machines to run workloads with different EVC modes ranging from Intel Merom to Intel Westmere. For the Java-based server-side applications, its performance on an ESXi host with processor as new as Westemere and as old as Merom had a negligible variation of 0.0007%. For OpenSSL(AES), the Intel Westmere EVC mode outperformed the other modes by more than three times. The improved performance is due to the encryption acceleration made possible by the introduction of the AESNI instruction set available on Intel processors – Westmere.

Tuesday, November 11, 2014

vCenter Operations Manager - Anomalies

For each attribute vCenter Operations Manager (vCOps) collects, it maintains thresholds of normal behavior, they can either be hard thresholds that you define or dynamic thresholds that vCenter Operations Manager calculates for the upper and lower range of normal behavior. The vCenter Operations Manager analytics engine pulls the full history of all the metrics amassed in the vCOps repository every 12 hours. It then runs the data history through eight difference algorithms, which determines the expected upper and lower level for that specific metrics for each of the 12 upcoming hours. Once completed, there is another algorithm applied that competitively scores each upper and lower level for each hour and selects which of the eight algorithms wins for that level for that hour. This process helps to produce the optimal hour-by-hour range for normal behavior, which is the dynamic threshold.

Learning behavior depends on the amount of differences presented, usually in the first week the system generates a basic understanding of thresholds in which metrics are classified. In the second week, the thresholds are validated. In the third week, the system reacts on abnormal and normal behavior. The more time the data evolves, the better it can establish the baseline of normal behavior.

When a metrics violates its attribute's threshold, vCenter Operations Manager generates an anomaly. It is a value that is out of the expected range. The Anomalies badge on the Operations tab can be moderately confusing; on the image below I have noted some of the key facets of the dashboard. On the sub-category of Memory under the Virtual Machine symptoms, you will observe the numbers (1 of 3). That is indicating that there are 3 child objects in the selected parent object. In the diagram below, I have selected the host and the child objects are the VMware vCenter Server Appliance, the vCloud Connector Node, and the vCloud Connector Server. The 1 is showing a single child object has the Memory anomaly. In the bar, you will notice 33%, that is the percentage of the child objects that have the Memory symptom: 1 child object / 3 total child objects = 33%.

Another key item is the (5 out of 7 Symptoms) for the Host System; a single sub-category only shows up to 5 abnormal metrics. For instance, in this case there are 7 total Symptoms, but vCenter Operations Manager is providing the top 5.

Anomalies badge score ranges are:
  • 0-50 Normal anomaly range
  • 50-75 Exceeds the normal range
  • 75-90 The range is high
  • 90-100 Most metrics are beyond their thresholds

News: Top vBlog 2016 Trending: DRS Advanced Settings