As this year comes to a close, I thought I would kick off a new series of blog posts on vRealize Operations 6.0. There has been a significant amount of change in vRealize 6.0. It was re-architected from the ground up with over a million lines of new code. With vRealize Operations 6.0, it merges the functionality of the UI virtual machine and the analytics virtual machine into a single scalable platform.
Each install of the software includes the entire stack of components; the UI (product and admin),the collector, controller, analytics, and persistent layer.
- User Interface: Admin UI and Product UI
- Collector: Resource data collection initiated by adapters
- Controller: Determines mapping for data insertion and queries
- Analytics: Metric calculations, threshold processing, alert generation, and stats
- Persistence: Each node persists its partition of data to disk
Gone is the postgress DB, it has been replaced by EMC Documentum xDB, which is a high performance and scalable native XML based database that is ideal for data-intensive uses. This is the database that will be used in future product releases; the intent is to have a uniform standard in our product platforms.
Some of the exciting new features in vRealize Operations 6.0 include its scalability and resiliency, uniform UI and functionality, actionable alerts, automated remediation, and user definable views, dashboards, and reports.
The vCHS 1.0 plug-in I installed on my vCenter Server Appliance back in the Spring hasn't been working for a few months, so it is time to update to vCHS 1.5. In order for the plug-in to work, we need the latest version of the VMware vCenter Server Appliance, to make sure you are at the latest build version log into https:\\vcenteripaddress:5480.
Click on the Update tab, make sure that the Appliance Version is 5.5.0.20200 Build 2183109. If you are running an older version of the appliance, click on Check Updates and then Install Updates to acquire the latest build.
To start our process, we need to go to the
vCloud Hybrid Service vSphere Client
Plug-in Download page. From the Version 1.0 page, we need to download the Installer for vCloud Hybrid Service Plug-in.
What is a VMware Technical Account Manager?
The VMware TAM is part of the Professional Services Organization (PSO), we are the trusted advisors that help our customers get the most out of their technology investments in VMware products, which helps their organizations thrive. We work directly with our customers to ensure our technology solutions help meet their strategic business plans through a mutual partnership. As a VMware Technical Account Manager, we are responsible for providing technical solutions, advice, and managing relationships with VMware's largest and most strategic customers.
To be a great Technical Account Manager; you need to be entrepreneurial, possess excellent communication skills, fluent in building strong relationships, motivated to exceed your customers expectations, and don't forget the secret sauce "Technical".
For VMware Technical Account Managers, it is important that we help our customers realize their full potential.
VMware recommends that all hosts in a cluster have similar CPU and memory configurations to have a balanced cluster and optimal HA resource calculations. This will not only help you in the event of a physical server outage in a cluster, it can help improve performance by taking advantage of all the capabilities in your latest generation servers.
In order to have multiple processor architectures in a single cluster you need to enable Enhanced vMotion Compatibility (EVC) mode. EVC mode allows migration of virtual machines between different generations of CPUs, making it possible to aggregate older and new server hardware generations in a single cluster.
However, despite the obvious advantages of EVC mode, you need to factor in the costs associated with this feature. Some applications will potentially lose performance due to certain advanced CPU features not being made available to the guest, even though the underlying host supports them. When an ESXi host with a newer generation CPU joins the cluster, the baseline will automatically hide the CPU features that are new and unique to that CPU generation. The below table lists the EVC Levels and a description of the features that are enabled.
To illustrate some of the performance variations, VMware ran some test that replicated applications in our customer environments to find out the impact of EVC mode. They created several guest virtual machines to run workloads with different EVC modes ranging from Intel Merom to Intel Westmere. For the Java-based server-side applications, its performance on an ESXi host with processor as new as Westemere and as old as Merom had a negligible variation of 0.0007%. For OpenSSL(AES), the Intel Westmere EVC mode outperformed the other modes by more than three times. The improved performance is due to the encryption acceleration made possible by the introduction of the AESNI instruction set available on Intel processors – Westmere.
For each attribute vCenter Operations Manager (vCOps) collects, it maintains thresholds of normal behavior, they can either be hard thresholds that you define or dynamic thresholds that vCenter Operations Manager calculates for the upper and lower range of normal behavior. The vCenter Operations Manager analytics engine pulls the full history of all the metrics amassed in the vCOps repository every 12 hours. It then runs the data history through eight difference algorithms, which determines the expected upper and lower level for that specific metrics for each of the 12 upcoming hours. Once completed, there is another algorithm applied that competitively scores each upper and lower level for each hour and selects which of the eight algorithms wins for that level for that hour. This process helps to produce the optimal hour-by-hour range for normal behavior, which is the dynamic threshold.
Learning behavior depends on the amount of differences presented, usually in the first week the system generates a basic understanding of thresholds in which metrics are classified. In the second week, the thresholds are validated. In the third week, the system reacts on abnormal and normal behavior. The more time the data evolves, the better it can establish the baseline of normal behavior.
When a metrics violates its attribute's threshold, vCenter Operations Manager generates an anomaly. It is a value that is out of the expected range. The Anomalies badge on the Operations tab can be moderately confusing; on the image below I have noted some of the key facets of the dashboard. On the sub-category of Memory under the Virtual Machine symptoms, you will observe the numbers (1 of 3). That is indicating that there are 3 child objects in the selected parent object. In the diagram below, I have selected the host 172.16.78.130 and the child objects are the VMware vCenter Server Appliance, the vCloud Connector Node, and the vCloud Connector Server. The 1 is showing a single child object has the Memory anomaly. In the bar, you will notice 33%, that is the percentage of the child objects that have the Memory symptom: 1 child object / 3 total child objects = 33%.
Another key item is the (5 out of 7 Symptoms) for the Host System; a single sub-category only shows up to 5 abnormal metrics. For instance, in this case there are 7 total Symptoms, but vCenter Operations Manager is providing the top 5.
Anomalies badge score ranges are:
- 0-50 Normal anomaly range
- 50-75 Exceeds the normal range
- 75-90 The range is high
- 90-100 Most metrics are beyond their thresholds
Like mentioned in my previous post; virtualization helped datacenters provide rapid deployment, increased business continuity, and provided a tremendous amount of capital savings with the reduction of hardware. However, with the substantial benefits and ease of deployment came virtualization sprawl and resource proliferation. Right-sizing is the process of reclaiming under-utilized resource components, such as compute and memory resources. In conjunction, there should be a process in place to validate that a guest virtual machine is still required by the business, this is typically considered a recertification.
A regular right-sizing lifecycle on a quarterly or semi-annual basis can ensure maximum performance of your workloads and efficient use of your underlying hardware. But, in order to make certain you don’t impact the business, you are going to want a structured process to understand the application workload.
Business Approval Process
Resource Reclamation
Right-sizing of virtual machines, should be done on a routine basis; such as monthly, quarterly, or semi-annually. This is done to ensure application owners and business partners have an opportunity to control virtual machine costs and help make the underlying infrastructure to run efficiently.
In the past 35 years, IT organizations have evolved from a narrowly focused data processing elements to a function that supports, and in many cases, drives, nearly every area of a company. But with this increase in technology dependency, the number of applications and therefore servers, supported by IT increased dramatically, placing strains on datacenter floor space, power, and operational support. Over the past 15 years, virtualization has helped businesses make the most out of their technology investments, became a disruptive technology in datacenter consolidation, and relieved pressure on IT operations.
However, without IT governance in place to measure the efficiency of the hosting resources, IT organizations are now faced with virtual machine sprawl and resource waste. At one of my accounts, they have seen an 81% growth rate in virtual servers over the past five years. Adding nearly 2,000 virtual machines in the past two years. That places a tremendous amount of stress on IT operations staff and the infrastructure resources.
It is very important to properly size your virtual machines from a vCPU and memory perspective to get the most out of your virtualization infrastructure, while keeping application users happy with the performance. We also need to ensure there is a life-cycle management process for the virtual machine. This is done by having a mature right-sizing and recertification process in place, and using tools like vCenter Operations Manager to understand the guest workload.
VMware CloudVolumes is a great addition to the EUC portfolio, it is a mechanism to provide application abstraction; which helps ease the burden of application life-cycle management, delivers applications quickly, and creates portability of an end user's entitled applications. The acquisition of CloudVolumes was announced just before VMworld US 2014.
Let's look at the traditional method for application integration into a Windows operating system. The applications, settings, user's profile, and in many cases user data are tightly coupled with the OS; and the overall user experience. Unfortunately, this practice makes it so that the applications can only be associated with a single system. With the CloudVolumes application model, it decouples the applications from the Windows operating system and places them into AppStacks. From a conceptual point of view, it isn't much different than the abstraction of the operating system from the underlying hardware, which we are familiar with in traditional hypervisor technology. It also uses an entirely different container for persisting user changes between sessions.
You can combine your core applications into a single AppStack, making it easy to deploy to users by using Active Directory object assignments. Applications are delivered through VMDK virtual disks. CloudVolumes can dynamically attach the virtual disks to a VDI or RDSH desktop, even when users are logged into their entitled virtual workstation. You can make these updates immediately, or on next login or reboot.
Overall vSphere 5.5 Enhancements
- Virtual Machine Compatibility ESXi 5.5 (vHW 10)
- Expanded vGPU and GP-GPU Support
- Hot-Plug SSD PCIe Devices
- Support for Reliable Memory
- New Single Sign On
- OSX support for vSphere Web Client
- vSphere App HA
- Support 62TB VMDK
- 16GB E2E Support
- MSCS supportability enhancements
- Storage vMotion and SDRS compatibility
- VAAI UNMAP and VMFS Heap enhancements
- Enhancements to LACP feature
- Enhanced SR-IOV
- Traffic Filtering
- QoS Tagging
- Host Level Packet Capture
- 40 Gig Support
What if you wanted to troubleshoot an application issue that was happening on a regular basis in your production environment? You could create a custom vR Ops dashboard that would show the counters you wanted to measure from both a host and a virtual machine perspective. With this type of visibility, you would have all the performance details you require when working a major incident call.
To start with, we are going to launch our vRealize Operations Custom UI, which is https://(vRealize Operations host)/vcops-custom/. At the end of the screen, you are going to click on the + tab after the last dashboard to edit a new dashboard tab. For our new dashboard, we are going to supply a Tab name and select the 1 Column layout. Next, we are going to drag over two Resource widgets onto our new dashboard tab.
In order for us to focus on a specific subset of virtual machines and hosts that make up our application, we need to create an application grouping. Click on Environment and then select Application Overview. To add a new application, click on the button with the green + symbol.
One of my customers wanted to create Microsoft Excel reports and charts from the All Metrics reports in vCenter Operations Manager. This gives you the ability to drill down into the numbers at a more granular level and create custom graphs in Microsoft Excel. When troubleshooting a problem in the environment, having this information can be exceedingly valuable, especially if you can pinpoint the time frame when the anomaly happened. With that information in hand, you can work directly with the application teams or change control coordinator to analyze the changes that occurred in that window.
In vCenter Operations Manager under the Operations tab, click on All Metrics. Then select the specific metrics you want to report on from the Metric Selector. You will notice in the image below, I have created a graph on my vCenter Operations Manage Analytics VM for Memory with the Guest Active (KB) counter. The date range I have specified is the Last 12 Hours. Once you have the graph in the Metrics Chart, you are going to click on the green down arrow icon, Download comma separated data.
This will save it to an Excel spreadsheet. You can than select data and insert a chart to map out the data points. If you choose the Recommended Charts, it will select one that is relatively close to the graph that is in vCenter Operations Manager.
One problem I have run into frequently is esx.problem.visorfs.ramdisk.full. This is when the ESX RAM disks that make up the ESXi host file system reaches its resource pool limit and the host becomes unresponsive. The virtual machines are online and available to the business users, but you are no longer capable of managing the host. In a corporate environment, that means you have the choice of restarting the server outside of normal change windows after hours, which will cause an outage to the virtual machines; or leaving the host online until the next approved change window and taking the chance that if something happens to the virtual machine during production hours you won’t be able to fix the problem.
If you catch this problem early enough, you may be able to get to the host before it has becomes fully inaccessible. The server may struggle with vMotion activity when putting it into maintenance mode, but you can move the virtual machines and restart the host without affecting the business community until you can get the underlying issue resolved.
Two of the issues I have come across recently are:
- VMware ESXi 5.x host becomes unresponsive when attempting a vMotion or a configuration change - This issue occurs when SNMPD is enabled and the /var/spool/snmp folder is filled with Simple Network Management Protocol (SNMP) trap files. This issue is resolved in ESXi 5.1 Patch 04. Detail article explaining the entire process can be found here - http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2040707
- ESXi ramdisk full due to /var/log/HpHelper.log file size - File (located at /var/log/) grows excessively. Error Unable to connect to the MKS: To resolve this issue, HP Support has provided an updated hp-ams VIB to stop the excessive logging to the hpHelper.log file. Detailed article explaining the entire process can be found here - http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=2055924
I am excited about the future VMware vRealize Air offerings! What is VMware vRealize Air? It is VMware's SaaS based solutions for their management stack, which was announced at VMworld 2014. It is going to help companies work faster, be more flexible, and adopt new capabilities. You can register for the vRealize Air Automation Beta at the vRealize Air web page.
There are many reasons for organizations to consider moving from traditional IT to cloud computing. One of the most cited benefits is the economics, some of the key factors that contributed to the cost savings, especially when you consider SaaS based solutions, are lowering the cost of running the technology, allowing for a shift from capital expenditure to operating expenditure, and giving organizations the ability to add business value by renewed focus on strategic activities.
This is where VMware vRealize Air is going to come into play in the future. VMware is planning on delivering a comprehensive cloud management platform for managing hybrid and heterogeneous clouds as a service. It is a new way of consuming VMware's cloud management solutions. Although this solution could be considered for large organizations, I believe this is a great opportunity for mid-market and commercial organizations to get the full benefit of the management component of the software-defined datacenter. VMware research shows that 20% of customers prefer SaaS based offerings.
Personally, I maintain a firm belief that if an application doesn't provide some type of business differentiator or holds a significant financial impact to the company, then a cloud strategy with SaaS based solutions should be considered.
So what is EVO RAIL? EVO Rail is a rapid deployment, configuration, and management engine which enables hyper-converged infrastructure. It combines compute, network, and storage virtualization with vSphere and Virtual SAN. EVO Rail is going to transform and simplify IT operations. You will be able to power-on the hyper-converged device and create virtual machines in minutes, it provides easy VM deployment, and one-click non-disruptive patching and upgrades.
If you are not excited yet you should be! Hyper-converged architecture is going to be a non-disruptive, disruptive technology.
EVO Rail is a software package that is going to work on partner devices. These 2U/4 Node hardware platforms will be optimized for running EVO Rail. They will be 4 independent nodes for compute, networking, and storage; each node has dual processors with 192 GB of memory, and a total of 16 TB of flash and magnetic storage delivered by VMware Virtual SAN.
EVO Rail will scale-up to four hyper-converged infrastructure appliances (HCIA). New appliances are automatically discovered and added to the cluster with zero configuration. Each device will be accommodate roughly 100 general purpose servers or 250 VDI instances; with a maximum of 400 general purpose servers or 1000 VDI instances for a 4 appliance - 16 node configuration.
Here are some of the high level specifications:
VMware is trying to change the operational model of storage. VMware is introducing a virtual data plane to abstract and pool the physical storage resources to make things easier from an operational standpoint, and much simpler for the IT professional. From Virtual Data Services perspective, VMware is trying to make things more VM centric and provide services like data protection, performance, and mobility.
One of the big keys to software-define storage is the Policy-driven Control Plane, the capability of provisioning storage based on VM business requirements and orchestrating the entire process.
VMware Virtual SAN is pretty unique in the marketplace; it allows you to aggregate locally attached storage from each ESXi host in a cluster, with a flash optimized storage layer. VSAN was designed for resiliency with a Distributed RAID Architecture to help ensure no single points of failure. I will discuss this further, later in the post.
Virtual SAN is not a traditional virtual storage appliance (VSA); it is fully integrated into vSphere, with drivers embedded in ESXi 5.5 that contain the Virtual SAN smarts.
Lowering TCO with Virtual SAN comes down to hardware. For VSAN, there is a requirement for magnetic devices, flash devices, network cards, storage controllers that support pass-through mode or RAID0 mode, and it is recommended to install ESXi on a 4 GB to 8 GB USB, SD card, or SATADOM. The minimum amount of hosts is 3, which are contributing storage; this is an important consideration when thinking about VSAN for remote and branch offices, and small business with just a few virtual instances. Virtual SAN may not meet those specific use cases.
Today, I thought I would write about VMware Virtual Volumes and the impact I think it is going to have on the way we manage storage in the future. Since VVols are currently under development from VMware and storage providers, the final product is not available. However, this was a topic that was discussed at EMC World 2014 and I am sure will have much more coverage at VMworld 2014. VMware as a company is trying to address several key pain points with storage; which includes cost, storage management complexities, and the difficulty of ensuring predictable performance. VMware believes it is strategically positioned to solve these problems at the virtual layer.
How do existing environments look today? Almost 98% of organizations have a 1 to 1 mapping between a datastore and a LUN. That is how we have been working with storage and virtualization for a long time. All your data services; including snapshots, cloning, replication, and recovery are done at the datastore level. This was a storage paradigm that was introduced by vSphere.
The new method being developed by VMware is a "per VM" storage approach. The data operations will be taking place at the VM/VMDK level rather than on the entire LUN/datastore level. This will give the vSphere Administrator the ability to provision compliant storage policies, based on business requirements on a "per VM" basis. This is a completely new concept. We are now moving away from the thought process of LUNs and datastores.
You will be able to provision data services on a fine granular level. For example, think of the possibility of replicating one or two applications instead of an entire datastore. It has a dramatic impact on the efficiency of storage, which influences the business performance by reducing infrastructure costs.
In my previous post The Cost of Business; I went through the elements I used for creating a pricing structure for show-back or charge-back.
It is important to understand the underlying costs for technology investments. Stephanie Overby at CIO magazine states, "Business outcomes from technology investments are all that really matter." IT spend should provide the ability for an organization to achieve or exceed its business objectives.
So what is a business outcome? Typical business outcomes include capital hardware and software avoidance, factory or application uptime, time to market, opening new market segments, optimizing existing markets, ect...
With that in mind, I wanted to take a look at the internal cost structure we created for our bronze SLA tier and compare that to a resource we can purchase from Amazon's EC2. Understanding our underlying costs will help us in our decision making process for deciding which infrastructure components we may want to move to the public cloud.
From Amazon EC2, I decided to go with a small windows virtual machine. I selected an On-Demand instance with (no Contract), which is 0.036 an hour. The 1-year cost for the compute resource is $316.32.
I also included 25 GB of magnetic storage, which comes to $1.25 a month. My total Amazon EC2 Service bill for the (US-East) is $27.61 a month.
In my role as a VMware Technical Account Manager, I like to provide my recommendations for the upcoming VMworld 2014 sessions. Here is a list of 41 sessions that I think are worth checking out!
TAM - VMworld 2014 Session Recommendations
1. HBC2708 - Customer Case Study & Demo on Application Mobility: How to seamlessly move applications and stretch networks to vCloud Hybrid Service
Hear firsthand from customers on how they have seamlessly extended their Data Center to vCloud Hybrid Service.See a live demo that showcases the experience of application migration to vCloud Hybrid Service. In the session we will demonstrate how to move VM templates, applications, networks and policies from your data center to vCloud Hybrid Service. The session will include customers talking about their experience with moving applications and stretching their Layer 2 networks on to vCloud Hybrid Service. We will show their use cases and how they truly extended their data center to vCloud Hybrid Service. The live demo will include - Connecting a vSphere environment to vCloud Hybrid Service - Selecting and moving applications to vCloud Hybrid Service - Selecting and stretching L2 networks from your data center to vCloud Hybrid Service.
Allwyn Sequeira, VMware
Serge Maskalik - Sr. Director of Engineering, VMware
2. HBC1533 - How to Build a Hybrid Cloud - Steps to Extend Your Datacenter
This session will help attendees understand the various steps to build and extended datacenter to vCloud Hybrid Service. Generally, speaking customers find it easier to view the vCloud Hybrid Service cloud as just another datacenter. However instead of being physical it is all software defined and available on demand. We will leverage examples and use cases from current customers, as well as, review a specific setup the vCloud Hybrid Service Technical Marketing team has built in a “Customer” Lab. We will also explore the networking specifics, considerations, and other options to truly build a hybrid cloud.
David Hill - Senior Technical Marketing Architect, VMware
Chris Colotti - Principal Technical Marketing Architect, VMware
In my mind, understanding the cost of running infrastructure operations and cost transparency is one of the most critical aspect of transforming IT operations into a service provider. If you can't provide the underlying value of the technology units being consumed by the business and the overall business value associated with the services provided, than the line of business is going to view you as overhead.
IT cost transparency is a key factor in being able to communicate the value of IT. When there is no overall cost of IT services being provided by the operations organization; it is impossible for IT decision makers and organizational leaders to measure the value of striking a balance between price and performance to run current applications, grow the business, and transform product lines to serve new customer segments with new products and services.
While my approach may not work for everyone, in my previous role in IT leadership, I worked on a method for providing cost transparency of the services my team was looking to provide. I wanted to share this with you!
I used server density ratios based on virtual machine vCPU ratio to physical cores. In my scenario, I am going to use a conservative 3:1 production virtual machine workload ratio for my silver level virtual machines, which have a 4 year lifecycle.
Here are the components I am going to include in my cost analysis:
- Physical Server Cost
- Sales Tax
- Server Power
- VMware vCloud Suite Standard License
- Windows Data Center License
- vC Ops License
- Norton Anti-Virus License for all VMs
- VMware TAM Support
- Microsoft TAM Support
In addition, I include a Cluster HA Charge for maintenance and a level of risk avoidance.
An aspect of vCenter Operations Manager that provides measurable benefits is Groups, which allows you to build an application group to make it easier for IT professionals to determine how applications are affected when change occurs. An application is a logical grouping of resources that represents a critical business application or business service. In software engineering, the standard multi-tier architecture includes web servers (presentation), application servers (application processing), and database servers (data management).
After you have specified an application group, you can view the real-time analysis across all tiers and resources that are contained within the application. This gives you the ability to get an early understanding of a major change that is occurring across an application, which might be an indication of a cascading performance problem.
In general, when you are going to create an application group there are a couple of approaches to consider for the design.
The first is a business reason, this usually encompasses all the components that make up a business application as it is seen by the application owner. Like mentioned earlier, this is the typical client-server model with the presentation, application, and data management functions. For example, this may include the application resources that make up the healthcare system Epic; or the systems that support an internally written enrollment application for benefits.
Most companies have a high reliance on the operations infrastructure that they have created. Outages of phone systems, e-mail, customer facing applications, and ERP systems can cripple a company.
Capacity management includes establishing and maintaining a safe and reliable amount of resources to meet the business demand. Demand management is an important component of providing reliable computing services. It requires a variety of non-IT oriented skills and knowledge, and is therefore an often-neglected area. However, it is becoming significantly more important as virtual machine growth weighs down IT staffing ratios and infrastructure budgets. For large organizations, most capacity management is done with spreadsheets; fortunately vCenter Operations Manager provides information to better account for the needed resources and capacity.
Enhancing data for capacity in vCenter Operations Manager involves finding opportunities for resource optimization and cost savings. On the Planning tab of VMware vCenter Operations Manager, you can monitor the use or resources and the available capacity in your virtual environment, and plan for capacity upgrading or optimization. On the Reports tab, you can create a report to capture the details relates to current or predicted resource needs and export that report to a file.
What's vSphere Big Data Extensions?
VMware vSphere Big Data Extensions (BDE) is a feature within vSphere to support Big Data and Hadoop workloads. BDE provides an integrated set of management tools to help enterprises deploy, run, and manage Hadoop on the vSphere platform. Through the vSphere vCenter user interface, enterprises are able to manage and scale Hadoop seamlessly. By combining with vCould Automation Center, we can also provide an on-premise Hadoop as a Service solution for Hadoop users.
What's new in BDE 2.0?
Support for the latest Distributions of Apache Hadoop 2.0 Software. In additon to the previously supported Hadoop distributions, Big Data Extensions users may now also deploy and manage Apache Bigtop 0.7.0, Cloudera CDH5, Hortonworks HDP 2.1, MapR 3.1, and Pivotal PHD 2.0.
CentOS 6.4 Operating System for the Hadoop Template Virtual Machine. The Hadoop Template Virtual Machine now uses CentOS 6.4 as its default operating system. This provides an increase in performance, as well as native support for all Hadoop distributions for use with Big Data Extensions.
I wanted to follow my previous two posts with vSphere Performance Troubleshooting. There are three golden questions when doing general performance troubleshooting.
- What are the symptoms?
- When did it work last?
- What has changed since it worked?
Don't let the user speculate on the root cause of the problem. Get a specific description of what they are doing and specifically what symptoms they see. Make sure you have an open mind, consider all the components involved, while focusing on the symptoms.
Performance issues are very subjective, it basically comes down to the fact that the end user is not happy. When diagnosing performance problems in vCenter, vCenter Operations Manager, or esxtop there are thousands of metrics you can look at for issues. But, very few of them are going to actually tell you, "Is the user happy?"
We are going to dive a little deeper into the Workload badge, because it is fundamental to understanding the health of your infrastructure. Let's go back to the Operations tab and click on the Workload badge.
On the right hand side of the pane, you will notice CPU, Memory, Disk I/O, and Network I/O. For each category, there is a bar that illustrates the host performance levels and a bar that illustrates the virtual machine performance levels.
The Demand is what is green, the usage is in grey, and the configured amount is the white background. Pop Quiz - what is demand and what is usage? The demand is what is being requested and the usage is what is being delivered. In our case above, the demand from the virtual machines is 395 MHz (4% of Configured) and the host is delivering 396 MHz (4% of Configured). Because the demand is about even with the usage, it seems very unlikely that there is any performance degradation to the application owners. None of the virtual machines are suffering because they aren't getting the resources requested. Now if you mouse over one of the virtual machines, it gives you the amount of MHz being consumed by the specific virtual machine, in this case my VMware vCenter Server Appliance.
vCenter Operations Manager 5.8 (vC Ops) is a tool from VMware that collects massive amounts of data from a variety of sources. You might wonder what is the difference between the metrics collected from ESXi by vCenter server and the metrics collected by vCenter Operations Manager? VMware vCenter shows you a lot of different metrics for the past hour in 20 second increments, if you start to research information further back in time it reveals less metrics and the data points become more averaged out. For example, the past day has 20 second intervals, the past week shows 30 minute intervals, and the past month shows two-hour intervals in vCenter. A two-hour long average can hide a lot of peaks and valleys, it might be good for some general capacity planning, but it isn't good if you are trying to troubleshoot the root cause of an application performance issue. It is simply to large an interval, you need a much finer data sampling. That is where vC Ops comes in!
vCenter Operations Manager does three things differently. It keeps all the metrics, it keeps five minute intervals, and it keeps them for six months. You can retroactively go back and tell an application owner if he was having a performance problem at a certain time. It is going to give you a lot more confidence about providing relevant information to your IT business partners.
Another fearture with vCenter Operations Manager is dynamic thresholds. vC Ops takes the collected metrics and it looks for patterns over time. It can then make predictions for the future with these patterns that help you proactively maintain your environment.
Under the Operations tab there are four badges, which include Health, Workload, Anomalies, and Faults for every resource. Health is nothing more than the aggregate of the other three badges and is scored 0 to 100, with 100 being the best score.
Faults show alert information, such as the link state down. When an event triggered alert occurs, it does not automatically clear by design, this is to ensure that someone looks at the state and takes corrective actions so it doesn't happen in the future.
For most operations teams, one of the top goals is to ensure quality of service for infrastructure, applications, and desktops. When you look at the day in the life of the typical operations engineer, there are two tasks that ensure the business end users are happy, one is reactive problem solving and the other is proactive maintenance.
Reactive problem solving generally starts with an alert if you have a monitoring system in place or if you don't, likely from the user facing the performance problem. The job of the operations practitioner is to detect the problem (such as slow performance), isolate the issue, and then remediate the issue (like rolling back a patch).
The other solution is proactive maintenance to avoid the problems from happening. This task involves planning (such as looking at utilization), optimizing (reclaiming resources), and even automating the maintenance. In my previous role in IT leadership, we called this being a "Good Shepard".
That is where vCenter Operations for Horizon View comes into place, vC Ops for View monitors the entire infrastructure stack building correlations between observed performance metrics and the end user experience. All the data is organized, maintained and displayed by user. vC Ops for View, after the first two weeks, establishes dynamic thresholds for resource consumption by user.
Several dashboards that are specific for vCenter Operations Manager for Horizon View appear in the customer user interface when you install the Horizon View adapter. Administrators can change the default number of widgets and types of metrics that appear on each dashboard, and create their own custom dashboards.
The View Main dashboard shows the overall status of the Horizon View environment. It helps you to visualize the end-to-end Horizon View environment, its underlying environment, and alerts.
Version 1.5 focused on improved scale to support the current Horizon View 5.2/5.3 configuration complexity. It leverages the scalability gains made in vCenter Operations Manager 5.8. The new vC Ops for View can support up to 7500 concurrent sessions.
I am inquisitive by nature, I think as technologist, most of us like to figure things out. As Albert Einstein said, "I have no particular talent. I am merely inquisitive." So with that perspective, I wanted to see if I could take the vSphere lab that is running on my notebook and connect it to my vCloud Hybrid Services (vCHS) account.
The nested lab on my notebook consists of the two ESXi 5.5 Advanced hosts, vSphere Appliance 5.5 with the vCHS plug-in, vCenter Operations Manager Advanced 5.8, vCloud Connector Server, vCloud Connector Node, and a couple of other virtual machines. All the virtual machines I have running on my notebook are virtual appliances to reduce overhead.
My notebook is a late 2013 MacBook Pro with 2.6 GHz Intel Core i7 processor, 16 GB of 1600 MHz DDR3 memory, and the NVIDIA GeForce GT 750M 2048 MB. For the most part, it handles everything I have in my lab environment without an issue.
Additionally, I have a vCloud Hybrid Service account with a Virtual Data Center that has 10 GHz of CPU, 20 GB of memory, and 2 TB of storage.
Since its introduction into the VMware portfolio, Horizon Mirage has been one of my favorite products. If you aren't familiar with Horizon Mirage, it is a software solution that was introduced to provide layered, single image management to end user personal computers. Horizon Mirage compliments Horizon View; both products combined provide a complete management solution for both desktop virtualization and physical endpoints.
In the diagram below, the green layers are managed by IT operations and the orange layers are unmanaged; however they are continuously backed up to the Mirage server.
I go into more detail of Horizon Mirage's capabilities in the following posts Horizon Mirage, Mirage Windows 7 Upgrade,
and Mirage Endpoint Protection. Today, I wanted to share some of the new features that were announced with Horizon Suite 6.
Horizon Mirage now provides the capability for image management of virtual desktops. This offers single image management to physical workstations, virtual desktops, and local virtual desktops utilizing VMware Fusion Pro. It has become a single tool for IT professionals to manage desktop images, and update images without wiping out user installed applications and data.
I recently attended a briefing for the EMC Elect on the new VNXe system. EMC is bringing enterprise-class storage capabilities to the small and medium-size businesses. The new VNXe3200 system provides affordable pricing, simple integration, and efficient data storage in a flash optimized hybrid VNX storage system.
The
VNXe3200 has efficiency features, which are important to all size business. These features help lower the overall
cost of your data center infrastructure, which is important when using
show-back to provide cost transparency to your business partners. The VNXe3200
hybrid system will lower the overall cost per IOPS and GB by leveraging
just a small amount of flash and using the FAST Suite. The system was
not only re-architected to support flash drives, it was optimized for
flash by automating and auto-tiering data into and out of flash drives
based on hot spot recognition.
It also boasts up to a 50% reduction in capacity requirements by
exploiting advanced capabilities like thin provisioning and file dedupe.
Additionally, EMC has added MCx, Fibre Channel support, and enhanced software functionality like security and compliance, monitoring and reporting, and snapshots for no additional cost.
The below infographic distills InformationWeek’s 2014 Strategic CIO Survey results into top focus areas. Perhaps not surprisingly, for IT execs, cutting costs ranks at the top of the list. What else are top priorities? Speed to market, insufficient budgets, and the skills gap.
Of course, most of these priorities are not new to corporate organizations, but what is fluctuating is the recognition that the way IT does business needs to change. I couldn't agree more with looking at IT organizational structure to determine if it is setup to meet today's technology challenges and service oriented mindset. The way IT delivers services to the business is significantly different than it was 5 to 10 years ago, and the legacy silo structure doesn't work effectively in today's environment.
In a past article, I reviewed the Horizon 5.2 POD architecture. Horizon View components fit together within a physical architecture, based on the concept of building blocks and pods. It is a scalable approach which allows IT operations to build out their VDI enviroment as end users move to virtual desktops.
A Horizon View pod integrates five 2,000-user building blocks into a View Manager
installation that you can manage as one entity.
By taking this
approach, you can be fairly user-agnostic and deploy additional blocks
or pods with varying performance characteristics, geographical
locations, or access mechanisms as required.
A typical
Horizon View deployment can consist of 500 to 10,000 virtual desktops
hosted across a single or multiple ESXi clusters managed by a management
building block.
The Cloud pod architecture extends the entitlement capabilities for Horizon desktops across datacenters and physical sites with Global Entitlements.
VMware had some very exciting news yesterday with the announcement of VMware Horizon 6! VMware
Horizon is a family of virtual desktop and application solutions
designed to deliver business applications to end users. With
Horizon, VMware extends the power of virtualization from data centers to
devices. It delivers desktops and applications with great user
experience, manageability, and flexibility. The endpoint conversation is no longer about VDI, it is about delivering application and mobility services to your business users that fit their specific needs.
I know I preach about this on a regular basis, but operations needs to be seen as delivering services to the business and get away from the perception of executing tasks. This helps them become much more strategic partners to their organizations. Horizon 6 will help transform end user computing to offer a portfolio of delivery options for application services.
Horizon 6 allows IT professionals to deliver virtual desktops or applications through a unified platform, the Horizon Workspace, to their corporate users. While accessing virtual desktops isn't something new with Horizon Workspace, the application services have been greatly expanded to include RDS-hosted applications, SaaS and web applications, Office 365, Google Apps, ThinApp packaged applications, and even XenApp applications from Citrix; all delivered from a single platform. This provides corporate business users with all the resources necessary to enable them to work effectively from any device, at any time, from anywhere. VMware Horizon 6 is going to raise the bar for delivering mobile services to business professionals and aligns nicely with the new era of the software-defined enterprise.