Wednesday, December 31, 2014

vRealize Operations 6.0 - Under The Hood

As this year comes to a close, I thought I would kick off a new series of blog posts on vRealize Operations 6.0. There has been a significant amount of change in vRealize 6.0. It was re-architected from the ground up with over a million lines of new code. With vRealize Operations 6.0, it merges the functionality of the UI virtual machine and the analytics virtual machine into a single scalable platform.

Each install of the software includes the entire stack of components; the UI (product and admin),the collector, controller, analytics, and persistent layer. 
  • User Interface: Admin UI and Product UI
  • Collector: Resource data collection initiated by adapters
  • Controller: Determines mapping for data insertion and queries
  • Analytics: Metric calculations, threshold processing, alert generation, and stats
  • Persistence: Each node persists its partition of data to disk
Gone is the postgress DB, it has been replaced by EMC Documentum xDB, which is a high performance and scalable native XML based database that is ideal for data-intensive uses. This is the database that will be used in future product releases; the intent is to have a uniform standard in our product platforms.

Some of the exciting new features in vRealize Operations 6.0 include its scalability and resiliency, uniform UI and functionality, actionable alerts, automated remediation, and user definable views, dashboards, and reports.

There are a few node types with vRealize Operations 6.0; they include the master node, data node, replica node, and the remote collector node.
  • Master Node: The initial, required node in the cluster. The master node manages all other nodes. In a single-node installation, the master node must also perform data collection and analysis because it is the sole node, and the only place where vRealize Operations Managers adapters are installed.
  • Data Node: In large deployments, additional nodes have adapters installed to perform the collection and analysis. Large deployments usually include adapters only on data nodes, not on the master node or replica node.
  • Replica Node: To enable high availability (HA), the cluster requires that you convert a data node into a replica of the master node.
  • Remote Collector Node: Distributed deployments might require a remote collector node that can navigate firewalls, interface with a remote data source, reduce bandwidth across data centers, or reduce the load on the vRealize Operations Manager analytics cluster. Remote collectors only gather objects for the inventory, without storing data or performing analysis.
With the new platform, you can scale up or scale out the environment. When deploying the .OVA file, you have the choice of creating the master node as an extra-small deployment to a large deployment. This option impacts the initial size of the node, if you go with an extra-small deployment it has 2 vCPU and 8 GB of memory and a large deployment has 16 vCPU and 48 GB of memory.  The resources needed for your deployment depend on several factors, you need to understand how large the environment is that you plan to monitor, how many metrics you plan to collect, and how long you plan to store the data. You can find the sizing guidelines on KB2083783.  After the vRealize Operations instance outgrows the existing size, you expand the cluster to add nodes of the same size. This is an important aspect to be mindful of; you cannot mix node sizes, if you start with a small deployment all the additional data nodes will be a small deployment.

A common practice will be to add nodes to monitor an environment, as it grows larger. vRealize Operations can scale out to 8 nodes, which can collect 64 thousand objects, 20 million metrics, and can accommodate 32 concurrent connections.

vRealize Operations 6.0 uses Gemfire Locator for connection information to one or more Controller API GemFire Cache Servers. Gemfire is the glue that logically connects all the layers together. It creates a shared memory cluster and maps the work across the systems. Gemfire remoting provides the map reducing technology, Gemfire caching handles the analytics layer, Gemfire persistence handles sharding of the data.

Let's look at how data gets into a multi-node vRealize Operations deployment. When a new resource is discovered, it comes into the collector with its defined metrics and properties and is added to the system. An auto-discovery task is created and sent to the controller, the controller routes the new resource to the master node. It then takes the resource kind and sends it to the Global xDB database. Afterwards, a resource cache is created in the analytics region on one of the data nodes.

For new data, after a resource has already been discovered, the metrics information will come in through the collector and go right down to the resource cache in the analytics region, after the threshold processing is completed it is persisted in the FSDB. If there is an alert trigger, that will get persisted in the alarm alert database.

To end the post, I want to talk a little bit about availability; this is with the high availability (HA) feature turned on. When using the high availability feature, it is going to cut your node scalability in half because it requires twice the amount of resources. Instead of having an 8-node deployment, which can handle 20 million metrics, you can deploy 4 data nodes that handles 10 million metrics. The other 4 nodes will be replica nodes.

In the diagram below, you will see there is a primary copy and a secondary copy. At the analytics layer, R1 is the primary copy and R1' is the secondary copy. It is split up across nodes similar to the process you find with data persistence in RAID arrays. Gemfire is used to decide where the primary and secondary resources are going to go, it balances it out between the nodes. The same happens at the persistence layer, you will notice again, we have the R1 primary resource and there is a secondary R1' resource on an alternate node.

All the user customizations go into the Global xDB (U1-3), these would be the custom views, dashboards, and reports. That does not get shared out across the nodes.

For the master node, the replica node is used as a backup to handle failover internally. Database replication is used to sync the Global xDB database. The best practice is to put the master and master replica on separate hardware.

 In my next post, we will start diving into all the changes that have been made into the UI.
News: Top vBlog 2016 Trending: DRS Advanced Settings