Each install of the software includes the entire stack of components; the UI (product and admin),the collector, controller, analytics, and persistent layer.
- User Interface: Admin UI and Product UI
- Collector: Resource data collection initiated by adapters
- Controller: Determines mapping for data insertion and queries
- Analytics: Metric calculations, threshold processing, alert generation, and stats
- Persistence: Each node persists its partition of data to disk
Some of the exciting new features in vRealize Operations 6.0 include its scalability and resiliency, uniform UI and functionality, actionable alerts, automated remediation, and user definable views, dashboards, and reports.
There are a few node types with vRealize Operations 6.0; they include the master node, data node, replica node, and the remote collector node.
- Master Node: The initial, required node in the cluster. The master node manages all other nodes. In a single-node installation, the master node must also perform data collection and analysis because it is the sole node, and the only place where vRealize Operations Managers adapters are installed.
- Data Node: In large deployments, additional nodes have adapters installed to perform the collection and analysis. Large deployments usually include adapters only on data nodes, not on the master node or replica node.
- Replica Node: To enable high availability (HA), the cluster requires that you convert a data node into a replica of the master node.
- Remote Collector Node: Distributed deployments might require a remote collector node that can navigate firewalls, interface with a remote data source, reduce bandwidth across data centers, or reduce the load on the vRealize Operations Manager analytics cluster. Remote collectors only gather objects for the inventory, without storing data or performing analysis.
A common practice will be to add nodes to monitor an environment, as it grows larger. vRealize Operations can scale out to 8 nodes, which can collect 64 thousand objects, 20 million metrics, and can accommodate 32 concurrent connections.
vRealize Operations 6.0 uses Gemfire Locator for connection information to one or more Controller API GemFire Cache Servers. Gemfire is the glue that logically connects all the layers together. It creates a shared memory cluster and maps the work across the systems. Gemfire remoting provides the map reducing technology, Gemfire caching handles the analytics layer, Gemfire persistence handles sharding of the data.
Let's look at how data gets into a multi-node vRealize Operations deployment. When a new resource is discovered, it comes into the collector with its defined metrics and properties and is added to the system. An auto-discovery task is created and sent to the controller, the controller routes the new resource to the master node. It then takes the resource kind and sends it to the Global xDB database. Afterwards, a resource cache is created in the analytics region on one of the data nodes.
For new data, after a resource has already been discovered, the metrics information will come in through the collector and go right down to the resource cache in the analytics region, after the threshold processing is completed it is persisted in the FSDB. If there is an alert trigger, that will get persisted in the alarm alert database.
To end the post, I want to talk a little bit about availability; this is with the high availability (HA) feature turned on. When using the high availability feature, it is going to cut your node scalability in half because it requires twice the amount of resources. Instead of having an 8-node deployment, which can handle 20 million metrics, you can deploy 4 data nodes that handles 10 million metrics. The other 4 nodes will be replica nodes.
In the diagram below, you will see there is a primary copy and a secondary copy. At the analytics layer, R1 is the primary copy and R1' is the secondary copy. It is split up across nodes similar to the process you find with data persistence in RAID arrays. Gemfire is used to decide where the primary and secondary resources are going to go, it balances it out between the nodes. The same happens at the persistence layer, you will notice again, we have the R1 primary resource and there is a secondary R1' resource on an alternate node.
All the user customizations go into the Global xDB (U1-3), these would be the custom views, dashboards, and reports. That does not get shared out across the nodes.
For the master node, the replica node is used as a backup to handle failover internally. Database replication is used to sync the Global xDB database. The best practice is to put the master and master replica on separate hardware.
In my next post, we will start diving into all the changes that have been made into the UI.