Monday, December 28, 2015

EMC ScaleIO Overview

EMC ScaleIO is a flexible software-only solution that leverages host-based internal storage media to make a scalable virtual storage pool.

In that respect, there are three primary components that make up EMC ScaleIO:

  • ScaleIO Data Client (SDC)
  • ScaleIO Data Server (SDS)
  • Metadata Manager (MDM)

The ScaleIO Data Client (SDC) is a block device driver that exposes ScaleIO storage volumes to applications. The SDC runs locally on any application server that requires access to the block storage volumes. The blocks that the SDC exposes can be blocks from any device in the ScaleIO storage pool. This enables the local application to issue an I/O request and the SDC fulfills it regardless of where the particular blocks reside.

The ScaleIO Data Server (SDS) possesses local storage that contributes to the ScaleIO storage pools. An instance of the SDS runs on every server that contributes some or all of its local storage space. The role of the SDS is to perform I/O operations as requested by an SDC on the local or another server within the cluster.

The Metadata Manager (MDM) holds the cluster-wide mapping information and is responsible for decisions regarding migration, rebuilds, and all system-related functions. It manages the ScaleIO system. The MDM is installed on at least three servers and functions as a quorum; a primary MDM server, a secondary MDM server, and a tie-breaker. The ScaleIO monitoring dashboard communicates with the MDM to retrieve system information for display in the ScaleIO GUI. The MDM is not on the ScaleIO data path, reads and writes never traverse the MDM.

In a VMware environment, you use the ScaleIO vSphere plug-in deployment wizard to install the MDM and SDS components on a dedicated ScaleIO virtual machine (SVM); whereas the SDC is installed directly on the ESX host. This is something new as of version 1.31, the SDC is installed inside the hypervisor using a vSphere Installation Bundle (VIB). 


In an ESX environment, you need to decide how devices are going to be added to the SVM. You can add the devices using RDM mapping, a device is created on the SVM that points to the physical disk on the ESX; or you can create it with a VMDK, and the VMDK is added to the SVM. ScaleIO requires thick provisioning for VMDK, so this process can take a long time. VCE highly recommends going with RDM mappings in their installation guide.

To access EMC ScaleIO you can use the vSphere Web Client plug-in or the Scale IO GUI, the plug-in communicates with the MDM node and the vSphere server, enabling you to view components as well as to perform configuration and provision tasks right from within the vCenter environment.


There are five elements that make up ScaleIO’s virtual storage pool:

  • Protection Domains
  • Storage Pools
  • Fault Sets
  • Volumes
  • Chunks

EMC ScaleIO uses the concept of protection domains, which is a unique set of ScaleIO Data Servers (SDSs) grouped together for reliability and tenancy separation. In the diagram below, we see there are four protection domains which include a Dev/Test Environment, VDI Desktops and SQL DB, Oracle and SAP HANA, and a NoSQL DB environment. Protection domains are logical groupings, such as business specific tenants, application groups, or geographic separation. 


In the vSphere Web Client plug-in, you add a new protection domain by creating a name and selecting the size of the RAM Read Cache per SDS. The RAM Read Cache is the memory that is reserved for caching storage devices in the storage pool. ScaleIO does not have auto-tiering between different storage media (SDD and HDD), instead it uses the RAM Read Cache to help accelerate read requests.

By default, all volumes and storage pools are configured to use caching, and all SDSs have caching enabled using a cache size of 128 MB. You can configure individual SDSs in the storage pool with different RAM values, or disable caching completely. The maximum amount of RAM cache is 128 GB.

As of version 1.31, RAM Read Cache can be enabled for an entire SDS, a storage pool, or a specific volume.


A storage pool is a sub-set of physical storage devices in a protection domain. Each storage device can only belong to one storage pool. Storage pools allow the creation of different storage tiers in the ScaleIO system. For illustration, the SSD drives can be pooled together in Volume 2 to create Tier 1 storage services and your spinning media could be your Tier 2 storage pool. 



A fault set is a logical entity that ensures that SDS data is backed up on other hosts that belong to alternate fault sets, thus preventing double-point-of-failure scenarios if rack power outages occur. This is similar to Virtual SANs fault domains.


A volume consists of multiple blocks spread evenly on a storage pool’s devices. A single volume is divided into chunks. These chunks will be stripped on physical disks throughout the storage pool, in a balanced and random manner.

Each volume block has two copies located on two different SDSs, this is called ScaleIO mesh mirroring. This allows the system to maintain data availably following a single-point failure. The data will still be available following multiple failures, as long as each failure took place in a different storage pool.

In the diagram below, we can see that there are two copies of each chunk on different SDS nodes. SDS 1 contains chunk F from Volume 1, and Volume 2 contains chunk E and B. 



To illustrate how mesh mirroring protects in the event of a failure, when SDS 1 goes down it begins a forward rebuild with the chunks on the failed device. The degraded chunks will be rebuilt on a new SDS node in a balanced way to help accelerate performance. Chunk F from Volume 1 is recreated on SDS 5, chunk B is recreated on SDS 100, and chunk E is recreated on SDS 4. 


To create a storage pool in the vSphere Web Client, you simply give the storage pool a name and select the appropriate protection domain.



Next we select the hosts that will participate in the protection domain and provide storage capacity as SDS resources.



The below screen shows the devices whose free space can be added to the selected SDS. To add a device’s space, select the Add Device box and choose the storage pool it will be assigned to.



After everything is configured, you can drill down into the summary of the protection domain and all its related objects.



In addition to the vSphere plug-in, you can use the Scale IO GUI. You access the ScaleIO GUI by pointing to the IP address or host name of the MDM node.


The ScaleIO GUI is very intuitive, the main dashboard provides an overview of the ScaleIO environment, including the total amount of capacity, I/O workload, mapped volumes, protection domains and storage pools, SDCs, SDSs and the number of devices, and the MDM cluster status under management.

The main dashboard also indicates if ScaleIO is currently rebuilding RAID 1 data. A rebuild is usually a result of a recovery due to failure of a server or a storage device.

The backend view provides detailed information about objects in the system, and lets you perform various configuration operations.

The Command menu button on the backend veiw displays a list of commands which you can perform on rows selected in the table. The contents of the Command menu differ, depending on the object selected in the table. Many of the commands can also be accessed from the context-sensitive menu when table rows are right-clicked. 

The different commands include:
  • Overview
  • Capacity Usage
  • Capacity Health
  • Rebuild and Rebalance
  • Application I/O
  • Overall I/O
  • I/O Bandwidth
  • State Summary
  • Configuration
  • Device Details
  • RAM Read Cache
  • Rebuild and Rebalance (Detailed)
  • Planned Rebuilds (Advanced)
  • Planned Rebalancing (Advanced)
  • Rebuild I/O Priority (Advanced)
  • Rebalance I/O Priority (Advanced)
  • Network Throttling (Advanced)
  • RAM Read Cache (Internal)




The Alerts view provides a list of the alert messages currently active in the system, in table format. You can filter the table rows according to alert severity, and according to object types in the system.


The ScaleIO system includes advanced I/O priorities and bandwidth settings, which can be used to fine-tune system performance. The number of concurrent rebuild and rebalance jobs can be configured, and bandwidth for rebalance jobs can be adjusted. I/O prioritization is configured per storage pool. 

Network throttling affects network limits, and is used to control the flow of traffic over the network. It is configured per protection domain. The SDS nodes transfer data between themselves. This data consists of user-data being replicated as part of the RAID protection, and data copied for internal rebalancing and recovery from failures. You can modify the balance between these types of data loads by limiting the data copy bandwidth. This change affects all SDSs in the specified protection domain. 

News: Top vBlog 2016 Trending: DRS Advanced Settings