Copy Data Management (CDM) is a relatively new IT category of solutions designed to manage the creation, use, distribution, retention, and clean-up of copies of production data -- or “copy data.” “Copy data” is the collective set of all data not currently being used in production (e.g., a snapshot, backup, vault, or replica of a version made for various IT or business functions—data recovery, Dev-Test, analytics or other business or operational functions).
Today CDM solutions are available from several players in the market, large and small, and investment in this area is accelerating.
The most easily observable problem with copy data is there is simply too much of it. The problem starts in the IT department, which has a core requirement for protection copies. Multiple versions of data must be maintained for recovery, and this includes local copies for operational recovery and remote copies for disaster recovery. But protection copies are just the beginning.
The largest consumers of data are business users. This group includes:
These groups and more all demand frequent access to data. Not only does this grow the storage footprint dramatically, it consumes large amounts of IT time since all these requests result in IT work, often involving several roles.
The problem has become acute. According to a recent study by analyst firm IDC, copy data will carry a cost of $50 billion by 2018. IDC estimates that copy data consumes up to 60 percent of current data growth, and there is no indication it will slow down.
The problem is far greater than the cost of the storage used to house the data. IT organizations expend significant time and energy keeping the complex status quo environment—defined by a hodgepodge of products, tools and scripts—up and running. A recent EMA paper demonstrated that 61% of operational costs can be saved by implementing a centralized CDM solution. Catalogic users often see processes that used to take many hours fall to minutes or less, based on the automation and self-service capabilities within ECX, our flagship CDM platform.
And finally, because of the time and effort spent on managing this complex environment that grows daily in size and complexity, IT is failing to meet the demands of its internal and external customers. Business demand for data uptime and data recovery are becoming more aggressive (i.e., the SLAs for RPO and RTO are getting shorter). Development teams are increasingly agile and focused on rapid software development, which places a significant demand (and strain) on the IT team to deliver the copies needed for the Dev-Test environment. Business units need access to recent copies of data frequently, in order to make near real-time decisions. The traditional IT infrastructure which has been designed for performance, uptime, and security is simply not equipped to meet all of these demands.
Together, these challenges reached a crescendo which is fueling a growing interest in CDM solutions. The ROI on an investment in a CDM solution can be enormous based on the ability to dramatically reduce capital and operational costs, and spark a level of agility and innovation that all organizations are seeking.
Problems in IT always fuel innovation… and big problems fuel significant innovation. A new category of CDM solutions has recently emerged to tackle the copy data crisis that is common across most IT environments. The potential impact for deploying a CDM solution is significant and can enable immediate and dramatic benefits. Don’t just take our word for it. eWeek described it well:
“With a CDM solution implemented correctly organizations will quickly feel the benefits. CDM enables organizations to discover, automate and optimize copy data to increase efficiency and streamline operations to reduce costs. By ensuring that the right copy data is on the right kind of storage in the right place and can be retrieved when needed, storage efficiency is maximized on both primary and protection storage. Fewer resources are required, reducing the storage burden, while at the same time enabling businesses to do more."
As this category is still new, there are less formal definitions than for some mature products, like data backup or data replication software. That said, a few common characteristics should exist in any true CDM offering, as outlined below:
One problem today is that even while many copies of data exist, it is difficult to access and use the right copy when needed. A CDM solution is intended to centralize the management of the copy data and therefore starts with a central catalog that tracks all copies across the environment, including remote sites and cloud environments, and maintains all of the vital metadata for each copy—the lineage, location, ownership and access rights, timestamps of the copy, related copies, completion status, etc.
The CDM solution allows IT to define a set of policies for creating copies of data to meet the needs of the various IT and business functions that need data access. Policy engines should be robust and flexible, allowing IT to tune the policies to their unique needs with precision. Because products in this category vary, the options for policies that IT can create differ amongst them. Options include: setting policies at the virtual machine level in addition to the storage volume or file share level; setting policies for using the data, including automated instantiation of working application environments built with the copy data; hooks into the applications to coordinate the copy and use process with the application; self-service capabilities; and others. Importantly, policies should cover the full lifecycle of the copy data, including the expiration, deletion, and return of resources to the available pool.
Today’s status quo environment allows users to create copies but does a dismal job of helping IT get a real-time status on the state of their environment. Did the ERP database replicate successfully to the DR site? Is the copy of the SQL database for the Dev team application consistent? How many copies of the financial database are there in my storage environment? CDM solutions should give the IT team the ability to run reports on schedule and on-demand, allowing the IT team to know the real-time status daily of their copy environment, and to make good decisions and take key actions to address problems or clean up old unused copies.
Many options exist amongst the diverse set of CDM offerings. Some key things IT should look for when analyzing solutions:
Data Protection — The ability to instantiate a copy in case the primary copy is lost or damaged. Daily automated protection policies make up the core of CDM solutions, but one can’t predict when the copies will be needed. Moreover, when the copies are needed it is usually a time of high stress for the team (think fire or flood in the data center, a database table lost, and so on). Thus, the CDM needs to give IT the ability to quickly identify the right copy, verify its consistency, and instantiate it rapidly at the push of a button. Nobody wants complexity when the clock is running.
Disaster Recovery — Data protection’s extension to a remote site, to ensure data availability in case of a data-center-wide outage. CDMs need to give IT quick access to the right copies, allowing for rapid recovery when needed. Moreover, the CDM should allow for frequent validation that the operation is DR-ready by allowing IT to automatically instantiate the remote environment periodically, and to do so without any disruption to production systems.
Development and Test Data Access — During the software development cycle, very often access to production data sets is needed to verify certain functionality can integrate as expected with the current systems of record. Similarly, as an application nears readiness, it needs a full QA cycle that will require acting in conjunction with live copies of production data. CDM systems can automate the creation and delivery of copies on a schedule or on-demand. Self-service capabilities and role-based access control (RBAC) in some CDMs allow the IT department to off-load data requisitions to the dev and test teams. The result is usually happier developers and faster software product cycles.
CDM can be a powerful enabler of the hybrid cloud, allowing IT to take advantage of cloud compute resources while maintaining control and without disruptive changes to the existing environment. CDM can help move data to the cloud, and should give the ability to bring up live application environments in order to deliver disaster recovery, Test and Dev, analytics or other key functions while leveraging the less expensive, elastic compute infrastructure in the cloud.
Organizations are increasingly moving toward DevOps methods in order to increase speed and agility, with the goal of faster delivery of new applications to market. CDM can play an important role in giving IT access to live running environments using fresh copies of data from the “systems of record,” and allowing access via APIs to make the use of these copies become a natural extension of the development process. Look for integration with popular DevOps tools like Chef, Puppet, Jenkins and so on.