What is Copy Data Management?

Create, optimize and manage copies of production data.

Copy Data Management (CDM) is an IT category of solutions designed to manage the creation, use, distribution, retention, and clean-up of copies of production data -- or “copy data.”  “Copy data” is the collective set of all data not currently being used in production (e.g., a snapshot, backup, vault, or replica made for various IT or business functions—data recovery, Dev-Test, analytics or other business or operational functions). 

CDM resolves a multi-faceted IT problem:

  1. Copy data consumes the majority of storage capacity and is growing at a multiple of the production data environment (as much as 20x)
  2. Demand for access to recent copies of production data is soaring, driven both by traditional IT functions and by new business use cases
  3. Until recently, there have not been central solutions to manage creating and distributing copies, leaving most IT organizations with a complex mish-mash of scripts and vendor tools without centralized control   

The Copy Data Challenge

The most easily observable problem with copy data is there is simply too much of it. The problem starts in the IT department, which has a core requirement for protection copies. Multiple versions of data must be maintained for recovery, and this includes local copies for operational recovery and remote copies for disaster recovery. But protection copies are just the beginning. 

The largest consumers of data are business users. This group includes:

  • Software development and test
  • Software quality assurance (QA) and acceptance testing
  • Application patch and sandbox testing
  • Reporting and analytics
  • Legal compliance
  • Internal system training 

These groups and more all demand frequent access to data. Not only does this grow the storage footprint dramatically, it consumes large amounts of IT time since all these requests result in IT work, often involving several roles. 

Addressing the Copy Data Problem

Problems in IT always fuel innovation… and big problems fuel significant innovation. A new category of CDM solutions has recently emerged to tackle the copy data crisis that is common across most IT environments.

As this category is still new, there are less formal definitions than for some mature products, like data backup or data replication software. That said, a few common characteristics should exist in any true CDM offering, as outlined below: 

Central Catalog

One problem today is that even while many copies of data exist, it is difficult to access and use the right copy when needed. A CDM solution is intended to centralize the management of the copy data and therefore starts with a central catalog that tracks all copies across the environment, including remote sites and cloud environments, and maintains all of the vital metadata for each copy—the lineage, location, ownership and access rights, timestamps of the copy, related copies, completion status, etc.  

Policy Engine 

The CDM solution allows IT to define a set of policies for creating copies of data to meet the needs of the various IT and business functions that need data access.  Policy engines should be robust and flexible, allowing IT to tune the policies to their unique needs with precision. Importantly, policies should cover the full lifecycle of the copy data, including the expiration, deletion, and return of resources to the available pool.  

Robust Reporting

Today’s status quo environment allows users to create copies but does a dismal job of helping IT get a real-time status on the state of their environment. Did the ERP database replicate successfully to the DR site?  Is the copy of the SQL database for the Dev team application consistent? How many copies of the financial database are there in my storage environment? CDM solutions should give the IT team the ability to run reports on schedule and on-demand, allowing the IT team to know the real-time status daily of their copy environment, and to make good decisions and take key actions to address problems or clean up old unused copies.  


Many options exist amongst the diverse set of CDM offerings.  Some key things IT should look for when analyzing solutions:

  • Does it provide copy data management for cloud environments in addition to the data center environments?
  • Does it work with existing storage and virtual infrastructure in the data center, calling on these existing solutions to perform key copy services (like snapshot or replication), while providing the central management and control of the entire environment?
  • Does it make available RESTful APIs, allowing other management or development tools to access it, creating CDM workflows that integrate into existing processes and practices?
  • Does it allow for database integration, ensuring careful coordination with these (often difficult) workloads to ensure maximum uptime and availability, while enabling better copy data management and access?

Traditional IT use cases helped by CDM

Data Protection — The ability to instantiate a copy in case the primary copy is lost or damaged. Daily automated protection policies make up the core of CDM solutions, but one can’t predict when the copies will be needed.  Moreover,  when the copies are needed it is usually a time of high stress for the team (think fire or flood in the data center, a database table lost, and so on). Thus, the CDM needs to give IT the ability to quickly identify the right copy, verify its consistency, and instantiate it rapidly at the push of a button. Nobody wants complexity when the clock is running.

Disaster Recovery — Data protection’s extension to a remote site, to ensure data availability in case of a data-center-wide outage. CDM should integrate with array replication processes and leverage their replication engines and data reduction functions.

Development and Test Data Access — During the software development cycle, access to production data sets is needed to verify certain functionality can integrate as expected with the current systems of record.  Similarly, as an application nears readiness, it needs a full QA cycle that will require acting in conjunction with live copies of production data.  CDM systems can automate the creation and delivery of copies on a schedule or on-demand.  Self-service capabilities and role-based access control (RBAC) in some CDMs allow the IT department to off-load data requisitions to the dev and test teams. The result is usually happier developers and faster software product cycles.   

In addition to core IT functions, many transformational IT initiatives rely on access to copy data:

Hybrid Cloud

CDM can be a powerful enabler of the hybrid cloud, allowing IT to take advantage of cloud compute resources while maintaining control and without disruptive changes to the existing environment. CDM can help move data to the cloud, and should give the ability to bring up live application environments in order to deliver disaster recovery, Test and Dev, analytics or other key functions while leveraging the less expensive, elastic compute infrastructure in the cloud.

DevOps

Organizations are increasingly moving toward DevOps methods in order to increase speed and agility, with the goal of faster delivery of new applications to market. CDM can play an important role in giving IT access to live running environments using fresh copies of data from the “systems of record,” and allowing access via APIs to make the use of these copies become a natural extension of the development process. Look for integration with popular DevOps tools like Chef, Puppet, Jenkins and so on. 

Check out the ECX product page for more details on use cases