Copy data management, as you might expect, concerns itself with creating and managing copies of data. What's important to bear in mind is that this isn't all there is to copy data management. The critical piece that is often overlooked is that more than creating and managing copies of data, copy data management is about making that data useful.
Let's consider a virtual machine to be the data in question. Copy data management can and does deal with other forms of data, but virtual machines are the primary containers of data in today's datacenters, so we'll start there.
Making a copy of a virtual machine and then turning it on isn't a particularly good idea. A perfect copy will have the same data in the virtual machine as well as the same configuration. This means the same name at the virtualization management layer, the same name at the operating system level as well as many other unique elements ranging from the MAC address of the network cards to the unique IDs of the hard drives, operating systems and applications.
A perfect copy of a virtual machine can't be registered with the same virtualization management software; it doesn't like having two VMs of the same name. Similarly, two VMs of the same name on the same subnet makes Windows very upset, and DNS registration can get tricky as well. Unless the copy is designed to work as part of a cluster with the original VM, data intended for the original VM could start showing up at the clone and vice versa. All in all, not good.
To be made useful copies of VMs have to go through a process known as genericization. The degree to which the virtual machine is genericized depends on intended purpose and the context under which the copy is to be deployed.
A copy of a VM made for disaster recovery purposes may need no genericization, or it may need only its network address changed. A copy of a golden master, template or development environment may need all systemwide unique IDs, names and addresses changed, as well as license keys wiped so that new ones can be entered.
Today's copy data management software has to deal with all of this. It is part of what sets it apart from simple backups. It has been quite some time since simply taking a copy of data or a VM was "good enough": in the real world we make copies of data because we eventually want to do something with those copies, so the ability to manipulate basic configuration aspects before bringing the copy online is both critical and fundamental.
It is important to bear in mind that copy data management isn't just backups. It certainly can be, but making copies of data and stashing it in a corner somewhere is such a fractional aspect of functionality that it is almost irrelevant, in the grand scheme of things.
Copy data management is as much about packaging data together as it is making copies. A single service might consist of a master and slave pair of databases, file storage, several web servers, a load balancer, a security system and a firewall. This could be packaged together as a single entity, snapshot, cloned, replicated and copied to dev & test as a unit.
Indeed, an entire department's worth of data can be handled this way, or workloads could be broken into tiers, with each tier getting a different treatment regarding data protection, offsite replication and dev & test availability.
Copy data management then is about data lifecycle management. Monitoring all copies that are in play, creating copies as needed, modifying copies to be useful and then bringing them online and providing the means to do this in an orchestrated, automatable and easy to use fashion. It isn't easy, but it is increasingly a basic requirement of the modern datacenter.
Trevor Pott is a guest writer with Catalogic Software. Trevor is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley start-ups better understand systems administrators and how to sell to them. He currently pens a weekly column for The Register; one of the world’s largest online science and technology magazines, with monthly readership of 7.2 mil. people worldwide.Trevor can be found at http://www.egeek.ca/ for those looking to engage his jedi-like guidance.