The Copy Data Management (CDM) space has gotten quite a bit of attention this year. Storage industry leaders Dell EMC and IBM both announced CDM offerings. A number of start-up players are growing rapidly, Catalogic Software among them! But there remains a fundamental difference in approach among the up-and-comers: should you use in-place CDM or go with an out-of-band, appliance-based model? Let’s look at the differences, but first let’s agree on what CDM means in the first place.
Long story short, Copy Data Management is managing the creation, use, distribution, retention and clean-up of data copies. Different feature sets may or may not be added on top; things like automation, user self-service options, REST APIs for DevOps integration, reporting and so on (Catalogic offers all of these features).
The goal of CDM is to eliminate the many manual steps currently used to create and deliver copies – like getting rid of scripting – and to centralize operations into one tool rather than several. The goal being greater efficiency in storage consumption and – much more importantly – dramatically lowering operational overhead and improving reliability.
In the Out-of-Band (OoB) model, data is moved from the source production storage to a target storage device; that is, a separate storage stack different from the primary. The data is typically moved by a client agent residing on the host server. The copy method is usually some kind of changed-block model. While this reduces the host impact compared to traditional backup, the impact is not zero as it will affect CPU utilization and host bandwidth.
The target device is usually an appliance, though some vendors offer “bring your own storage” software versions as well in the form of a virtualized appliance. It is important to realize that the appliance is a separate storage device with its own controller and firmware. We’ll get back to that later.
When using the data copies (for reporting, dev-test, etc.), a workload server connects to the OoB appliance. If changes need to go back to the production side, they would be copied back to production over the network. Multiple workload servers can connect to a single copy, which is parceled out via snapshot images. This is how efficiency benefits are realized. But performance characteristics of the OoB appliance of course will be different than the production storage.
The out-of-band approach can be visualized as follows:
Since Catalogic is an in-place CDM vendor, we’ll discuss this approach in specific Catalogic terms rather than generic terms.
Catalogic’s in-place approach makes use of primary storage copy processes -- snapshots, replication, and clones – as well as any data reduction inherent in the array (deduplication, compression). Catalogic ECX functions as a control plane and doesn’t replace any storage features; it makes use of them, providing features like automation, self-service, application integration (for data consistency, log management, etc.), hypervisor integration, and so on. There is no separate storage device involved. Catalogic communicates to both the host server and the production storage, driving copy processes.
In this model, there is no data movement as storage snapshots are created in-place. Workload server access is provided by using the storage cloning features and connecting the snapshot clone to the server. With the in-place approach, recoveries or data copying are far quicker because little or no data movement is involved, and nothing traverses the network.
The benefits of this approach are apparent.
We should mention that when EMC entered the CDM market, they chose the in-place approach pioneered by Catalogic (they call it “integrated CDM”). IBM also uses this approach.
There is one aspect of the in-place model that should be discussed. Not everyone is comfortable with running non-production workloads on production storage. Even though all-flash arrays have plenty of IOPs to spare and can run non-prod workloads without performance impact (you can’t say that for spinning disk), many organizations simply won’t allow non-prod work to happen on prod storage. Fair enough!
We have an easy solution for that. Since Catalogic ECX can manage replication as well as snapshots, it’s simple enough to replicate to an alternate array (can be in the same storage rack or physically separated) and run non-prod workloads on the alternate array.
In this way, you have complete array separation. A major benefit of this approach is that by using array replication, you maintain all the data reduction from the primary array, thereby greatly decreasing the amount of data replicated. And nothing goes through the host, which limits network and host impact and reduces complexity.
There’s something else to consider, as we learned at a customer. Consider the case of Dev-Test. In the out-of-band model, your Dev work is being done on a storage system that is fundamentally different than your production storage. It has different software, a different file system, different disks, which all equate to different performance characteristics. So if you are doing development work on a different system, when you move the application back to production you can’t be certain it will work properly. An element of risk remains. And when something does go wrong, it inevitably leads to finger-pointing as development blames IT for an infrastructure problem and IT blames development for problems with their code. Troubleshooting this kind of problem can be nightmarish, because the source of the problem could be anywhere on either side of the equation.
In the Catalogic approach, the same array type is used in prod and non-prod, so all your Dev work is done on a storage stack that is 100% identical to production. Even the copy processes use the native array software. Catalogic does not introduce any software changes; it only manages the process by talking to the array APIs. This is no-risk infrastructure model for development.
In some ways, this choice in approaches reminds me of the early days of VTLs and deduplication. There was a huge battle about the better approach: inline dedupe during backup, or post-processing dedupe after backup completes. The battle raged for a few years until hardware improvements made it impossible to argue anymore: in-line dedupe won, because it was clearly more efficient. I think we’re seeing a similar shakeup in the copy data space. Out-of-band was the first approach that came to market, but in-place is proving to be the better choice (as witness both EMC and IBM using it – two companies that know a thing or two about storage).
The differences are straightforward. The out-of-band model requires you to deploy an entirely different storage hardware/software stack with different operational and performance behaviors. And incidentally, the appliances are often quite expensive. But the costs are more than the initial purchase price. You must manage and maintain a separate storage environment just for the sake of your copies. And you also have to consider scale. We’ve seen users with thousands of database instances, each of which spins off 8 or 10 copies for non-prod workloads. How many copy appliances will you need to deliver on this kind of workload demand?
Catalogic’s in-place approach lets you maximize the storage investment you’ve already made (investments in both money and operational know-how). If required by budget limits and allowed by policy, you can in fact run non-production workloads on the production array. That is, with an easy, non-disruptive software tool you can bring automation, self-service and other operational enhancements to your existing storage footprint. If you need to deploy a second array (or third, fourth, fifth…), Catalogic fully manages operations across nodes: production array snapshots for recovery (scheduling, retention, easy recovery workflows, etc.) and replication (scheduling, retention, easy use of data at the second node, etc.) to put data on a separate frame.
Your choices in a copy data management approach are clear, and we think the better choice is clear as well.