Are All Storage Snapshots Equal? Seven Things To Consider Posted on May 5th, 2017 by Sathya Sankaran

Categories: Data Backup, Data Recovery, ECX

Every primary storage vendor offers the ability to snapshot, replicate and/or vault datasets. But when it comes to end users, snapshots are like gym memberships: everyone’s got them, but very few use them. At Catalogic, our software is uniquely designed and developed to take advantage of these native capabilities of your storage arrays, but we are aware that (like body types and skill levels) not all snapshots are the same. The technology needs good genes, just as much as good form and the ability to perform and persevere.

When choosing primary storage with snapshots in mind, we recommend to our customers that you build your own scorecard of what’s important to you and how the various storage arrays compete on each of these variables. Once you make that decision and procure a platform, we hope it’s one of many that Catalogic enables to maximize your storage investments by leveraging these snapshots.

To help you with your decision process, here are some key parameters to consider for your storage snapshot scorecard. The importance of these variables may vary for your specific data types. However, it would be good to remember that the alternative to using storage snapshots and native array replication is the need for:

  1. A dedicated hardware appliance.
  2. A new hardware stack with different management needs, differing performance characteristics and most likely a system never tuned to run your production at scale.
  3. An infrastructure model where the secondary doesn’t scale with your primary – exacerbated when you use a software-only appliance model.
  4. Higher impact on your servers and applications for your backups, because the host systems will be involved in data movement.
  5. A process that will have to rehydrate, decrypt, re-encrypt and re-compress datasets as part of data movement. That’s a whole lot of effort spent for a net zero effect.
  6. Almost always higher cost compared to maximizing investments already made.
  7. More importantly, a new vendor lock-in introduced and designed to milk the customer forever.

Choose this approach at your own peril. Let’s get back to Storage Snapshot Scorecard. Here’s what our experience shows us is important:

1. Performance Impact

There are essentially two major types of snapshots: copy-on-write and redirect-on-write snapshots. Differences between them and a few other variations are well documented in this article. The key to selecting the right type of snaps includes:

  • Low read performance impact
  • Low write performance impact
  • Mitigation options for performance impact, if any
  • Minimal snapshot consolidation overhead
  • Consistent performance of the array when close to full capacity
2. Scale Factors and Max Limits

Every snapshot technology has its boundaries. If a vendor says there is no technical limit, always ask what the trade-off is. Typical limits to factor into your selection would include:

  • Maximum number of volumes
  • Maximum number of clones
  • Maximum number of snapshots
  • Maximum number of consistency groups
  • Maximum number of concurrent mounts
  • Minimum space reserve
3. Space Efficiency Impact

The impact of snapshots on deduplication is not well documented by vendors. Some have truly global deduplication and compression. Some limit it to volumes, and some limit it to snapshots within volumes. This certainly impacts deduplication and compression efficiency. If your changes are pretty much the exact same every day, the snapshot should dedupe itself to just one copy with the right technology. The keys to measure are:

  • Space savings on Day 0
  • Space savings after N snapshots with representative amount of changed data. A suggested value for N is 30.
  • Space savings when multiple versions are in use from the same underlying snapshot.
4. Chaining and Concatenation
In test-dev processes as well as in continuous integration, continuous deployment (CICD) DevOps processes, there is a tremendous emphasis on path dependencies. How you got to a certain point may have an impact on how the application behaves at that point to the same input data. This is often tested by branching, versioning, and rewinding your test data sets. The snapshot technology needs to allow chaining and concatenating these snapshots to allow this to happen. Factors to check are:
  • Can I snapshot a snapshot clone? Some vendors call this a 2nd gen snapshot.
  • How many generations of snapshots can be created?
  • What is the performance impact with each generation?
  • What is the Impact of deleting parent snapshots on the child versions?
5. Replication Carryovers

Replication of snapshots is key to having an off-box copy of your critical data.  The key factors to consider are:

  • One-to-Many and Many-to-One capabilities
  • Does the replication target volume support snapshots?
  • RPOs possible within the framework.
  • Ability to replicate to low performance, lower cost storage.
  • Ability to retain compression/dedupe/encryption etc. during replication.
  • Does replication support different retention compared to primary storage?
  • Does the replication target behave the same way as the primary array for restores?
6. Software Costs

Nothing is free, but everything is acceptable as long as it is not hidden and the pricing is transparent. Sometimes the point of sale discounting to add-on discounting can significantly vary, so it’s best to understand the costs involved from all fronts. Things to consider the costs of:

  • Snapshot licenses
  • Mirror licenses
  • Unlimited clone licenses
  • Management software licenses
7. Miscellaneous

There are always items that don’t fall into a category, but for some organizations, these may need to be separate categories as well. A few to think about are:

  • At-rest encryption on snapshots and whether replication retains encryption during data transfer.
  • Can you restore from snapshots, revert a snapshot and mount snapshots as read/write copies?
  • Availability of a robust set of [REST] APIs for automation.
  • Ability to manage snapshots across multiple arrays/storage vendors on-premises and in-cloud (wink, wink… there is always a plug somewhere).