Rethinking Snapshots to Accelerate Performance

Rethinking Snapshots to Accelerate Performance

Gal Turchinski Posted by Gal Turchinski
on June 17, 2019

Previously, we discussed the challenges inherent in providing the strong levels of data protection that are required today. Specifically, outdated storage architectures require application performance to be sacrificed and budgets to be exceeded, in order to obtain acceptable levels of data protection and resiliency. Of data protection capabilities, snapshots are the most CPU and memory-intensive, and as a result are a leading culprit of performance slowdowns. Throughout this installment, we will discuss how StorOne has rewritten snapshots to address this challenge.

The Problem with Traditional Snapshots

Storage snapshots encapsulate the state of a storage system, logical unit number (LUN), volume or virtual machine at a specific point in time. Effectively, they are a virtual, read-only, point-in-time copy of a specific data set. They are important when it comes to ensuring data consistency, and also when it comes to accelerating backup and recovery times and reducing the amount of storage capacity required as compared to taking completing full backups of very large storage systems. The latter is especially true as recovery time objectives (RTOs, the total amount of downtime resulting from a data outage) and recovery point objectives (RPOs, the point in time to which data is restored) become more stringent.

The problem is that storage snapshots halt the storage system’s input/output operations (IOPS) and throughput when they occur. These “halts” can impact application performance. Organizations typically execute hundreds of snapshots per day in order to meet aggressive RTOs and RPOs that application owners and users demand. As a result, storage managers must choose between more snapshots and greater data protection but significantly lower storage IOPS and throughput, or faster performance and lower data protection.

How StorOne Is Different

At StorOne, we wrote the S1 Unified Enterprise Storage (UES) system to enable an unlimited quantity and frequency of snapshots to be captured, without degradation to performance of storage IOPS or throughput, by minimizing the impact of each snapshot on the CPU and storage memory. Specifically, our snapshots are policy-based, redirect-on-write (ROW), as opposed to the copy-on-write (COW) snapshots that are more common. Whereas COW snapshots require write requests to be cached when a snapshot is occurring, ROW snapshots enable changes to be written separately, and adjust the pointers to the original data when the snapshot has completed. As a result, S1 increases computations per snapshot by as much as 10x when compared to most storage and hypervisor snapshot technologies, thus improving CPU and memory utilization by an average of 90%. To further save on capacity, snapshots are thin provisioned, and users can change retention requirements as needed. As a result, the customer can position to meet demanding, near zero RPOs by saving millions of snapshots on a single volume, without requiring tradeoffs to system performance and while keeping storage capacity requirements in check.

Conclusion

Data protection of any form can impact performance. Snapshots and their management are a key consumer of the resources needed to deliver and sustain high performance. However, there are several other key capabilities of data protection and resiliency that must also be revamped to optimize value for customers. In our next blog, we will shift gears to discuss how to use volume-level erasure coding instead of RAID, to increase data resiliency and to accelerate storage rebuilds without taxing storage I/O and throughput performance.

Topics: news, data storage