Data is the foundation of business advantage in today’s economy. Analytics and artificial intelligence (AI) are helping businesses to uncover new competitive opportunities and to operate in a more efficient and streamlined fashion. At the same time, requirements for data privacy are higher than ever before, because consumers are becoming more discerning about how their information is used, and stringent data privacy regulations are emerging globally. Simply put, it is mission-critical to the business that data be available, accurate, consistent and secure.
The problem is that legacy storage architectures force tradeoffs between cost, data resilience, protection, and performance. Across this blog series, we will explore how StorOne has rewritten the storage algorithms to eliminate these tradeoffs, enabling customers to achieve complete data protection without sacrificing performance or breaking the budget. In this blog, we will level set on what data resilience and data protection are, why they matter, and the challenges in achieving them with legacy storage architectures.
What are Data Resiliency and Data Protection?
Data resiliency and data protection are similar in the sense that they both serve to preserve and make data available in the event that it is compromised. As a result, both are important when it comes to ensuring data integrity – that data is accurate and readily available when the user needs it. The major differences lie in the time it takes to restore the data (the recovery time objective, or RTO), the point in time in which the data is restored to (the recovery point objective, or RPO), and the types of data outages that are protected against.
Data resiliency effectively is the ability of an IT system or an entire data center to return quickly to production after a disruption. Data resiliency is typically architected into the design of the IT system or data center itself. For example, erasure coding (the process of breaking a file up into redundant fragments and storing those fragments across multiple drives), parity (or redundant) drives, and storage mirroring (whereby logical storage volumes are copied and maintained across multiple disks) facilitate data availability in the event of a storage drive or node failure. Storage controllers can be deployed in active-active configurations, leaving one node continuously on standby to process input/output (I/O) operations in the event that the other nodes goes down. Also, non-volatile dual in-line memory modules (NVDIMMs) can be used to protect against data loss in the event of a power outage, because NVDIMMs retain their content when offline.
Data resiliency is designed to facilitate near-zero RPOs and RTOs – that is, continuous availability, and instantaneous recovery to the moment in time immediately before an outage occurred. At the same time, however, data is still at risk from hardware failures, human errors, malicious attacks such as malware, and software corruption. This is where data protection comes in.
Data protection technologies protect against these data loss risks, but at the expense of having longer RTOs (typically hours or even days), and RPOs that are less granular preceding the outage (for example, data might be able to be restored from the previous night’s backup, as opposed to being restored to its exact state immediately preceding the outage). Example storage data protection technologies include snapshots (which mark the state of an entire storage volume at a specific point in time, and then mark subsequent changes to that storage volume over time) and clones (a snapshot that captures an entire storage volume from a specific point in time, rather than capturing only changes to the storage volume).
The Problem with Enabling Data Resiliency and Data Protection
With traditional storage architectures, snapshots, RAID, mirroring, replication and other data resiliency and protection technologies all come at a significant price of input/output operations per second (IOPS) and throughput. That is to say, they are heavily computationally intensive, and when they are running, they dramatically slow the workload’s performance. As a result, the customer is forced to choose between data quality, which is fundamental to the business’ success, performance (which is also very important to business operations) and cost. They either cannot achieve the levels of performance that their premium storage resources are capable of, or they are forced into expensive workarounds such as adding customized application-specific integrated circuits (ASICs) or dynamic random-access memory (DRAM) caching.
Computationally-intensive data resilience and protection capabilities are becoming more and more business critical, during a time in which maximizing CPU and storage capacity and memory utilization is also becoming paramount. A re-write of core storage algorithms is needed to deliver data integrity, while keeping storage infrastructure costs and complexities in check and ensuring required levels of performance. In our next blog, we will discuss snapshots specifically in more detail.