Data is the foundation of business advantage in today’s economy. Analytics and artificial intelligence (AI) are helping businesses to uncover new competitive opportunities and to operate in a more efficient and streamlined fashion. At the same time, requirements for data privacy are higher than ever before, because consumers are becoming more discerning about how their information is used, and stringent data privacy regulations are emerging globally. Simply put, it is mission-critical to the business that data be available, accurate, consistent and secure.
The problem is that legacy storage architectures force tradeoffs between cost, data resilience, protection, and performance. Across this blog series, we will explore how StorOne has rewritten the storage algorithms to eliminate these tradeoffs, enabling customers to achieve complete data protection without sacrificing performance or breaking the budget. In this blog, we will level set on what data resilience and data protection are, why they matter, and the challenges in achieving them with legacy storage architectures.
What are Data Resiliency and Data Protection?
Data resiliency and data protection are similar in the sense that they both serve to preserve and make data available in the event that it is compromised. As a result, both are important when it comes to ensuring data integrity – that data is accurate and readily available when the user needs it. The major differences lie in the time it takes to restore the data (the recovery time objective, or RTO), the point in time in which the data is restored to (the recovery point objective, or RPO), and the types of data outages that are protected against.
Data resiliency effectively is the ability of an IT system or an entire data center to return quickly to production after a disruption. Data resiliency is typically architected into the design of the IT system or data center itself. For example, erasure coding (the process of breaking a file up into redundant fragments and storing those fragments across multiple drives), parity (or redundant) drives, and storage mirroring (whereby logical storage volumes are copied and maintained across multiple disks) facilitate data availability in the event of a storage drive or node failure. Storage controllers can be deployed in active-active configurations, leaving one node continuously on standby to process input/output (I/O) operations in the event that the other nodes goes down. Also, non-volatile dual in-line memory modules (NVDIMMs) can be used to protect against data loss in the event of a power outage, because NVDIMMs retain their content when offline.
Data resiliency is designed to facilitate near-zero RPOs and RTOs – that is, continuous availability, and instantaneous recovery to the moment in time immediately before an outage occurred. At the same time, however, data is still at risk from hardware failures, human errors, malicious attacks such as malware, and software corruption. This is where data protection comes in.
Data protection technologies protect against these data loss risks, but at the expense of having longer RTOs (typically hours or even days), and RPOs that are less granular preceding the outage (for example, data might be able to be restored from the previous night’s backup, as opposed to being restored to its exact state immediately preceding the outage). Example storage data protection technologies include snapshots (which mark the state of an entire storage volume at a specific point in time, and then mark subsequent changes to that storage volume over time) and clones (a snapshot that captures an entire storage volume from a specific point in time, rather than capturing only changes to the storage volume).
The Problem with Enabling Data Resiliency and Data Protection
With traditional storage architectures, snapshots, RAID, mirroring, replication and other data resiliency and protection technologies all come at a significant price of input/output operations per second (IOPS) and throughput. That is to say, they are heavily computationally intensive, and when they are running, they dramatically slow the workload’s performance. As a result, the customer is forced to choose between data quality, which is fundamental to the business’ success, performance (which is also very important to business operations) and cost. They either cannot achieve the levels of performance that their premium storage resources are capable of, or they are forced into expensive workarounds such as adding customized application-specific integrated circuits (ASICs) or dynamic random-access memory (DRAM) caching.
Computationally-intensive data resilience and protection capabilities are becoming more and more business critical, during a time in which maximizing CPU and storage capacity and memory utilization is also becoming paramount. A re-write of core storage algorithms is needed to deliver data integrity, while keeping storage infrastructure costs and complexities in check and ensuring required levels of performance. In our next blog, we will discuss snapshots specifically in more detail.
We live in an age of tremendous storage hardware innovation.
Solid-state drives (SSDs) that are capable of delivering more than 100,000 input/output operations per second (IOPS) in raw performance have hit the market. The reality, though, is that customers are not getting the full benefits of these innovations. They are only able to obtain a fraction of these levels of performance from their storage arrays, because the storage array is bogged down by wildly inefficient legacy storage software algorithms.
Most storage vendors take 12-36 months to come to market since they simply integrate their inefficient legacy storage software code base with a couple of new features and faster hardware. This does not fix the storage hardware utilization problem, because it does not address the underlying root of the problem. True innovation that re-writes core storage algorithms is required to tackle this issue.
StorONE has taken the innovation, rather than the integration, approach. At StorONE, we have spent six years re-writing the storage software stack from the ground up so that customers can enjoy the full potential of modern hardware capabilities. Through high performance erasure coding and algorithm techniques, we created our Unified Enterprise Storage (UES) platform, S1. S1 unlocks previously unobtainable levels of storage system efficiency – something that we call Total Resource Utilization (TRU). Through changing the efficiency equation, S1 enables you to utilize significantly more of your hardware’s capabilities. The same results are achieved with far less hardware.
StorONE’s innovation-first culture has led to a total of 33 patents that have been granted in only six years, and we have tens of additional patent applications pending. Our large number of patents reflects our heavy investment in research and development, as well as our focus on re-architecting the storage stack for levels of efficiency that are, quite simply, transformative for our customers’ businesses.
Most recently, in the first quarter of 2019, StorONE was granted two new patents. Patent No. 10198321, entitled “System and method for continuous data protection,” recognized our groundbreaking approach to integrated data retention without compromising on performance, and Patent No. 10169021, entitled “System and method for deploying a data-path-related plug-in for a logical storage entity of a storage system,” addresses creating, verifying and executing tasks that ensure availability of data in distributed storage systems.
The new patents reflect the core focus of StorONE and of S1, which is to enable customers to obtain high-performance storage without sacrificing on enterprise-class data protection services, at a cost point that is in fact lower than both legacy storage architectures and cloud storage services.
At the core of how we are removing legacy constraints, is the fact that we have designed our data services to be highly computationally efficient, so that they are not hogging valuable CPU cycles that should be spent on serving the application itself. The customer gets more value out of each core and out of each gigabyte of storage capacity, because they do not need to overprovision on one or the other to obtain the levels of performance that they need. S1 supports the full range of high-performance and high capacity use cases, including all-flash and secondary storage, with a fully flexible, mix-and-match approach that provides the freedom for you to quickly integrate the newest innovations and to customize your infrastructure according to your unique application needs.
Storage managers have always been pressured to do more with less.
That pressure intensifies as the volume of data explodes, as the number of performance-hungry workloads grows, and as faster-performing but also more expensive storage technologies such as solid-state drives (SSDs) and non-volatile memory express drives (NVMe) enter the equation. Delivering the throughput, processing power, and storage capacity required by today’s workload ecosystem without breaking the bank necessitates new levels of hardware utilization that are not possible with legacy storage software.
Amazing innovations have occurred over the past five to ten years within storage media; for example, there are enterprise SSDs available on the market today that are capable of achieving hundreds of thousands of Input/Output Operations Per Second (IOPS). However, most storage software stacks have not been re-written to utilize these drives abilities, resulting in wild inefficiencies that the customer ends up paying for.
In the era of Moore’s Law and hard-disk drives (HDDs), storage software programmers did not need to worry about writing efficient code. Central processing unit (CPU) performance was accelerating at a rate with which storage media simply could not keep pace, so bloated storage software could be masked by significantly lagging HDD performance. Programmers prioritized getting their software to market as quickly as possible, versus taking the extra time that would be necessary to write more efficient code.
Today, the tables have turned as CPU performance gains have become incremental and storage media performance accelerates and density increases drastically. The end result is storage arrays that deliver only 20% or less of the IOPS that the storage media is capable of, forcing customers to dramatically overbuy to meet storage performance or capacity needs.
Extracting as much functionality and value as possible from every CPU cycle requires a rethink and a ground up re-write of the storage controller to serve as a consolidation engine.
Consolidating to a single interface for the storage operating system and services streamlines deployment and management of the underlying storage infrastructure.
- Storage managers can more easily make changes. For example, they do not need to worry about dealing with complex RAIDs when a capacity expansion is required.
- Furthermore, this reduces the impact of notoriously CPU and memory-intensive storage software services such as snapshots thus freeing up IOPS and throughput for the application itself.
- Lowering storage controller CPU resources enables the system to utilize fewer drives and use lower-priced CPUs, all without sacrificing performance and storage services.