Life Without StorONE
Poor and Inconsistent Performance
The performance of storage environments is often overlooked until it becomes a problem. Poor performance can affect your company’s ability to maintain an efficient work environment, and can even result in a poor customer experience. Below, we walk through the most common problems and workarounds, and then go into detail about how we’ve built a storage data platform that solves these problems, giving yout he highest performance.
Most Common Causes of Poor Data Storage Performance
Misleading published and benchmark performance specs
You’re consistently disappointed that your data storage system’s performance never comes close to the vendor’s published specs.
An increase in capacity utilization results in a decrease in data storage performance
You constantly see that when your capacity utilization exceeds 50%, your performance begins to drop noticeably and falls off a cliff at 80% utilization. Your vendor probably even recommends never exceeding 80% capacity utilization regardless of the amount of capacity installed. You always have at least 20% capacity that can never be used. You’re probably thinking ‘Why am I paying for 100% capacity when I can only use 80% of it?!’ It’s a good question.
RAID drive rebuild negatively impacts performance
Your RAID 5 rebuild of a single failed drive reduces data storage performance by 50%. Or your RAID 6 rebuild of two concurrent drive failures reduce data storage system performance by as much as 80%. Any of your larger capacity drives (15TB, 30TB, 128TB SSDs and 18TB, 20TB, 22TB HDDs) take much more time to rebuild – time that kills performance and puts data in that RAID group at risk of loss...
Snapshots cause a hit to your performance
Snapshots are nearly instantaneous; however, each consumes a lot of CPU resources and memory. So when you need frequent snapshots of different volumes that target a small recovery point objective or RPO – amount of data that can be lost – there is a severe IO performance reduction while the snapshots are occurring. The more snapshots you take, the greater the reduction in resources available for reads and writes.
Deduplication and/or compression are supposed to lower costs but with a performance penalty
You want to take advantage of the potential deduplication and compression capacity and cost savings. But since they both require considerable data storage computational resources, they increase latency while considerably reducing performance whether writing or reading – deduplicated and/or compressed data must be rehydrated to be read. And if you have data types such as video and relational database data that do not deduplicate or compress very well, your performance decrease is amplified even more. These data types slow down the data reductions and reduce overall data storage performance. So much so that the licensing, subscription, or as-a-service cost of deduplication and compression ends up being greater than any potential savings.
Max IOPS and throughput don’t add up
Your data storage system’s maximum IOPS and throughput are significantly less than the sum total of your drives. If you have an all-flash system, It’s especially noticeable. But even hybrid or HDD systems there’s a huge shortfall. The bottleneck is the storage controller or server and inefficient data storage software.
Data storage software consumed takes resources away from IO
Resources consumed by the data storage software are resources unavailable for IO. That limits the maximum performance in IOPS and throughput.
Most Common Workarounds When Data Performance Drops
Forklift replace the current data storage system with an all-flash SSD data storage system
Most storage vendors will simply say to replace the current system with an all-flash one. That’s not practical if your current data storage system is not yet amortized, you have to break a lease, or you will have penalties for changing out a subscription or as-a-service contract. The cost can be excessive.
It also assumes your performance problem is the data storage media – either HDDs or older SSDs. That may be the case, or it may not. If it is, this workaround will help. If it’s not, your money is wasted.
Faster SSDs with more IOPS and throughput
Your fastest SSDs tend to be NVMe. The fastest NVMe SSD is based on single level cell (SLC). NVMe runs on PCIe. Your PCIe slots are severely limited in any given storage controller or server. And those slots have to be used for NICs, DPUs, Adapters, and Drives. That puts a hard limit on your number of NVMe SSDs. The workaround for this is to use the NVMe SSDs as a caching tier for hot data.
The problem is that your current data storage systems may not be able to take advantage of the latest high-performance SSDs or are even set up for a caching tier. They are more likely unable to support either of them.
Faster drives assume that the drives are the bottleneck. More often than not it’s your data storage controllers or servers. This workaround generally only nominally moves the performance needle, especially when the SSD performance is not the performance bottleneck. It’s also a high cost for not much gain.
Add more drive shelves and drives
This only works when your performance bottleneck is from too few drives. The far more common situation is your data storage system performance is less than the sum of your drives. Adding more drives only increases capacity and cost.
Use smaller capacity drives and RAID groups
Smaller capacity drives increase costs significantly because it means more drives to reach your required capacity, more shelves, more rack units, and a much reduced overall system capacity. Smaller RAID groups also raise costs with additional parity drives that are not available for writes. Neither solves the problem. They merely somewhat mitigate it at a very high cost.
Run RAID rebuilds in background
This mitigates the negative performance impact of a RAID based drive rebuild. But it does not eliminate it and it makes the rebuild much longer. Longer rebuilds increase the risk of data loss from the entire RAID group.
DRAM has the lowest latency of any media. However, your data storage controller/server DRAM has severe limits. It tops out at approximately 1.5TB per socket – CXL will change that down the road in a few years. There are several other problems with this workaround. There needs to be complicated cache coherency between data storage controllers/servers otherwise non-recoverable errors will occur in relational databases. That cache coherency marginalizes DRAM’s performance advantage. DRAM has to have a battery backup or super capacitor during a power outage to move the data to persistent storage. There is a much higher probability of data corruption and loss. All of this makes DRAM caching orders of magnitude more expensive than the fastest storage.
Faster storage networking
Assumes your performance bottleneck is the storage networking. It can cause more pressure on your application servers and data storage CPUs. It requires faster NICs or DPUs for all of your application servers, switches, and transceivers. May have little to no impact on your overall latency, IOPS, and throughput for undue additional cost.
Turn off deduplication and compression except for backups and archiving
The workaround here is to manually turn it off for all your performance applications where the savings are nominal. Leave it on for those applications where performance is not that important. This only works if your data storage system has the granularity of enabling it to be turned on or off by volume, file system, or object store. But all performance applications will still be negatively impacted by the deduplication and/or compression and rehydration of the non-performance applications.
How StorONE's Enables High-Performance Data Storage
Optimized Storage IO Engine Cacheless Architecture
Our Cacheless Architecture enables you to extract maximum performance from modern storage hardware, servers, and media. We modernized and completely rewrote the storage software stack from the ground up. Taking this approach eliminated the layers of code that added huge latencies. Storage services such as vSNAP snapshots, vSecure encryption, vRAID erasure coding, vReplicate replication, and our Virtual Storage Container™ have no impact on performance. This enables the StorONE Storage Data Platform to achieve over 1 million IOPS performances with 12 NVMe flash drives in a single storage server/controller.
High capacity utilization without performance degradation
You can take advantage of much more of your data storage capacity without degrading performance. Customers report capacity utilization in excess of 90% with the same performance.
The cacheless architecture shortens the write path, which accelerates performance. Your data is written close to max storage media speeds, eliminating the need for a write cache. The results are extremely high-performance writes with the industry's highest level of data integrity.
The StorONE engine vRAID erasure coding rebuilds failed drives considerably faster than any other RAID. Rebuild times are so fast it returns a volume of 20TB HDD to full protection within 3 hours. And more importantly, it does not reduce the IO performance while rebuilds are occurring.