IT professionals need better high-availability during pandemics like COVID-19. Better high-availability (HA) is the ability to continue operations, at full performance even with failed hardware. COVID-19 and any future pandemics lead to work from home orders and that means data centers become truly ‘lights-out’. That work from home order also includes storage administrators who can’t get into their data center.
The data center and the storage infrastructure that supports it becomes even more critical during work from home situations. It is the circulatory system that keeps employees connected and productive. The storage infrastructure must stay up and running even though a prolonged hardware failure. Work from home orders means that IT can’t get to the data center to address broken hardware.
In our survey, “How has COVID-19 Impacted Your Data Center?” 100% of respondents indicated that the pandemic had impacted data center accessibility, with 40% stating they had either no access to the data center or extremely limited access. The limited access means that broken hardware needs to wait. (If you haven’t taken our survey yet please do here, we’d like to understand how COVID-19 is impacting your data center operations.)
Why Better High-Availability During Pandemics?
Regardless of the quality of hardware your organization buys, it will eventually break. Therefore, storage systems deploy techniques like high-availability and RAID to make sure data isn’t lost if there is a hardware failure. These techniques also provide IT with the hours needed to replace the broken hardware. Whatever it is (fate, Murphy’s Law, bad luck) that causes the hardware to break, doesn’t take time off during a pandemic, and we have countless examples of organizations dealing with broken hardware that they can’t fix during this time.
Better High-Availability During Pandemics Needs More than RAID
The reality is that during a pandemic, IT needs to improve all the techniques that storage systems employ to recover from failure. The most common component to fail on a storage system is the drive. Both flash and hard disk drives can fail, and so protection from that eventuality is a table-stakes requirement of any storage system. Most systems deploy RAID to protect against media failure and use hot spares to help an organization return to a protected state without physically being present.
Even in times when access to the data center is immediate, using hot spare has risks. The problem is when the system needs to use the hot spare to replace the failed drive. The customer is exposed to future data loss if there is another series of failures. Most storage systems do not do an adequate job of reporting and alerting to a zero hot spare condition. The problem of hot spare replacement gets worse if you can’t get to the data center or only get into it once per week.
At StorONE, our vRAID takes RAID data protection to the next level. Not only does vRAID allow extreme flexibility in drive redundancy settings and very rapid rebuilds, but it also operates without the need for hot spares. vRAID will rebalance the data segments from a failed drive to any available drive in the system. If another drive fails, it will do the same thing. Customers can run for months with failed drives in their storage system, and yet, still be in a 100% protected state. Also, none of these efforts impact storage performance.
* vRAID is one star in StorONE’s Five Star Safety Data Availability Features. Learn more about vRAID and the other five-star team members by downloading our white paper, “How to Reduce Backup Costs with Better Primary Storage.”
Better High-Availability Requires More Than Basic HA
Modern name-brand servers don’t fail very often, and each has a fair amount of redundancy built into it. When IT counts on these systems for storage servers, the need for better availability becomes obvious. Better high-availability during pandemics means more than sophisticated RAID. It requires new levels of high availability (HA). At a minimum, the system needs to ensure that one storage controller’s loss won’t mean a loss of access. A second system needs to take over.
During a pandemic, when IT can’t get to the data center quickly, or at all, basic HA may not be enough. You need non-stop operations. Beter HA is synchronously replicated systems, where one HA storage system is synchronously replicated to a second HA system within the same data center. If one node fails, the other node can continue running, or IT can switch operations to the other two-node targets. For the ultimate protection, the organization may want to replicate again to an off-site location.
StorONE’s HA is an active-active cluster, which means you benefit from the servers’ full performance while everything is in operation. Our DirectWrite feature, that we discuss in our blog “The Write Cache Crutch” makes active-active HA fast while maintaining data integrity. If you want even more availability, you can synchronously replicate to another HA pair, located on-premises. Our Q3 release, which is less than 30 days away, will significantly improve our already industry-leading data protection and resilience abilities.
We Need You!
The most critical aspect of creating a pandemic-resilient storage infrastructure is you. The above recommendations can improve your storage systems’ immune system but we need you to be “highly-available” too. Please do everything you can to stay safe and take precautions.
Another way you can help us to complete our COVID-19 Survey. This unique survey asks specific questions about the data center, and it will also help us design better solutions for you. The results will show you how your peers have been dealing with the pandemic.
A next step in designing better high-availability in your storage infrastructure is to join me and StorONE’s CEO, Gal Naor, for a live webinar on July 29th at 1:00 pm ET. We will share the survey results during the webinar and provide additional recommendations on how you can create a pandemic-resilient storage infrastructure. If you pre-register this week, you’ll receive an advance copy of Gal’s latest white paper “10 Recommendations for Storage Managers to Prepare for Future Pandemics,” loaded with advice on how to make your storage infrastructure more resilient now and in the future. The paper is unavailable anywhere else, so register now to get your copy.