Let's Get Real About Data Protection and Disaster Recovery

Personally, I am getting rather tired of the dismissive tone adopted by virtualization and cloud vendors when you raise the issue of disaster recovery. We previously discussed the limited scope of virtual systems clustering and failover: active-passive and active-active server clusters with data mirroring is generally inadequate for recovery from interruption events that have a footprint larger than a given equipment rack or subnetwork. Extending mirroring and cluster failover over distances greater than 80 kilometers is a dicey strategy, especially given the impact of latency and jitter on data transport over WAN links, which can create data deltas that can prevent successful application or database recovery altogether.

Despite the physics, just about everyone from your hypervisor vendor to your Disaster Recovery as a Service (DRaaS) provider tends to dismiss these issues. Some suggest compressing and/or de-duplicating data to reduce the volume of data that must traverse networks, as though that will increase throughput. It doesn’t. In a traffic jam on a major thoroughfare, the biggest lorry and the smallest SMART car both move at the same rate of speed – which is to say, not at all. In this case, size doesn’t matter.

Another issue we need to confront is one of data isolation. As work continues on hypervisor technology and, especially, on software-defined storage and software-defined networking technology that underpin products like hyper-converged infrastructure appliances, many leading vendors are up to their old tricks again of seeking to dominate the entire hardware/software stack and to exclude competitors from their customer’s IT budgets. What we are seeing is a return to isolated islands of automation similar to what we sought to attack throughout the 1990s with technologies like SAMBA (developed in 1991 and 1992 to facilitate data access and sharing between unlike operating systems) and even Fibre Channel fabric “storage area networks” (intended originally to enable all storage arrays, regardless of vendor, to interoperate within the same shared storage pool).

We can argue about whether we actually achieved the open-ness and interoperability that visionaries were seeking, but we must also agree that these technologies and others were aimed at breaking through the moat and stockade defenses that IBM, HP, EMC, Sun, Oracle, Microsoft and other big name vendors had erected around their monolithic hardware/software stacks. We succeeded to some extent, but the continued proprietariness of some technologies – and their high price tags – created some of the momentum behind the virtualization and software-defined movements. Now, those movements have also, in some respects, been usurped and converted to competing proprietary technology stacks.

Ask VMware whether their VSAN storage architecture can store data from Microsoft Hyper-V. (Spoiler alert: the answer is no.) Ask Microsoft if you can share your storage spaces with VMware workload data. (Another spoiler: the answer is “yes” – so long as you convert your VMDK files to Hyper-V VHD.) This is hardly an improvement over the pre SAMBA days, when the only guidance Microsoft provided for sharing data with UNIX systems was to “get rid of all that Sun crap and standardize on Microsoft.”

Bottom line: if you are running multiple hypervisors (as most larger companies are) and also maintaining non-virtualized workloads such as high performance transaction systems you don’t want to risk slowing down with virtualization, you are going to need multiple disaster recovery strategies tailored to each isolated data stack.

The situation gets more complicated when you attempt to use cloud services for backup, emergency hosting (geo-clustering), and business recovery. In addition to the physics of the issue (replicating data over distance on a WAN and latency/jitter-induced data deltas), you will also confront the need to obtain services from multiple cloud service providers – each specialized in a particular virtual server hypervisor and software-defined technology stack.

I know, we all liked the pretty advertisements for clouds a few years ago. They reminded me of toilet paper commercials with fluffy white clouds, rainbows, doves, hot air balloons and sunflowers.

But the reality has become more like an action game: BattleClouds™. Most cloud services vendors, including the DRaaS crowd, have teamed up with a specific hypervisor/software-defined stack vendor and can only support the part of your infrastructure that uses that same stack.

Given this reality, planners who want to use cloud-based DR will likely need to use multiple cloud services and to figure out a way to coordinate and synchronize recovery between different service providers. This means that, in addition to being dependent upon your current WAN service for access to each cloud, your recovery is also contingent upon the networks that each vendor uses to connect up to a competitor’s service. Be afraid, be very afraid.

Some firms, including IBM, are working on a software layer – they call it “Resiliency as a Service” – that will help you to orchestrate the services of both on-site disaster recovery and business continuity strategies and also remote service providers like DRaaS vendors and legacy hot site vendors. But this coordination/orchestration layer is still a work in progress.

And no, unfortunately, the idea of simply using a “Swiss Army Knife” change block data copy engine (the latest evolution of backup software) to copy your data to tape or to a cloud target somewhere is not an acceptable alternative solution. The latest stats collected on World Backup Day in March showed that 1 in 3 users had lost data that they had placed in a cloud for safe-keeping. Not a good record. Plus, storing a mish-mash of change block backups rarely provides a restore capability that will match your time to data needs.

At the end of the day, DR planning remains a challenging exercise. The smart IT planner will confront it head-on, like any appdev effort. At a minimum, we cannot afford to buy the propaganda that certain vendors are peddling, that you can simply entrust your recovery to hypervisor failover or to the clouds. Time to get real.

Related materials:

Let’s Get Real About Data Protection and Disaster Recovery