Fault Tolerance and High Availability
I/O makes or breaks the system, so storage performance is always a big deal, especially in virtualized environments, where VMs are hungry for IOPS. It’s extremely expensive to implement an all-Flash, even more so – all-RAM storage, both considered to be overkill. Thus, a combination of slower spindle, faster Flash and much faster RAM tiers is typically used in the industry.
Hardware failure is more of an issue for virtualized environment than it was for all-physical one, because one failing physical machine will bring down all the VMs it hosted. Every VM plays the role of an entire server, so such a failure would mean catastrophic service discontinuation. It becomes even more disastrous in case VDI and thin clients, because one hypervisor box going down would mean stopping a noticeable part of the company’s operations. It’s essential to build hypervisor clusters to be fault-tolerant and fully redundant. Shared storage is an essential part of virtualization infrastructure, since it stores VMs of the given virtual environment, thus it must not in any case be the single point of failure.
All the Virtual Applications are running on the single hypervisor host, which is considered as single point of failure in this particular scenario.
To achieve fault-tolerance for storage subsystem, duplication or even triplication of all the critical components is used. In converged deployment scenario, StarWind runs virtual storage on multiple hypervisor nodes. In non-converged scenario, storage runs on many dedicated commodity servers. The shared Logical Unit is basically “mirrored” between the hosts, maintaining data integrity and continuous operation even if one or more nodes fail. Every active host acts as a storage controller and every Logical Unit has duplicated or triplicated data back-end. Multi-Path nature makes sure that even if some I/O fail, the work will just continue instantly with zero downtime. This way 99.99% uptime is achieved with 2-way replica and 99.9999% with 3-way replica. Going beyond triplication is considered pointless for most cases, unless it’s a life-critical system, like a nuclear power plant reactor control or cruise missile guidance operations.
The hypervisor is running as the cluster, thus eliminating the single point of failure.
StarWind Virtual SAN eliminates the single point of failure for storage in virtualized infrastructure by using duplication and triplication of data, caches and I/O controllers, basically “mirroring” them all between independent physically different hosts. This way, the virtual shared storage becomes fault-tolerant and provides high availability to higher performance and low-cost.