Architecture: GRID

July 20, 2024

Ivan Ischenko

StarWind Director of Product Management. Ivan is an expert in virtualization and storage architecture. With deep knowledge of software-defined storage and data protection, he provides technical leadership in solution design and product strategy. Ivan delivers high-authority insights into modernizing enterprise-scale IT infrastructure and optimizing virtualized ecosystems.

Ensures high availability and scalability, maintaining high performance and redundancy even during node failures.

Intro:

High Availability (HA) is becoming increasingly important for businesses worldwide. To achieve HA in virtualization environments, you need to build a cluster. However, the cluster doesn’t solve other challenges, such as scalability and failures to tolerate (FTT) on the storage layer.

Problem:

Traditional N+1 or N+2 cluster configurations do not provide sufficient redundancy and performance for certain workloads. They may do just fine for some data storage, but when it comes to many independent workloads, the resilience provided will not be adequate. One or two nodes going down should not mean the failure of the whole cluster with all the VMs.

It is difficult to build a system that would gradually scale and maintain “data locality”. Without it, the cluster will suffer a decrease in performance, because dividing compute and storage resources of a single process will send much of the data through fabrics. As for the former, without flexible scaling, the cluster loses one of its main benefits, thus requiring higher expenses to grow resources.

Solution:

The so-called “grid architecture” allows the cluster to maintain a high rate of fault tolerance without losing the principle of “data locality”, and not collapsing if multiple nodes go down. As the name implies, this cluster topology resembles a grid, where a number of take nodes resemble a cluster of their own. These “clusters” have a much higher rate of system resiliency than typical N+1 or N+2 systems, where each component has one or two backup partners, allowing them to withstand only roughly 1/3 nodes failure. In case with virtualization, it is much better to have the part of the cluster working, than have it all go down.

Conclusion:

“Grid architecture” is the way to build a highly redundant and high-performing cluster. It connects the nodes together into a resilient grid, which supports the principle of data locality to a certain extent. Instead of crashing the entire cluster when a number of nodes go down, “grid architecture” preserves the work of the “healthy” nodes.