What's Split Brain and how to avoid it like the plague?

In this article, we will discuss the split brain issue, and different approaches to prevent it from happening.

The situation when communication loss between the cluster nodes is caused by network connection problems is called network partition, which can lead to split brain. According to Wikipedia, split brain is a computer term, based on an analogy with the medical split-brain syndrome. It can be a real horror for a system administrator to handle the consequences of a split brain.

A split brain can happen if active nodes lose all synchronization and heartbeat connections between them at the same time and are not able to communicate anymore to define the synchronization state of the partner node. Split brain indicates data or availability inconsistencies originating from the maintenance of two separate data sets with overlap in scope.

Having split brain can lead to serious, unrecoverable errors. In this situation, if each node considers it is Primary and commits any transactions that the other does not, then you cannot resynchronize the nodes because there is no way to “merge” the information on each node to create a single HA storage that has the correct information. In practice, the transactions committed in the “wrong primary” storage during the dual primary situation, will be lost. Having dual primaries can also lead to other errors.

So, when dealing with network partition, the highest priority is to eliminate the risk of split brain and data corruption.

There are few different approaches for dealing with the network partition issue after it has occurred.

With the so-called “optimistic” approach, system admin simply restores the communication channel between the nodes and lets the partitioned nodes work as usual for some time, with a hope that they will synchronize automatically after a while. It is a very careless approach, as there is always a chance of data corruption because of split brain.

Another approach, the so-called “pessimistic”, is when a system administrator has to sacrifice systems availability in favor of data consistency. Once a network partitioning has been detected, access to the sub-partitions is limited in order to guarantee consistency. In this case, only one component can continue to make read/write requests to the storage, in order to avoid history divergence.

Modern commercial general-purpose HA clusters typically use a combination of heartbeat network connections between cluster hosts, and quorum witness storage.

Today we will mostly focus on the storage part of HA clusters based on an example of StarWind VSAN, especially since it now offers several Split Brain preventive mechanisms:

Heartbeat strategy
Node Majority strategy

Heartbeat strategy or hand on the pulse

Heartbeat is an advanced mechanism which is used to avoid data corruption in case of synchronization channel failure. If data can`t be transferred through the synchronization channel, StarWind VSAN attempts to ping the partner nodes using the provided heartbeat links. If the partner nodes do not respond, StarWind VSAN assumes they are offline. In this case, StarWind VSAN marks the other nodes as not synchronized, and all HA devices on the current node flush the data from the cache to the disk and continue to operate in Write-Through caching mode. This is done to preserve data integrity in case the node goes out of service unexpectedly. If the heartbeat ping is successful, StarWind blocks the nodes with the lower priority until the synchronization channels are re-established

Heartbeat is approved as a very easy and effective way to prevent split brain. This functionality is supported by VSAN from StarWind for a long period of time already.

In general, the functionality itself is based on the constant status check of the cluster participants.

Let’s say we have two HA nodes. When there is a write request for the first node, the data is transferred to the partner node over the sync channel. In case of Sync channel failure, the data cannot be sent to the partner node, so the devices cannot synchronize. The Primary node initiates a status check of the system. Both nodes are synchronized, they are in equal conditions. As a result, the node that has a Primary priority remains operational, the Secondary priority node becomes “not synchronized” and stops receiving and responding to client requests. The Secondary node periodically checks the availability of the sync channel. As soon as it is available, the synchronization process starts, the Secondary node becomes active and allows client connections.

Benefits:

The device continues to work even if only one node remains alive.

Disadvantages:

In the event when all the communication channels between nodes are lost, but the connection of nodes to the client is maintained, a split brain condition arises. This problem is solved by configuring the heartbeat connections through the same communication channels that are used to access the clients.

To summarize, this kind of strategy is mostly applicable to the systems where you have enough network links that can be used as the additional heartbeat channels and are physically separated from the primary ones.

Node Majority strategy or heartbeat life support

Another way to eliminate the risk of a split brain is adding a special “witness” node. In StarWind VSAN it is called Node Majority Strategy, as opposed to the Heartbeat strategy.

The witness node is actually a “router” between the main nodes and it is used to get information about the partner node status in case of direct connection loss between Node 1 and Node 2. It can be deployed as a separate StarWind instance in the cloud or on the physical host, that is connected to storage nodes. Witness node does not participate in data exchange and is not available to the client device. One could ask, “Why should I consider using Witness node if I can configure 3-node HA storage and keep my data replicated twice?” As opposed to 3-node replication, Witness node doesn’t require the same amount of storage for keeping replicated data and just keeps storage cluster settings. This node will have a vote, so it will participate in quorum voting when considering which network partition is primary.

Let’s take a look under the hood.

So, we have the same config, we have two storage HA nodes. For example, client requests are transferred from Node 1 to Node 2 over the Sync channel. In case of Sync channel failure, the devices cannot synchronize. In this case, Witness Node, which is connected to both storage nodes, will make a majority with Node 1 as it has the most relevant data. As a result, the Node 2 becomes “not synchronized” and stops receiving and responding to client requests. Now, there is a witness, you can lose a node and keep the quorum. Even if a node is down, the cluster is still working. As soon as sync channel is available back again, the synchronization process starts. After successful synchronization, Node 2 becomes active again and allows client connections.

So, when you have an even number of nodes, the quorum witness is required. But to keep an odd majority of votes, when you have an odd number of nodes, you should not implement a quorum witness.

StarWind Witness Node Configuration Guide can be found here: https://www.starwindsoftware.com/resource-library/creating-highly-available-device-using-node-majority-failover-strategy

Conclusion

So, let’s break it down. Cluster members cannot determine which should be active in case of all network connection interfaces failure. Witness node is an approved way to keep the storage cluster operational in such situations and nullify the chance of split brain. Therefore, we can point the following benefits and disadvantages of implementing a Node Majority configuration.

Benefits:

The possibility of a split brain is completely excluded;
An additional communication channel for the heartbeat is NOT required.

Disadvantages:

In a configuration with two storage nodes, a third node is required;
In case if an HA device consists of three storage nodes or two storage nodes and one witness node, only one node can be disabled. In case of failure of two nodes, the third node will also shut down.

As a conclusion, Node Majority Strategy can be implemented with VSAN from StarWind and can be used as an alternative to Heartbeat Strategy. This is the best choice if you wish to be 100% sure that split brain will not happen.

Combining Virtual SAN (vSAN) with Microsoft Storage Spaces for greater Performance and better Resiliency

StarWind VSAN Configuring HA shared storage for Scale-Out File Server in Windows Server 2012 R2

What’s Split Brain and how to avoid it like the plague?

Heartbeat strategy or hand on the pulse

Node Majority strategy or heartbeat life support

Conclusion