Share on Facebook0Share on Google+0Share on LinkedIn0Share on Reddit0Tweet about this on Twitter0

Introduction

Working at StarWind Support, me and most of my colleagues from time to time receive feedback from our customers based on their user experience with StarWind VSAN, as well as some suggestions what they would like to see in the software the most. It appears that customers do not really like the procedure of full synchronization which may take from hours to days, depending on the size of the StarWind HA devices provisioned and characteristics of the underlying storage (i.e., the type of disks used for creating RAID arrays, type of RAID arrays, and the settings used while creating them).

Now, the time has come to introduce a feature that many people have been waiting for. It is Log-structured Write Cache (LWC). Let me describe a bit what it does and what it needs to work as designed.

In this blog post, I will make an overview of how Log-structured Write Cache can be configured. Additionally, I will make some power outage tests to check the operation of LWC under the conditions that are pretty close to the real power outages, which, in fact, are the most common reason for the full synchronization triggered on StarWind HA devices. Check the list of possible reasons of full sync to get a better idea about the cases where LWC can help you.

The idea behind Log-structured Write Cache

In the first row, LWC has been developed to perform two main functions. One of them is buffering of random write operations. That’s, actually, what makes it similar to StarWind L2 cache. Yet, the difference is that LWC is operating as a write-back cache, unlike L2 cache which is write-through only. The other purpose, and, probably, even a more anticipated one, is eliminating the need for full synchronization for highly available devices. During its operation, LWC journals latest snapshots, which results in the necessity to synchronize only the latest fragments instead of the whole volume.

Setting up LWC

I will not cover the steps of configuring an LWC-enabled device, as you can read that in the technical documentation at our Resource Library. I will just point out several important things that are different between setting up an LWC-enabled HA device and an already known HA device based on StarWind flat disks:

  • The usual usage scenario assumes that LWC journals are located on separate SSDs or an SSD-based RAID array, while the flat device itself is located on another volume which can be either HDD or SSD.
  • It is impossible to configure StarWind L1 cache in the write-back mode for LWC-enabled devices. So, you have to either use it in the write-though mode, or simply leave it without caching.

One more important thing to know about LWC is that it eventually will consume all free space on the volume where it is placed. Sure, data will be flushed to the main storage in due course, mainly when enough amount is accumulated in LWC.

For my test lab, I configured a setup of 2 nodes having Windows Server 2016 installed on them and the Windows failover cluster configured in a way that got both nodes as its members. For the shared storage, I used a StarWind device with LWC enabled.

wp-image-10414

Then, I presented the shared storage to the failover cluster as a cluster shared volume and created a test server there that ran Windows Server 2016.

wp-image-10415

StarWind Virtual SAN eliminates any need for physical shared storage just by mirroring internal flash and storage resources between hypervisor servers. Furthermore, the solution can be run on the off-the-shelf hardware. Such design allows StarWind Virtual SAN to not only achieve high performance and efficient hardware utilization but also reduce operational and capital expenses.

Learn more about ➡ StarWind Virtual SAN

Power outage tests

For the purpose of testing done for this post, I decided to imitate a power outage for both nodes simultaneously. With StarWind flat image files, such situations are likely to cause StarWind devices to break synchronization on both nodes. The nodes won’t be able to define which of them contains the latest data, and thus should become the synchronization source.

To aggravate things, I was copying a large file of 10GB into the clustered VM right at the time when I cut off the power supply on both cluster nodes simultaneously. My clustered VM failed, too, just as it was expected, without the data transfer being completed.

When I powered up the cluster nodes, the StarWind LWC device was doing fast synchronization, which took only few minutes, instead of the “traditional” full synchronization or even sync loss on both nodes. The clustered VM came up online automatically and was just fine.

Wrap

To sum it up, LWC could be used as a means of avoiding the full synchronization scenario, especially for the systems with no power backup. Also, the feature is useful for the environments that would like to benefit from using write operations caching on the drives faster than the main drive array, e.g., SSD caching for HDD arrays, or NVMe caching for SSD arrays.

Views All Time
4
Views Today
11
Appreciate how useful this article was to you?
No Ratings Yet
Loading...
Back to blog
The following two tabs change content below.
Boris Yurchenko
Boris Yurchenko
Boris is a Tech Support Engineer at StarWind. He is keen on virtualization and adores playing around with Java and PowerShell.