Free Webinar
October 3 | 11am PT / 2pm ET
Disaster Recovery 101
Hyper-V and vSphere
Speaker: Oleg Pankevych, Solutions Engineer, StarWind

StarWind Virtual SAN® 2-node Stretched Cluster on VMware vSphere 6.5

INTRODUCTION

Stretched clustering technology allows many organizations to secure their constant business continuity by replicating data and distributing workloads not only among different nodes, but also among different remote locations. Mission-critical applications are kept 24/7/365 up and running due to storage mirroring between distant server locations.

This technical paper provides detailed instructions on how to set up a 2-node stretched cluster on VMware vSphere 6.5 with StarWind Virtual SAN as a storage provider.

StarWind Virtual SAN® is a hardware-agnostic VM storage solution. By mirroring the existing servers’ storage and RAM between the participating cluster nodes, it allows creating a fault-tolerant high-performing storage purpose-built for intensive virtualization workloads. All I/O is processed by local RAM, SSD cache, and disks. Thus, it never gets bottlenecked by the storage fabric. The mirrored storage is utilized by all cluster nodes and is treated by all hypervisors and clustered applications just as one big local storage. StarWind Virtual SAN features stretched clustering to ensure constant applications uptime due to Live Migration between different geographical locations.

A full set of up-to-date technical documentation is available here, or by pressing the Help button in the StarWind Management Console.

For any technical inquiries, please, visit our online community, Frequently Asked Questions page, or use the support form to contact our technical support department.

Pre-Configuring the Servers

The diagram below illustrates the connection scheme of the StarWind stretched cluster configuration described in this guide.

connection scheme of the StarWind stretched cluster configuration

Make sure that the prerequisites for deploying StarWind stretched cluster on VMware are met:

  • L2/L3 multisite network applied according to the appropriate StarWind failover strategy.
  • Each iSCSI and Synchronization network channel throughput should be at least 1Gbps.
    The 10Gbps or higher link bandwidth is highly recommended.
  • The maximum supported latency for StarWind synchronous storage replication should be 10ms round-trip time (RTT).
  • The maximum supported latency between the ESXi ethernet networks should be 10ms round-trip time (RTT).
  • vSphere 6.5 or newer installed on the servers to be clustered.
  • StarWind Virtual SAN installed on Windows Server 2016 VMs.

StarWind Failover Strategies

Select the appropriate StarWind failover strategy before creating the device. Once selected, the failover strategy applies for the device time being. StarWind Virtual SAN provides two options: Heartbeat failover strategy and Node Majority failover strategy.

Heartbeat Failover Strategy

Heartbeat Failover Strategy

Heartbeat is a technology that allows avoiding the so-called “split-brain” scenario when the HA cluster nodes are unable to synchronize but continue to accept write commands from the initiators independently. It can occur when all synchronization and heartbeat channels disconnect simultaneously, and the other partner nodes do not respond to the node’s requests. As a result, StarWind service assumes the partner nodes to be offline and continues operations in a single-node mode using the data written to it.

If at least one heartbeat link is online, StarWind services can communicate with each other via this link. The services mark the device with the lowest priority as not-synchronized one. Subsequently it gets blocked for further read and write operations until the synchronization channel resumption. Then, the partner device on the synchronized node flushes data from the cache to the disk to preserve data integrity in case the node goes down unexpectedly. It is recommended to assign more independent heartbeat channels during replica creation to improve system stability and avoid the “split-brain” issue. With the Heartbeat Failover Strategy, the storage cluster will continue working with only one StarWind node available.

Heartbeat Failover Strategy Network Design

  • Management / Heartbeat – 100Mbps network or higher.
  • iSCSI / Heartbeat – 1Gbps network or higher. The 10Gbps or higher bandwidth link is highly recommended.
  • Synchronization – 1Gbps network or higher. The 10Gbps or higher bandwidth link is highly recommended.

Node Majority Failover Strategy

Node Majority Failover Strategy

This strategy ensures synchronization connection without any additional heartbeat links. The failure-handling process occurs when the node has detected the absence of connection with the partner. The main requirement for keeping the node operational is an active connection with more than a half of the HA device’s nodes. Calculation of the available partners bases on their “votes”. In case of a two-node HA storage, all nodes disconnect if there is a problem with the node itself, or with communication within the cluster. Therefore, the Node Majority failover strategy does not work in case if only two synchronous nodes are available. To apply this strategy, the third entity is required. It can be a Witness node which participates in the nodes count for the majority, but neither contains data nor processes clients’ requests.

Node Majority failover strategy allows tolerating failure of only one node. If two nodes fail, the third one will also become unavailable to clients’ requests. If replicated between 2 nodes, the Witness node requires additional configuration for an HA device that uses Node Majority failover strategy. Replication of an HA device among 3 nodes requires no Witness nodes.

Node Majority Failover Strategy Network Design

  • Management / Heartbeat /Synchronization – 1Gbps network or higher. The 10Gbps or higher bandwidth link is highly recommended.

Preparing Hypervisor for StarWind Deployment

Configuring Networks

Configure network interfaces on each node to make sure that vMotion and StarWind Synchronization interfaces are in different subnets and connected according to one of failover strategies scenarios (see the network diagrams above). In this document, Management/iSCSI traffic uses 10.212.0.x subnet while the Synchronization traffic uses 10.212.1.x subnet.

1. Create two vSwitches, one for the iSCSI/ StarWind Heartbeat channel and the other one for the Synchronization channel.

NOTE: iSCSI/ StarWind Heartbeat and the Synchronization vSwitches require creation of Virtual Machine Port Group while iSCSI traffic requires creation of VMKernel port only. VMKernel ports require assignment of static IP address to them.

2. Create a VMKernel port for the iSCSI/StarWind Heartbeat channel.

3. Add Virtual Machine Port Group on the vSwitch for iSCSI traffic and on the vSwitch for Synchronization traffic.


NOTE:
It is recommended to set jumbo frames to 9000 on vSwitches and VMKernel ports for iSCSI and Synchronization traffic. Additionally, users can enable vMotion on VMKernel ports.

4. Repeat steps 1 to 3 for any other links intended for Synchronization and iSCSI/Heartbeat traffic on both ESXi nodes.

Preparing StarWind Virtual Machines

5. Create a Virtual Machine (VM) on each ESXi host with Windows Server 2016 (2012 R2) and StarWind VSAN installed.

Configure the following settings for StarWind VMs on ESXi hosts:

RAM: at least 4 GB (plus the size of the RAM cache if applicable)
CPUs: at least 4 virtual processors with 2 GHz reserved;
Hard disk 1: 100 GB for OS (recommended);
Hard disk 2: Depends on the intended shared storage volume.

NOTE: VMs require Thick Provisioned Eager Zeroed for each hard disk.

Networking:

Network adapter 1: Management
Network adapter 2: iSCSI
Network adapter 3: Sync

NOTE:
Select VMXNET3 type for all network adapters.

NOTE: If necessary, to make StarWind Virtual Machine (VM) serve as a domain controller add Active Directory Domain Services role on it.

NOTE: When using StarWind Virtual Machine with synchronous replication, it is recommended not to make backups and snapshots of the Virtual Machine with StarWind Service. This can pause the StarWind Virtual Machine. Pausing the Virtual Machines while StarWind service under load may lead to split-brain issues in the devices with synchronous replication and data corruption.

Configuring StarWind VMs Startup/Shutdown

Setup the VMs startup policy on both ESXi hosts from the Manage -> System tab in ESXi web console. In the appeared window, check Yes to enable the option and select. Click Save to proceed.


To configure VM autostart, right-click on it, navigate to Autostart and click Enable.

Complete the actions above on StarWind VM located on another host.

Start both virtual machines, install OS and StarWind Virtual SAN.

Downloading, Installing, and Registering the Software

6. Download the StarWind setup executable file from the official StarWind website:

https://www.starwind.com/registration-starwind-virtual-san

NOTE: The setup file is the same for x86 and x64 systems, as well as for all Virtual SAN deployment scenarios.

7. Launch the downloaded setup file on the server to install StarWind Virtual SAN or one of its components. The Setup wizard will appear. Read and accept the License Agreement. Click Next to continue.

8. Read the important information about new features and improvements. Text in red indicates warnings for users that are updating their existing software installations. Click Next to continue.

9. Select Browse to modify the installation path if necessary. Click Next to continue.

10. Select the following components for the minimum setup:

  • StarWind Virtual SAN Service. StarWind service is the “core” of the software. It allows creating iSCSI targets as well as share virtual and physical devices. StarWind Management Console allows managing the service from any Windows computer or VSA in the same network. Alternatively, users can manage StarWind Web Console deployed separately.
  • StarWind Management Console. StarWind Management Console is the Graphic User Interface (GUI) part of the software that controls and monitors all storage-related operations (e.g., allows users to create targets and devices on StarWind Virtual SAN servers connected to the network).

Click Next to continue.

11. Specify Start Menu Folder. Click Next to continue.

12. Enable the checkbox if a desktop icon is needed. Click Next to continue.

13. When the license key prompt appears, select the appropriate option:

  • Request time-limited fully functional evaluation key
  • Request FREE version key.
  • Thank you, I do have a key already.

Click Next to continue.

14. Click Browse to locate the license file. Press Next to continue.

15. Read the licensing information. Click Next to continue.

16. Verify the installation settings. Click Back to make any changes. Press Install to proceed with the installation.

17. Enable the appropriate checkbox to launch StarWind Management Console right after the setup wizard closes. Click Finish to close the wizard.

18. Repeat the installation steps on the partner node.

NOTE: Managing StarWind Virtual SAN on a Windows Server Core edition without GUI requires installation of StarWind Management Console on a different computer running a GUI-enabled Windows edition.

Configuring Automatic Storage Rescan

Configure automatic storage rescan for each ESXi host.

19. Log in to StarWind VM and install vSphere PowerCLI on each StarWind virtual machine by adding the PowerShell module (Internet connectivity is required). To do so, run the following command in PowerShell:

NOTE: For Windows Server 2012 R2, online installation of PowerCLI requires Windows Management Framework 5.1 or upper version available on VMs. Download Windows Management Framework 5.1 by the following link:

https://go.microsoft.com/fwlink/?linkid=839516

20. Open PowerShell and change the Execution Policy to Unrestricted by running the following command:

21. Create the PowerShell script which will perform an HBA rescan on the hypervisor host.

In the appropriate lines, specify the IP address and login credentials of the ESXi host on which the current StarWind VM is stored and running:

$ESXiHost1 = “IP address”

$ESXiUser = “Login”

$ESXiPassword = “Password”

Save the script as rescan_script.ps1 to the VM’s C:\ drive root.

22. Perform the configuration steps above on the partner node.

23. Go to Control Panel -> Administrative Tools -> Task Scheduler -> Create Basic Task and follow the wizard steps:

24. Specify the task name, select When a specific event is logged, and click Next.

25. Select Application in the Log drop-down list, type StarWindService as the event source and 788 as the event ID. Click the Next button.

26. Select Start a Program as the task action and click Next.

27. Type powershell.exe in the Program/script field. In the Add arguments field, type:

“ -ExecutionPolicy Bypass -NoLogo -NonInteractive -NoProfile -WindowStyle Hidden -File C:\rescan_script.ps1 ”

Click the Next button to continue.

28. Click Finish to exit the Wizard.

29. Configure the task to run with highest privileges by enabling the checkbox at the bottom of the window. Also, make sure that the Run whether user is logged on or not option is enabled.

30. Switch to the Triggers tab. Verify that the trigger on event 788 is set up correctly.

31. Click New and add other triggers by Event ID 782, 257, 773, and 817.

32. The list of triggers should look like in the picture below:

33. Switch to the Actions tab and verify the parameters for the task.

Press OK and type the role to execute the command in the user credentials.

34. Perform the same steps on the second StarWind VM, specifying the corresponding settings.

Provisioning Shared Storage with StarWind VSAN

35. Open StarWind Management Console and click the Add Device (advanced) button.

36. Right-click on the Servers field and click on the Add Server button. Add new StarWind Server to use as the second StarWind VSAN node.

37. Select the StarWind Server to create the device on and press the Add Device (advanced) button on the toolbar.

38. Add Device Wizard will appear. Select Hard Disk Device and click Next.

39. Select Virtual Disk and click Next.

40. Specify the Virtual Disk Location, Name, and Size. Click Next.

41. Specify the Virtual Disk options and click Next.

NOTE: Sector size should be 512 bytes in case of ESXi.

42. Define the caching policy and specify the cache size (in GB). Click Next to continue.

NOTE: It’s recommended to assign 1 GB of L1 cache in Write-Back or Write-Through mode per 1 TB of storage capacity. Yet, the cache size should correspond with the storage working set of the servers.

43. Define Flash Cache Parameters and Size if necessary. Specify SSD location in the Wizard. Press Next.

NOTE: The recommended size of the L2 cache is 10% of the initial StarWind device capacity.

44. Specify the Target Parameters. Enable the Target Name checkbox to customize the target name. Otherwise, the program generates the name automatically based on the Target Alias. Click Next.

45. Click Create to add new device and attach it to the target.

46. Click Finish to close the Wizard.

47. Right-click on the recently created device and select Replication Manager from the shortcut menu.

48. Then, click Add Replica.

49. Select Synchronous “Two-Way” Replication. Click Next to proceed.

50. Specify the partner server IP address. The default StarWind management port is 3261. For a different port, type it in the Port Number field. Click Next to continue.

Heartbeat Failover Strategy

51. Check Heartbeat Failover Strategy according to the network design. Click Next to continue.

NOTE: For setting the Node Majority failover strategy, jump to Step 54.

52. Select Create new Partner Device. Click Next.

53. Specify the partner device location if necessary and/or modify the device Target Name. Click Next.

54. Select the Synchronization and Heartbeat networks for the HA device by clicking Change Network Settings.

55. Specify the interfaces for the Synchronization and Heartbeat. Press OK. Then, click Next.

NOTE: It is recommended to configure the Heartbeat and iSCSI channels on the same interfaces to avoid the split-brain issue. If the Synchronization and Heartbeat interfaces locate on the same network adapter, it is recommended to assign another Heartbeat interface to a separate adapter.

56. Select Synchronize from existing Device for the partner device initialization mode.

Click Next.

57. Press the Create Replica button. Then, click Close.

58. The added device will appear in StarWind Management Console.

Repeat the HA device creation steps for any virtual disks to be used as Cluster Shared Volumes.

Once created, the devices appear in the left pane of the Management Console as shown in the screenshot below.

Node Majority Failover Strategy

59. Check Node Majority Failover Strategy according to the network design. Click Next to continue.

60. Select Create new Partner Device. Click Next.

61. Specify the partner device location if necessary and/or modify the target name of the device. Click Next.

62. Select the Synchronization and Heartbeat networks for the HA device by clicking Change Network Settings.

63. Specify the interfaces for Synchronization and Heartbeat. Press OK. Then, click Next.

64. Select Synchronize from existing Device for the partner device initialization mode. Click Next.

65. Press the Create Replica button. Then click Close.

66. The added device will appear in StarWind Management Console.

Repeat HA device creation steps for any virtual disks that will be further used as a Cluster Shared Volumes.

Once created, the devices appear in the left pane of the Management Console as shown in the screenshot below.

Adding Witness Node

This section describes adding a Witness node. Witness nodes count for the majority, but neither contain data nor process any clients’ requests.

Configure the Witness node at a separate location. There are two options to do so: it can either be a virtual machine run in the cloud or a host at another site. Witness node should have StarWind Virtual SAN service installed on it.

67. Open the StarWind Management Console, right-click on the Servers field and press the Add Server button. Add new StarWind Server to be used as the Witness node and click OK.

68. Right-click the HA device with the configured Node Majority failover policy and select Replication Manager. The Replication Manager window will appear. Press the Add Replica button.

69. Select Witness Node and click Next

Specify the Witness node name or its IP address.

70. Specify the Witness device location and its target name if necessary. Click Next.

71. For the HA device, select the synchronization channel with the Witness node by clicking on the Change network settings button.

72. Specify Interfaces for Synchronization Channels, confirm, and click Next.

73. Click Create Replica.

After the device creation is completed, close the Wizard by pressing the Close button.

74. Repeat the steps above to create other virtual disks.

75. The added device will appear in StarWind Management Console. The list of HA devices should look as follows:

Preparing Datastores

Adding Discover Portals

76. To connect the previously created devices to the ESXi host, click the Storage -> Adapters -> Configure iSCSI and select the Enabled option to enable Software iSCSI storage adapter.

77. In the Configure iSCSI window, under Dynamic Targets, click on the Add dynamic target button to specify iSCSI interfaces.

78. Enter the iSCSI IP address of the first StarWind node from the virtual local network (e.g. 10.212.0.x).

79. Add the IP address of the second StarWind node (e.g. 10.212.0.x).

Confirm your actions by pressing Save configuration.

80. The list of targets should look like in the image below:

81. Click on the Rescan button to rescan the storage.

Now, the previously created StarWind devices are visible.

82. Repeat all the steps from this section on the other ESXi node, specifying corresponding IP addresses for the iSCSI subnet.

Creating Datastores

83. Open the Storage tab on one of your hosts and click New Datastore.

84. Specify the Datastore name, select the previously discovered StarWind device, and click Next.

85. Enter datastore size. Click Next.

86. Verify the settings. Click Finish.

87. Add another Datastore (DS2) in the same way but select the second device for the second datastore.

88. Verify that your storages (DS1, DS2) are connected to both hosts. Otherwise, rescan the storage adapter.

89. The Rescan Script already contains the Path Selection Policy for Datastores from Most Recently Used (VMware) to Round Robin (VMware), and the program performs this action automatically. For checking and changing this parameter manually, connect the hosts to vCenter.

90. vCenter only allows checking the multipathing configuration. To check it, click the Configure tab, open Storage Devices, select the proper storage device, and click the Edit Multipathing button.

Additional Tweaks

91. From the ESXi host, click on the Manage tab on one of the hosts and proceed to Services. Select TSW SSH and right-click on Start, then select Start and Stop with host.

92. Connect to the host using the SSH client (e.g. Putty).

93. Check the device list using the following command:

94. For devices, adjust Round Robin size from 1000 to 1 using the following command:

NOTE: Paste UID of the StarWind device in the end of the command. Also, Rescan Script already contains this parameter, and the program performs this action automatically.

95. Repeat the steps above on each host for each datastore.

96. Click the Manage tab on one of the hosts and open Advanced Settings.

97. Select Disk and change the Disk.DiskMaxIOSize parameter to 512.

Creating Datacenter

NOTE: Creating Datacenter requires deployment of the vCenter server prior to creation.

98. Connect to vCenter, select the Getting Started tab, click Create Datacenter.

99. Enter the Datacenter name and click OK.

Creating Cluster

100. Click the Datacenter’s Getting Started tab and click Create a cluster. Enter the cluster name and click Next.

Adding Hosts to Cluster

101. Open the Cluster tab and click Add a host.

102. Enter the ESXi host name or IP address and enter the administrative account.

103. Lockdown mode is disabled by default.

104. Assign the License from the appropriate tab.

105. Click Cluster -> Configure -> Edit and turn on vSphere HA.

CONCLUSION

This technical paper has explained how to set up a stretched cluster on VMware vSphere 6.5 with StarWind VSAN as a storage provider. A simple 2-nodes on ESXi scenario was taken as a base for this technical paper. Using StarWind VSAN, the distributed storage of both ESXi hosts is converted into a fault-tolerant shared storage synchronously ”mirrored” between the nodes, whatever the location. The technical paper has also covered the so-called “split-brain” problem and ways to avoid it by properly organizing failover strategies.

Save your time finding all the answers to your questions in one place!
Have a question? Doubt something? Or just want to know an independent opinion?
StarWind Forum exists for you to “trust and verify” any issue already discussed and solved