The basic principle for building any highly available environment is eliminating any single points of failure in the hardware and software configurations. Since a single hardware failure can lead to downtime for the whole system, it is vital to achieve the redundancy of all the elements in the system in order to eliminate or minimize downtime that is caused by failures.
StarWind Virtual SAN solution makes it possible for customers to minimize or avoid downtime of the associated with storage or host failures. . StarWind also enables virtualization environment maintenance with zero downtime. This is achieved by clustering multiple StarWind servers into a fault tolerant storage cluster to guarantee seamless storage failover in the event of hardware/software failures, and power outages.
StarWind HA relies on redundant network links between the StarWind hosts to ensure storage resilience. This allows StarWind to maintain a fully fault tolerant storage cluster with just hosts. In contrary, major part of storage solutions available on the market require some sort of a 3rd entity in the storage cluster to maintain resilience and arbiter the storage cluster in the event of hardware failure.
Using internal and native OS tools StarWind HA constantly monitors the state of all the network links between the servers in the HA cluster. Should any of the cluster nodes fail or stop processing requests properly, the failover is instantly initiated from the client OS/Hypervisor side. This guarantees correct failover procedure and makes StarWind HA automatically compatible with various initiators. StarWind also provides an internal heartbeat mechanism, which ensures proper storage path isolation in the event of synchronization network failures, and prevents so called storage “Split-brain”.
StarWind HA can be easily combined with native Windows file storage features and can also leverage Microsoft SoFS act as a continuously available file share service for multiple nonclustered client computers.
StarWind HA has multiple advantages over traditional storage failover solutions: Hardware agnostic – no proprietary hardware necessary, commodity x86 servers supported. Reduced TCO – doesn’t require dedicated storage hardware, can be installed directly into the hypervisor.
Minimal setup time – installation and full HA storage configuration takes under 30 minutes. Ease of use and management – Native Windows application with a user-friendly centralized management console.
Instant failover and failback – StarWind HA leverages MPIO driver on the initiator host for a fully transparent failover procedure.
A full set of up-to-date technical documentation can always be found here, or by pressing the Help button in the StarWind Management Console.
Proper equipment selection is a very important step of the architecture planning. Always choose the appropriate equipment for the tasks you put on the highly available environment. The equipment used should support redundant hot-swappable parts, especially if a minimalistic cluster (less than 4 nodes) is planned. Note that overestimating the storage requirements is not always a good practice. It can turn out that the equipment purchased according to these estimations will never be used effectively resulting in low ROI. Ensure to always plan for the scalability of the servers you are purchasing. Keep in mind you can not only scale-out to more nodes, but also scale-up the existing hosts as your compute and storage requirements grow.
StarWind HA performs in active-active mode thus, it is a best practice to use identical hardware for all the nodes participating in an HA storage cluster. In rare cases one HA node can have faster storage to improve read performance. In this case ALUA (Asymmetric Logical Unit Access) is configured to achieve optimal performance. ALUA marks a certain storage path as optimal or non-optimal for write IO. The initiator server (when supported) uses these marks to optimize LU access. Please refer to the Asymmetric configurations chapter of the document for more information.
*Authors notice, for the last 10 years I have seen just a few configurations where ALUA was applicable. The implementation of the ALUA mechanisms on the initiator side did not always result in optimal performance even though all pre-requisites were fulfilled from the storage side.
When configuring StarWind Virtual SAN it is a best practice to use identical version of Windows Server OS version installed on all StarWind hosts. . An edition difference is possible though: e.g. host 1 running Windows 2012 R2 Standard and host 2 running Windows 2012 R2 Datacenter. Please note that certain Windows editions are not supported. Please refer to the System requirements page for the list of supported operating systems:
It is mandatory to have the same version and build of StarWind Virtual SAN installed on all nodes of the HA storage cluster. Always update all StarWind Virtual SAN servers in the environment to avoid version and build mismatch due to differences in the HA device compatibility, performance, and operational features. It is strongly recommended to keep all StarWind servers in the environment up-to-date and install StarWind updates as soon as they are available. By default, you will receive an e-mail notifying about available updates on the address you used to register on StarWind website. Please make sure you have all @starwind.com and @starwindsoftware.com whitelisted on your mailbox not to miss any important announcements.
HA environment has special requirements for uptime and availability. In order to fulfill these requirements, user needs to modify certain settings in the operating system.
Since any clustered environment has strict requirements for maintenance downtime, Administrators are required to control all the update processes on the server. All the automatic Windows updates should be either disabled or set to “Check for updates but let me choose whether to download and install them”. Never apply updates to more than one node of the HA cluster at a time. After applying updates to a node, verify the functionality is intact and all iSCSI devices are resynchronized, and are reconnected to the initiator hosts. Only after verifying the above, you can start the update processes on the next HA node.
StarWind operates through ports 3260 and 3261. 3260 is used for iSCSI traffic and 3261 for the StarWind management console connections. StarWind installer automatically opens these ports in the Windows Firewall during the initial installation. If a third party firewall is used, ports 3260 and 3261 have to be opened manually.
It is not recommended to install any kind of third party applications on the server running StarWind Virtual SAN. Exceptions here are benchmarking tools, remote access utilities, and hardware management utilities such as Network card managers or RAID controller management packs. If you have any doubts about the software that you want to install on the server running StarWind Virtual SAN, please consult with StarWind Software support.
For certain tasks e.g. fulfilling an IO pattern like 90% Read with high degree of random IO StarWind HA cluster can be configured in an asymmetric way: All HA nodes have identical network performance, but one node uses flash storage to accommodate the high rate of random read IO. Asymmetric configuration allows users to increase the read performance of the HA SAN, while keeping the TCO lower. In this case, ALUA is configured for the HA device required to serve this IO pattern. . With ALUA, all the network paths to the storage array remain active, but the initiator only writes data over those paths, which are marked as optimal in the ALUA configuration. Writes are routed to slower storage in order to avoid storage bottlenecks.
Network is one of the most important parts of the Virtual SAN environment. Determining correct network bandwidth for your SAN is the #1 task along with finding the approximate IOPS amount your storage needs to deliver in order to fulfill the requirements of applications it will be serving.
Networking speed considerations
Once finished with the IOPS calculations you need to pick the networking equipment that won’t cause bottlenecks on the interconnect level. E.g., if your calculations say the cluster consumes 63,000 IOPS (or demands ~250mb/s streaming speed capabilities) then 1GbE network will. Networking throughput demand grows along with IOPS demand so after ~250,000 IOPS (or ~110MB/s), a single 10 GbE card becomes a bottleneck. Below is a table showing the recommended network equipment throughput depending on the
Networking layout recommendations
The main goal of a highly available storage solution is 24/7 uptime with zero downtime in the event of most failures, or during maintenance and upgrades. Thus, it is very important to understand that High Availability is not achieved by just clustering the servers. It is always a combination of redundant hardware, special software, and a set of configurations that make the system truly highly available. Below you can find reference architecture diagrams showing the recommended redundant networking for HA. These network layouts are considered the best practice of StarWind Virtual SAN design.
All switches used in the StarWind Virtual SAN deployment have to be redundant. This applies to both iSCSI traffic switches and, if used, synchronization channel switches.
The connections pictured on the diagrams are dedicated for StarWind traffic. In hyper-converged scenarios (StarWind is running on the hypervisor host) it is possible to share the heartbeat with other networks except synchronization e.g. vMotion/Live Migration.
Fig. 1: Hyper-converged setup. Two-node hypervisor cluster converged with StarWind Virtual SAN. Direct connections used for synchronization and iSCSI channels.
Fig. 1a: Hyper-converged setup. Two-node hypervisor cluster converged with StarWind Virtual SAN. Switched connections used for synchronization and iSCSI channels.
Fig. 1b: Hyper-converged setup. Two-node vSphere cluster converged with StarWind Virtual SAN. Detailed diagram.
Fig. 2: Hyper-converged setup. Three node hypervisor cluster converged with StarWind Virtual SAN.
Fig. 2a: Hyper-converged setup. Three node vSphere cluster converged with StarWind Virtual SAN. Detailed diagram.
Fig. 3: Compute and storage separated configuration: Two node hypervisor cluster connected to a two node StarWind Virtual SAN, direct connections used for synchronization channels.
NOTE: all of the diagrams above only show the SAN connections. LAN connections, internal cluster communication, and any auxiliary connections have to be configured using separate network equipment or separated from the iSCSI traffic using VLANs. Networking inside the cluster should be configured according to your hypervisor vendor recommendations and best
Shielded cabling (e.g. Cat 6a or higher) has to be used for all the network links used for StarWind Virtual SAN traffic. Cat. 5e cables are not recommended. StarWind Virtual SAN does not have specific requirements for 10/40/56/100 GbE cabling. Should you have any doubts about the cabling type to use, please contact your networking equipment vendor for recommendations.
Teaming and Multipathing best practices
StarWind Virtual SAN does not support any form of NIC teaming for resiliency or throughput aggregation.
All configurations on the diagrams shown in the “Networking Layouts Recommendations” section (Fig. 1-5) are configured for MPIO. In Compute & Storage Separated configurations the recommended MPIO mode is “Round-Robin”. In Hyper-Converged configurations the recommended MPIO mode is “Failover Only” (Hyper-V) or “Fixed path” (vSphere). Please note: Multipathing is not designed to show linear performance growth along with increasing the number of network links between the servers.
To aggregate synchronization throughput and achieve network redundancy StarWind Virtual SAN can use multiple non-bonded network interfaces.
Synchronization channel recommendations
The synchronization channel is a critical part of the HA configuration. The synchronization channel is used to mirror every write operation addressing the HA storage. It is mandatory to have synchronization link throughput to be equal or higher than the total throughput of all links between the client servers and the Virtual SAN Cluster.
For the HA storage, the maximum performance is limited by 3 factors:
1. Performance of the storage arrays used for HA storage.
2. Total performance of round-robin multipathed iSCSI links from the client servers
3. Synchronization channel performance
It is required that 3 ≥ 2. In real world scenarios (1) may be slower or faster than (2) or (3), but at this point user has to understand that the HA device performance will be limited by the smallest value from the three mentioned above.
The HA device IO response time directly depends on the synchronization link latency. Certain hypervisor vendors have very strict requirements for storage response time. Exceeding the recommended response time limits can lead to various application or virtual machine issues. In addition, certain features (e.g. Microsoft Hyper-V Live Migration or VMware vSphere HA) may fail or work incorrectly if the storage response time is not within the recommended limits.
The maximum synchronization channel latency values are provided below:
• HA SANs are located in different buildings/data centers – 5 ms
• HA SANs are located in one building/data center – 3 ms.
Heartbeat channel recommendations
Heartbeat is a technology that allows to avoid the so-called “split-brain” situations, when the HA cluster nodes are unable to synchronize, but continue to accept write commands from the initiators.. With StarWind Heartbeat technology, if the synchronization channel fails, StarWind attempts to ping the partner nodes using the provided heartbeat links. If the partner nodes do not respond, StarWind assumes that they are offline. In this case, StarWind marks the other nodes as not synchronized, and all HA devices on the node flush the write cache to the disk and continue to operate in Write-Through caching mode to preserve data integrity in case the node goes out of service unexpectedly.
If the heartbeat ping is successful, StarWind blocks the nodes with the lower priority until the synchronization channels are re-established. This is accomplished by designating node priorities. This priorities are used only in case of a synchronization channel failure and are configured during the HA device creation automatically. Please note that these settings have no effect on the multipath IO distribution between the HA nodes.
In order to minimize the number of network links in the Virtual SAN cluster, heartbeat is configured to run on the same network cards with iSCSI traffic. Heartbeat is only activated if the synchronization channel has failed and, therefore, it cannot affect the performance.
It is recommended to enable additional heartbeat connections for all the links between the HA SANs, except the ones used for HA device synchronization and the connection used for the Failover cluster management (for hyper-converged configurations with Hyper-V).
A network performance issue discovered after the HA SAN deployment often becomes a stopper. It is often nearly impossible to diagnose and fix the problem without putting the whole SAN infrastructure offline. Therefore, every network link and every disk in a Virtual SAN environment has to be checked to operate at peak performance before the system is deployed into production environment. Detailed guidelines for Virtual SAN benchmarking can be found in the StarWind Virtual SAN benchmarking guide.
It is critical to properly measure your storage requirements. This includes two factors: the first is the actual storage capacity, and the second is performance. The number one objective for the administrator planning a Virtual SAN deployment is performance and capacity planning.
• Performance: Calculate the approximate IOPS amount your system needs to sustain. In addition, it is never a bad idea to add a power reserve with plans for future growth in mind.
• Capacity: Figure out how many terabytes of data you need to store.
StarWind recommends the storage configuration used in the HA SAN cluster to be identical. This includes RAID controllers, volumes, and settings, as well as OS-level partitioning. This section will cover 2 configuration approaches:
• FLAT image file architecture – traditional data layout on the disks. General purpose use storage
• Log Structured File System (LSFS) architecture – log structured based data layout. Recommended for intensive write workloads. VM-centric file system, doesn`t support certain workload types.
Each of the approaches assumes different storage architecture
There is no preferred vendor for RAID controllers. Therefore, StarWind recommends using RAID controllers from any industry-standard vendor. There are 2 basic requirements to the RAID controller for an HA SAN. We will go through them one-by-one below.
FLAT image file configuration:
• Write-Back caching with BBU
• RAID 0, 1, and 10 support, RAID 5, 50, 6, and 60 only supported for all-flash arrays
• Write-back caching with BBU
• RAID 0, 1, 10, 5, 50, 6, 60 support
Software RAID controllers are not supported for use with StarWind Virtual SAN.
Currently use of Microsoft Storage Spaces is not recommended due to performance limitations.
iSCSI uses 64K blocks for network transfers. It is recommended to align your stripe size at this value. Modern RAID controllers often show similar performance independent of the stripe size used. However, it is still recommended to keep the stripe size aligned at 64K to change a full stripe with each write operation. This increases the lifecycle of flash arrays and ensures optimal performance with spindle arrays.
There are 2 data separation approaches with StarWind Virtual SAN. One is to Keep both OS and StarWind device images on one physical volume, but segregate those using partitions. Second option is to segregate the OS to a dedicated disk/RAID1 array. Both options are supported. This applies for both FLAT image file and LSFS configurations.
It is recommended to use a GUID Partitioning Table (GPT) when initializing the disks used for StarWind devices. This allows creating volumes bigger than 2 TB, and makes it possible to expand the partitions without taking the volume offline. This recommendation applies for both FLAT image file and LSFS configurations.
Virtual disk device sector size
StarWind supports underlying storage with both 512B and 4KB physical sector size. However, to optimize interoperability and performance one needs to select the sector size of the underlying storage when creating virtual disks in the StarWind Management Console. The value can be picked when choosing the location to store the virtual disk file(s).
Please note, when creating a virtual disk device on storage spaces please use the 4K sector size option.
It is critical to benchmark the disks and RAID arrays installed in the StarWind servers to avoid possible performance problems after deploying StarWind Virtual SAN into production. Make sure the array performance does not have abnormal performance drops on mixed read and write operations, as well as on random write tests. The local array benchmark should be used as a
reference point for judging the performance of an HA device.
Please note that file copy is not a valid way to benchmark the performance of neither local, nor iSCSI attached storage. Always use disk benchmarking tools like Intel IOmeter, VDbench, or ATTO Disk benchmark to get relevant performance information about your storage.
HA Device Considerations
Size, Provisioning considerations
There is no strict requirement for the size of the HA devices user is creating with StarWind Virtual SAN. Creating one big HA device consuming all the available space on the SAN can cause management inconvenience. It is not an issue for devices up 5-6 TB in size. Bigger devices can cause the above mentioned inconvenience due to increased full synchronization times. The use of bigger devices also makes it more difficult for a granular VM/application restore after outages or major failures. Segregating your mission critical VMs or applications to separate HA devices can make the management easier.
Since the HA caching is provisioned per device, segregating the devices according to the application load profiles also allows for better utilization of the memory allocated for HA device caching.
For Hyper-V environments it is an optimum performance best practice to create at least 1 HA device per Hyper-V/Microsoft Failover Cluster.
LSFS configuration: Device size limit – 11 TB
FLAT image file configuration: Device size limited by the underlying FS maximum file size.
RAM consumption and Caching
StarWind HA is designed to show peak performance with Write-Back caching. Each written block is first cached on the local SAN node, and then synchronized with the second node’s cache, and only after that StarWind considers the block as written. Along with providing great performance improvements, write-back caching also introduces specific requirements to underlying hardware stability. UPS units have to be installed to guarantee the graceful shutdown of at least one StarWind Virtual SAN (*1) node in case of a power outage.
With Write-Through caching, write operations can become significantly slower and will fully depend on the underlying array performance. This happens because the write is only confirmed when the block is written to the disk itself. Although write-through caching gives no boost to the write performance, it does not depend on power stability (compared to write-back) and still maintains read cache, which balances the read/write access distribution to the underlying disk array.
Minimum recommended cache size is 128 MB. For VDI scenarios the minimum recommended cache size should be equal to the size of the golden image.
The cache effectiveness depends on cache size to working set size ratio. General benchmarks have shown the following cache improvement levels.
(1 *) For scale-out deployments the number of hosts configured for graceful shutdown is determined based on both number of hosts and number of times the HA device are replicated within the scale-out cluster. We recommend consulting with StarWind support for determining the most appropriate setup in your particular case.
LSFS devices consume additional 3.5 GB of RAM per terabyte of stored data. This value is hardcoded and cannot be changed from within the StarWind Management console.
Please keep in mind that write cache size affects the time server needs to do a graceful shutdown.
During shutdown the server needs to flush the cache to the disk. Shutdown time can be calculated as performance of the disk array under 100% random 100% 4K write load divided by total amount of RAM provisioned as write-back cache.