Search
StarWind is a hyperconverged (HCI) vendor with focus on Enterprise ROBO, SMB & Edge

Eliminating Blue Screen or Errors during failover

  • March 15, 2017
  • 5 min read
Director of Sales Engineering with more than 15 years of professional IT experience. Almost 2 years of Technical Support and Engineering at StarWind. Storage and virtualization expert. IT systems engineer. Web designer as a hobby.
Director of Sales Engineering with more than 15 years of professional IT experience. Almost 2 years of Technical Support and Engineering at StarWind. Storage and virtualization expert. IT systems engineer. Web designer as a hobby.

Introduction

The reason for writing this post was a recent case from one of our customers, who ran into an issue when their SAN switch failed. The problem was that their VMs were generating an enormous amount of errors that were caused by the switching of active paths at the time of failover.

Problem

A typical fault-tolerant scenario consists of one or more server HBAs connected to one/several processor(s) as well as an active path used by the server, that can be found in the properties of the LUN. The failover path occurs when the LUN is changed from one path to another in situations when a SAN component, which is a part of the path, fails.

In the process of failover (the scenario that can be simulated by pulling out the cable), there is a big chance of the data I/O coming to a halt for 30-60 seconds to determine if the link is available. If you try to access the data/VM or its adapter, the operation will stall until the failover process is completed.

If a disaster caused multiple issues in the LUN path links, and all connections to the drive were lost, the failover process will result in a failure and multiple I/O errors in multiple iSCSI disks.

The scenario mentioned above can be overcome by avoiding any possible disruptions during the path failover (single points of failure), countless backups, snapshots, as well as increasing the Standard Disk Timeout values on the guest operating systems.

Solution

After backing up the registry and using the method of increasing the TimeOutValue parameter described below, it will be possible to eliminate any disruptions during the path of failover.

So, what you will need to do is:

  1. Right click on Start and select Run command.
  2. Type regedit.exe, and click OK.
  3. In the left-panel tree go to HKEY_LOCAL_MACHINE -> System -> CurrentControlSet -> Services -> disk.

Local Machine System Current Controller Set Services Disk

  1. Double-click TimeOutValue parameter and set the value data to 0x3c (hexadecimal) or 60 (decimal) and apply with OK.
  2. Reboot the guest OS for the change to take effect.

Conclusion

After making this change, Windows will wait for 60 seconds to complete delayed disk operations before generating errors.

Hey! Found Taras’s article helpful? Looking to deploy a new, easy-to-manage, and cost-effective hyperconverged infrastructure?
Alex Bykovskyi
Alex Bykovskyi StarWind Virtual HCI Appliance Product Manager
Well, we can help you with this one! Building a new hyperconverged environment is a breeze with StarWind Virtual HCI Appliance (VHCA). It’s a complete hyperconverged infrastructure solution that combines hypervisor (vSphere, Hyper-V, Proxmox, or our custom version of KVM), software-defined storage (StarWind VSAN), and streamlined management tools. Interested in diving deeper into VHCA’s capabilities and features? Book your StarWind Virtual HCI Appliance demo today!