Eliminating Blue Screen or Errors during failover

Posted by Taras Shved on March 15, 2017
Share on Facebook0Share on Google+0Share on LinkedIn1Share on Reddit2Tweet about this on Twitter0
4.67/5 (3)
4.67/53

Introduction

The reason for writing this post was a recent case from one of our customers, who ran into an issue when their SAN switch failed. The problem was that their VMs were generating an enormous amount of errors that were caused by the switching of active paths at the time of failover.

Problem

A typical fault-tolerant scenario consists of one or more server HBAs connected to one/several processor(s) as well as an active path used by the server, that can be found in the properties of the LUN. The failover path occurs when the LUN is changed from one path to another in situations when a SAN component, which is a part of the path, fails.

In the process of failover (the scenario that can be simulated by pulling out the cable), there is a big chance of the data I/O coming to a halt for 30-60 seconds to determine if the link is available. If you try to access the data/VM or its adapter, the operation will stall until the failover process is completed.

If a disaster caused multiple issues in the LUN path links, and all connections to the drive were lost, the failover process will result in a failure and multiple I/O errors in multiple iSCSI disks.

The scenario mentioned above can be overcome by avoiding any possible disruptions during the path failover (single points of failure), countless backups, snapshots, as well as increasing the Standard Disk Timeout values on the guest operating systems.

Solution

After backing up the registry and using the method of increasing the TimeOutValue parameter described below, it will be possible to eliminate any disruptions during the path of failover.

So, what you will need to do is:

  1. Right click on Start and select Run command.
  2. Type regedit.exe, and click OK.
  3. In the left-panel tree go to HKEY_LOCAL_MACHINE -> System -> CurrentControlSet -> Services -> disk.

Local Machine System Current Controller Set Services Disk

  1. Double-click TimeOutValue parameter and set the value data to 0x3c (hexadecimal) or 60 (decimal) and apply with OK.
  2. Reboot the guest OS for the change to take effect.

Conclusion

After making this change, Windows will wait for 60 seconds to complete delayed disk operations before generating errors.

Views All Time
1
Views Today
6

Please rate this

To download the software products, please, make your choice below. An installer link and a license key will be sent to the e-mail address you’ve specified. If you consider StarWind Virtual SAN but are uncertain of the version, please check the following document Free vs. Paid. The recent build of Release Notes. A totally unrestricted NFR (Not For Resale) version of StarWind Virtual SAN is available for certain use cases. Learn more details here.



Return to all posts

Installing System Center Configuration Manager 1610 (Current Branch) on Windows Server 2016 with SQL Server 2016. PART 2
Installing System Center Configuration Manager 1610 (Current Branch) on Windows Server 2016 with SQL Server 2016. PART1
The following two tabs change content below.
Taras Shved
Taras Shved
Director of Sales Engineering at StarWind
Director of Sales Engineering with more than 15 years of professional IT experience. Almost 2 years of Technical Support and Engineering at StarWind. Storage and virtualization expert. IT systems engineer. Web designer as a hobby.