Search
Join the Technical Preview Program
See how NVMe-oF removes iSCSI
bottlenecks in your HCI
The Best Hyperconverged
Infrastructure
(HCI) for Enterprise
ROBO, SMB & Edge
The Best Virtual SAN
for Enterprise ROBO, SMB & Edge

NIC Load Balancing on ESXi: Recovery & Real-World Best Practices

  • February 25, 2025
  • 8 min read
StarWind Post-Sales Support Engineer. Vitalii specializes in storage, virtualization, and backup solutions. With expertise in infrastructure implementation and system recovery, he provides technical leadership in optimizing virtualized environments. Vitalii delivers expert guidance on data protection and high-availability infrastructure, focusing on seamless post-deployment support and performance tuning.
StarWind Post-Sales Support Engineer. Vitalii specializes in storage, virtualization, and backup solutions. With expertise in infrastructure implementation and system recovery, he provides technical leadership in optimizing virtualized environments. Vitalii delivers expert guidance on data protection and high-availability infrastructure, focusing on seamless post-deployment support and performance tuning.

A mismatch between ESXi load balancing policies and physical switch configurations is the #1 cause of host isolation. The classic scenario: A junior admin changes the policy to “Route based on IP Hash” without configuring a static EtherChannel on the physical switch first.

Result: The host drops off the network immediately. vCenter access is lost. The only way back is the local console (DCUI) or iDRAC.

This guide covers the emergency recovery runbook via esxcli and why you should probably avoid IP Hash in the first place.

The Emergency Fix (ESXCLI Runbook)

If you locked yourself out, stop guessing. Log into the physical console (or iDRAC/iLO), press Alt + F1 to access the shell, and log in as root.

1. Inspect the Damage

Check the vSwitch policy first. In most setups, the management network is on vSwitch0.

Bash

esxcli network vswitch standard policy failover get -v vSwitch0
  • Look for: The Load Balancing field.
  • Diagnosis: If it says iphash but your switch ports aren’t trunked effectively, that is your problem.

2. Check for Overrides

Even if the vSwitch is correct, the “Management Network” port group might have a specific override.

Bash

esxcli network vswitch standard portgroup policy failover get -p "Management Network"

3. The Fix: Revert to “Port ID”

To restore connectivity, force the policy back to the default “Route based on originating port ID”.

To fix the vSwitch:

Bash

esxcli network vswitch standard policy failover set -v vSwitch0 -l portid

To fix the Port Group (if an override exists):

Bash

esxcli network vswitch standard portgroup policy failover set -p "Management Network" -l portid

4. Verify

Ping the gateway to confirm you are back online:

Bash

vmkping -I vmk0 <gateway_ip>

The Policies: What Actually Matters

Route based on originating port ID (Default)

  • How it works: Maps a virtual NIC (vNIC) to a physical uplink. Traffic from that VM is “pinned” to that uplink.
  • The Verdict: This is the correct setting for 95% of standard vSwitch deployments. It requires zero physical switch configuration (no EtherChannel/LACP).
  • Limit: A single VM cannot exceed the speed of one physical link (e.g., 10Gbps), but the aggregate load of 50 VMs will naturally balance across all uplinks.

Route based on physical NIC load (LBT)

  • Requirement: Distributed Switch (VDS) only.
  • How it works: The VDS monitors physical uplink saturation. If an uplink exceeds 75% utilization, it moves flows to a less busy adapter.
  • The Verdict: This is the Gold Standard for enterprise clusters. It provides dynamic load balancing without the complexity of LACP.

Route based on IP Hash

  • Requirement: Static EtherChannel (LAG) on the physical switch.
  • The Trap: If you turn this on before the switch is ready, you disconnect. If you use LACP with this policy on a Standard Switch, you disconnect.
  • Verdict: Avoid unless you have a very specific bandwidth requirement for a single VM that exceeds one link.

Field Insights

1. LACP is a Trap (Mostly)

The overwhelming consensus in sysadmin communities is that LACP on ESXi is rarely worth the headache.

  • Why? It adds a rigid dependency between the host and the switch. If you need to restore a host configuration or replace a switch, the LACP mismatch can leave you isolated.
  • Better approach: Use Route based on physical NIC load (LBT). It achieves load balancing purely in software, keeping the physical layer dumb and reliable.

2. Management Network = Keep It Simple

Don’t mix complex load balancing with your Management Network.

  • Best Practice: Use Explicit Failover (Active/Standby) for the Management VMkernel.
  • Why: If your sophisticated LACP/LBT data network creates a loop or breaks, you need a simple, bulletproof “back door” to access the host.

3. The “Beacon Probing” Myth

Do not enable Beacon Probing if you only have 2 uplinks.

  • The Risk: With only 2 NICs, if one fails, the host cannot determine which one is bad (Split Brain), leading to “flapping” where traffic is sent to the dead link.
  • Rule: Use “Link Status Only” for 2 uplinks. Only consider Beacon Probing if you have 3+ uplinks.

Verdict

Technical skills on ESXi are about risk management. The esxcli commands above are your parachute when a policy change goes wrong. For day-to-day design, resist the urge to over-engineer. Use Originating Port ID for Standard Switches and Load Based Teaming (LBT) for Distributed Switches. Leave LACP for the networking team’s core switches, not your hypervisors.

Hey! Found Vitalii’s insights useful? Looking for a cost-effective, high-performance, and easy-to-use hyperconverged platform?
Taras Shved
Taras Shved StarWind HCI Appliance Product Manager
Look no further! StarWind HCI Appliance (HCA) is a plug-and-play solution that combines compute, storage, networking, and virtualization software into a single easy-to-use hyperconverged platform. It's designed to significantly trim your IT costs and save valuable time. Interested in learning more? Book your StarWind HCA demo now to see it in action!