Hello! I have 3 esxi 6.5 hosts. vCenter version: 7515524 Hosts: 7388607
There is a dedicated 10Gb port on each for vMotion with 9000 MTU enabled. All three ports are connected into isolated vlan on D-Link DXS3600 switch (jumbo is enabled obviously).
But when i try to vmotion, i get following error (from any host to any host):
I totally dont get why.
Forgot to mention: vmkping -s 9000 works fine from any host to any host.
Сообщение отредактировано: Ivan Karmyshin
Hi!
You need to include '-d' option into vmkping command which sets DF (Don't Fragment) bit on the IPv4 packet.
Definitely you need to double check MTU size on all hopes.
If i set MTU to 8000 on hosts vmotion works fine. The readon of this post is more educational. I wanna understand the reason why 9000 didnt work. Hosts are connected only to switch, all are 100% configured for 9000 MTU, i dont understand why. The maximum size which passes unfragmented is 8958.
what is a MTU size configured on network switch?
9000 ( i tried with 10000 and 12000 even also)
From my perspective it's an issue with jumbo frames on network switch.
To check what's going on you can capture network traffic on ESXi hosts during vmotion attempt with pktcap-uw VMware Knowledge Base and read dump with wireshark.
As well check that you have latest driver and firmware installed for your network card.
Hi Chekan,
There was a similar issue and error within our environment. Issue was due to mismatch in Jumbo frame size in vmkernel and physical router (gateway).
Please check vMotion network and try to vmkping the source host and destination host vmotion vmknic ip address.
Verify the security settings/policies in vswitch/portgroups like promiscuous mode that are identical between two hosts or not?
Please check this knowledge base, related to similar errors:
Regards, Raj
You know what is super strange. Yesterday, using vmkping -d o found out that maximum size of non-fragmented packet is 8958. So i did set MTU on ESXi to 8958 (on physical switch i left it on 9000) and vmotion started to work. Today i try to vmotion again and get error... I go adn do vmkping, now maximum size changed to 8930. How is this possible??? No changes were made since yesterday. Hosts are connected directly to switch - maximum simple structure.
I just realized and double checked. If i reduse MTU on hosts (both vswitch and vmkernel) - the actual maximum packet size reduses by 28 bits: for example if i set MTU to 8930 on Host - max vmkping -d will be 8902. So vMotion doesnt work at all... I am totally confused.
I guess that we can help you only seeing your configs on DLink for Jumbo frames and vmkernel\vswitch.
Also different NICs have different behaviour when Jumbo frames don't work on physical switch. Some drivers can split big frame into small when they don't fit, but most drivers can't do this.
Again - collect dump when you get the error. This will help a lot.
Excuse me, what do you mean under "collect dump" ?
The packet size that you specify fro vmkping is the size of the data portion. The actial packet size is 28 bytes bigger so to test MTU 9000 you specify vmkping -d -s 8972
See also VMware Knowledge Base
the actual maximum packet size reduses by 28 bits
That's expected, the 28 bytes is vmkping header data (see e.g. Troubleshooting ESXi Jumbo Frames)
What should actually work is to set everything to MTU 9000. Physical switch/ports, vSwitch, and VMKernel port group.
I'm not familiar with your physical switch. However a quick search showed that it supports a maximum MTU of 9216, so if 9000 doesn't work you may try 9216.
André
Jumbo frames of my physical switch - DXS-3600-32S are 12288 maximum. I did set both vmkernel and vswitch to 9000 mtu and ports on phsical switch to 12288 - i still get timeout error
Hi Chekan,
Its again a different error now.
Disable ipv6 on the vmnic interfaces and check.
Hope you have already verified the security policies on the both hosts, vswitch/portgroups...
Okay, so check for the network devices connected to ESXi and the interfaces connected to it.
If your business allows, please reboot the network switche (physical switch - DXS-3600-32S) connected to this ESXi hosts.
There can be the issue....!!!! Its definitely on network side...
ipv6 disabled on all hosts, security settings are identical. Switch rebooted. No success.
can you show output
esxcli network nic get -n <vmnic#>
where vmnic# is the port used for vmkernel's uplink?
Do hosts have the same NIC or different?
All vmnics are the same AOC-STGN-i2s (Supermicro). Today i replaced switch with Mikrotik SFP+ switch and treid to vmotion. Waht was strange again is that 2 machines (30 and 60 gb sizez) did well. But when i tried bigger one (230gb) on around 55% it failed with error like on the latest screenshot.
Advertised Auto Negotiation: false
Advertised Link Modes: 10000BaseT/Full
Auto Negotiation: false
Cable Type: FIBRE
Current Message Level: 7
Driver Info:
Bus Info: 0000:82:00.1
Driver: ixgbe
Firmware Version: 0x800006da
Version: 3.7.13.7.14iov-NAPI
Link Detected: true
Link Status: Up
Name: vmnic3
PHYAddress: 0
Pause Autonegotiate: true
Pause RX: false
Pause TX: false
Supported Ports: FIBRE
Supports Auto Negotiation: false
Supports Pause: true
Supports Wakeon: false
Transceiver: external
Virtual Address: 00:50:56:53:9d:eb
Wakeon: None
When you do vmtion - you copy only VM's RAM.
Or do you mean storage vmotion you tried?
Also try to update ixgbe driver on hosts