VMware Cloud Community
jojo22
Contributor
Contributor

Hot Migration doesnt work (one way only)

Hi folks, have strange issue happening on All VMs in one way ONLY, thus it must be Host related issue.
If I Hot-Migrate VM from Host-B to Host-A, it fails. Cold migration works OK.
example : 
"VM1" on "Host-A" Live migration to "Host-B" - ALL OK.
"VM1" on "Host-B" Live Migration to "Host-A" - fails.
Error :  Event Type Description:

Failed to migrate the virtual machine for reasons described in the event message
Possible Causes:
  • The virtual machine did not migrate. This condition can occur if vMotion IPs are not configured, the source and destination hosts are not accessible, and so on. Action: Check the reason in the event message to find the cause of the failure. Ensure that the vMotion IPs are configured on source and destination hosts, the hosts are accessible, and so on.

Sometimes the running tasks give that error above, sometimes it is stuck on 36% forever (Migration I mostly tested on small VM with only 10GB disk).
Cold migration both ways work OK.
Both hosts are in EVC cluster, affected are all VMs when Live migration when migrated ONLY from Host-B.
both hosts are ESXi 6.5, each host has 2 physical NICs, each configured with unique IP offcourse.

Seems something strange is on Host-A (as migration recipient), as I tried to modify the "Migrate.Enabled" system parameter in vCenter / directly on host, and on Host-B it was possible, but on Host-A I got "Failed to change Migrate.Enabled "   

Anybody any ideas?

0 Kudos
13 Replies
virtualqc
Enthusiast
Enthusiast

 

One possible cause is that the vMotion network configuration is not consistent or correct on both hosts. You need to make sure that the vMotion IPs are valid, reachable, and unique on each host, and that the vMotion network adapters are connected to the same vSwitch or distributed vSwitch and use the same DNS. You can check and edit the vMotion network settings by using the vSphere Client or the ESXi Shell.

 

Another possible cause is that the CPU compatibility or EVC mode is not the same on both hosts. You need to make sure that the hosts have the same or compatible CPU features and that the EVC mode is enabled and set to the appropriate level on the cluster or the VM. You can check and change the CPU compatibility or EVC mode by using the vSphere Client or the PowerCLI.

Another possible cause is time is not the same on both server, check if NTP services are working and time is the same on each host.

If none of the above methods work, you can also try to use the hybrid mode feature, which allows you to migrate a VM between two different vDS versions. You need to enable the hybrid mode by adding a property to the vCenter Server advanced settings.

zchris06
Enthusiast
Enthusiast

Hi,

You may have a mismatched subnet mask on the host vmotion config. Also, ssh to each host and run vmkping. Hopefully the links below will help.

https://kb.vmware.com/s/article/1003728

https://kb.vmware.com/s/article/65184

0 Kudos
aurora-chase
Contributor
Contributor

To make sure VM is successfully moved to the new server, there are some requirements:

1.The shared storage should be in the same directory, which means the image directory should not be changed but the names of storage pools can be different.

2.The operating systems of the source server and destination server should be the same.

3. The name of the bridge should not be changed.

4. SELinux should be disabled on both the source and destination server.

5. Firewall should be configured to allow hot migration between the source server and destination server.

6. Name resolution is workable on both the source and destination server.

If that still doesn't work, you can learn the three methods to migrate VM without vMotion here.

0 Kudos
jojo22
Contributor
Contributor

Hi, thanks,
I believe the migration used to work some time ago,
- both hosts have unique IP, their DNS names are resolved, have same DNS addresses.
- CPU compatibility I believe is valid (1host is Xeon X7560 another is Xeon  E7540) because when the EVC was setup, It must have pass the pre-checks, otherwise It would not pass and allow us to create the EVC cluster.
- Time is same, NTP is set to identical servers.
Any more ideas on Which host is misconfigured? (as I mentioned, hot migration works in one way only, the other dont, so can anybody tell at least which host is suspected to have invalid settings?
....is it possible to get more information from some logs, what exactly is the root of the fail? (are there any migration specific logs stored elsewhere than in vCenter/VM/Monitor/Events)?

0 Kudos
jojo22
Contributor
Contributor

The subnet on both hosts is same, ssh from each host shows connectivity bellow 1ms.Any more ideas?

0 Kudos
virtualqc
Enthusiast
Enthusiast

Any DRS policy (anti affinity policy in place) or storage policies in place 

0 Kudos
jojo22
Contributor
Contributor

Hi Thaks for reply.
1. I am moving the VM storage together with Compute resources, not using shared folder/nas/san
2. OS is the same
3. what "bridge " you mean? vSwitch?
4. SELinux - could not find anywhere how to turn it of in ESXi, but as it used to work some time ago,I believe it is in state as it should be.
5. firewall is allowed to all IPs on both hosts
6. name resolution is working OK for both hosts
....any idea why it is working ok from Host-A to Host-B, and not OK from Host-B to Host-A?

Tags (2)
0 Kudos
jojo22
Contributor
Contributor

Hi, thanks, no DRS rules or storage / anti afinity rules...
Any more ideas? (again want to highlight that from Host-A to Host-B it works OK, from Host-B to Host-A live migration doesnt work....only cold works ok.
Thanks in advance

0 Kudos
virtualqc
Enthusiast
Enthusiast

do you have the updated to the last version of VM tools inside the VM?

0 Kudos
zchris06
Enthusiast
Enthusiast

Can you check if the vmotion has a duplicate IP, or change it to an unused one.

0 Kudos
virtualqc
Enthusiast
Enthusiast

it really like a network mask problem on the vmotion nic on one of the servers

0 Kudos
jojo22
Contributor
Contributor

Got to the point, when few VM are migrated ok in both ways, but some VMs are still failing if live-migrated from Host-B to Host-A (other way is OK)
Some VM helped alot in Settings/Devices, to change Network adapter type from VMXNET3 to E1100E, some helped to uncheck the Direct path I/O in network adapter. I messed with removing all unused HW settings, display settings reset from custom to auto, etc...
Now I have only 2 VMs , which are not able to be migrated from Host-B to Host-A.Other 10 are working OK in both ways.
So with troubleshooting we can remove all suspections regarding Host misconfiguration, IP conflicts, etc....So there must be ONLY something in the VM /settings I guess.
VMs with issue (one way live migration only):
VM1 - Windows, VMtools version 12294
VM3-vCenter, VMTools version 10346

VMs live migration OK in both ways 
VM9-Windows, VMTools version 12294
VM6 - Ubuntu, VMTools version 12352
for testing purpose I have set on all: All Vms have the same Network adapter type now , no special devices, etc....so it is not about hardware, VMtools version etc....
Seems for 1 VM which wasnt working in both ways before it helped to lower amount of RAM alocated from 40GB to 4GB, even both Host Servers have about 140GB out of 256GB RAM still free..... 
The last 2 VMs I can not live migrate no matter what I tried....both these VMs are impossible to cold migrate - vCenter and Domain Controller , so I need to find a way to hot migrate them......
Any more ideas?

0 Kudos
jojo22
Contributor
Contributor

as you can see in my last answer in this post, the version of VMTools doesnt impact if it can be migrated or not....have 2 VMs, each with same VMTools version, one of them is working OK, the other one not....any ideas? 

0 Kudos