VMware Cloud Community
Silvester
Contributor
Contributor

Update VSAN configuration job seems to be hung, then "An error occured while communicating with the remote host."

Dear all,

I've just built a small POC environment for VSAN. This consists of 3 UCS 240M3S servers, installed with ESXi 5.5 U1. The VCenter is 5.5 U1c.

The cluster is created beforhand, with 3 hosts added, everything seems well so far.

When I enabled VSAN cluster, the job keeps running "in progress", for running about 40 minutes, then failed out with the error message "An error occured while communicating with the remote host." or "Time out"

I've set up the vmkernals for management/vmotion/vsan traffic like the screenshot attached. I'm using 10Gb NIC and ensured I can ping/vmkping each kernal port from VCenter, and each other from each host.

I suspect the issue lies with the network, but I can't figured out which one.

Are their any network configuration caveat here? Or is there any steps to narrow down the problem?

Silvester

0 Kudos
4 Replies
MightySuite
VMware Employee
VMware Employee

What is the vSAN mode (of adding disks to storage) that you are using..? If it is Automatic, can you try with Manual..?

  • Remove the current cluster
  • Add all the hosts directly to Datacenter
  • Create a cluster (do not enable HA and DRS) and turn on vSAN and set the mode to manual
  • Add all the hosts to the above cluster
0 Kudos
Silvester
Contributor
Contributor

I followed the steps and removed the cluster and created an empty cluster with VSAN enabled and manual mode.

1.Request time out when adding host, although I can ping the host name without any problem. I have to restart management agents on the hosts so that they can be re-added to the hosts.

2. When trying to move hosts into the cluster, suddenly I found one of the host is not responding, after I removed it and add back, it gave me timeout error again.

3. After rebooting all the hosts several times, the hosts is able to be added into the cluster.

What's interesting is that, now all the disks (SAS and SSD) are shown on the console as being in used now, with disk groups created, although it it still showing the Manual Mode.

BTW, the hardware configuration is like this:

UCS 240M3S X3

Each server has 10 SAS disks(900GB each) and 2 SSDs (200GB each).

Each server has 128GB RAM and 10GbNIC X2.

I also notice disk group deleting job is hung now.

Questions:

It seems to me enabling/disabling VSAN on the cluster will  broken up something on the hosts. Causing the hosts not able to be connected by VCenter. What might be the cause?

With my disk layout (10SAS plus 2 SSD), how long will it take to create/delete a disk group if one group contain 5 SAS plus 1 SSD?

Thanks

Silvester

0 Kudos
lvaibhavt
Hot Shot
Hot Shot

Is multicast enabled on your network which is required for the VSAN network for inter host communication ?

0 Kudos
Silvester
Contributor
Contributor

The issue is resolved. The root cause is IGMP being disabled on the network. After enabling IGMP, everything becomes fine again.

0 Kudos