So we have been running VSAN 6.2 in a stretched cluster for a few months now and everything has been looking good. Am about to add some new nodes in to the cluster for capacity and noticed something strange. The bulk of our hosts in the cluster are showing two disk groups as expected. Just under half the hosts though are showing no disk groups - but still using all 8 disks and contributing storage to the cluster:
I initially thought it was an issue with the web client but digging in to the RVC view of the cluster I can see that the host has no disk mappings:
2017-03-09 17:07:03 +0000: Fetching host info from invldnectx211.uk.corpdomain (may take a moment) ...
Product: VMware ESXi 6.0.0 build-4600944
VSAN enabled: yes
Cluster info:
Cluster role: agent
Cluster UUID: 5287918e-b4ae-314f-d2c0-d4a44435ccd2
Node UUID: 582430f9-6067-fc7c-4be5-109836a85524
Member UUIDs: ["582430fd-2a54-4830-6b24-109836a85868", "582430e6-5c21-ad90-51b7-109836a8588c", "582430e2-804a-3f3c-b512-109836a85854", "58411c33-4cb3-5bb4-e4a0-005056a264b2", "582430f9-6067-fc7c-4be5-109836a85524", "583c67a6-b0d6-a9e7-b647-484d7e968d51", "583dccfe-dbec-5ea7-afb8-484d7e968e3f", "583ed5eb-d1fd-94a5-fa20-484d7e968db9", "583ed65f-64a5-8ca4-dc11-484d7e9693f7", "582e2035-6092-e082-3750-484d7e8cbe11", "58b8000b-092d-1da8-39a5-484d7e8cbeff", "58b6eeae-f7aa-27ff-19ab-484d7e8cd691", "58b8005b-b87a-0fcb-6592-484d7e8cd77f"] (13)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
None
FaultDomainInfo:
Gresham Street (Preferred)
NetworkInfo:
Adapter: vmk2 ()
Has anyone seen this before? Is this even an issue? It just seems strange to have hosts contributing storage to a cluster but vsphere seems to be unaware of it....
Result was to ensure new nodes introduced were on correct, same version/build. Error looked to be PERC controller maxing out memory so suggestion was to update to new firmware for all RAID controllers across the cluster. Carried this out along with the reboot of all hosts so will have to see how this goes.
couple of weeks in and everything is going well so far.
that is strange indeed, I have never seen this. I recommend calling support to see how this can be fixed.
I feared as much. Will post back the results here in case anyone else gets this
Result was to ensure new nodes introduced were on correct, same version/build. Error looked to be PERC controller maxing out memory so suggestion was to update to new firmware for all RAID controllers across the cluster. Carried this out along with the reboot of all hosts so will have to see how this goes.
couple of weeks in and everything is going well so far.