VMware Cloud Community
kentleeKX
Contributor
Contributor

vsan container csi lost

Dear All

I have a 3 hosts vsan cluster
It runs k8s service and uses vsan csi to provide k8s volume
Yesterday I found that the system SSD on ESXi lost its connection, and at this point I found that the VMs on that host were still running normally & ESXi host client still can login
But the host is disconnected from VCSA and cannot vmotion to the VMs above
Although the VMs are running normally, the PVCs on the k8s cannot be created normally, and no HA is happening.
I tried hostd restart & service.sh restart , the command was executed successfully
But I only get 503 service unavailable on the web

until I forced reboot, triggered the HA to let the VM start on other hosts and then it was back to normal
After checking the K8s cluster, I found some data lost

The host did not fail completely, only the OS disk problem did not happen HA also can not vmotion, but actually my k8s because of this problem can not update pvc cause data loss

How can I avoid this situation in the future?

 

0 Kudos
0 Replies