Had a lab vCenter crash and am trying to figure out why.
Current symptoms:
* If starting vCenter from shell with "service-control --start --all" the process will fail with vPostgres couldn't start.
* If starting vPostgres manually ("service-control --start --vmware-vpostgres") and then starting vCenter ("service-control --start --all") the proccess will fail with vpxd-svcs failed to start.
* I logged into vCenter VCDB and verified administrator account
* I reset vCenter certificates and validated with lsdoctor
* vxpd.log shows "Failed to connect to Authz service" and "Failed to initialize authorizeManager"
Anyone seen something like this?
Well, was able to recover. VMware sent a certificate tool (vCert) which identified some trust issues and registrations which the standard tools didn't address.
Then I found an issue with setting up logging within the tomcat instance. I commented out the "isAccessLogCreated" and "accessLogCleaner" beans from the Tomcat config.
I also had to manually rebuild the vPostgresql certificate store.
I restarted the services and got the core up and running. I got a good snapshot of the VCSA. I attempted to do a VCSA back and it failed continuously. I decided to attempt an upgrade to repair the VCSA. It took about 2 hours, but the upgrade completed from 7.03f to g. I continue to walk the update path all the way to the latest 7.03 release.
I tested the VCSA backup and it ran successfully.
I tested the Tomcat by uncommenting the previously commented out beans. It ran successfully.
In summary, there was corruption at multiple points within the VCSA. The help here and from VMware was able to recover it. Thank you all.
Hi,
looks like a certificate issue.
Have you checked all certificates with
for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;
lsdcotor says all good ? after trustfix ?
Upon running the suggested code, all certs are dated 2025 and beyond. There is a BACKUP_STORE cert for __MACHINE_CERT dated December 2022, but I was under the understanding that those are inactive.
LSDOCTOR shows all good.
Can you verify it the Hostname is correct with this command with the certificate ?
/usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost
@hirschinho
Yes. It returns the FQDN of the VCSA.
which Build of vCenter you are running ?
Which way do you have reset the certificates ?
/usr/lib/vmware-vmca/bin/certificate-manager
with option 8 ?
If not please do it with option 8
@hirschinho
VCSA - 7.0.3.01000
BUILD - 20395099
Yes. it was the Certificate Manager with option 8.
Got a new error:
VPXD - Failed to read X509 cert
Try this KB
I would reset a Certificate to default VMware cert and after that would create a new CSR.
@hirschinho
First, thank you for all of the assistance.
I have executed that KB. The STS was in good standing, but I replaced it anyway.
@maksym007
All certificates are VMware self-signed certificates.
what about that option?
There is something with vPostgres and the certificates. When attempting to start vPostgres on its own, there is a long list of messages about trying to build the root_crl.pem file. It makes many requests to the auth service, but eventually fails.
Well, can get most of the services up, but the vSphere-UI just won't play nice.
Interesting what is causing such issues
Anyone know if we can just deploy a new vCenter and have it discover or reregister the existing cluster (vSAN, NSX, etc.)?
If not, I will have to plan a big "new deployment and migration".
1) Remove host from existing cluster
2) Clean host
3) Deploy vCenter to single host
4) Enable vSAN
5) Enable NSX
6) Begin migration of workloads (how without a working vCenter?)
7) Role hosts between clusters
Hi @Dr_Virt
First ist this a streched cluster with witness host or a standard cluster / OSA or ESA ?
vsan can work without the vCenter, so in my opinion its not neccessary to destroy everything.
The importent thing ist to install a new vCenter - do you have local datastores in one of your ESXi host - for example a boot device mit about 200 GB space ? - there you can temporarly deploy a vCenter.
Then follow this
Witch NSX Version do you use ? - the nodes must be redeployed.