VMware Cloud Community
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

Recovering vCenter 7

Had a lab vCenter crash and am trying to figure out why.

Current symptoms:

* If starting vCenter from shell with "service-control --start --all" the process will fail with vPostgres couldn't start.

* If starting vPostgres manually ("service-control --start --vmware-vpostgres") and then starting vCenter ("service-control --start --all") the proccess will fail with vpxd-svcs failed to start. 

* I logged into vCenter VCDB and verified administrator account

* I reset vCenter certificates and validated with lsdoctor

* vxpd.log shows "Failed to connect to Authz service" and "Failed to initialize authorizeManager"

Anyone seen something like this?

0 Kudos
1 Solution

Accepted Solutions
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

Well, was able to recover. VMware sent a certificate tool (vCert) which identified some trust issues and registrations which the standard tools didn't address. 

Then I found an issue with setting up logging within the tomcat instance. I commented out the "isAccessLogCreated" and "accessLogCleaner" beans from the Tomcat config.

I also had to manually rebuild the vPostgresql certificate store.

I restarted the services and got the core up and running. I got a good snapshot of the VCSA. I attempted to do a VCSA back and it failed continuously. I decided to attempt an upgrade to repair the VCSA. It took about 2 hours, but the upgrade completed from 7.03f to g. I continue to walk the update path all the way to the latest 7.03 release. 

I tested the VCSA backup and it ran successfully. 

I tested the Tomcat by uncommenting the previously commented out beans. It ran successfully. 

In summary, there was corruption at multiple points within the VCSA. The help here and from VMware was able to recover it. Thank you all.

View solution in original post

0 Kudos
21 Replies
hirschinho
Contributor
Contributor
Jump to solution

Hi,

looks like a certificate issue.

 

Have you checked all certificates with 

for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;

 

lsdcotor says all good ?  after trustfix ?

 

 

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

@hirschinho 

Upon running the suggested code, all certs are dated 2025 and beyond. There is a BACKUP_STORE cert for __MACHINE_CERT dated December 2022, but I was under the understanding that those are inactive. 

LSDOCTOR shows all good. 

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

Dr_Virt_0-1690206096460.png

 

0 Kudos
hirschinho
Contributor
Contributor
Jump to solution

Can you verify it the Hostname is correct with this command with the certificate ?

/usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost

 

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

@hirschinho 

Yes. It returns the FQDN of the VCSA.

0 Kudos
hirschinho
Contributor
Contributor
Jump to solution

which Build of vCenter you are running ?

Which way do you have reset the certificates ?

/usr/lib/vmware-vmca/bin/certificate-manager

with option 8 ?

If not please do it with option 8

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

@hirschinho 

VCSA - 7.0.3.01000
BUILD - 20395099


Yes. it was the Certificate Manager with option 8.

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

Got a new error:

VPXD - Failed to read X509 cert

Dr_Virt_0-1690207531810.png

 

0 Kudos
hirschinho
Contributor
Contributor
Jump to solution

0 Kudos
maksym007
Expert
Expert
Jump to solution

I would reset a Certificate to default VMware cert and after that would create a new CSR.

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

@hirschinho 

First, thank you for all of the assistance. 

I have executed that KB. The STS was in good standing, but I replaced it anyway.

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

@maksym007 

All certificates are VMware self-signed certificates.

0 Kudos
maksym007
Expert
Expert
Jump to solution

what about that option? 

https://kb.vmware.com/s/article/82332 

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

@maksym007 

 

All certs are in good standing and the STS was replaced today.

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

There is something with vPostgres and the certificates. When attempting to start vPostgres on its own, there is a long list of messages about trying to build the root_crl.pem file. It makes many requests to the auth service, but eventually fails.

Dr_Virt_0-1690229137985.png

 

 

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

Well, can get most of the services up, but the vSphere-UI just won't play nice.

Dr_Virt_0-1690233035215.png

 

0 Kudos
maksym007
Expert
Expert
Jump to solution

Interesting what is causing such issues

0 Kudos
Dr_Virt
Hot Shot
Hot Shot
Jump to solution

Anyone know if we can just deploy a new vCenter and have it discover or reregister the existing cluster (vSAN, NSX, etc.)?

If not, I will have to plan a big "new deployment and migration". 

1) Remove host from existing cluster

2) Clean host

3) Deploy vCenter to single host

4) Enable vSAN

5) Enable NSX

6) Begin migration of workloads (how without a working vCenter?)

7) Role hosts between clusters

0 Kudos
hirschinho
Contributor
Contributor
Jump to solution

Hi @Dr_Virt 

First ist this a streched cluster with witness host or a standard cluster / OSA or ESA ?

vsan can work without the vCenter, so in my opinion its not neccessary to destroy everything.

The importent thing ist to install a new vCenter - do you have local datastores in one of your ESXi host - for example a boot device mit about 200 GB space ? - there you can temporarly deploy a vCenter. 

Then follow this 

  1. Create a cluster and enable vSAN on new cluster 
  2. Check vSAN Health before (esxcli vsan cluster get,....)
  3. Move each host.
  4. Reapply storage policies to the VMs
  5. Re-enable stretched cluster if neccessary

Witch NSX Version do you use ? - the nodes must be redeployed.

0 Kudos