This happened to me a few months ago on a fresh install of the vcenter appliance 6.5. It just stopped working a week or two after applying an update. Services would not start and there was no indication as to why. It wasn't a space issue, it wasn't that other issue with a duplicate value in the vpostgres database I read about either. I finally gave up and wiped it out to redeploy from scratch.
Well low and behold, sometime last night vcenter stopped working again. This time it wasn't even a full week after having applied the 6.5.0c patch. Only two services start, none of the others will. My deployment is two appliances, a PSC and the vCenter. The PSC appears fine and the services are showing healthy. The vCenter turned to garbage again. Here's an output of service-control:
root@mp1vsivcs501 [ ~ ]# service-control --status
Running:
lwsmd vmafdd
Stopped:
applmgmt vmcam vmonapi vmware-cm vmware-content-library vmware-eam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-perfcharts vmware-rbd-watchdog vmware-rhttpproxy vmware-sca vmware-sps vmware-statsmonitor vmware-updatemgr vmware-vapi-endpoint vmware-vcha vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui
Trying to start any service produces a similar output:
root@mp1vsivcs501 [ ~ ]# service-control --start vmware-vpxd-svcs
Perform start operation. vmon_profile=None, svc_names=['vmware-vpxd-svcs'], include_coreossvcs=False, include_leafossvcs=False
2017-04-24T19:36:49.136Z Running command: ['/usr/bin/systemctl', 'set-environment', 'VMON_PROFILE=NONE']
2017-04-24T19:36:49.140Z Done running command
2017-04-24T19:36:49.143Z Running command: ['/usr/bin/systemctl', 'daemon-reload']
2017-04-24T19:36:49.222Z Done running command
2017-04-24T19:36:49.222Z Running command: ['/usr/bin/systemctl', 'set-property', u'vmware-vmon.service', 'MemoryAccounting=true', 'CPUAccounting=true', 'BlockIOAccounting=true']
2017-04-24T19:36:49.227Z Done running command
2017-04-24T19:36:49.231Z RC = 1
Stdout =
Stderr = Failed to execute operation: Unit file is masked
2017-04-24T19:36:49.231Z {
"resolution": null,
"detail": [
{
"args": [
"Stderr: Failed to execute operation: Unit file is masked\n"
],
"id": "install.ciscommon.command.errinvoke",
"localized": "An error occurred while invoking external command : 'Stderr: Failed to execute operation: Unit file is masked\n'",
"translatable": "An error occurred while invoking external command : '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
2017-04-24T19:36:49.231Z Running command: ['/usr/bin/systemctl', 'unset-environment', 'VMON_PROFILE']
2017-04-24T19:36:49.235Z Done running command
Error executing start on service vpxd-svcs. Details {
"resolution": null,
"detail": [
{
"args": [
"vmware-vmon"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vmware-vmon'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
Service-control failed. Error {
"resolution": null,
"detail": [
{
"args": [
"vmware-vmon"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vmware-vmon'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
The first thing that pops out for me is line 11, "Failed to execute operation: Unit file is masked". I get that on every service I attempt to start and I'm not finding anything in VMware's knowledge portal about it. This is extremely frustrating.
**Additional info**
Running a search on just unit file is masked took me to a generic ubuntu thread about systemctl showing masked unit files. Here's the output of the systemctl list-unit-files:
root@mp1vsivcs501 [ ~ ]# systemctl list-unit-files | grep vmware
vmware-bigsister.service static
vmware-cm.service masked
vmware-content-library.service masked
vmware-eam.service masked
vmware-firewall.service enabled
vmware-imagebuilder.service masked
vmware-mbcs.service masked
vmware-netdump.service masked
vmware-perfcharts.service masked
vmware-rbd-watchdog.service masked
vmware-rhttpproxy.service masked
vmware-sca.service masked
vmware-sps.service masked
vmware-statsmonitor.service masked
vmware-updatemgr.service masked
vmware-vapi.service masked
vmware-vcha.service masked
vmware-vmon.service masked
vmware-vmonapi.service masked
vmware-vpostgres.service masked
vmware-vpxd-svcs.service masked
vmware-vpxd.service masked
vmware-vsan-health.service masked
vmware-vsm.service masked
vmware-bigsister.timer disabled
Not sure if that's normal or not, but it appears to be what the error message is complaining about?
Message was edited by: jhboricua
I ran into the same issue as part of a very long (20+ hour) P1 call on my vpxd service crashing if a VM gets assigned an invalid VDS network port group, the only resolution was to restore from backup or redeploy unfortunately
There's gotta be something else to this. I'm not running a VDS in my setup. It's all standard vSwitches.
We had this issue last week. After a 3-hr support call with a vCenter support engineer, he came up with the idea of looking around the forums. The fix is to UNMASK vmon.service:
systemctl unmask vmon.service
Then reboot your appliance. This fixes the issue.
We still do not know why the vmon service got masked to begin with. Maybe some kind of race condition during shutdown, it does a lot of systemctl masking/unmasking via the appliance start up and shut down scripts?
HI -
Warning 1. The following is a Linux solution to the problem and does not take into account any of the configurations and reasons for the masking.
Warning 2. The ongoing failures seems to be caused by the system boot / shutdown process - so external issues may still be in play .. be careful - suggest only for lab testing ...
login as root ...
enter
shell <cr>
cd /etc/systemd/system
<Please note this may just seem to be a directory BUT there is a lot going on here directly connected to kernel>
ls -lisa
< here is the files I found masked>
root@localhost [ ~ ]# systemctl list-unit-files | grep masked
applmgmt.service masked
vmcam.service masked
vmware-cis-license.service masked
vmware-cm.service masked
vmware-content-library.service masked
vmware-eam.service masked
vmware-imagebuilder.service masked
vmware-mbcs.service masked
vmware-netdump.service masked
vmware-perfcharts.service masked
vmware-pschealth.service masked
vmware-rbd-watchdog.service masked
vmware-rhttpproxy.service masked
vmware-sca.service masked
vmware-sps.service masked
vmware-statsmonitor.service masked
vmware-updatemgr.service masked
vmware-vapi.service masked
vmware-vcha.service masked
vmware-vmonapi.service masked
vmware-vpostgres.service masked
vmware-vpxd-svcs.service masked
vmware-vpxd.service masked
vmware-vsan-health.service masked
vmware-vsm.service masked
vsphere-client.service masked
vsphere-ui.service masked
ctrl-alt-del.target masked
then look in the directory
root@localhost [ /etc/systemd/system ]# ls -lisa
total 108
451046 4 drwxr-xr-x 24 root root 4096 May 9 03:03 .
450562 4 drwxr-xr-x 7 root root 4096 May 9 01:35 ..
452681 0 lrwxrwxrwx 1 root root 9 May 8 08:11 applmgmt.service -> /dev/null
467464 4 drwxr-xr-x 2 root root 4096 May 8 08:19 applmgmt.service.d
451876 0 lrwxrwxrwx 1 root root 40 Oct 22 2016 default.target -> /usr/lib/systemd/system/runlevel3.target
451048 4 drwxr-xr-x 2 root root 4096 May 8 17:08 getty.target.wants
467484 4 drwxr-xr-x 2 root root 4096 May 8 08:19 halt.target.wants
451195 4 drwxr-xr-x 2 root root 4096 Oct 22 2016 local-fs.target.wants
467460 4 drwxr-xr-x 2 root root 4096 May 8 08:19 lwsmd.service.d
451050 4 drwxr-xr-x 2 root root 4096 May 8 09:03 multi-user.target.wants
451054 4 drwxr-xr-x 2 root root 4096 Oct 22 2016 network-online.target.wants
467486 4 drwxr-xr-x 2 root root 4096 May 8 08:19 poweroff.target.wants
467482 4 drwxr-xr-x 2 root root 4096 May 8 08:19 reboot.target.wants
451834 4 -rw-r--r-- 1 root root 268 Jun 7 2016 sendmail.service
467117 4 drwxr-xr-x 2 root root 4096 May 8 08:19 shutdown.target.wants
452083 4 -rw-r--r-- 1 root root 476 Aug 22 2016 snmpd.service
451056 4 drwxr-xr-x 2 root root 4096 Oct 22 2016 sockets.target.wants
451058 4 drwxr-xr-x 2 root root 4096 Oct 22 2016 sysinit.target.wants
451107 0 lrwxrwxrwx 1 root root 39 Oct 22 2016 syslog.service -> /usr/lib/systemd/system/rsyslog.service
452464 4 -r-xr-xr-x 1 root root 470 Jan 18 10:08 vcha-hacheck.service
452104 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmafdd.service.d
452121 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmcad.service.d
452752 0 lrwxrwxrwx 1 root root 9 May 8 08:17 vmcam.service -> /dev/null
467023 4 drwxr-xr-x 2 root root 4096 May 8 17:10 vmcam.service.d
452116 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmdird.service.d
452157 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmdnsd.service.d
451129 4 drwxr-xr-x 2 root root 4096 Oct 22 2016 vmtoolsd.service.requires
452654 0 lrwxrwxrwx 1 root root 9 May 8 08:10 vmware-cis-license.service -> /dev/null
452651 0 lrwxrwxrwx 1 root root 9 May 8 08:09 vmware-cm.service -> /dev/null
452726 0 lrwxrwxrwx 1 root root 9 May 8 08:14 vmware-content-library.service -> /dev/null
452734 0 lrwxrwxrwx 1 root root 9 May 8 08:16 vmware-eam.service -> /dev/null
452761 0 lrwxrwxrwx 1 root root 9 May 8 08:18 vmware-imagebuilder.service -> /dev/null
452707 0 lrwxrwxrwx 1 root root 9 May 8 08:12 vmware-mbcs.service -> /dev/null
452684 0 lrwxrwxrwx 1 root root 9 May 8 08:11 vmware-netdump.service -> /dev/null
452763 0 lrwxrwxrwx 1 root root 9 May 8 08:18 vmware-perfcharts.service -> /dev/null
467114 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmware-psc-client.service.d
452247 0 lrwxrwxrwx 1 root root 9 May 8 08:11 vmware-pschealth.service -> /dev/null
452745 0 lrwxrwxrwx 1 root root 9 May 8 08:16 vmware-rbd-watchdog.service -> /dev/null
452646 0 lrwxrwxrwx 1 root root 9 May 8 08:09 vmware-rhttpproxy.service -> /dev/null
452664 0 lrwxrwxrwx 1 root root 9 May 8 08:10 vmware-sca.service -> /dev/null
452435 0 lrwxrwxrwx 1 root root 9 May 8 08:16 vmware-sps.service -> /dev/null
452690 0 lrwxrwxrwx 1 root root 9 May 8 08:11 vmware-statsmonitor.service -> /dev/null
467472 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmware-stsd.service.d
467476 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmware-sts-idmd.service.d
452749 0 lrwxrwxrwx 1 root root 9 May 8 08:17 vmware-updatemgr.service -> /dev/null
452667 0 lrwxrwxrwx 1 root root 9 May 8 08:10 vmware-vapi.service -> /dev/null
452751 0 lrwxrwxrwx 1 root root 9 May 8 08:17 vmware-vcha.service -> /dev/null
467468 4 drwxr-xr-x 2 root root 4096 May 8 08:19 vmware-vmon.service.d
452692 0 lrwxrwxrwx 1 root root 9 May 8 08:11 vmware-vpostgres.service -> /dev/null
452718 0 lrwxrwxrwx 1 root root 9 May 8 08:12 vmware-vpxd.service -> /dev/null
452704 0 lrwxrwxrwx 1 root root 9 May 8 08:11 vmware-vpxd-svcs.service -> /dev/null
452483 0 lrwxrwxrwx 1 root root 9 May 8 17:11 vmware-vsan-health.service -> /dev/null
452758 0 lrwxrwxrwx 1 root root 9 May 8 08:18 vmware-vsm.service -> /dev/null
<This will display all the files and more importantly links in the system .. we need to remove all the links to /dev/null>
<I have removed them all ,,,, but it may be a case - only some of them should be removed .. remember this is a kernel control area>
<and there are usually good reason to stop root for doing things ...- the masking is a protective process which is like a database holding a process until finishing a write .. >
<this command will remove all the links and ignore the directories ...>
rm vmware*
< I also did the other files that where linked to /dev/nul ...>
then reboot
reboot <cr>
hope that helps - I am not a vmware specialists - my knowledge is linux and this is a systems solution that may not fix an underlying issue ....
regards
Jeremy
Still a bug in December Patch, here are some easier steps to unmask all services.
# List all disabled services for removal.
find /etc/systemd/system/ -lname '/dev/null' -exec ls {} \;
# Automatically remove them (or rm each file)
find /etc/systemd/system/ -lname '/dev/null' -exec rm {} \;
# Relaod systemctl daemon
systemctl daemon-reload
# Start services or Reboot
service-control --start --all
Those 4 lines did the trick for me JacobDEvans, thanks for the post.
Hi@all,
this issue isn´t fixed with the latest VCSA 6.5U1g update.
Information from VMware Support:
This issue is tracked and will be fixed with vCenter 6.5 UPDATE2.
For now the workaround described by JacobDEvans is supported.
Thanks and BR/JO!
Just wanted to update on status of this. Still having the same problem with 6.5 U2b. Solution still works.
Thanks for the solution. It was driving me crazy. I would still like to know the cause.
So what caused it for me was snapshoting the vcenter server while it was on. After restoring from that snapshot it had this issue.
This happened to my VCSA 6.7 appliance after I cloned it to migrate to a new cluster, and started it up. So thankful to find this post!
Moderator: Moved to vCenter Server