I tried to stop our VIO deployment in the vCenter, because we had to do hardware maintainance. But the stop task takes forever, so I decided to shut down the VMs manually.
After that I had to start and restart the deployment with the command:
sudo viocli services stop
sudo viocli services start
The deployment works now, but in the vCenter the status is still on "error"
If I check on the console the deployment seems ok:
# viocli deployment status
Collector Name Overall Status
---------------------------------- ----------------
VerifyTimeSynchronization SUCCESS
VerifyConnection SUCCESS
VerifyMariaDatabaseClusterSize SUCCESS
VerifyDatabaseConnectionPerProcess SUCCESS
VerifyRunningProcess SUCCESS
but if I try to stop I get:
# viocli deployment stop
Deployment with name: VIO has a task in progress, waiting for it to finish...
and it doesn't seem that waiting helps. How can I find this running task and fix this?
Run these two commands to get the status to reset.
viocli show
# Get the VM name for controller02
viocli recover -n <VMname-Controller-1>
When a node is rebuilt and as part of the process the cluster status is reset. Since controllers are stateless, they are easier to recover since you do not need to have a backup to import like a database node needs.
you can look into /var/log/jarvis directory. Find out which file is still updating and take a look at the last 100 or so lines.
thank you for your answer.
The files in /var/log/jarvis doesn't seem to update anymore:
-rw-r--r-- 1 jarvis adm 3975869 May 10 14:50 ansible.log
-rw-r--r-- 1 jarvis adm 5613701 May 15 10:23 jarvis.log
-rw-r--r-- 1 jarvis adm 2283852 May 10 14:50 pecan.log
The error in jarvis.log doesn't helps me:
2017-05-11 09:10:44,368 INFO [jarvis.ans.util][MainThread] Using Customization file /opt/vmware/vio/custom/custom.yml
2017-05-11 09:10:44,368 INFO [jarvis.ans.util][MainThread] No customization file params were specified.
2017-05-11 09:10:44,508 ERROR [wsme.api][MainThread] Server-side error: "need more than 1 value to unpack". Detail:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/wsmeext/pecan.py", line 82, in callfunction
result = f(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/jarvis/api/controllers/v1.py", line 170, in status
period=period)
File "/usr/local/lib/python2.7/dist-packages/jarvis/ans/manager.py", line 967, in report_viomon_status
period=period)
File "/usr/lib/python2.7/dist-packages/viomon/util/log.py", line 190, in report_viomon_status
collector_map[collector_name].processor.send(message)
File "/usr/lib/python2.7/dist-packages/viomon/module/base.py", line 86, in process_message
self._further_process_message(message)
File "/usr/lib/python2.7/dist-packages/viomon/module/database.py", line 124, in _further_process_message
process_name, count = message.value.split(':')
ValueError: need more than 1 value to unpack
2017-05-11 09:10:44,513 ERROR [pecan.commands.serve][MainThread] "GET /deployment/VIO/status?period=300 HTTP/1.1" 500 1029
2017-05-11 09:12:14,171 INFO [jarvis.api.controllers.v1][MainThread] Retrieve status of deployment VIO with period: 300 second(s).
I also tried now a "viocli deployment configure" and this runs without a error.
This procedure also didn't update the logs in /var/log/jarvis.
Do you have other suggestions?
Thank you.
can you confirm if your postgres and OMS status is running on your setup:
service vpostgres status
service oms status
yes, I checked this on the management server:
# service vpostgres status
vpostgres start/running, process 1870
# service oms status
oms start/running, process 12844
i'm still stuck into this.
Is it possible to see the running process that blocks my deployment?
~# viocli deployment stop
Deployment with name: VIO has a task in progress, waiting for it to finish...
Or do I've to install the deployment new and restore from a backup?
I tried now to set the uncompleted task in the management DB to 'COMPLETED'. Now I got a different message:
root@ids-ost-1:/home/viouser# viocli deployment start
Deployment: VIO is not in STOPPED state.
Cannot start the deployment.
root@ids-ost-1:/home/viouser# viocli deployment stop
Deployment: VIO is not in RUNNING state.
Cannot stop the deployment.
root@ids-ost-1:/home/viouser# viocli deployment status
Collector Name Overall Status
---------------------------------- ----------------
VerifyTimeSynchronization SUCCESS
VerifyConnection SUCCESS
VerifyMariaDatabaseClusterSize SUCCESS
VerifyDatabaseConnectionPerProcess SUCCESS
VerifyRunningProcess SUCCESS
I didn't found the location where the state is saved. Can anybody tell me this, so I could this set to 'RUNNING'?:
Run these two commands to get the status to reset.
viocli show
# Get the VM name for controller02
viocli recover -n <VMname-Controller-1>
When a node is rebuilt and as part of the process the cluster status is reset. Since controllers are stateless, they are easier to recover since you do not need to have a backup to import like a database node needs.
Thank you, that solved the issue.
You're welcome.