VMware Cloud Community
maslow
Enthusiast
Enthusiast

Problems with payload exchange between vRA and vRO after crash

Hi community,

I am experiencing the following problem since a ESXi host crash that contained vRA/vRo appliance and the IaaS host. I set up a PoC for a company and all was running fine so far. Then during adding new subscriptions for some vRO workflows a host crashed and my vRA appliance with the integrated vRO andalso my IaaS machine crashed. After booting the VMs again once the host was up again all seemed to be fine.

But my workflows do not work correct any longer, allthough they are finished with a green arrow. I cant update the properties that I receive via payload from vRA, it just dont work. I then wrote a function to output all properties when I start a vRO workflow. I noticed that my updated or added properties from vRO are added some kind of wrong:

[2017-11-22 10:33:02.470] [I]

payload

   payload> IaaSTimeoutId : 12044

   payload> blueprintName : Windows Server 2012 R2 tests

   payload> componentId : vSphere__vCenter__Machine_1

   payload> componentTypeId : Infrastructure.CatalogItem.Machine.Virtual.vSphere

   payload> endpointId : 6b4a0c64-b3a6-47ed-918f-5ec69bd0ec5c

   payload> requestId : 15907846-a1d1-4c15-ac06-f46ef57d63c5

   payload> workflow.execution.endDate : Wed Nov 22 2017 10:32:57 GMT+0100 (CET)

   payload> workflow.execution.id : d8628f54-3723-4ffc-bbc1-b249ea492dde

   payload> workflow.execution.startDate : Wed Nov 22 2017 10:32:49 GMT+0100 (CET)

   payload> workflow.execution.state : COMPLETED

payload>lifecycleState

   payload>lifecycleState> phase : PRE

   payload>lifecycleState> state : VMPSMasterWorkflow32.Requested

payload>machine

   payload>machine> id : 918bc93d-890b-483a-94c2-ae6a1814d98a

   payload>machine> name : VCTVRAPoCBu0072

   payload>machine> owner : xxxxx

   payload>machine> type : 0

payload>machine>properties

   payload>machine>properties> Cafe.Shim.VirtualMachine.TotalStorageSize : 80

   payload>machine>properties> EWOE.ADcomputer.created : false

   payload>machine>properties> EWOE.scripts.toExecute : service;StoreAdminPWinPWVault.ps1,local;mkdir1.ps1,service;logUser.ps1,sccm;logUser.ps1

   payload>machine>properties> EWOE.scripts.toExecuteIfSuccessful : service;"c:\tmp\mkdir2.ps1",service;"c:\tmp\logUser.ps1",service;"c:\tmp\enableLUA.ps1"

   payload>machine>properties> EWOE.vm.name : poc-ntnx-vm04

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.BuildingMachine : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.DeactivateMachine : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.Disposing : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.MachineActivated : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.MachineProvisioned : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.RegisterMachine : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.Requested : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.UnprovisionMachine : *

   payload>machine>properties> Extensibility.Lifecycle.Properties.VMPSMasterWorkflow32.WaitingToBuild : *

   payload>machine>properties> VirtualMachine.Admin.TotalDiskUsage : 81920

   payload>machine>properties> VirtualMachine.CPU.Count : 1

   payload>machine>properties> VirtualMachine.Cafe.Blueprint.Component.Cluster.Index : 0

   payload>machine>properties> VirtualMachine.Cafe.Blueprint.Component.Id : vSphere__vCenter__Machine_1

   payload>machine>properties> VirtualMachine.Cafe.Blueprint.Component.TypeId : Infrastructure.CatalogItem.Machine.Virtual.vSphere

   payload>machine>properties> VirtualMachine.Cafe.Blueprint.Id : WindowsServer2012R2tests

   payload>machine>properties> VirtualMachine.Cafe.Blueprint.Name : Windows Server 2012 R2 tests

   payload>machine>properties> VirtualMachine.Disk0.IsClone : true

   payload>machine>properties> VirtualMachine.Disk0.Label : Hard disk 1

   payload>machine>properties> VirtualMachine.Disk0.Size : 80

   payload>machine>properties> VirtualMachine.Disk0.Storage : vRApoc

   payload>machine>properties> VirtualMachine.Memory.Size : 4096

   payload>machine>properties> VirtualMachine.Network0.Name : VM Network

   payload>machine>properties> VirtualMachine.Storage.Name : vRApoc

   payload>machine>properties> __Cafe.Request.BlueprintType : 1

   payload>machine>properties> __Cafe.Root.Request.Id : 5d52d60a-99ec-4631-8111-225da4a4caac

   payload>machine>properties> __Clone_Type : CloneWorkflow

   payload>machine>properties> __InterfaceType : vSphere

   payload>machine>properties> __Legacy.Workflow.ImpersonatingUser :

   payload>machine>properties> __Legacy.Workflow.User : xxxx

   payload>machine>properties> __VirtualMachine.Allocation.InitialMachineState : SubmittingRequest

   payload>machine>properties> __VirtualMachine.ProvisioningWorkflowName : CloneWorkflow

   payload>machine>properties> __api.request.callback.service.id : 54316982-fdb1-4c6d-b522-263d6ecbe004

   payload>machine>properties> __api.request.id : 15907846-a1d1-4c15-ac06-f46ef57d63c5

   payload>machine>properties> __clonefrom : Woerth_Template_Okt

   payload>machine>properties> __clonefromid : 421ca1dc-930f-46e3-8d32-98f1e78faff5

   payload>machine>properties> __clonespec : vRA PoC Windows 2012 R2

   payload>machine>properties> __iaas_request_binding_id : 240d9f96-de1e-4aaa-9aa3-d2a4c831df92

   payload>machine>properties> __request_reason :

   payload>machine>properties> __reservationPolicyID : bf056e7c-bf5c-47c8-a5ee-26ed2cc1ea3e

   payload>machine>properties> __trace_id : 1GfSUeZk

   payload>machine>properties> _number_of_instances : 1

   payload>machine>properties> errorMessage :

   payload>machine>properties> failedWorkflow :

   payload>machine>properties> isFailed : false

payload>virtualMachineAddOrUpdateProperties

   payload>virtualMachineAddOrUpdateProperties> VirtualMachine.Network0.Address : xx.8.255.76

   payload>virtualMachineAddOrUpdateProperties> VirtualMachine.Network0.Gateway : xx.8.255.1

   payload>virtualMachineAddOrUpdateProperties> VirtualMachine.Network0.PrimaryDns : xx.8.2.254

   payload>virtualMachineAddOrUpdateProperties> VirtualMachine.Network0.SecondaryDns : xx8.60.65

   payload>virtualMachineAddOrUpdateProperties> VirtualMachine.Network0.SubnetMask : 255.255.255.0

[2017-11-22 10:33:02.712] [I] AD Computer poc-ntnx-vm04 already exists in AD

[2017-11-22 10:33:02.714] [E] Error in (Workflow:vRA PoC Create AD computer object in OU / objectAlreadyExistsInAD (item6)#1) AD Computer poc-ntnx-vm04 already exists in AD

As you can see at the bottom, the IP data is not added to the path payload>machine>properties> VirtualMachine. but to payload>virtualMachineAddOrUpdateProperties.

I changed nothing within the workflows, prior to the crash all workflows worked and now they dont hand back the properties correct 😕

I cant see errors in the logs, so I dont know where to look.

I already tried to fix it with deleting/adding subscriptions, endpoints and blueprints. That doesnt helped me. Then I wrote new workflows to test the behaviour and it is still the same.

Any ideas?

0 Kudos
6 Replies
vklinden
VMware Employee
VMware Employee

I ran into the same issue today. We had a power outtage in a lab environment, and it appears that the payload data isn't be passed back from vRO to vRA. 
0 Kudos
maslow
Enthusiast
Enthusiast

Could you figure out a solution for this issue? I still experience this issue in our environment and the VMware support is really Slow. They are still analyzing the logs... for over one week now.

0 Kudos
vklinden
VMware Employee
VMware Employee

My issue turned out to be something else. it looks like from your log the hostname already exists in AD when it tries to create it.

0 Kudos
maslow
Enthusiast
Enthusiast

Yeah I know that but the error I have is above the AD error message regarding network Settings in the given example, I just ran a workflow again, that is why the AD error appears.

As I wrote the edited payload Data was Not written correctly. I have that with a lot of workflows.

0 Kudos
maslow
Enthusiast
Enthusiast

So if you have found a fix for your problem, please share it Smiley Happy Maybe it helps other ppl or even myself Smiley Wink

0 Kudos
maslow
Enthusiast
Enthusiast

THe VMware support found out that the vRA node has inconsistencies in the database:


catalina:
20-Nov-2017 08:56:06.516 ERROR [tomcat-http--26] org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logExceptions Batch entry 0 insert into ApprovalPhaseType (allow_updates, description, approval_form, details_fo
rm, name, phaseOrder, id) values ('1', '{com.vmware.csp.core.cafe.catalog@request.approval.phase.pre.desc}', NULL, NULL, '{com.vmware.csp.core.cafe.catalog@request.approval.phase.pre.name}', 0, 'com.vmware.cafe
.catalog.request.pre') was aborted: ERROR: duplicate key value violates unique constraint "approvalphasetype_pkey"
Detail: Key (id)=(com.vmware.cafe.catalog.request.pre) already exists.  Call getNextException to see other errors in the batch.

20-Nov-2017 08:56:06.517 WARN [tomcat-http--26] org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logExceptions SQL Error: 0, SQLState: 23505
20-Nov-2017 08:56:06.517 ERROR [tomcat-http--26] org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logExceptions ERROR: duplicate key value violates unique constraint "approvalphasetype_pkey"
Detail: Key (id)=(com.vmware.cafe.catalog.request.pre) already exists.

20-Nov-2017 08:56:06.518 ERROR [tomcat-http--26] org.hibernate.engine.jdbc.batch.internal.BatchingBatch.performExecution HHH000315: Exception executing batch [could not execute batch]

20-Nov-2017 08:56:10.327 INFO [tomcat-http--30] org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl.release HHH000010: On release of batch it still contained JDBC statements
20-Nov-2017 08:56:10.327 WARN [tomcat-http--30] org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logExceptions SQL Error: 0, SQLState: 23505

20-Nov-2017 08:56:10.328 ERROR [tomcat-http--30] org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logExceptions Batch entry 0 insert into ApprovalPhaseType (allow_updates, description, approval_form, details_fo
rm, name, phaseOrder, id) values ('1', '{com.vmware.csp.core.cafe.catalog@request.approval.phase.pre.desc}', NULL, NULL, '{com.vmware.csp.core.cafe.catalog@request.approval.phase.pre.name}', 0, 'com.vmware.cafe
.catalog.request.pre') was aborted: ERROR: duplicate key value violates unique constraint "approvalphasetype_pkey"
Detail: Key (id)=(com.vmware.cafe.catalog.request.pre) already exists.  Call getNextException to see other errors in the batch.

20-Nov-2017 08:56:10.328 WARN [tomcat-http--30] org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logExceptions SQL Error: 0, SQLState: 23505
20-Nov-2017 08:56:10.328 ERROR [tomcat-http--30] org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logExceptions ERROR: duplicate key value violates unique constraint "approvalphasetype_pkey"
Detail: Key (id)=(com.vmware.cafe.catalog.request.pre) already exists.

controlcenter.log
SEVERE [syncScheduler-1] org.postgresql.Driver.connect Connection error: 
org.postgresql.util.PSQLException: Connection to 127.0.0.1:5433 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:265)

They stated they can try to solve the issues, but they might fail and problems cant be solved to 100%. Suggestion is to redeploy everything and configure vRA node from scratch without importing. That is bad cause we would have to set up everything again as we are also using the integrated vRO node 😕

0 Kudos