I am trying to enable Workload Management in VMware 7.0 and NSX-T 3.0. Unfortunately without success.
The constant factor is a lot of unauthorized errors with NSX in the wcpsvc.log
I am logged into vCenter with the local Administrator@vsphere.local account and vCenter has been added to NSX as a Compute Manager and Trust has been enabled. vCenter, NSX, Edge and ESXi hosts are all configured in the same VLAN.
Does anyone has an idea? See the logging below.
2020-11-15T12:55:04.203Z error wcp [opID=5fb24d4d] Error occurred sending Principal Identity request to NSX: principal identity already created
2020-11-15T12:55:04.203Z error wcp [opID=5fb24d4d] Failed to create PI in NSX managers. Err: principal identity already created
2020-11-15T12:55:04.203Z debug wcp [opID=5fb24d4d] WCP cluster principal identity (for cluster domain-c1006, service account wcp-cluster-user-domain-c1006-07eef2dd-e023-4707-a61e-f393cdd86090) already created
2
2020-11-15T12:55:10.658Z debug wcp [opID=5fb24d4d-domain-c1006] Cluster Network Provider is NSXT Container Plugin. Performing additional NCP-specific configuration.
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error checking if NSX resources exist. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error checking if NSX resources exist for VMs: [vm-4041]. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error creating NSX resources. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-4041. Err: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error configuring cluster NIC on master VM vm-4041: Unauthorized
2020-11-15T12:55:10.659Z error wcp [opID=5fb24d4d-domain-c1006] Error configuring API server on cluster domain-c1006 Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried.
Did you enable trust on the Compute Manager in NSX-T ?
In NSX-T --> System --> Fabric --> Compute Managers --> <Your vCenter config> --> Enable Trust
Yes, Trust has been enabled.
It looks like there are authorization problems from vCenter to NSX but not the other way around.
Configure operation for the Master node VM with identifier vm-7008 failed.
2020-11-15T19:09:15.671Z debug wcp [opID=EAMAgent] Ignore non WCP agency vCLS
2020-11-15T19:09:15.671Z debug wcp informer.processLoop() lister.List() returned
2020-11-15T19:09:18.02Z error wcp [opID=vapi] Security Context missing in the request
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] SecurityContext not passed in the request. Creating an empty security context
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] opId was not present for the request
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Handling new request with input {"STRUCTURE":{"operation-input":{}}}
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Service specific authorization scheme for com.vmware.vapi.std.introspection.service not found.
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Service specific authorization scheme for com.vmware.vapi.std.introspection.service not found.
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Could not find package specific auth scheme for com.vmware.vapi.std.introspection.service
2020-11-15T19:09:18.02Z debug wcp [opID=vapi] Authn scheme Id is not provided but NO AUTH is allowed hence invoking the operation
2020-11-15T19:09:18.02Z error wcp [opID=vapi] SecurityCtx doesn't have property AUTHN_IDENTITY
2020-11-15T19:09:18.02Z error wcp [opID=vapi] Invalid authentication result
2020-11-15T19:09:18.021Z debug wcp [opID=vapi] Skipping authorization checks, because there is no authentication data for: com.vmware.vapi.std.introspection.service.list
I'm having this issue as well - I do think that the NSX account check is a red herring, as the principal identity doesn't appear to be created on the manager side. I'm going to reset root on vCenter (unrelated), and rebuild trust with NSX-T, then retry:
2020-12-12T19:46:29.839Z debug wcp [opID=5fd5201f] NSX HTTP Request is: &{Method:POST URL:https://10.66.0.204:443/api/v1/trust-management/token-principal-identities/ Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[] Body:{Reader:{"description":"Principal Identity for WCP service","display_name":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7","id":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7","name":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7","node_id":"wcp-1b34a1bc-6fec-4ca6-8728-984065fb69c7"}} GetBody:0x843110 ContentLength:261 TransferEncoding:[] Close:false Host:10.66.0.204:443 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc0000ce000}
2020-12-12T19:46:30Z debug wcp [opID=5fd5201f] NSX HTTP Response is: &{409 409 HTTP/1.1 1 1 map[Cache-Control:[no-cache, no-store, max-age=0, must-revalidate] Content-Type:[application/json] Date:[Sat, 12 Dec 2020 19:46:29 GMT] Expires:[0] Keep-Alive:[timeout=60] Pragma:[no-cache] Server:[NSX] Set-Cookie:[JSESSIONID=00B54AAEC7982E12A1DDEEF2B92170F5; Path=/; Secure; HttpOnly] Strict-Transport-Security:[max-age=31536000 ; includeSubDomains] Vary:[accept-encoding] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Nsx-Requestid:[9c2965e2-9ec8-4094-823b-bcdedc8579e3] X-Xss-Protection:[1; mode=block]] 0xc001f795c0 -1 [chunked] false true map[] 0xc001da6d00 0xc00145b970}, {
2020-12-12T19:46:30Z error wcp [opID=5fd5201f] Error occurred sending Principal Identity request to NSX: principal identity already created
2020-12-12T19:46:30Z error wcp [opID=5fd5201f] Failed to create PI in NSX managers. Err: principal identity already created
Hey guys,
Are you all trying this with less than 3 compute nodes?
It seems that these nodes are reporting NCP00010 TN ID not found. At first, I thought this meant EDGE Transport node, but it could also mean HOST transport node. It'd take a bit for me to scare up some more hosts, and I'd like to validate my hypothesis a little more before going on Craigslist.
Are the following components all in the same Subnet?
Common issues are also DNS (both, reachable and resolvable) and NTP. The errors are looking all very similar, if not identical.
Regards
Stephan
I have noticed that as well when re-reviewing the logs.
Most people seem to be filtering based on severity because good docs are not yet available for this.
A LEVEL SET
Readers, if you're wondering what log:
root@vcenter [ /var/log/vmware/wcp ]# ls
stdstream.log-0.stderr stdstream.log-1.stdout stdstream.log-3.stderr stdstream.log-4.stdout tkg-telemetry wcpsvc-2020-12-12T19-45-55.657.log.gz wcpsvc.log
stdstream.log-0.stdout stdstream.log-2.stderr stdstream.log-3.stdout stdstream.log.stderr wcpsvc-2020-12-12T15-38-27.008.log.gz wcpsvc-2020-12-13T03-39-49.311.log.gz wcp-telemetry
stdstream.log-1.stderr stdstream.log-2.stdout stdstream.log-4.stderr stdstream.log.stdout wcpsvc-2020-12-12T17-36-11.039.log.gz wcpsvc-2020-12-13T19-50-23.995.log.gz
The log wcpsvc.log is the one in question.
Furthermore, we're able to come to the following conclusions:
The following logs can be ignored in most cases, as they're just spam.
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:31:20.927Z error wcp [opID=5fc87d1a-domain-c1008] Error checking if NSX resources exist for VMs: [vm-4011]. Err: Unauthorized
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:31:21.978Z error wcp [opID=5fc87d1a-domain-c1008] Error checking if NSX resources exist. Err: Unauthorized
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:31:21.978Z error wcp [opID=5fc87d1a-domain-c1008] Error checking if NSX resources exist for VMs: [vm-4013]. Err: Unauthorized
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:35:11.472Z error wcp [opID=5fc87e99] Error occurred sending Principal Identity request to NSX: principal identity already created
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:35:11.472Z error wcp [opID=5fc87e99] Failed to create PI in NSX managers. Err: principal identity already created
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:32:22.364Z error wcp [opID=5fc87e99] Error occurred sending Principal Identity request to NSX: principal identity already created
wcpsvc-2020-12-12T17-36-11.039.log:2020-12-12T17:32:22.364Z error wcp [opID=5fc87e99] Failed to create PI in NSX managers. Err: principal identity already created
A starter filter:
grep error wcpsvc* | egrep -v 'Error checking if NSX resources exist|principal identity already created'
There are definitely more messages here that are generating traffic, but these errors appear to simply say "there is an error" as opposed to "the error is X."
The second step I'd recommend is to go into NSX-T Manager and review what Tanzu built for you.
Thanks for the reply. I think the problem is this:
"Error configuring cluster NIC on master VM"
All the rest is looking fine in vCenter and NSX-T. I could even download the kubectl software from the kubernetes control plane member ip-addresses. Only the cluster ip-address does not work.
Because I couldn't find the solution I also tried to configure Tanzu with haproxy instead of NSX-T. To my surprise I got a similar error on the haproxy. There was a problem with enabling the VIP address on the haproxy. The haproxy configuration was looking fine. I saw that the VIP-address was not configured on the network interface of the haproxy server. Only the management ip-address was configured.
I am even wondering if it might be a hardware issue. I am running everything on one workstation with 128 GB memory and an I9 processor with nested ESXi 7.01.
I think "Error configuring cluster NIC on master VM" is just a blanket statement that means it failed after NSX config stand-up was completed...
Did you have time to check? I am seeing the exact same issue on a single-host build where the cluster vIP isn't there, but one node IP is.
Are all three Supervisor VMs already up?
If so, how many NICs has each of them?
Regards,
Stephan
Not the original poster, but we both are able to reach each supervisor VM via the overlay network.
If you're up to compare, it has 2 total, and is reachable on both.
Yes, all 3 supervisor VM's were up.
I have deleted my cluster so I don't know how many NICs but I believe the master node had 5 ip-addresses (ip4/ip6) and the other nodes less. The issue is with the cluster/namespace ip-address.
Do we have solution for this . I am seeing same behavior in vSphere 7 Update 1
2021-01-07T18:04:54.088Z error wcp [opID=5ff66f83] Error occurred sending Principal Identity request to NSX: principal identity already created
2021-01-07T18:04:54.088Z error wcp [opID=5ff66f83] Failed to create PI in NSX managers. Err: principal identity already created
2021-01-07T18:04:54.088Z debug wcp [opID=5ff66f83] WCP service principal identity already created.
2021-01-07T18:04:54.113Z debug wcp [opID=5ff66f83] NSX HTTP Request is: &{Method:POST URL:https://10.202.xx.xx:443/api/v1/trust-management/token-principal-identities/ Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[] Body:{Reader:{"description":"Principal Identity for WCP cluster service account: wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","display_name":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","id":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","name":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7","node_id":"wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7"}} GetBody:0x843110 ContentLength:434 TransferEncoding:[] Close:false Host:10.202.xx.xx:443 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc000056058}
2021-01-07T18:04:54.24Z debug wcp [opID=5ff66f83] NSX HTTP Response is: &{409 409 HTTP/1.1 1 1 map[Cache-Control:[no-cache, no-store, max-age=0, must-revalidate] Content-Type:[application/json] Date:[Thu, 07 Jan 2021 18:04:54 GMT] Expires:[0] Keep-Alive:[timeout=60] Pragma:[no-cache] Server:[NSX] Set-Cookie:[JSESSIONID=BC0BF8015F1BA6308A935EF617AEDED6; Path=/; Secure; HttpOnly] Strict-Transport-Security:[max-age=31536000 ; includeSubDomains] Vary:[accept-encoding] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Nsx-Requestid:[8b1465a1-37e1-4fd8-993b-e12ab892fc42] X-Xss-Protection:[1; mode=block]] 0xc000a51380 -1 [chunked] false true map[] 0xc000275800 0xc0010ecb00}, {
"httpStatus" : "CONFLICT",
"error_code" : 2039,
"module_name" : "internal-framework",
"error_message" : "Principal TokenBasedPrincipalIdentityEntity{schemaValue=, identifier=null/null, touched=false, revision=0, displayName=wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, description=Principal Identity for WCP cluster service account: wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, createUser=null, lastModifiedUser=null, createTime=null, lastModifiedTime=null, systemResourceFlag=false, tags=[], name=wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, nodeId=wcp-cluster-user-domain-c8-51e560c5-3043-46a1-b532-89995fe028c7, dataProtected=true} already exists."
}
I'd recommend verifying that the logged principal identity is created in NSX, and that vSphere and NSX-T trust each other.