VMware Networking Community
Punkgeek
Enthusiast
Enthusiast

NSX-T faulty TEP

I have created a VLAN segment in NSX and a T-0 gateway. I added the created VLAN segment as the Tier-0 interface. Then, I created an overlay segment and connected it to the T-0 gateway.

Everything was fine until I connected the Overlay segment to one of the virtual machines, causing the Edge node and the ESXi host to go down.

I checked the VTEP between the ESXi hosts and the edge node, and they are responding with an MTU of 1700.

I'm using 192.168.8.0/24 for the management, which the ESXi hosts, NSX appliance, vCetner, and Edge node management IP are in this range.
And 192.168.10.9/24 for the VTEP IP range.

Here are the error messages that show in the NSX:

TEP Health, Faulty TEP
Description :
TEP:vmk10 of VDS:VDS at Transport node:2e7e8310-aabb-4c9e-aec1-9f37ed1f9fa8. Overlay workloads using this TEP will face network outage.

Recommended Action
1. Check if TEP has valid IP or any other underlay connectivity issues. 2. Enable TEP HA to failover workloads to other healthy TEPs.


Infrastructure Communication, Edge Tunnels Down

Description:
The overall tunnel status of Edge node 31829895-3a35-432f-a2d3-0b3d24469dd6 is down.

Recommended Action:
Invoke the NSX CLI command `get tunnel-ports` to get all tunnel ports, then check each tunnel's stats by invoking NSX CLI command `get tunnel-port <UUID> stats` to check if there are any drops. Also check /var/log/syslog if there are tunnel related errors.

High Availability, Tier0 Gateway Failover
Description:
The tier0 gateway 94bd643e-a463-452c-9c66-b734a6c31623 failover from Active to Down, service-router 3b7b34f6-ebee-4dd6-afc4-ae777f7d4fd3

Recommended Action:
Invoke the NSX CLI command `get logical-router <service_router_id>` to identify the tier0 service-router vrf ID. Switch to the vrf context by invoking `vrf <vrf-id>` then invoke `get high-availability status` to determine the service that is down.

 

 

 

 

 
0 Kudos
7 Replies
bmcb555
Enthusiast
Enthusiast

Hi,


Could you confirm a few things

Double check all your appliances and confirm they all have a valid TEP IP and they're not over lapping. Are all the TEPs in the same IP Address range and VLAN? 192.168.10.9/24 isn't a valid range.

What versions of NSX, vCenter and ESXi are you running?

Punkgeek
Enthusiast
Enthusiast

Hello, thank you for your response. 

Sorry, I wrote incorrectly. All IPs are in the 192.168.10.0/24 range, and there is no configured VLAN.

Nsx-T version 4

I didn't understand the part that appliances VTEP. How can an appliance have VTEP?

It's a nested environment; I only have a single edge node and a single nsx appliance. 

I've attached some screenshots from nodes and hosts.

Thanks

0 Kudos
bmcb555
Enthusiast
Enthusiast

Okay a nested environment. It doesn't look like any of your tunnels are up. Since it's nested have you configured the vSS/vDS on the top level host to support jumbo frames? Geneve needs at least 1600 although larger is recommended, what you have configured will be available on the uplink profile you applied to your hosts in NSX as it is configurable. Whatever you have set (I believe the default is 1700) set the top level vSS/VDS to that. 

I'd also configure the top level vSS/vDS to support tagging by apply 5095 to support tags.

DanielKrieger
Enthusiast
Enthusiast

I had the same problem in my nested lab and it turned out that my MTU size was configured correctly, but in between a switch could not handle jumbo frames. Have you pinged all TEP IP addresses from on of your ESX server? ping ++netstack=vxlan <dst IP> -s 1600 -d 

-s is for size, -d is for don't fragment

How is your LAB structured? Are the edge VMs inside your nested lab or outside? If the EdgeVM is inside, you can get problems, because Host TEP and Edge TEP are in the same VLAN and this is only supported under certain conditions with one Network Adapter.

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Punkgeek
Enthusiast
Enthusiast

Hello,

Yes, the edge node is inside the nested lab, but I'm using a single uplink for the ESXi and edge vm.

I've checked the MTU, and everything is working fine:

From ESXi to ESXi:

vmkping -I vmk10 -S vxlan -d -s 8000 192.168.10.61

8008 bytes from 192.168.10.61: icmp_seq=0 ttl=64 time=0.374 ms

 

From ESXi to Edge Node:

vmkping -I vmk10 -S vxlan -d -s 8000 192.168.10.61

8008 bytes from 192.168.10.61: icmp_seq=0 ttl=64 time=0.374 ms

 

How can ESXi communicate with the edge node in different VLANs? Could you please explain this?

 

Regards,

0 Kudos
DanielKrieger
Enthusiast
Enthusiast

Normally you would use different VLANs for TEP and the routing is done via the top of the rack switches. The exact setup can be found in the reference design guide. If your edge is inside the nested environment, you either need to give the ESX virtual server more than one NIC (can all be on the same network) and then use for example pNIC 1 for the ESX host TEP and use pNIC2 for TEP on the edge.

Here is a KB article which requirements for Shared VLAN TEP and under which circumstances the whole thing works.

https://kb.vmware.com/s/article/83743

I've built the whole thing in my LAB before and was going to write a blog article about it for my Company, but I haven't had time yet.

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
0 Kudos
siddhartha2303
Contributor
Contributor

If it's a home lab without nested ESXi and configured natively over vmware workstation, then make sure to select LAN segment for host nics instead of VMNet

0 Kudos