VMware Networking Community
scbennett72
Contributor
Contributor

NSX Overlay Performance on 25G NICS

We recently purchased BCM57414 Dual Port 25G NICS.  Testing conditions are the following:

  • ESXI hosts are connected to Nexus 9K switches configured in a Cisco VPC
  • (4) Interface LAG configured on each host
  • NIC supports TSO, LRO, RSSv2, Geneve Offload and Geneve RX-Filter
  • Running Iperf2 or Iperf3 with MTU 1500 and Single Stream results in 9 Gbps throughput
  • Running Iperf2 or Iperf3 with MTU 1500 and two or more streams results in 18 Gbps throughput (no higher)
  • Test environment include very basic DSW rules. VM's have also been excluded in the NSX-T exclusion list for DSW with no difference in results.
  • VM's used for testing are on the same overlay network on different hosts. Both hosts connected to the same 9K pair.

Now,

  • When VM's are on a standard distributed VDS (No NSX-T) in the same configuration, we are able to achieve line rate
  • When VM's are on an NSX-T VLAN-Backed segment on the same VDS as the overlays
    • Single stream results are 18Gbps (2x the throughput)
    • Multiple streams results in line rate throughput

We are currently working with support, and it has been escalated to engineering. Case has been open 2+ months. Throughput on 25G NICS are not much better than 10G NICS for a single stream. Even when hosts had 10G network adapters in the same configuration, throughput was not optimal (In my mind), but just chalked it up to the Geneve overhead.

Any thoughts? What are other's experience in terms of throughput on the Overlay? There is not much documentation regarding the Geneve Overhead though I get it because environments are different. Just curious on how much overhead others are experiencing on the Overlay?

0 Kudos
7 Replies
DanielKrieger
Enthusiast
Enthusiast

We use Intel 2x25 Gb/s network cards, we use an MTU size of 9000 for overlay and our VMs use 8800 MTU. We have near linespeed east/west traffic. I have no experience with the cards you are using, but as long as the drivers and the card are supported, there should be no problems. Does the overlay network work consistently without fragmentation?

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
0 Kudos
scbennett72
Contributor
Contributor

Hi Daniel,

Thanks for the response.     We do use 9000 on the overlay, but we use 1500 for the VM's  (Except on our Storage interfaces where we run at 8800).   We have tested with 8800 on the test VM's,  and the results are in line with what we see on a VLAN-Backed segment with VM's at 1500 MTU.    Roughly 18Gbps with a single stream, and line speed with multiple streams.   

In regards to your fragmentation question,  we don't see any fragmentation on the overlay.    I just ran a Tcpdump between my two VM while running iperf just to confirm, and the capture is clean.    There is no fragmenting, retransmits, dup acks, etc. 

I would be curious to see what your throughput is with 2 VM's set to an MTU 1500. 

0 Kudos
pcgeek2009
Hot Shot
Hot Shot

We found with the overlay we had to set our systems to 8914 to prevent fragmentation. We only do 9000 for some specific systems and backup dumps (my DBA's prefer to do flat file dumps rather than an agent). We also use Windows Failover Clustering on some and once again, those had to be 8914 to prevent issues with the cluster. NSX does not let you go above 9000, which we did with our Cisco UCS environment. We have now gone to all VSAN Ready nodes with 100G NICs and network switches. However, the VM's are still set to 10G. 

0 Kudos
DanielKrieger
Enthusiast
Enthusiast

There is a lot of SMB 3 traffic on my platform, which is why we have set our VMs to 8800. Our physical switches are consistently on 9216, VMware's distributed switch supports a maximum of 9000.

Maybe next week I can run tests with 2 test vms and 1500 mtu. If the traffic goes over a vlan portgroup, the geneve overhead is dropped and therefore this traffic is higher. VMware always tests the overlay performance with multistream and then you should get about 20 Gb/s with -P 4 at iperf and almost linespeed with 8800 MTU

But in the real world the results depend on much more than the MTU. It depends on the protocol, whether single or multisession is used, etc. In addition, the number of CPU cores also has an influence on the network performance and the network card used. VMXNET 3 should be used for maximum performance. You can also tweak the RSS settings for Windows.

The easiest way to test your VTEP network is with a vmkping and see if you have a fragmentation problem. Simply ping through all tunnel endpoints. If the ping works with a packet size greater than 8XXX, then you have no fragmentation.

vmkping ++netstack=vxlan <destination VTEP IP address> -d -s <packet size>

-d = don't fragment

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
0 Kudos
DanielKrieger
Enthusiast
Enthusiast

VVMXNET 3 network cards always shows a 10 Gb connection in the operating system, this is normal and has no influence on the actual speed.

 

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
0 Kudos
DanielKrieger
Enthusiast
Enthusiast

@AbbedSedkaoui 

Both links are about Baremetal N/S Performance. This is a) not achieved with a single stream and b) baremetal edges need specific hardware and are also optimized by VMware. See deactivated hyperthreading and ringbuffer modification.

The thread creator is concerned with the East/West performance between two VMs on different hosts when a single TCP stream is used with an MTU size of 1500 bytes.

 

 

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
0 Kudos