VMware Cloud Community
joachimbuechse
Contributor
Contributor

Linux + vmxnet3 UDP propagation delay / select timeout overshot

Good day,


I'm seeing (rare) excessive UDP packet propagation delays for incoming traffic with Debian wheezy kernel 3.2.0-p4 (default kernel of stable debian release) and vmxnet3 on ESXi 5.1.


We have a VM that is driving digitizers that send data with about 40-60MB/s. After some physical NIC tuning we see zero packet drop. However over a period of 12hours I see 180 cases (i.e. on average every 4 minutes) where a UDP packet was delivered with rather extreme delay. We use a blocking select() with timeout followed by recv() to read from the UDP socket.


Observations:

  • With a select timeout of 24ms we often get timeout overshot of up to 30ms with the default kernel - i.e. the select call does not return after 24ms but after 30, 40 even 54ms.
  • Every single time we see timeout overshot (of up to 30ms!), less than 0.1ms later the next packet arrives.
  • The incoming packets have sequence numbers, so I could verify that it is not packet loss.


I've compiled a custom kernel with

# CONFIG_RCU_FAST_NO_HZ is not set

CONFIG_NO_HZ=y

CONFIG_HZ_1000=y

CONFIG_HZ=1000

which drastically reduces the timeout overshot. So my first thought is that this is an RCU issue in/with the driver. However what really puzzles me is that the data is always there 0.1ms later. So it almost seems like the driver is missing to "raise an interrupt" on incoming data from time to time.

Any help appreciated.

0 Kudos
2 Replies
JarryG
Expert
Expert

I hope by "Debian 3.0.2" you mean "Debian with Linux-kernel version 3.0.2". Because Debian 3.0 has been released some 6 or 7 years ago. Anyway, maybe it is time to update your kernel. IIRC, there have been some changes to vmxnet3 kernel driver since 3.x. BTW I think anything more than config_hz_250 is contra-productive for server. Higher values are generally recommended for desktop-systems...

_____________________________________________ If you found my answer useful please do *not* mark it as "correct" or "helpful". It is hard to pretend being noob with all those points! 😉
0 Kudos
joachimbuechse
Contributor
Contributor

Sorry 3.0.2 was a typo, kernel version is 3.2.0-p4 (i.e. the default kernel of the stable debian release).

This is not really a server in the sense that it does not need to provide outgoing data with high bandwidth. It needs to handle incoming data in a very timely manner.

0 Kudos