Solved: Re: NVIDIA Grid vGPU M10 performance issues with P...

AlexWhiteraft · ‎01-16-2019

We have been working on implementing M10 vGPUs in our VMware environment and have been experiencing performance issues. We worked with NVIDIA to verify that the environment is setup correctly. Here is quick bullet point list for the environment.

vSphere 6.7
Host have VMware ESXi, 6.7.0
- PowerEdge R740
- Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
- 768 GB RAM
- 2x M10 GPUs
Horizon 7.4.0

Linked Clones
- Windows 10 1803 (also tried 1809 & 1709 builds)
- 4 core
- 8 GB RAM
- M10-1B vGPU profile
- Teradici based zero clients (PCOIP)

What we have noticed is after our first small set of user testing and no issues we began a larger test and noticed that once we hit about 15 users per M10 we began getting performance issue reports. The users are seeing lag in the interface, a right click on the desktop might take 5-10 seconds for the context menu to appear. The same could be seen with the start menu. Additionally these issues were only occurring on vGPU VMs. On the same host non vGPU VMs were not experiencing the same slowdowns. We began to notice that the pcoip_server_win32.exe was using a lot of CPU and GPU time via the task manager. We began trying different version of the vmware agent and direct connect, various revisions of the driver for esxi. We tried standalone fresh copies of windows 10 and various build numbers. Thus far no combination we have attempted has resolved the issues for vGPU machines when there are more than a few users per machine. The performance problem appears even if we use the Horizon software client and have it set to PCoIP.

We tried running it with different Horizon Agent versions (6, 7.0.2, 7.2.0, 7.4.0, 7.5.1, 7.6 & 7.7) and using direct connect bypassing Horizon Server.

We also tried running VMs using VMware Blast protocol and it didn’t have the high GPU usage issues, but unfortunately, almost all of our thin clients only support PCOIP.

Attaching screenshot below: please note the GPU utilization of PCoIP Server (32bit) process.

UPDATE 1:

After a long discussion with NVIDIA, they concluded that it's not on their side.

They pointed us at this KB: https://nvidia.custhelp.com/app/answers/detail/a_id/4156/~/nvidia-smi-shows-high-gpu-utilization-for...

Looks like it's a known issue with Teradici PCOIP protocol and it hasn't been fixed yet.

UPDATE 2:

I tried downloading and installing Teradici's PCOIP Agent (PCoIP_agent_release_installer_graphics.exe) direct from Teradici. Then I ran "NvFBCEnable.exe -disable", it disables NVFBC capture and switches back to CPU. And it works great - no GPU spike when idle and a much better performance overall.

However, when I try to do this on Horizon Agent's Teradici protocol it disables NvFBC for a brief period of time and then as soon as I reconnect via PCOIP it re-enables it back, see extract from a log below:

Svgadevtap: NvFBC – Fixed capture by enabling NvFBC

Is there a way to permanently disable NvFBC on Horizon Agent's Teradici Protocol?

AlexWhiteraft · ‎01-18-2019

We were able to find a workaround for our problem.

We used a combination of memory dump & Sysinternals process monitor to find the registry key.

Below is a combination of settings we use to achieve satisfactory performance:

[HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap]

"Win32FrameRate"=dword:0000002d

"MaxAppFrameRate"=dword:0000002d

"ForceWin32Capture"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin_defaults]

"pcoip.audio_bandwidth_limit"=dword:000001c2

"pcoip.enable_build_to_lossless"=dword:00000000

"pcoip.enable_console_access"=dword:00000000

"pcoip.minimum_image_quality"=dword:00000028

"pcoip.maximum_initial_image_quality"=dword:00000050

"pcoip.maximum_frame_rate"=dword:0000002d

"pcoip.use_client_img_settings"=dword:00000000

The config is for 45FPS which is the maximum we can achieve with our current NVIDIA license.

We already spoke with VMware and they confirmed that is indeed a workaround for now, until they get Teradici to fix PCOIP code.

View solution in original post

AlexWhiteraft · ‎01-17-2019

I have found this registry hack - Create a new DWORD "NoNvFBC" and set "1" in Data here - HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap

Then open CMD and CD to this folder "C:\Program Files\Common Files\VMware\Teradici PCoIP Server", and run "NvFBCEnable.exe - Disable". Reboot.

It works fine with 1 monitor, however then we add a second monitor to our Zero Client, it distorts the picture on both monitors...

AlexWhiteraft · ‎01-17-2019

I spoke to one of the Teradici's representatives and they said:

"Unfortunately, Teradici cannot change the behavior within Horizon as it is a VMware product and they control any PCoIP changes that goes into Horizon."

So now we are hoping for a resolution from VMware team.

AlexWhiteraft · ‎01-17-2019

We decided to give it another go and built everything fresh from scratch with brand new ESXi, drivers, 1803 image, etc.

The VM below isn't joined to the domain and doesn't have anything installed apart from Nvidia drivers & Horizon Agent 7.7.

See screenshot below, as you can see it's using 30% of GPU utilization in idle.

AlexWhiteraft · ‎01-18-2019

We were able to find a workaround for our problem.

We used a combination of memory dump & Sysinternals process monitor to find the registry key.

Below is a combination of settings we use to achieve satisfactory performance:

[HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap]

"Win32FrameRate"=dword:0000002d

"MaxAppFrameRate"=dword:0000002d

"ForceWin32Capture"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin_defaults]

"pcoip.audio_bandwidth_limit"=dword:000001c2

"pcoip.enable_build_to_lossless"=dword:00000000

"pcoip.enable_console_access"=dword:00000000

"pcoip.minimum_image_quality"=dword:00000028

"pcoip.maximum_initial_image_quality"=dword:00000050

"pcoip.maximum_frame_rate"=dword:0000002d

"pcoip.use_client_img_settings"=dword:00000000

The config is for 45FPS which is the maximum we can achieve with our current NVIDIA license.

We already spoke with VMware and they confirmed that is indeed a workaround for now, until they get Teradici to fix PCOIP code.

RHaerri · ‎01-24-2019

Hi Alex

I currently have the same Issue at a Customer.

You marked your Thread as solved. Did your Registry Changes really help?

Do you have any experience with your Changes on Zero Clients with two Screens?

Would be great to benefit from your effort.

Thanks

Robin

AlexWhiteraft · ‎01-24-2019

Hi Robin,

Yes, changing the registry helped and it works great on two monitors.

See before & after the reg change.

Before - with 30 VM sessions:

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.68 Driver Version: 410.68 CUDA Version: N/A |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla M10 On | 00000000:3D:00.0 Off | N/A |

| N/A 51C P0 31W / 53W | 8142MiB / 8191MiB | 79% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla M10 On | 00000000:3E:00.0 Off | N/A |

| N/A 46C P0 30W / 53W | 8142MiB / 8191MiB | 77% Default |

+-------------------------------+----------------------+----------------------+

| 2 Tesla M10 On | 00000000:3F:00.0 Off | N/A |

| N/A 28C P0 17W / 53W | 8142MiB / 8191MiB | 29% Default |

+-------------------------------+----------------------+----------------------+

| 3 Tesla M10 On | 00000000:40:00.0 Off | N/A |

| N/A 40C P0 30W / 53W | 8142MiB / 8191MiB | 73% Default |

+-------------------------------+----------------------+----------------------+

| 4 Tesla M10 On | 00000000:DA:00.0 Off | N/A |

| N/A 52C P0 33W / 53W | 8142MiB / 8191MiB | 94% Default |

+-------------------------------+----------------------+----------------------+

| 5 Tesla M10 On | 00000000:DB:00.0 Off | N/A |

| N/A 44C P0 19W / 53W | 5094MiB / 8191MiB | 41% Default |

+-------------------------------+----------------------+----------------------+

| 6 Tesla M10 On | 00000000:DC:00.0 Off | N/A |

| N/A 41C P0 35W / 53W | 6110MiB / 8191MiB | 100% Default |

+-------------------------------+----------------------+----------------------+

| 7 Tesla M10 On | 00000000:DD:00.0 Off | N/A |

| N/A 43C P0 25W / 53W | 6110MiB / 8191MiB | 57% Default |

+-------------------------------+----------------------+----------------------+

After - with 50 VM sessions:

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.91 Driver Version: 410.91 CUDA Version: N/A |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla M10 On | 00000000:3D:00.0 Off | N/A |

| N/A 29C P8 10W / 53W | 8142MiB / 8191MiB | 2% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla M10 On | 00000000:3E:00.0 Off | N/A |

| N/A 30C P8 10W / 53W | 8142MiB / 8191MiB | 18% Default |

+-------------------------------+----------------------+----------------------+

| 2 Tesla M10 On | 00000000:3F:00.0 Off | N/A |

| N/A 27C P0 20W / 53W | 8142MiB / 8191MiB | 5% Default |

+-------------------------------+----------------------+----------------------+

| 3 Tesla M10 On | 00000000:40:00.0 Off | N/A |

| N/A 27C P8 11W / 53W | 8142MiB / 8191MiB | 34% Default |

+-------------------------------+----------------------+----------------------+

| 4 Tesla M10 On | 00000000:DA:00.0 Off | N/A |

| N/A 29C P8 10W / 53W | 8142MiB / 8191MiB | 4% Default |

+-------------------------------+----------------------+----------------------+

| 5 Tesla M10 On | 00000000:DB:00.0 Off | N/A |

| N/A 35C P0 19W / 53W | 7126MiB / 8191MiB | 3% Default |

+-------------------------------+----------------------+----------------------+

| 6 Tesla M10 On | 00000000:DC:00.0 Off | N/A |

| N/A 31C P0 19W / 53W | 7126MiB / 8191MiB | 1% Default |

+-------------------------------+----------------------+----------------------+

| 7 Tesla M10 On | 00000000:DD:00.0 Off | N/A |

| N/A 29C P8 10W / 53W | 7126MiB / 8191MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

LukaszDziwisz · ‎04-29-2019

Hello @Alex

It appears that we are hitting exactly the same problem on our end with PCOIP however I do not seem to have a key

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin_defaults]

The only thing we have is [HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP

Did you end up creating the pcoip_admin_defaults key and then adding all of the DWORD values?

SchwarzC · ‎09-06-2019

Dear Alex.

Thank you for your fix - this fixed our issue with idle gpu as well - but our PcoIP Server Process`s CPU now always has 20-30 CPU regardless of what is on the monitor.

Any idea what we could do?

Best regards

sWORDs · ‎03-18-2020

This has been fixed by Teradici and the Horizon team in Horizon 7.12, which we released yesterday. Internal testing results:

GPU usage for NVidia is drastically reduced, from 2-3 times for a single HD display (5% before ->2% after) up to 10 times for 4xUHD displays (50% before ->5% after).

CPU usage is equal (slightly less, but within testing variance) to the old path.

All

NVIDIA Grid vGPU M10 performance issues with PCOIP protocol (high GPU utilization)