VMware Horizon Community
FCOETech
Enthusiast
Enthusiast

View 6.1 Disconnects, LF Troubleshooting tips

Full disclosure, I'm a tier 2 tech with some responsibility in maintaining our VDI. Not strong in View administration or understanding of virtual infrastructure (working on it). Could use some advice from those who do!

Recently moved to 6.1, seeing a lot of disconnects on Tera1, Tera2, and ThinPC (Windows ThinPC on various Dell Optis) after. We use a 3rd party service for VDI builds. I'm finding disconnects specifically to be a frustrating and difficult item to troubleshoot and would love some advice on where to start with this. Is it the VDI? View? The VM, vmx config, driver conflicts? PCoIP policies? Physical hardware, Tera1/Tera2? Firware related to the phyical hardware? The network? I try to do basic isolation and omission testing to quickly cut out variables, but often what works once doesn't in another test, or vice versa. I then question whether an issue presented in testing was actually transient and unrelated (true story, don't laugh, I once spent several days chasing disconnects before realizing users were unmounting their Ethernet adapters instead of USB drives from the safely remove hardware prompts in Windows.)

I timestamp and gather as much environmental info as I can when an issue occurs. When working with the View diagnostic log bundles, I see they pretty much grab everything from PCoIP messages to messages from the View agent and server, Guest OS events, policies, etc... What are some general tips and methods of troubleshooting disconnects you guys use? Are there some logs that will be more useful than others in troubleshooting disocnnects?

I've reviewed KB1030697:

     1. Our power settings are set by policy and are as recommended in the KB. I verified these are enforced and getting to the machines.

     2. Network config is in review by our networking team using the View6 networking diagram to verify open ports and communication between services, but should not have changed and are the same as they've been in our environment for a few years now. Also, the disconnects seem to affect same setups in same network and on same VLAN as others that do not experience them.

     3. We use the policies in the adm provided to define recommended allowances for available network resource. These policies also haven't changed, though I'm not exactly clear on how I can test how much bandwidth is being used and what the quality of the available resources are.

     4.Our VM hardware spec exceeds the recommended in the article, and I do not see spikes or 100% top outs when the disconnects are occurring on machines. I have not used the tools in the article for a more through investigation of this however.

     5. We've seen the disconnects on Tera1, Tera2, and ThinPCs. I know Tera1 is no longer supported, not sure about the workstations running ThinPC. Our eVGA PD06s should be good though and are seeing disconnects as well.

     6. Timeout value for all pools is set globally as recommended to 99999.

     7. 3rd party. Ah. Well. I'm building a new VM outside of our 3rd party vendor, and will see if I can recreate the disconnects. I can't really create a machine in the same way as out VDI vendor though, so being unable to recreate the issue I feel isn't a true isolation. It could still be View, or the View Agent, or something else on VMWare's side. Obviously VMWare can't troubleshoot other people's solutions and I wouldn't expect them too, but I feel I should still be able to get a point of failure/disconnect from the perspective of VMWare so I can take that back to the vendor if I need to. If I can figure out how and where to ID the disconnects in the different logs from the bundle I'm hoping I cna provide more info to both VMWare and the vendor in the future.

Thanks in advance!

EDIT: This turned out to be one of two fixes, which were applied around the same time but can't be pinpointed for root cause: either stale APEX drivers installed to the OS layer of our VDI solution, or a bad upgrade of the View agent and vmware tools after moving to View 6.1 After updating drivers for the cards on the hosts, they weren't updated for the client desktops. We did this and saw a drop in PCoIP errors in the logs timestamped at the same time of the disconnects. However, we had also reinstalled tools and the view agent at the same time. Who knows. Thanks for the assist and happy Viewing!

0 Kudos
9 Replies
Linjo
Leadership
Leadership

You describe the symptoms very briefly, it would be good to understand a bit more about the issue.

For example:

Was it an issue before upgrading to 6.1?

How often does these disconnects happen?

Is it as the same time every day?

Does it happen to several users?

What is the error-message?

Can they reconnect after being disconnected?

What is the status of the disconnected desktop in the Admin GUI?

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
FCOETech
Enthusiast
Enthusiast

You describe the symptoms very briefly, it would be good to understand a bit more about the issue.

- This is true, apologies! While I'd certainly appreciate troubleshooting on this specific issue (and have a case open), I was also looking to improve my log troubleshooting specifically by comparing other people's methods.

Was it an issue before upgrading to 6.1?

- No.

How often does these disconnects happen?

- For affected users, they cannot hold a session. After being in session for 1-5 minutes, they are disconnected and then continue to be almost as soon as they get back in.

Is it as the same time every day?

- No.

Does it happen to several users?

- It can, but not particularly as a group (like several sessions do not disconnect at exactly the same time).

What is the error-message?

- The user sees the guest session freeze, then they get a "You have been disconnected" message and are returned to either the View desktop client (ThinPC), or the Teradici logon prompt (zero client). In the guest logs, Windows has an application event for a fault in pcoip_server_win32.exe at the time of the disconnect. The guest session stays up and is otherwise unaffected.

Can they reconnect after being disconnected?

- Yes. The frequency of the next disconnect is not always immediate, but is usually within a few minutes and then increases in frequency until it is immediate.

What is the status of the disconnected desktop in the Admin GUI?

- Disconnected. Under monitoring > events, the View reports a pool request, agent accept, reconnect, disconnect in that order and all within 5-10 seconds.

0 Kudos
AlexeyKhudyakov
Enthusiast
Enthusiast

Hi FCOETech,

It would be much easier if you desribed a configuration of the faulted virtual machines.

Are the machines using NVIDIA vGPU?

0 Kudos
FCOETech
Enthusiast
Enthusiast

It would be much easier if you described a configuration of the faulted virtual machines.

- Sure! Is there a specific log from the log bundle that outlines the info you're looking for? The machines are on vmx-09. 2CPU, 3072MB RAM, VMXNet3 NIC, VMCI disabled.

Are the machines using NVIDIA vGPU?

- We have Grid cards in two of the three hosts these machines are on. 3D support is enabled. The pools are set to "3D Render: Automatic (not nVidia GRID vGPU specifically) ", max mon: 2, max res: 1920x1200. The machines have 128MB of vRAM.

0 Kudos
AlexeyKhudyakov
Enthusiast
Enthusiast

I have the same symptoms on machines with vGPU.

May be you should try to upgrade version of VM's from vmx-09 to vmx-10 or vmx-11 ?

FCOETech
Enthusiast
Enthusiast

I'll look into this Alexey, thank you. I have 3 hosts, (2) Dell R720s and (1) R820, which we are in the process of upgrading from 5.1 to 5.5; I'll work with our VM admin to evaluate moving some of these to vmx-10 for testing after all the hosts are updated. Have you seen any official reference from VMWare with regard to compatibility challenges between vmx-09 and vGPU (or unofficial threads on this)?


Also, do you know how this might present in the logs if it's vGPU causing problems?

0 Kudos
Linjo
Leadership
Leadership

But vGPU is not compatible with vSphere 5.5 and earlier, maybe you are using vSGA?

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
FCOETech
Enthusiast
Enthusiast

But vGPU is not compatible with vSphere 5.5 and earlier, maybe you are using vSGA?


     - This is correct, yes. Again, apologies, I am not highly familiar with the VMWare product lines and features. I think of utilizing the GRID cards in a generic sense of being a shared virtual GPU and used the term vGPU casually. I am aware of the branded feature of VMWare's vGPU and cited this incorrectly. VMware SVGA 3D drivers are deployed on the VMs and we use sVGA with the GRID cards on the hosts, yes.


0 Kudos
Linjo
Leadership
Leadership

Hope I did not sound to harsh in my reply, I can understand that the whole graphics thing is confusing since its a lot of different technologies and marketing lingo there.

vGPU is actually a Nvidia Trademark that they are using in a few different places.

Anyway, I hope that the issue is out of the way and that your users are happy!

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
0 Kudos