VMware Cloud Community
russ79
Enthusiast
Enthusiast
Jump to solution

ESXi 10-15-2014 Patches break Citrix Netscaler VPX Appliance

Part warning, part question....

My ESXi Hosts were v5.5.0 206190 (straight from the 5.5U2-RollupISO2 cd), I applied the following patches:

ID: ESXi550-201410101-SG  Impact: Important  Release date: 2014-10-15  Products: embeddedEsx 5.5.0 Updates esx-base

ID: ESXi550-201410401-BG  Impact: Critical  Release date: 2014-10-15  Products: embeddedEsx 5.5.0 Updates esx-base

ID: ESXi550-201410402-BG  Impact: Important  Release date: 2014-10-15  Products: embeddedEsx 5.5.0 Updates misc-drivers

ID: ESXi550-201410403-BG  Impact: Important  Release date: 2014-10-15  Products: embeddedEsx 5.5.0 Updates sata-ahci

ID: ESXi550-201410404-BG  Impact: Important  Release date: 2014-10-15  Products: embeddedEsx 5.5.0 Updates xhci-xhci

ID: ESXi550-201410405-BG  Impact: Important  Release date: 2014-10-15  Products: embeddedEsx 5.5.0 Updates tools-light

ID: ESXi550-201410406-BG  Impact: Important  Release date: 2014-10-15  Products: embeddedEsx 5.5.0 Updates net-vmxnet3

afterwards they were v5.5.0 2143827

Once upgraded my 2 Citrix Netscaler VPX Appliances (vmware hardware version 8, Citrix Netscaler OS v10.5.52.11nc (latest), FreeBSD 64bit, E1000 Nic) stopped responding on the network. They work fine when on the older 5.5U2 release but start dropping packets intermittently and then ultimately drop off the network when migrated to a host that has the above listed patches. I have 2 in High Availability mode, this occurs even if just one is on and the other off. It goes back to normal when vmotioned back to the pre-patched host.

I've tried switching networks, removing and re-adding nics, upgrading to hardware 10, deleting them all together and deploying the latest version from citrix's website, if it doesn't drop off all together it will drop a lot of packets. You can watch the netscaler console show NODE FAIL, NODE DOWN, and NODE UP events many times a minute.

64 Replies
MarcHuppert
Enthusiast
Enthusiast
Jump to solution

No, they are Windows 2008R2....

VCDX #181, VSP, VTSP, VCA, VCP-DCV(2+3+4+5+6+6.5+6.7+2019), VCP-DT, VCP-NV, VCAP(DCA4+5+DCD4+5), VCIX-NV, VCIX-DCV, VCI, vExpert, vEpxert NSX, vExpert VSAN and VCDX
0 Kudos
matthewkoeman
Contributor
Contributor
Jump to solution

then i'ts possible another issue, i recommand you to open a case @ VMWare. we do not have aby problems with other OS

0 Kudos
asananikone
Contributor
Contributor
Jump to solution

Same issue, resolved by rolling back ESXi.

0 Kudos
light_man
Contributor
Contributor
Jump to solution

stupid question maybe, but how to roll back easily?

same problems with a Netscaler VPX

0 Kudos
rufust3
Contributor
Contributor
Jump to solution

light_man,

On reboot of the host during the initial "Please wait while VMware is loading ..." screen, in the lower-right you'll see Shift-R (Recovery) as an option. When you press Shift-R, you'll be presented with the current build to boot into, or the previous one. Select the previous build. Once you've selected the previous build, you'll no longer have the option for booting to the other build number.

0 Kudos
revordr
Contributor
Contributor
Jump to solution

Also had this issue last night with updates so we rolled back, we also tried 4 different builds of the netscaler VPX 10.0-10.5 latest all had the same problem, if we moved the vm from a patched to unpatched host it would immediately start working again.

0 Kudos
JBergson
Contributor
Contributor
Jump to solution

Same problem for me. Netscaler and Esx 5.5 Update 2

0 Kudos
Itsjustus
Contributor
Contributor
Jump to solution

We had the same problem and have a new working situation now.

Haven't figured out the exact cause yet, but is has to do with a misconfiguration of the netsacler software in combination with delivery controllers/subnets/ network and/or arp requests.

Works as designed (which I think was wrong in the first place)

Ik took us 3 days to solve.

Yes we did the vmware update and had the same issue. Bring the network interfaces up and down, you have a ping and suddenly the netscaler stops responding. And on the console we were not able to ping any gateway or external server.  But we had 100+ other vmservers with no problems. And yes it looked like this was our only freebsd based server, and yes our only netscaler server. So why  did that one only have the problem.

I think the software thinks it has a new route based on an incoming packet from a citrix delivery controller and decides on it's own to change the default route to an other interface (not sure yet which one the Nsip/vip/mip/snip or whatever)

We got the latest netscaler software, buildt it on our XEN hypervisor environment got it working. And after a few hours we got a similar problem. In this case the ip adresses were still pingable but the result was the same. No access.

In the netscaler virtual server STA ticketing authority interface we had 4 controllers 2 down and 2 up. The ones that were up were  on a different subnet than the one that were down.

The up ones were our latest added delivery controllers. They were in the same ip-subnet as the NSIP interface.

So we removed them rebooted the netscaler and the problem was solved.

My concusion (for now) the STA must not be in the same subnet as the NSIP (and /or) the 2 VIP's we had. Our MIP was in allreadyin a different subnet.

Now I'm trying the figure out with our external party who installed out netscaler appliance 2 years ago what the requirements are for all the different interfaces and subnets/vi/STA etc within the netscaler software.

And I think it was pure luck the pasted 2 yearsk that the vmware hosts probably blocked that uninteded network trafic/arp requests or whatever.

When the smoke clears we will try to install it back on the vm platform and see if we get it working again

Forgive my typos etc, I'm in a hurry (lost 3days not puzzling 😉

Cheers and have a nice weekend.

Eric Burger

0 Kudos
JBergson
Contributor
Contributor
Jump to solution

Does Citrix have any solution (except roll-back ESX updates)?

Does anyone here have opened case with Citrix - so I can give them reference, when I open my case?

0 Kudos
melay
Contributor
Contributor
Jump to solution

Do you happen to have a VM in the same port group with a VMXNET 2 adapter ?

VMware KB: Virtual machines in the same vSwitch or vNetwork Distributed Switch (VDS) as a virtual ma...

0 Kudos
matthewkoeman
Contributor
Contributor
Jump to solution

VMWare techsupport told me : The issue seems to be a NetScaler driver problem when setting the tx_ring size, the ESXi patch just exposes the issue, but it is not the root cause of it. This is the recommendation from engineering so far: "Suggest customer consult Citrix to upgrade the e1000 driver on NetScalar to see if the issue disappears.   can somebody open a case @ citrix and keep us updated> (i only have sa @ citrix and not a support contract)"

0 Kudos
russ79
Enthusiast
Enthusiast
Jump to solution

I'm in the same boat at this point... its not VMware's problem and the status of our citrix support contract is shaky

0 Kudos
JJ6709
Contributor
Contributor
Jump to solution

I had an opened ticket with citrix regarding that issue, and they saying that already working together with VMware to resolve this problem.

There is no update yet from citrix support at this time.

0 Kudos
JJ6709
Contributor
Contributor
Jump to solution

Citrix Technical Support Case - 64255155

0 Kudos
nsojka
Contributor
Contributor
Jump to solution


Thank you for posting this!  We've been searching up and down for the last 24hrs trying to find what caused the sudden issues on our Netscaler.  Rolling back the ESX patch using the above instructions solved the issue.  Please post back when a permanent fix is released.

0 Kudos
dshapiro
Contributor
Contributor
Jump to solution

We had exactly same problem. The issue was resolved by downloading updated Netscaler appliance from Citrix importing it to VMware and configuring it. Once done everything is stable. Hope this helps.

0 Kudos
JJ6709
Contributor
Contributor
Jump to solution

what is the updated Netscaler appliance version/build etc. we have the latest update Netscaler ADC 10.5 Build 5211 (Oct 1 2014)

0 Kudos
Itsjustus
Contributor
Contributor
Jump to solution

Can anyone confirm if they also solved the problemby changing the STA(s) ip address(es) into an other subnet.

Afterwords I though there still could be 2 workarounds for the problem, the one I proposed or maybe all the STA's shouldl be in the same subnet.

Becasue we had  2 in one subnet and 2 in the one of the NSIP.

I would wait for a patch, because I think it's going wrong on the network level (by design) and my expirience is that even the big multinationals

make the same mistakes twice.

Eric

0 Kudos
JJ6709
Contributor
Contributor
Jump to solution

You should never have STA's in the same subnet as the netscaler (NSIP)

The SNIP subnet IP is designed to communicate with the STA's.

Anyway, Citrix and VMware need to work together to fix the code, mostly it is Citrix that need to take actions for that to be resolved.

We have few hundred VM's and only one, the citrix netscaler will brake when moved to the ESXi host with the latest  version v5.5.0 2143827

0 Kudos
matthewkoeman
Contributor
Contributor
Jump to solution

FIX (or work around?)

Hi all,

when working with VMWare Technical support they foud a possible solution which is working for  me almost an hour now.

maybe you guys can "Test" this solution also ?

Enter the shell mode in the NetScaler, then:

1) find where loader.conf is located on NetScaler VM    #find / -name loader.conf

For the uploaded NetScaler VM, there are 2 loader.conf: ./flash/boot/defaults/loader.conf and ./flash/boot/loader.conf, we only need to change the first one.

2) add "hw.em.txd=512" to loader.conf, this will change Tx ring size to 512 (note: do not set the ring size to 256, this will cause NetScaler VM core dump)

3) reboot the NetScaler VM

4) migrate it back to a host with latest patches

Good luck.....

edit: still working fine, also with userload in production environment