VMware Cloud Community
mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
0 Kudos
704 Replies
gdragats
Contributor
Contributor

LudoS,

Agreed,

Question to VMware:

  • What will the format be of the fix?

  • How do we know something like this will not ocurr again or anything else for that matter?

I spent a whole day in a data centre upgrading from 302 to 35u2. I hope there will not be a complete reinstall of ESX but only a simple update.

PLEASE!

Although there is a bit of a verbal cloud going on, I suggest "temporarily fix" the issue, let the engineers find a an efficient fix and then do a post mortem

Although our customers are impacted, there is no use in throwing negativity. Lets fix and move on and ensure this doesn't happen again by taking appropriate measures. (VMware included)

0 Kudos
rabbie
Contributor
Contributor

How on earth would we perform a full re-install?? It would mean (for many of us) going into a data centre with a Portable HDD and maybe a notebook pc, copying all the VM's from a server, re-installing the OS, and then restoring the VM's... Unless your fortunate enough to have a secondary server on the network you can move your VM's to temporarily.. The fix for this must be made available through update manager.. I mean otherwise, what would be the point of having the Upgrade Manager tool ?

0 Kudos
davidbarclay
Virtuoso
Virtuoso

As an SI we typically don't install bleeding edge at customer installations..so have seen minimal damage. However..as technologists we have it running in internal production and lab environments...

...so we are having this same annoying problem!!

Changing the date/time just doesn't cut it really...When was the last time a time/date change "cracked" a piece of software anyway lol Smiley Happy

Dave

0 Kudos
s1xth
VMware Employee
VMware Employee

Agreed!!! I JUST moved to ESXi a week ago, have only 12 VM's on two boxes, not like you guys with a data center, but still, they BETTER release a patch through update manager, I was sweating my a** off this morning when I had to reboot a host and nothing would start back up!...not very good VMware...

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos
jasonboche
Immortal
Immortal

Although our customers are impacted, there is no use in throwing negativity. Lets fix and move on and ensure this doesn't happen again by taking appropriate measures. (VMware included)

Impacted customers will choose on their own how to react. Trying to manage the reaction of peers will probably just add fuel to the fire. The forum moderators can provide some cleanup if it gets out of hand or inappropriate. The degree to which each customer is impacted will vary. No doubt the time spent addressing this issue along with the heat from their management and their customers is going to cause some to be emotional, especially for those who have been up all night in the Datacenter spending long hours troubleshooting. Unfortunately the nature of this issue really adds insult to injury.

A little emotion could be just what the doctor ordered for the VMware camp so that they can grasp a firm understanding of the effectiveness and impacts of their quality control process which has been called into question at least twice already this year (reference the December 17th, 2007 ESX 3.5 re-release in February 2008 and the MD5SUM inconsistency pointing to an incorrectly posted ESX build also in the spring of 2008).






[i]Jason Boche[/i]

[VMware Communities User Moderator|http://communities.vmware.com/docs/DOC-2444][/i]

VCDX3 #34, VCDX4, VCDX5, VCAP4-DCA #14, VCAP4-DCD #35, VCAP5-DCD, VCPx4, vEXPERTx4, MCSEx3, MCSAx2, MCP, CCAx2, A+
0 Kudos
rabittom
Contributor
Contributor

Hi everybody,

please note that setting the date date -s "08/10/2008" sets the time back to 0:00:00. We're using ESXpress for VMDK-backup and after a few minutes the daemon started backing-up our VM's.

Hope VMWare will provide a fix quickly!

regards

gernot

0 Kudos
gdragats
Contributor
Contributor

Here's one for the books:

http://keznews.com/2087_Next_Vista_2099_Time_crack

Who said time cracks don't work? LOL

0 Kudos
stuart_ling
Contributor
Contributor

I am sure that VMware are well aware that they have upset a lot of customers; the biggest problem with this issue is that it could not have been identified in customer test labs.

Ironically I've just recevied a 2008 VMware customer survey - perhaps it will not look that great this time round.

0 Kudos
gdragats
Contributor
Contributor

Hi Jasonboche,

Agreed, although as tech consultants we will be assiting them on their decisions.

I also think to our clients or business units we would need to maintain a level of calmness to all stake holders and although it might be a bit emotional in here, emotion will not cut it when dealing with key business people.

I would maintaind a plan of action and have started preparing a fallback plan if there is no resolution soon. (or an efficient one).

I also agree that VMware should feel the heat from this but let;s do it post fix while we focus on the above. (I am visualising a shareholders meeting scenario with a board and angry shareholders ( in this case techies) screaming how could this happen etc etc etc)

0 Kudos
A13x
Hot Shot
Hot Shot

I had loads of VMs which wouldnt power back on and i hae only slept for a couple of hours myself, basically after applying the time fix to get the VMs back up the domain controllers in our AD all went crazy because they all picked up the ESX host date which is strange (but i think its osmething to do with when i updated the vmware tools) and the entire domain was living in the past, this had a knock on effect, DHCP, DNS etc all down.

VMware tech support were fast to look into it when i called up and gave me a KB article which they said would be updated but was later pulled from their KB 😕 now i am wondering if the VMs will stay up and why wasnt this spotted sooner

0 Kudos
ChicaneUK
Enthusiast
Enthusiast

Haven't had a chance to dig through all 14 pages of this but we seem to be affected by this too - on VMware ESX 3.5.0 U1. Mistakenly just powered off a VM and cannot power it back on - getting the same error messages and licensing alerts in the hostd.log files.

We updated the server last night (deliberately excluding 3.5.0 U2) just because we didn't want to be totally bleeding edge but we're still suckered with this bug...

(edit)

Just found that despite deliberately excluding the update to U2, it seems as though some files that have gone through have updated the server to U2. Great.

0 Kudos
ADHDave
Contributor
Contributor

thanks for the screenshot, the kb.vmware.com site is unavailable

0 Kudos
LeoKurz2
Enthusiast
Enthusiast

To avoid time issues within th VMs, for those who start the VMs out of the GUI: Set the time of the ESX to a date before today, then use the new option to boot the VM into the BIOS, set the right date & time on the ESX machine, set the right date & time in the BIOS of the VM, exit the BIOS and the VM will start with accurate date & time. Disable timesync within VMware tools if enabled and get the right time within the guest OS (DC, NTP...)

__Leo

0 Kudos
Vodder
Enthusiast
Enthusiast

I got the same survey - quite ironic to get it today of all days!! Thankfully we aren't even running Update 1 let alone Update 2. Glad for the notice about Update Manager doing it's own thing as well.

@timgleed | VCAP5-DCA/DCD | VCAP4-DCA/DCD | VCP5 | VCP4 | VCP3 | VCP4-DT | VCA4-DT | VTSP4 | MCITP | PRINCE2 | ITIL | BSc Hons
0 Kudos
THP
Contributor
Contributor

@ ChicaneUK - That is exactly what happened to us we tried to exclude it but somehow it sneaked through.

An issue I would hope VMWare will address after they fic the main issue.

0 Kudos
Hem09
Contributor
Contributor

I'm replying to receive email updates on this thread.

▲ www.vLemon.com ▲ VCP 3,4,5 VCAP-DCA & VCAP-DCD
0 Kudos
LeoKurz2
Enthusiast
Enthusiast

If you install all patches from Update Manager, you'll get U2 as well... The U2 patch is simply an XML-file reffering to all available patches. Just take a look at the tar-ball update, it's more than twice(!) the size of the iso because it includes all the patches. The directory for U1 & U2 simply contain the XML metadata. Makes the tar-balls quite useless...

__Leo

0 Kudos
max70
Contributor
Contributor

Same problem here on two brand new esxi servers.

Fortunatley date/time workaround fixed the problem (thanks!). We are waiting for the patch!

0 Kudos
DeloitteIT
Contributor
Contributor

Hi, has anybody got an official workaround from vmware customer support?

Chris

0 Kudos
waghekk
Contributor
Contributor

Whats the latest VMware giving us on this?

0 Kudos