The express patches have been posted. This thread is long.
Please post technical experiences here and non-technical feedback here. --JohnTroyer
Hi all,
We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.
The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.
The bug:
Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".
Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:
Aug 12 10:40:10.792: vmx| This product has expired.
Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.
Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".
A call to tech support confirmed this as a known problem with a temporary workaround.
The work-around:
Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.
As soon as the date was reset to the 10th - problem solved.
Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.
So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.
There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.
Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!
Cheers,
Matt Kilham
Message was edited by: JohnTroyer to add new thread links.
Another important thing to note:
If you are using VC 2.5U1, and you added any ESX 3.5 U2 hosts to the VC, you have to upgrade VC to 2.5 U2 or the service will fail to start / restart / do anything if you reboot the machine.
Upgrade VC to 2.5U2 - Use the .zip, not the ISO, there is something borked with the ISO also.
VMWARE is aware of the problem, has KBs up related to the ISO problem - I don't recall the exact #, but its out there.
All you have to do is build another VC and point to your database add in the host and then vmotion your virtual VC to a working server, then shutdown the new VC then start up the Virtual VC to continue on.
Dan L. Buchanan | Microsoft Engineer
If the service does not start have you had a look at the VC log files to find out what the problem is? Also do you have your database server up and running.
Start by taking a look in the following directory C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\Logs
Leonard...
-
Don't forget if the answers help, award points
All you have to do is build another VC and point to your database add in
the host and then vmotion your virtual VC to a working server, then
shutdown the new VC then start up the Virtual VC to continue on.
Dan L. Buchanan | Microsoft Engineer
Just downloading the VC centre 2.5 u2 patch. In the meantime here is mt log
Log for VMware VirtualCenter, pid=3332, version=2.5.0, build=build-84767, option=Release, section=2
Current working directory: C:\WINDOWS\system32
HOSTINFO: Seeing Intel CPU, numCoresPerCPU 4 numThreadsPerCore 1.
HOSTINFO: This machine has 1 physical CPUS, 4 total cores, and 4 logical CPUs.
Log path: C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\Logs
Using system libcrypto, version 90709F
Vmacore::InitSSL: doVersionCheck = true, handshakeTimeoutUs = 120000000
Starting VMware VirtualCenter 2.5.0 build-84767
Log directory: C:\Documents and Settings\Default User\Local Settings\Application Data\VMware\vpx.
Enabled low-frag process heap.
Unable to recover from 42000:195
Unable to recover from 42000:8180
Couldn't read registry entry vmdbPort
Couldn't read registry entry managedIP
Ignoring unknown entry from DB: AgentUpgrade.maxConcurrentUpgrades
Ignoring unknown entry from DB: Perf.Stats.FilterLevel
Invoking callbacks for key vpxd.npivWwnGeneration.singleNodeWwn
Invoking callbacks for key vpxd.npivWwnGeneration.portWwnNumber
Locale subsystem initialized from C:\Program Files\VMware\Infrastructure\VirtualCenter Server\locale/ with default locale en.
Setting VerifySSLCertificates to FALSE
Failed to initialize: not well-formed (invalid token)
Failed to intialize VMware VirtualCenter. Shutting down...
Forcing shutdown of VMware VirtualCenter now
Any help would be much appreciated
Lee Richardson
Hi Lee Roy
What I would do is take a look at this other thread http://communities.vmware.com/thread/155382. If that does not help, it might be time to call suporrt.
Leonard...
-
Don't forget if the answers help, award points
I didn't want to read through 46 pages of posts. so if someone can give me a quick answer I would appreciate it.
IS THE PATCH DOWNLOADABLE VIA UPDATE MANAGER? MY UPDATE MANAGER IS NOT SEEING THE PATCH, IS THIS A PROBLEM ON MY END?
vmware says to view the kb articles but the LINKS ARE BROKEN.
Anyone...
Yes, I was able to get it from Update Manager yesterday. However, I did have to cycle the VMware Update Manager service before I could see it include it in a baseline. But it did come down when I forced the download from my client.
-Doug
The update manager works for applying the latest patch which fixes the time bomb issue with U2 hosts. I used the Update Manager to patch my hosts.
I also used update manager to roll out the ESXu2 time bomb fix. One esx box had to be bounced (lost conn to the vc server) while the other did not. This was in a lab.
Jason Boche, MCSE NT4/2000/2003, MCSA 2000/2003, MCP, VCPx2, CCAx2, A+
Systems Support Analyst 4
Wells Fargo Bank
625 Marquette Ave.
16th floor, N9311-162
Minneapolis, MN 55479
612-667-2473 office 612-910-5281 mobile
jason.g.boche@wellsfargo.com Email/AIM
*Sent from my BlackBerry handheld device*
Great, thanks for the info.
Ed Valenciano
Global Technical Enablement - HP Software
303.886.1544 mobile | Ed.Valenciano@hp.com
3404 E Harmony Road | FTC06 | Fort Collins | CO 80528
It also occurs if you vmotion a VM to a host with a wrong date.
Personally experienced this in my environment, so I can't wait for you to tell me it didnt happen.
--Matt
hey martin -- just left you a voicemail. You can reach me on my mobile 510 520 7832.
Everything ok?
thanks for the replies regarding update manager and the recent patches. In case anyone else is running into an issue with Update Manager the fix for me was to change the schedule once again I set it to two minutes in the future, daily vs weekly and let it run. Finally it detected and is downloading the updates.
I have managed to get my VC back onli by applying the 2.5 u2 patch, I have reconnected all my esx servers, but I am still unable to migrate an live vms, anyone any ideas whats causing this.
Lee Richardson
Ok Guys I have a question,
We haven't updated to U2 yet but update manager has downloaded it (25/07) and the fix but and heres the rub.... Should we install the original U2 and the fix or somehow remove the U2 update as I understand that has been fixed itself now?
Any pointers greatfully received.
Harvey Dowler
IT Support Manager
Ideal Shopping Direct PLC
Registered Office: Ideal Home House, Newark Road, Peterborough, PE1 5WG, UK. Registered in England No. 1534758
www.idealshoppingdirect.tv, www.idealworld.tv,www.createandcraft.tv, www.idealvitality.tv, www.createandcraft.biz,
Freeview Channel 22, Satellite Channel 634, Virgin TV 747
Hi Lee
First thing to do would be to start a new thread as a lot of people have given up on this one because it is too big.
Having said that, have you put the patch onto your 3.5U2 servers so that the licensing bug is fixed? If you have create the new thread and describe exactly what you have got versions and all. Can you power on VMs on the same servers that you cannot VMotion from / to?
How did you restore the patch. Put this information into a new thread or if you are under maintenance and you need it fixed quickly I would speak to support.
Leonard...
-
Don't forget if the answers help, award points
Clarification needed on patching.
I had updated my VirtualCenter Server from 2.5 U1 to U2 on August 10th from the 3.5 Infrastructure Management Installation download. My ESX 3.5 Servers are at U1 and were not upgraded to U2 (was planning this week - Wed).
My understanding is that code on ONLY ESX 3.5 Servers with U2 pre-12Aug are affected, so VC2.5U2 is OK?
And I should just proceed to install the latest build 110268 'ESX 3.5 U2 Refresh' ?
Appreciate any guidance from the community.
It also occurs if you vmotion a VM to a host with a wrong date.
Personally experienced this in my environment, so I can't wait for you to tell me it didnt happen.
--Matt
Hi Matt,
This is where things get interesting! I tested this on 3 (yes three) environments, and was unable to get a VM jump back in time by vmotioning it to a backdated ESX host. I tested this by monitoring the vmotioned VM, and the date simply remained unaffected... I am wondering what might have caused your leap in time to happen. I would like to know, because this kind of unexpected behaviour might sting you later on...