VMware Cloud Community
benny_hauk
Enthusiast
Enthusiast

Newer IBM servers not vMotion compatible despite identical CPUs

I just installed a new HS22V blade, identical to a pack of others installed about a year ago.  CPUs are identical, UEFI version is identical, Processor Settings in the firmware are identical.  Still not vMotion-compatible with existing cluster of blades though.  Crazy.  With VMware tech support and an IBM KB article we finally figured it out and fixed it.

It seems that IBM servers shipping with UEFI prior to 1.10 came with the Westmere "AES" feature disabled and those servers shipping with 1.10 and later have it enabled.  Even if you upgrade an older server to 1.10 or 1.12, etc, it still has AES disabled.  AES isn't something you can enable/disable from the BIOS Settings for some reason so it's not readily obvious what's out of sync.  While your 0x1 level, ecx row may vary from mine due to other cpu features, mine was 0000:0010:1001:1010:0010:0010:0000:0011 (or from the CPUID CD, ID1ECX: 0x029ee3ff).  Like I said, yours will vary based on what other features you have enabled such as VTx and XD.  The bits that seemed to come into play (broken vMotion compatibility) were these:  ----:--1-:----:----:----:----:----:--1- (in other words, the two "1's" in the bit mask above should have been "0's" in order for vMotion to work).  It seems that in our case, disabling AES on this new blade makes it vMotion-compatible with other Westmere blades in our cluster.  Therefore, disabling AES turns the two "1's" in the mask above into "0's".

AES ("Advanced Encryption Standard") instructions enable fast and secure data encryption and decryption.  I suppose if you have VMs doing a lot of encryption/decryption you might benefit from having this enabled.  For us it's not worth the hassle of rebooting all our VMs to get it enabled on all our hosts (there's a vMotion brickwall between any hosts with AES enabled and AES disabled).  I suppose we could enable EVC while we fix each host and that might allow us to fix the issue without rebooting VMs but I don't see enough benefit to us to warrant taking that much time so we're just disabling it on the new blade and moving on.

To disable AES (the alternative is to enable it on the existing blades which, one way or another, will require you to reboot all VMs sooner or later) you create a bootable CD from IBM.  When server boots to that CD it automatically fixes the problem without you having to push a button.  Here's what the screen looks like once the CD finishes booting (as you can tell, it's like IBM created this CD specifically for this exact issue):

aes-disable.jpg

Here's the goods:

KB Reference: http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5086963&brandind=5000020

Bootable ISO for changing the AES feature: ftp://testcase.boulder.ibm.com/eserver/fromibm/xseries/BoMC-2.20-uEFI-AesEnable-to-enabled-vmotion-f...

Description of the AES CPU feature:  http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set/

Benny

Benny Hauk Systems Admin, VCP3/VCP4 LifeWay Chrstian Resources
16 Replies
MauroBonder
VMware Employee
VMware Employee

1. "Intel virtualization Technology" - Enable

2. "Intel execute disable bit" - Enable

if not solve, i recommend try update of bios to latest version

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado
0 Kudos
benny_hauk
Enthusiast
Enthusiast

Yeah, my post above is to solve the issue after you've already enabled those CPU settings and upgraded the BIOS (UEFI) to the latest versions.  If IBM servers that are identical CPUs/identical BIOS settings/identical BIOS versions still aren't vmotion compatible it may be because one has AES enabled and one doesn't (again: you can't set this in the BIOS for some crazy reason on IBM hardware; you have to use a bootable CD to change it).  I think the only way you can tell how it's set is to look at the two bits in the "Host bits" string that's a part of the error message you see when you attempt to vMotion (see above).

Benny

Benny Hauk Systems Admin, VCP3/VCP4 LifeWay Chrstian Resources
0 Kudos
idle-jam
Immortal
Immortal

the above should solve it. have a try .. good luck.

0 Kudos
cawn
Enthusiast
Enthusiast

Hello to you all

I think that i have the same problem - just on Dell R610 host.

I had a faulty CPU and got a technician to change the CPU and motherboard. He chose to bios update to 6.3.0, and after I got the same error.

We then tried to downgrade the same BIOS to the same leves as the other host 2.1.9 - but after the downgrade - I still get the error when i want to migrate a guest to the host.

I need to somehow disable AES-NI  -  but how?

0 Kudos
kunukg
Contributor
Contributor

Hello Benny

Greetings from Greenland :smileygrin:

I have a exact problem right now. I've tried to update the BIOS, no luck.

I've tried to download the .iso but i think it's no longer in the ftp server.

any chance you can fix the error?

advanced thanks

/Kunuk

0 Kudos
cawn
Enthusiast
Enthusiast

Hej /Hello

Jeg er ikke tilstede, er tilbage igen den 15/07-2013.

I am not present. I am back again on the 15/07-2013.

Alternativt kan følgende nummer kontaktes:

Alternatively, the following number can be contacted:

Factory IT - Second level support - 89 19 19 34 - Tryk 2 / Press 2.

Factory IT - First level support i Horsens - 70 21 74 18

Venlig hilsen / Kind regards / Freundliche Grüße

Carsten Weirsøe Widtfeldt

IT-Administrator / Gruppekoordinator

Factory IT, Operations

Danish Crown A/S, Østbirkvej 2, 8700 Horsens

Phone: +45 89 19 19 19

Mobile: +45 29 79 86 67

E-mail: cawn@danishcrown.dk

0 Kudos
benny_hauk
Enthusiast
Enthusiast

Yeah, the FTP link no longer works.  It looks like the process for disabling AES has changed slightly.  See MIGR-5086963 for more details.

Take note in the IBM link above that you can enable AES now just by installing "UEFI version 1.10 or newer and restore the default settings in UEFI".  I'm pretty certain that that applies to your old blades too so you can either disable AES on all your newer blades going forward or bite the bullet and enable AES on all your older blades (after upgrading the UEFI on them to 1.10 or newer).  Going forward AES will always be enabled on newer blades I'm sure so I'd recommend enabling it on older blades whose UEFI couldn't support it when they first came out.

Blessings!

Benny Hauk Systems Admin, VCP3/VCP4 LifeWay Chrstian Resources
ami857
Contributor
Contributor

Hi,

I have a mix of IBM blades (HS21, HS22 and HS23). Do you think I can activate vMotion between this blades (it works for HS22 and HS23 but not for HS21)?

The FTP link is dead for download, and I tried to use IBM ASU utility to enable AES on HS21 but no luck.

Should I give up?:smileyangry:

Thank You

0 Kudos
benny_hauk
Enthusiast
Enthusiast

It all depends on what the CPUs are inside the HS21s but from what it sounds like you've already tried, I bet you're going to have to give up.  We're in the same boat.  Our HS21's are completely incompatible with our HS22's because of the processor families.  Our HS21 has E5450 CPUs.  The newer CPUs in the HS22's can't be dumbed down to an E5450 even with VMotion Compatibility Mode turned on (they can only "dumb down" so much if that makes sense).

In our case though, we've found it most cost-effective to simply sunset our HS21s because of these factors:

1) They have paltry memory compared to our current standard (32-48GB vs. the 256GB blades we buy now)

2) Upgrading memory on that model of blade is prohibitively high (we could buy a new blade with newer memory for the same price as only upgrading memory in the HS21s - don't know why but the memory used in the HS21 must be hard to come by)

3) If you divide the cost of your fully populated/configured chassis by 14, you'll have the "cost per slot".  Turns out for us that the cost-per-slot we pay is more expensive than the HS21s themselves.  That essentially means that it's more expensive for us to have the HS21s that to not have them and allow their slot to be taken up by a more dense, newer blade.  Since we're upside down on our HS21 blades, we're phasing them out.  In other words, we'll keep using them but if all chassis slots are full, we'll throw away an HS21 before we'll buy a new chassis.

4) Besides when you consider that one HX5 can replace what? 8 HS21s... phasing them out isn't a difficult choice.  The vSphere CPU licenses you'll recoup for deployment on future blades alone is pretty significant.  Upgrade one HS21 blade's vSphere licenses up to Enterprise Plus (if it's not there already), then replace up to 8 HS21s with a single HX5 (using that upgraded vSphere Enterprise Plus license for the HX5).  Then you have 12CPU sockets of vSphere licenses at your disposal.  Using methods like this, we rarely need to ever purchase additional vSphere licenses.  We just keep recycling them on more and more dense CPUs.

Benny Hauk Systems Admin, VCP3/VCP4 LifeWay Chrstian Resources
ami857
Contributor
Contributor

Thank You for the advice. It is very sad that IBM doesn't offer a solution to this problem. for small deployments HS 21 may be used, if you have one blade with some memory installed (16GB RAM)

Thank You again

0 Kudos
benny_hauk
Enthusiast
Enthusiast

I understand and can relate to your frustration but the fact is the limitation at it's core is a technical limitation in terms of the compatibility of one CPU family with another one.  The last think you'd want is your VM executing one bit of CPU instruction on one CPU, vMotioning to a CPU that doesn't have that instruction set that the VM expects to be there.  The server would simply crash after most vMotions.  The good news is that going forward the people who really have to address this technical limitation (the CPU manufacturers) have addressed this with the ability to "dumb down" newer CPUs on command to maintain backward compatibility where desired.  The HS21s just missed the boat, coming out right before those CPUs.  Going forward you shouldn't run into these headaches.

The rest is just a factor of economics.  Blades like all high tech stuff, depreciates at incredible rates.  Doesn't matter if you're talking about computers or not.  Look at how a electronic keyboard depreciates compared to a well made piano.  The piano holds its value (or perhaps increases) over time while the value of even a mint-condition keyboard plummets over time, comparatively.  It's because of the timeless nature of the piano (built with the same guts as pianos 100 years ago) vs. the constantly changing nature of high tech.  As newer and better come along, the old inherently loses value.  As an additional result, inventory of the out of date tech diminishes as demand for it diminishes so while the value of the blade decreases, the cost to maintain it (upgrading or replacing memory for example) actually increases, etc.  Ultimately I see this as the cost of being in a business sector that uses high-tech anything.  You definitely pay a "depreciation penalty", especially the more "bleeding edge" your leadership is committed to.

The lesson: find creative ways to breath new life into older machines (HS21s) so you can replace them with more dense, newer, more efficient HX5s or whatever for virtualization.  There are lots of benefits to the business you can use to make the business justification for the expenditure:

  1. capability of visualizing "large VMs" so the big servers gain the benefits of virtualization and not just the smaller servers
  2. The older machines are fulfilling a role that you'd otherwise have to buy new, smaller servers anyway (sand box or test or lab servers)
  3. Find hardware firms out there willing to buy your old hardware (they are out there)
  4. Power and cooling savings (it's ultimately a cost savings that datacenter managers need to be able to quantify because there's definitely a cost savings there to be quantified)
  5. More available blade slots (less blade chassis to buy, power, cool, maintain, support) down the road because one dense blade replaces lots of HX21s

... what else?

Personally, I think a successful hardware replacement strategy will be impacted by virtualization.  I'd be curious to learn what other org's hardware lifecycle strategies look like and how virtualization has impacted that strategy.  I think I'll start a new post, posing that question...

Benny Hauk Systems Admin, VCP3/VCP4 LifeWay Chrstian Resources
ami857
Contributor
Contributor

I totally agree with you. But now i think I have done a mistake last year when  I run the IBM CD on a HS22 blade. If the customer wants to ad a new blade to the HS22 and eliminate the HS21 I think there will be a problem. That's why IBM doesn't offer that CD.:smileyminus:

0 Kudos
benny_hauk
Enthusiast
Enthusiast

For what it's worth, if you want to undo what the "disable AES CD" does, just reflash/reset the uEFI back to the defaults and it'll undo what the CD did.  If you had other settings you'd gone in and configured in the BIOS (Disabling GEN2 for any CNA cards, for instance) you'll want to go back in and make those settings changes after resetting the BIOS.  Backing out the CD change is actually easier than making the CD change.

We recently went through the same thing.  For a while we wanted new servers to have the AES feature disabled but eventually (lately we wanted it enabled because all new servers were coming with it enabled.  We reset the BIOS on the old ones to enable it and we're all set now.

The only caveat I can think of there is that you'll want those older servers to be on the latest BIOS.  Once they are though and you reset it back to defaults, the AES feature should be enabled.  That's our experience anyway (as always, check with IBM support first as mileage may vary).

Benny Hauk Systems Admin, VCP3/VCP4 LifeWay Chrstian Resources
0 Kudos
StukaRT
Contributor
Contributor

Anyone still has that image somewhere? I have to disable AES it seems but i have no idea how to do it through ASU on my esxi host... this seems like a straightforward and easy solution but its not available anymore Smiley Sad

0 Kudos
Javierbarreto1
Contributor
Contributor

Please i need the file .iso, I dont download

0 Kudos
waelkhalileid
Contributor
Contributor

Hi Benny,

can you share a link for the BoMC-2.20-uEFI-AesEnable-to-enabled-vmotion-fix.iso

as the ftp link is dead

0 Kudos