VMware Performance Community
aminf13
Enthusiast
Enthusiast

Forcing ESX to allocate remote memory

Hello

I'm trying to perform a few experiments with ESX and I want to manually specify the affinity of the vCPUs and memory of a virtual machine. Specifically, I have a two socket server and I want ESX to allocate memory of a given VM on the remote node. I use the affinity controls in the Resources section of the VM Properties and set them as follows:

CPU Affinity: 0-1

Memory Affinity: Node 1

My impression was that this will force the vCPU to go to node 0 and VM's memory to go to node 1. However, running esxtop, I can see that all the memory is infact allocated on node 1, but N%L indicates a 100%. I interpret that as saying all the memory for this VM is allocated on the local node. Does this mean that the CPU affinity that I specified is incorrect or does not work somehow? Is there something overriding the specified affinities?

Just to be on the safe side, I also have set the value of Numa.AutoMemAffinity to 0 (in the Advanced Settings section). Here is the relevant portion of esxtop:

NAME                                                    NHN    NMIG    NRMEM    NLMEM N%L GST_ND0  OVD_ND0  GST_ND1  OVD_ND1     MEMSZ    javaserver0-1.1 0,1,2,3,4,5,6,7,8,9,10,11,12,13,    0          0.00  1       024.00   100   0.00           4.34          1024.00     8.16              1024.00  

I would appreciate any help in explaining how I can force ESX to use the remote memory rather than local.

Thanks,

Amin

0 Kudos
15 Replies
jpschnee
VMware Employee
VMware Employee

Not sure if the text is garbled or something but there appears to be something wrong with your numa home node (NHN) "0,1,2,3,4,5,6,7,8,9,10,11,12,13,".  Normally that would just be a "1" since you've constrained it to only node 1.  Did you shut the VM down, make the changes then start it back up?

-Joshua
0 Kudos
aminf13
Enthusiast
Enthusiast

Sorry for the text messing it up. Yes, that surprised me as well, it shows  0,1,2,3,4,5,6,7,8,9,10,11,12,13 as Numa home node, which is surprising and that's what caused me to suspect that CPU affinity is not working.

  Yes, I turned off the VM, changed the settings, and then turned it back on. It doesn't matter what I set the CPU affinity to, it always shows the same string in esxtop. However when I change memory affinity I can see it moving from node0 to node1 and vice versa, but the CPU seems to be moving with it as well, hence memory always stays local, which is the opposite of what I want!

  Amin

0 Kudos
aminf13
Enthusiast
Enthusiast

Well, playing with the settings again, I turned  Numa.AutoMemAfiinity back on in the Advanced Settings, Numa section and now it seems that CPU affinity is working. I no longer see a list of 0 to 13, I just see 0 and 1 and I can force all memory allocation to go remote. Thanks!

0 Kudos
aminf13
Enthusiast
Enthusiast

Hello,

I encountered another situation regarding allocating memory on the remote node. It seems that after a while ESX starts to move memory of some of the virtual machines back to local node, even though the affinity specifies pinning the memory on the remote node. In one occasion I saw up to 70% of the memory of a VM moved to local node. I was wondering why that is the case and how I can disable that.

Thanks,

Amin

0 Kudos
jpschnee
VMware Employee
VMware Employee

Can you post the VMX file for the VM you're working on?  That would make it easier to see what all is currently configured for the VM.

-Joshua
0 Kudos
aminf13
Enthusiast
Enthusiast

vmx file is inlined below. For what it's worth, I'm trying this on VMmark 1.1.1.  I noticed 70% javaserver's memory and 15% of the fileservers memory was moved to local node during the run. The other VMs had 0-5% of their memory on the local node. The affinity settings was set for all of them to force using remote memory. Host is a two socket server with 8 physical cores (16 logical with hyperthreding) per socket.

  Amin

.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "7"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
nvram = "javaserver0-1.1.1.nvram"
virtualHW.productCompatibility = "hosted"
powerType.powerOff = "soft"
powerType.powerOn = "hard"
powerType.suspend = "hard"
powerType.reset = "soft"
displayName = "javaserver0-1.1.1"
extendedConfigFile = "javaserver0-1.1.1.vmxf"
floppy0.present = "TRUE"
numvcpus = "2"
scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "1024"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "javaserver0-1.1.1.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.clientDevice = "TRUE"
ide1:0.deviceType = "atapi-cdrom"
ide1:0.startConnected = "FALSE"
floppy0.startConnected = "FALSE"
floppy0.fileName = ""
floppy0.clientDevice = "TRUE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "VMmark Net"
ethernet0.addressType = "generated"
guestOS = "winnetenterprise-64"
uuid.location = "56 4d 78 8e d3 23 ff 61-1a 99 3a 48 39 9e 29 2e"
uuid.bios = "56 4d 78 8e d3 23 ff 61-1a 99 3a 48 39 9e 29 2e"
vc.uuid = "52 d0 ac 00 8b c2 3f 33-0c 54 e0 d2 b7 5f 8d 28"
ide1:0.fileName = ""
ethernet0.generatedAddress = "00:0c:29:9e:29:2e"
vmci0.id = "1378771856"
tools.syncTime = "FALSE"
cleanShutdown = "FALSE"
replay.supported = "FALSE"
unity.wasCapable = "TRUE"
sched.swap.derivedName = "/vmfs/volumes/4fd73629-0e676e00-72fe-001e673d56b0/javaserver0-1.1.1/javaserver0-1.1.1-918fcd45.vswp"
replay.filename = ""
scsi0:0.redo = ""
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "32"
vmci0.pciSlotNumber = "33"
ethernet0.generatedAddressOffset = "0"
hostCPUID.0 = "0000000d756e65476c65746e49656e69"
hostCPUID.1 = "000206d70020080017bee3ffbfebfbff"
hostCPUID.80000001 = "0000000000000000000000012c100800"
guestCPUID.0 = "0000000d756e65476c65746e49656e69"
guestCPUID.1 = "000206d700010800829822030febfbff"
guestCPUID.80000001 = "00000000000000000000000128100800"
userCPUID.0 = "0000000d756e65476c65746e49656e69"
userCPUID.1 = "000206d700200800029822030febfbff"
userCPUID.80000001 = "00000000000000000000000128100800"
evcCompatibilityMode = "FALSE"
vmotion.checkpointFBSize = "4194304"
sched.cpu.affinity = "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15"
sched.mem.affinity = "1"
sched.cpu.htsharing = "any"

0 Kudos
jpschnee
VMware Employee
VMware Employee

Can you also confirm the version/build of ESX used as well as the total number of LCPUs for your ESX host?

-Joshua
0 Kudos
aminf13
Enthusiast
Enthusiast

Total logical CPUs on the host are 32 (I see this when I want to specify CPU affinity).

I'm using ESXi 5.0.0 VMKernel Release Build 623860

Amin

0 Kudos
jpschnee
VMware Employee
VMware Employee

FYI, We're looking into this internally....

-Joshua
0 Kudos
aminf13
Enthusiast
Enthusiast

Playing with the settings again, I set the Numa.PageMigEnabled to 0 and now I see only a tiny bit of the local memory used, but it's not completely zero though. All my VMs (as in VMmark 1.1.1) are now using mostly memory in the remote node, as the affinity settings dictate. Please kindly let me know if there is any other setting that I should change. I was also wondering why the local memory usage is not completely zero.

Thanks,

Amin

0 Kudos
jpschnee
VMware Employee
VMware Employee

Can you please provide some stats to help us understand the issue?

Output file of `schedsnapshot <dir> 300 –notrace`

Thanks,

-Joshua
0 Kudos
aminf13
Enthusiast
Enthusiast

I will try to do this as soon as I get a chance, but as I mentioned eariler, when I set the Numa.PageMigEnabled to zero in the advanced settings I get the correct behaviour: memory is allocated on the remote note and stays there. There is still a residue in the local memory (generally less than 1MB), but I think I can live with that. Do you want me to run the command with the Numa.PageMigEnabled on or off?

0 Kudos
aminf13
Enthusiast
Enthusiast

  When I try to run this comman in the ESX shell I get error: sh: schedsnapshot: not found. Can you please provide more information regardign how to exactly do this?

  Thanks,

  Amin

0 Kudos
jpschnee
VMware Employee
VMware Employee

Hmm.  OK let's try another way.

Either in your console or via ssh, cd into a LUN with some free space  (~100MB should do it).  Then run;

vsi_traverse

If it's under 50MB, you should be able to attach it here. Otherwise, I'll send you instructions on how to upload it to our ftpsite.

-Joshua
0 Kudos
aminf13
Enthusiast
Enthusiast

I ran the command successfully, the resulting file is 47 MB. I still cannot upload it via forumes, the progress bar in the upload section gets stuck at 0%. Here are the descriptions anyway. Please kindly let me know how I can get you the resulting file.

VM name: javaserver0-1.1.1

Numa.PageMigEnabled is turned on in the advanced setting

At the time I ran the command, around 7MB of memory was moved from remote node to local node

As I mentioned earlier, if I turn off Numa.PageMigEnabled, I don't see memory moving to local node anymore.

0 Kudos