I have just written some software in C++ that is NUMA aware and am running it on an ESX guest with Windows 10.
At the start I call GetNumaHighestNodeNumber(...) and receive a 1, indicating that there should be 2 nodes, 0 and 1.
Later on I try to start two new processes, using CreateProcess, and include a PROC_THREAD_ATTRIBUTE_PREFERRED_NODE attribute.
The first process is started on node 0 and this call succeeds.
The second process is started on node 1 and this call fails with the error code ERROR_INVALID_PARAMETER.
Is it possible that the call to GetNumaHighestNodeNumber(...) is returning the hosts capability, ignoring the fact that the guest VM is limited to using only Numa-node 0 ?
If you look at the remarks
The number of the highest node is not guaranteed to be the total number of nodes.
To retrieve a list of all processors in a node, use the GetNumaNodeProcessorMask function.
I can't find an API that lets you know directly if there are NUMA nodes or not. I think you should try GetNumaNodeProcessorMask and check if the return value of ProcessorMask is nonzero or not. If the mask is zero that node has no processors; which roughly means that is an existent node. From there maybe count how many nodes returned a non-zero ProcessorMask.
https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getnumanodeprocessormask
I suspect the Windows 10 guest does not see the NUMA architecture but just sees the two virtual sockets.
What does the Task Manager - set affinity show in the Windows 10 guest show?
On a dual socket CPU physical machine, Windows 10 Pro the set affinity from Task Manager shows this picture
A Windows 10 Pro VM running inside VMware Workstation 16 beta on the same physical machine as above, configured with 2 virtual sockets with 4 cores each, the set affinity shows this picture
The physical Windows 10 Pro shows the node 0 and node 1 while the Windows 10 Pro VM does not, although the Windows 10 Pro VM sees 2 sockets.
The Task Manager on my VM shows the same list as the one you have posted for your VM.
Nevertheless the GetNumaHighestNodeNumber returns a 1, which causes my program logic to start the processes it starts on node 0 and node 1.
Which API call should I be using to determine that there are no NUMA nodes on the VM?
If you look at the remarks
The number of the highest node is not guaranteed to be the total number of nodes.
To retrieve a list of all processors in a node, use the GetNumaNodeProcessorMask function.
I can't find an API that lets you know directly if there are NUMA nodes or not. I think you should try GetNumaNodeProcessorMask and check if the return value of ProcessorMask is nonzero or not. If the mask is zero that node has no processors; which roughly means that is an existent node. From there maybe count how many nodes returned a non-zero ProcessorMask.
https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getnumanodeprocessormask
Use the GetLogicaProcessorInformation call to get the number of NUMA nodes
I prefer the solution that uses GetNumaHighestNodeNumber and then loops over the nodes to see where the affinity.mask is non-zero.
However you must use GetNumaNodeProcessorMaskEx and not GetNumaNodeProcessorMask as the latter only gives you information about the nodes in the same group as the calling thread.
The solution using GetLogicalProcessorInformation doesn't appear to work quite as I expected, depending on the hardware configuration. And looping through the nodes does actually give you a list of available nodes and not just the count, so this is a bonus.
You should still use the helper function CountSetBits in the GetLogicalProcessorInformation sample code as the bits in the mask returned from GetNumaNodeProcessorMaskEx are not necessarily the same for the nodes.
On my physical machine with NUMA and hyperthread enabled, the GROUP_AFFINITY.Mask value returned from GetNumaNodeProcessorMaskEx
node 0, 0x000000ff
node 1, 0x0000ff00
So each nonzero bit has to be counted to represent the logical processors in the node (this includes the hyperthread); in this case 8 logical CPU in each node.
If hyperthread is disabled, the mask values are
node 0, 0x0000000f
node 1, 0x000000f0
so 4 logical CPUs in each node