Hello,
ESXi, 6.5.0, 18678235
HP ProLiant ML350 G6
vSphere Client show red Alarm: "Host memory status"
I ran memtest86+, but it didn't find any errors:
just a question - what does the string "PROC 2 DIMM 8" mean?
Host memory status does not mean something is wrong with the RAM. It means the ESXi host has consumed more than 80%. When your server is running, what is the total usage of RAM with all your VMs powered on ?
It's not a problem, just a warning you're getting close to maxing the server out. It will go from yellow to red once you exceed 90% usage
This host have 48GB RAM and only one VM with 10GB memory:
What does the iLO / IML logs show for the memory ? Any issues or are they all active and healthy.
The only other thing I can suggest is to try and install the HPE Customized ESXi ISO . It includes drivers from the vendor. I've always used those images on my G9-G10 servers..
Also check the hardware compatibility guide to make sure your G6 is actually supported by this version of ESXi.
I hope some of this helps you find the cause. Good luck !
There are no memory errors in iLO.
Server is already installed from a HPE custom image.
Hello.
If there are no physical hardware errors, they could be going undetected, it is a good idea to update the UEFI(Bios), ILO and other firmware. These updates should be done at least once a year if new firmware versions are available.
From the build you indicate you have installed the latest patch stack (October 2021), by any chance the memory error messages started after this update?
BIOS and iLO2 already have latest firmware.
ESXi was installed three days ago from HP custom ISO and do not have other updates.
When you reset it, how long does it take to come back?
Can you post a picture of the host -> monitor -> hardware health tab?
The 2 PROC 8 DIMM in memtests means 2 PROCessors (sockets / packages) and 8 DIMM slots (not sure if populated or just available).
Hello
For HP ML350 G6 the latest supported ESXi version is ESXi 5.5 U3
If you can work with this memory error message or others that may occur, you can continue with version 6.5.
If it is a new installation that is not in use, you may want to try version 6.0, which is also not supported for this server model by HP or VMware.
Hello.
Another option for your case would be to install version 6.5 from a standard VMware ISO (without the HP drivers) and if it installs and works without problems try installing the HP driver for the disk controller (which is the most critical).
To be covered you will need to configure the ILO to report Hardware failures.
>>When you reset it, how long does it take to come back?
6 hours and 5 hours
>>Can you post a picture of the host -> monitor -> hardware health tab?
>>The 2 PROC 8 DIMM in memtests means 2 PROCessors (sockets / packages) and 8 DIMM slots
Host have 18 slots (12 populated)
So "System Board 10 Memory" does show a warning. Check whether "esxcli hardware ipmi sdr list" gives you any additional information or whether anything is logged in "esxcli hardware ipmi sel list".
> Host have 18 slots (12 populated)
You are right, that actually looks like a locator straight out of smbios, so presumably it comes somewhere from dmi.c. Looking very briefly at https://github.com/Distrotech/memtest86/ (if that is the same version) I can't find where it is printed though, I also can't make the ascii characters before that, unless they are merged. You might want to pop out that DIMM and verify again.
esxcli hardware ipmi sdr list
Node-Sensor Description Entity-Instance Computed Reading Base Unit Raw Reading Sensor Type Timestamp/Comment Raw
----------- ----------------------------------------- --------------- --------------------- ----------- ----------- ------------ ------------------- ---
0.4 Power Supply 1 Power Supply 1 10.1 Presence detected Watts 1 Power Supply 2021-11-23T14:53:22
0.5 Power Supply 2 Power Supply 2 10.2 Presence detected Watts 1 Power Supply 2021-11-23T14:53:22
0.6 Power Supply 3 Power Supplies 10.3 Fully Redundant unspecified 1 Power Supply 2021-11-23T14:53:22
0.7 System Board 1 Fan 1 7.1 Transition to Running unspecified 1 Fan 2021-11-23T14:53:22
0.8 System Board 2 Fan 2 7.2 Transition to Running unspecified 1 Fan 2021-11-23T14:53:22
0.9 System Board 3 Fan 3 7.3 Transition to Running unspecified 1 Fan 2021-11-23T14:53:22
0.10 System Board 4 Fan 4 7.4 Transition to Running unspecified 1 Fan 2021-11-23T14:53:22
0.11 System Board 5 Fans 7.5 Fully Redundant unspecified 1 Fan 2021-11-23T14:53:22
0.12 External Environment 1 Temp 1 39.1 19 degrees C 19 Temperature 2021-11-23T14:53:22
0.13 Processor 1 Temp 2 3.1 40 degrees C 40 Temperature 2021-11-23T14:53:22
0.14 Processor 2 Temp 3 3.2 40 degrees C 40 Temperature 2021-11-23T14:53:22
0.15 Memory Module 1 Temp 4 8.1 32 degrees C 32 Temperature 2021-11-23T14:53:22
0.16 Memory Module 2 Temp 5 8.2 26 degrees C 26 Temperature 2021-11-23T14:53:22
0.17 Memory Module 3 Temp 6 8.3 25 degrees C 25 Temperature 2021-11-23T14:53:22
0.18 Memory Module 4 Temp 7 8.4 25 degrees C 25 Temperature 2021-11-23T14:53:22
0.19 Memory Module 5 Temp 8 8.5 32 degrees C 32 Temperature 2021-11-23T14:53:22
0.20 Memory Module 6 Temp 9 8.6 28 degrees C 28 Temperature 2021-11-23T14:53:22
0.21 Memory Module 7 Temp 10 8.7 31 degrees C 31 Temperature 2021-11-23T14:53:22
0.22 Memory Module 8 Temp 11 8.8 35 degrees C 35 Temperature 2021-11-23T14:53:22
0.23 System Internal Expansion Board 1 Temp 12 16.1 34 degrees C 34 Temperature 2021-11-23T14:53:22
0.24 System Internal Expansion Board 2 Temp 13 16.2 32 degrees C 32 Temperature 2021-11-23T14:53:22
0.25 System Internal Expansion Board 3 Temp 14 16.3 31 degrees C 31 Temperature 2021-11-23T14:53:22
0.26 System Internal Expansion Board 4 Temp 15 16.4 29 degrees C 29 Temperature 2021-11-23T14:53:22
0.27 System Internal Expansion Board 5 Temp 16 16.5 27 degrees C 27 Temperature 2021-11-23T14:53:22
0.28 System Internal Expansion Board 6 Temp 17 16.6 26 degrees C 26 Temperature 2021-11-23T14:53:22
0.29 System Internal Expansion Board 7 Temp 18 16.7 25 degrees C 25 Temperature 2021-11-23T14:53:22
0.30 Processor 3 Temp 19 3.3 24 degrees C 24 Temperature 2021-11-23T14:53:22
0.31 Memory Module 9 Temp 20 8.9 27 degrees C 27 Temperature 2021-11-23T14:53:22
0.32 Drive Backplane 1 Temp 21 15.1 35 degrees C 35 Temperature 2021-11-23T14:53:22
0.33 System Board 6 Temp 22 7.6 50 degrees C 50 Temperature 2021-11-23T14:53:22
0.34 System Board 7 Temp 23 7.7 34 degrees C 34 Temperature 2021-11-23T14:53:22
0.35 System Board 8 Temp 24 7.8 35 degrees C 35 Temperature 2021-11-23T14:53:22
0.36 System Board 9 Power Meter 7.9 Device Enabled Watts 2 Current 2021-11-23T14:53:22
0.37 System Board 10 Memory 7.10 Presence Detected error 65 Memory 2021-11-23T14:53:22
"esxcli hardware ipmi sel list" do not show any info
Make sure that the RAM DIMM slots are populated in the correct order (specified in letter sequence A through I) in the following link
https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=c01727710#N1051A
Although it is hard to see that the server would still boot up if the memory modules are populated in an incorrect fashion.
From the example of the 54GB it looks like it populates lower capacity RAM first before the larger ones (2GB in A to C, 8GB in D to I)
as you can see, memory is installed correctly