ESX 3.5 Host with 16 Cores and 64 GB RAM
Guest in Question: Windows Server 2003 R2 64 bit- 16GB RAM - 4 VCPU's
Started with 1 GB RAM, and the server booted fine. I configured my server with updates and shut down.
I added 15 GB of RAM and the server began taking up to 30 minutes to boot all the way up. Hanging on the Windows Splash screen with the scrolling graphic for the majority of that time.
Since, here is what I have done (checking each time that the server sees the amount of RAM and the correct number of cpus, It does):
scoured the community.
checked the services for services that didn't start,nothing Glaring
Checked the event log, (event log service didn't start until windows finally came up).
checked limits and reservations = no reservations and unlimited is checked.
-no ballooning or swapping taking place on host
-Shut down the server,
-removed 14 GB of RAM from guest(2 total now) = boot in 2 minutes
-shutdown, add 2 GB of RAM(4 total now) = boot in 2 minutes
-shutdown, add 4 GB of RAM(8 total now) = boot in 2 minutes
-shutdown, add 4 GB of RAM(12 total now) = boot in 2 minutes
-shutdown, add 4 GB of RAM(back to 16 now) = boot in 2 minutes
-let server stand, selected "restart" from the shutdown menu = 30 minutes to come back to windows
-doubled page file to 10 GB (I know that Ideally I want my Page file to = the amount of Memory in the OS, but my OS part is too small, COULD THIS BE MY PROBLEM? though it wouldn't explain why it booted in 2 minutes with 16 GB of RAM when I stepped the server up...)
-selected "restart" from the shutdown menu = 30 minutes to come back to windows.
OK so you see my issue, I am looking for some help and I have some additional questions:
1. could it be the pagefile?
2. could I have a bad pair memory in my host? is there a log or a memory test in ESX that I could look at to find out?
3. any other suggestions?
thanks,
Dallas
Sounds right, physical servers take longer to boot with more memory, it's probably doing a memory test.
Have you tried going into the bios of the VM, and turning memory testing off?
Any particular reason why you gave the VM 4 vCPUs? I would recommend always starting with 1 vCPU and if necessary add more! Check your drive for space and page file location. Latest vmware tools installed. Latest patches to the esx host. If you had bad memory, you would probably get "panic attacks" by the esx hosts. You can download from Veeam their monitor product with a eval license http://www.veeam.com/vmware-esx-monitoring.html to assist you where to look at if it's outside of the box or inside.
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
I am in the BIOS and I dont see where to disable memory testing? a co worker also suggested this prior to me posting today.
can you give me any direction?
Dallas
I've seen this more with the 64bit systems and have found that setting the page file to "system managed" in Windows helps some. We also saw this when using a older, slower san for testing. I think the delay is waiting for Windows to create the page file, I believe system managed allows it to start out smaller and grow if needed.
The option you're looking for will usually be Boot Time Diagnostic Screen. It's disabled by default. You should be able to check the vmware.log in the same folder as your vm for more info. Also, do you have clear pagefile on shutdown activated also in windows? This would also cause problems. Usually, when a boot takes a long time, on the windows splash screen, it could be memory, but it could also be network connections and/or mapped drives that are timing out before windows comes up.
-KjB
We have seen a similar behaviour in our system if we increase the number of vCPUs in a virtual machine.
If we've added 4 vCPUs to a VM it took a long time to boot. If we've added only 2 vCPUs it was a lot of faster. The fastest boot was with only 1 vCPU.
We're facing such an issue with ESX 3.5 and some W2K3 (SE/EE) guests. Sometimes it comes up nomally, sometimes it takes up to 15 minutes to boot. Even if the guest is the only one on the host.
VMware Tools are installed.
I don't know why this sometime happen and sometimes not.
We also have W2K3 system were we never saw such an behavior in the same cluster.
AWo
Are you sure you're using the correct HAL on those VM's?
Since it is Microsoft Windows I hope so. If I change hardware e.g. increase cpu from 2 to 4 in a VM and start it after that there is a balloon-tip shown within the VM that new hardware has been detected and installed correctly. If I look in the device manager (I don't know the correct english word) in the control panel of this VM I see 4 cpus. So in my opinion it should be the correct HAL. Am I wrong?
We're seeing this issue also (happened when we increased the vRAM from 1GB to 12GB -- I am suspicous of it taking extra time due to:
-Windows and its pagefile mechanism,
-ESX creating the .vswp file during boot (we set a memory reservation of 10GB so that lowered the .vswp to 2GB but it still takes quite a bit of time to boot...
Seeing 4 CPUs does not always mean you're using the correct HAL. Check your device manager, and under the computer branch, and see if it says acpi multiprocessor. Also, check the c:\windows\repair\setup.log and look for the string "hal", it should have an "m" in there (halmacpi.dll NOT halaacpi.dll )
-KjB
The pagefile is created/processed at boot time, if you have clear pagefile on exit policy set, then it will be cleared out at shutdown. I have very rearely seen a scenario where that large of a pagefile is beneficial. From what I remember, a pagefile over 4 GB is actually bad, since a core dump will sometimes not process correctly in a pagefile over the 4 GB size. Try lowering the size of your pagefile, but I'm not sure that will help here. I'm not sure how long a .vswp file creation would take, since ESX creating a disk takes not long at all.
Still, if it helped speed up your boot, it's all the better, but now you have a reservation to deal with as well. I would check out the pagefile a little more.
-KjB
With you 100% on the dubiousness of large pagefiles being beneficial -- unfortunately SAP doesn't feel that way.
You alluded to "....now you have a reservation to deal with..." -- you take these as being negative? For our SAP scenario we have a limited # of VM's per Physical host so I'm not seeing reservations as being an issue (especially since it cuts down on 12GB .vswp files consuming local ESX VMFS disk space for MSCS configs) but perhaps I'm missing something...
what vmware ESX version are you running, ESX 3.5 had an issue with 4vCPU guests running slowly their was a patch that sloved this. I cannot find it right now but update 1 should fix it. however 4vCPU guest with large amount of memory will take significantly longer to boot than single vCPU guest with lower memory, this is expected behaviour. windows does a memory check of all its available memory on boot writing to each sector to verify.
Tom Howarth
VMware Communities User Moderator
Yes, I try not to use reservations unless I absolutely have to use them. It adds more overhead to the DRS calculations, and is usually forgotten about until problems arrise, and questions posted on this board. Not that I have an issue with answering questions. It's good to reduce the size of the .vswp file, but that file isn't really used unless the guests are starved for memory, and have to swap to disk. If you have disk issues, this is good, but if it's not helping you to change the default, then I don't suggest changing it. It helped marginally here, so it's good, and if the app requires it, it's good, but that isn't so here.
-KjB
DRS = Not supported for MSCS config so a non-issue w/reservations & adding complexity but point taken, thank you.
I would love to figure out exactly what is causing these long reboot times though.
It doesn't appear that the vswp is getting created each time the Guest OS (Windows 2K3 Enterprise, 64-bit) is rebooted so I am definitely suspecting something within Windows.
The original poster is seeing long reboot times with 4 vCPU's -- in my case it is similar although we have 2 vCPU's (running on ESX 3.5 Update 1)
Maybe you can post the vmware.log after a reboot, so maybe we can see something in the log? Hopefully we can come to some resolution of not just an understanding.
-KjB
Below is a vmware.log file.
VM was shutdown at 6:03 PM
VM came back online at 6:17 PM
This gap in time is represented by these two lines in the vmware.log file:
May 06 18:03:41.096: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:17:33.227: vcpu-0| SVGA: Unregistering IOSpace at 0x1060
vmware.log contents during reboot:
May 06 18:02:50.221: vcpu-1| Guest: toolbox: Got a logoff event.
May 06 18:02:50.247: vcpu-1| GuestRpc: Channel 1 reinitialized.
May 06 18:02:51.353: vcpu-1| Guest: toolbox: Got a logoff event.
May 06 18:02:52.802: vcpu-1| Guest: toolbox: VMware Tools Service Shutdown.
May 06 18:02:52.803: vcpu-1| Guest: toolbox: VMware Tools Service Stopping.
May 06 18:02:52.838: vcpu-0| TOOLS autoupgrade protocol version 0
May 06 18:02:52.840: vcpu-0| TOOLS ToolsCapabilityGuestTempDirectory received 0
May 06 18:02:52.841: vcpu-0| GuestRpc: Channel 0 reinitialized.
May 06 18:02:52.845: vcpu-1| Guest: toolbox: Service: waiting for GuestInfoServer thread.
May 06 18:02:52.845: vcpu-0| Guest: toolbox: GuestInfoServer received quit event.
May 06 18:02:52.846: vcpu-0| Guest: toolbox: GuestInfoServer exiting.
May 06 18:02:52.846: vcpu-1| Guest: toolbox: Service: GuestInfoServer thread exited.
May 06 18:03:17.933: vcpu-0| VMMouse: CMD Disable
May 06 18:03:17.933: vcpu-0| VMMouse: Disabling VMMouse mode
May 06 18:03:17.933: vcpu-0| MKS switching absolute mouse on
May 06 18:03:17.953: vcpu-1| CPU reset: soft
May 06 18:03:17.953: vcpu-0| CPU reset: soft
May 06 18:03:18.098: mks| VNCENCODE 6 encoding mode change: (640x480x16depth,16bpp)
May 06 18:03:18.103: mks| VNCENCODE 7 encoding mode change: (640x480x16depth,16bpp)
May 06 18:03:18.120: vcpu-1| CPU reset: soft
May 06 18:03:18.131: vcpu-0| SVGA: Unregistering IOSpace at 0x1060
May 06 18:03:18.131: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:03:18.266: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:03:18.287: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:03:18.607: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:03:18.634: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:03:18.637: vcpu-0| SVGA: Registering IOSpace at 0x1060
May 06 18:03:18.637: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:03:18.664: vcpu-1| CPU reset: soft
May 06 18:03:18.688: mks| VNCENCODE 7 encoding mode change: (720x400x16depth,16bpp)
May 06 18:03:18.688: mks| VNCENCODE 6 encoding mode change: (720x400x16depth,16bpp)
May 06 18:03:18.809: mks| VNCENCODE 6 encoding mode change: (640x480x16depth,16bpp)
May 06 18:03:18.818: mks| VNCENCODE 7 encoding mode change: (640x480x16depth,16bpp)
May 06 18:03:18.821: vcpu-0| SIO: Skipping bogus enable for COM1
May 06 18:03:18.822: vcpu-0| SIO: Skipping bogus enable for COM2
May 06 18:03:18.887: vcpu-0| DISKUTIL: scsi0:0 : geometry=5221/255/63
May 06 18:03:18.917: vcpu-0| DISKUTIL: scsi1:0 : geometry=525/255/63
May 06 18:03:18.917: vcpu-0| DISKUTIL: scsi1:1 : geometry=7314/255/63
May 06 18:03:19.591: vcpu-1| CPU reset: soft
May 06 18:03:19.609: vcpu-0| BIOS-UUID is 50 04 09 e1 33 cf 5c 85-6b 21 7c b9 f6 c2 0e b7
May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300
May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300
May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300
May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300
May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300
May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300
May 06 18:03:20.030: mks| VNCENCODE 7 encoding mode change: (720x400x16depth,16bpp)
May 06 18:03:20.030: mks| VNCENCODE 6 encoding mode change: (720x400x16depth,16bpp)
May 06 18:03:20.303: vcpu-0| Unknown int 10h func 0x2000
May 06 18:03:38.792: mks| VNCENCODE 7 encoding mode change: (640x480x16depth,16bpp)
May 06 18:03:38.792: mks| VNCENCODE 6 encoding mode change: (640x480x16depth,16bpp)
May 06 18:03:39.164: vcpu-1| CPU reset: soft
May 06 18:03:41.090: vcpu-0| SVGA: Unregistering IOSpace at 0x1060
May 06 18:03:41.090: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:03:41.095: vcpu-0| SVGA: Registering IOSpace at 0x1060
May 06 18:03:41.096: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:17:33.227: vcpu-0| SVGA: Unregistering IOSpace at 0x1060
May 06 18:17:33.228: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:17:33.230: vcpu-0| SVGA: Registering IOSpace at 0x1060
May 06 18:17:33.231: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
May 06 18:17:36.401: mks| VNCENCODE 7 encoding mode change: (800x600x16depth,16bpp)
May 06 18:17:36.401: mks| VNCENCODE 6 encoding mode change: (800x600x16depth,16bpp)
May 06 18:17:38.273: vcpu-0| Balloon: Start: vmmemctl reset balloon
May 06 18:17:38.273: vcpu-0| Balloon: Reset (n=2 pages=0)
May 06 18:17:38.273: vcpu-0| Balloon: Reset: nUnlocked=0 (size=0)
May 06 18:17:41.952: mks| MKS remote display status changed, enabling remote optimizations
May 06 18:17:42.778: vcpu-0| GuestRpc: Channel 0, registration number 1, guest application toolbox.
May 06 18:17:42.779: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300
May 06 18:17:42.779: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300
May 06 18:17:42.779: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300
May 06 18:17:42.784: vcpu-0| TOOLS autoupgrade protocol version 2
May 06 18:17:42.800: vcpu-0| TOOLS ToolsCapabilityGuestTempDirectory received 1 C:\WINNT\TEMP
May 06 18:17:42.800: vcpu-0| TOOLS setting the tools version to '7300'
May 06 18:17:42.848: vcpu-0| TOOLS soft reset detected.
May 06 18:17:42.848: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300
May 06 18:17:42.848: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300
May 06 18:17:42.848: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300
May 06 18:17:42.848: vcpu-0| TOOLS installed version 7300, available version 7300
May 06 18:17:42.848: vcpu-0| TOOLS don't need to be upgraded.
May 06 18:17:42.942: vcpu-0| Guest: toolbox: Version: build-82663
May 06 18:17:42.943: vcpu-0| TOOLS unified loop capability requested by 'toolbox'; now sending options via TCLO
May 06 18:19:07.004: vcpu-0| VMMouse: CMD Read ID
May 06 18:19:07.004: vcpu-0| MKS switching absolute mouse on
May 06 18:19:16.329: vcpu-0| TOOLS unified loop capability requested by 'toolbox-dnd'; now sending options via TCLO
May 06 18:19:16.329: vcpu-0| GuestRpc: Channel 1, registration number 1, guest application toolbox-dnd.
May 06 18:19:16.329: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300
May 06 18:19:16.329: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300
May 06 18:19:16.329: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300
May 06 18:50:51.004: vcpu-1| TOOLS unified loop capability requested by 'toolbox-ui'; now sending options via TCLO
May 06 18:50:51.004: vcpu-1| GuestRpc: Channel 3, registration number 1, guest application toolbox-ui.
May 06 18:50:51.004: vcpu-1| DISKUTIL: scsi1:1 : toolsVersion = 7300
May 06 18:50:51.004: vcpu-1| DISKUTIL: scsi1:0 : toolsVersion = 7300
May 06 18:50:51.004: vcpu-1| DISKUTIL: scsi0:0 : toolsVersion = 7300
May 06 18:50:51.011: vcpu-1| TOOLS unified loop capability requested by 'toolbox-ui'; now sending options via TCLO
May 06 18:50:51.012: vcpu-1| GuestRpc: Channel 2, conflict: guest application toolbox-ui tried to register, but it is still registered on channel 3
May 06 18:50:51.012: vcpu-1| GuestRpc: Channel 2 reinitialized.
May 06 18:50:51.012: vcpu-1| GuestRpc: Channel 2 reinitialized.
May 06 18:50:54.419: vcpu-1| GuestRpc: Channel 3 reinitialized.
May 06 19:29:01.900: mks| SOCKET 7 recv error 110: Connection timed out
May 06 19:29:01.900: mks| SOCKET 7 destroying VNC backend on socket error: 110
May 06 19:29:02.208: mks| SOCKET 6 recv error 110: Connection timed out
May 06 19:29:02.208: mks| SOCKET 6 destroying VNC backend on socket error: 110
May 06 19:54:09.792: vcpu-1| Guest: toolbox: Got a logoff event.
May 06 19:54:09.821: vcpu-1| GuestRpc: Channel 1 reinitialized.
May 06 19:54:10.858: vcpu-0| Guest: toolbox: Got a logoff event.
May 06 19:54:13.296: mks| SOCKET 8 recv error 5: Input/output error
FYI
I have opened an SR for this and will let everyone know what we hear back...