Currently, our standard deployment of ESX servers and Windows guest VMs involves everything that the following article espouses:
In the guests, VMware tools runs with the checkbox checked to sync time with the host, and Windows time service itself stopped/disabled. These settings have been part of our Windows guest VM templates for about 2 years now.
ESX hosts are setup for NTP time sync, firewall/port opened, step-tickers, ntp.conf, etc. This all per . The ESX hosts all sync with our chosen time server (which in turn is verified accurate).
We are a single Windows domain shop, and the PDC emulator (none of our domain controllers are virtual machines) syncs to the same time source as our ESX hosts do. A check of our domain controllers shows time is indeed in sync with the same NTP source of the ESX hosts.
All that said, I don't think many of us think to check on Windows guests unless a particular program requires our attention, such as, oh I don't know, Kerberos authentication issues. Recently, one of guest VMs had difficulty authenticating to the domain, and I found that, sure enough, its time was offset (relative to the domain controller in its site) by about 6 minutes. First, I checked the VMtools checkbox (checked/fine). Then, I verified the Windows time service itself was stopped/disabled as per our standard practice; indeed it was. Next, I checked the time of the ESX host the VM was running on, and guess what? Its time was off by that same 6 minutes, down to the second. So, VMware tools time sync was indeed doing its job. That's good. Fortunately, the other (2) guests on that 6-minute offset host were not as user-affecting as the one that led me to investigate all this.
The question, of course, becomes s why would the time on the ESX host, controlled via NTP daemon, have such a large offset with its NTP source I configured it for? Also, I immediately checked ALL of our ESX clusters/hosts, and, out of 15 ESX hosts total, NINE of them had various offsets. NONE of my physical Windows hosts (whether domain members or domain controllers) had severe offsets such as the rest of these did. Fortunately, of those 9 offsets, all of them were between 7 seconds fast, to 3.5 minutes slow, per the output of ntpdate -q (time source).
A simple service ntpd restart on the affected ESX hosts has resolved this. For now---until, over time, they drift again(?!). They all use the same ntp.conf and step-tickers configs/files, yet 6 of those 15 had no (or negligible) offset in the 0.0001/2/3/4... range. On all hosts, the NTP service has been configured to start/stop with the system, and the NTP service was certainly running on every one of them. If ALL hosts had some sort of offset, I'd actually feel more comfortable about this, but as noted only 6 did (yet they are all configured the same to the best of my knowledge; I built them all and have set standards for NTP config, etc).
Now I'll have to hand-check every single VM and make sure the time on them (if off), is set to a time that is EARLIER than that of the host, so the tools will sync the time correctly. I get to check about 95 VMs. Hooray.
All hosts are ESX 3.0.2. They don't have every single patch for 3.0.2 but I don't think there are any patches released for 3.0.2 that deal with drifts in the service console's time.
Short of setting up a cron job to restart the NTP daemon every day, does anyone know of drift issues on the hosts? Have you checked 'your' hosts lately for offset? How about yours Windows guests syncing with your hosts via tools? While I'm at it, does anyone have Linux guests (with tools installed)? The time on my one RedHat Ent 4 guest is off as well. Thanks.
jftwp,
The log messages indicate problems with your ntp.conf file. You need to add a "restrict 127.0.0.1" line so that the NTP daemon can contact your DNS resolver on the Service Console. Also, the NTP daemon doesn't run as root for security reasons, and does not have permission to write into the /etc/ntp directory. You should place your NTP drift file in /var/lib/ntp, or another directory where the "ntp" user has write permission. Finally, it appears that your ntp.conf file wasn't created on the ESX Service Console, so the line terminators are DOS-based, rather than Unix/Linux-based. Make sure you transfer ntp.conf from your Windows box to your ESX server as "text" rather than "binary".
Joe
Here's something else I've noticed when searching the /var/log/messages file for clues...
On both 'offset' AND 'non-offset' servers, I get a variety of messages. Here's a sampling of results of grep -i ntp /var/log/messages, taken from various hosts. Something is clearly wrong.
Jan 4 05:15:14 sfesxdev1 ntpd[923]: can't open /etc/ntp/drift.TEMP: Permission denied
Jan 3 23:45:53 sfesxdev1 ntpd[923]: synchronisation lost
Jan 4 14:08:49 nyesx1 ntpd[10917]: precision = 6 usec
Jan 4 14:08:49 nyesx1 ntpd[10917]: kernel time discipline status 0040
" unknown, line ignoredntpd[10917]: configure: keyword "notrap
" unknown, line ignoredntpd[10917]: configure: keyword "
" unknown, line ignoredntpd[10917]: configure: keyword "
', giving up on itesx1 ntpd_initres[10920]: couldn't resolve `10.4.254.5
', giving up on itesx1 ntpd_initres[10920]: couldn't resolve `10.1.254.5
.TEMP: Permission deniedtpd[10917]: can't open /etc/ntp/drift
.TEMP: Permission deniedtpd[10917]: can't open /etc/ntp/drift
This is the procedure i use and it works every single time, i entered your ntp servers try it for 1 esx server and see if it fixes the problem:
edit npt.conf :
restrict 127.0.0.1
restrict default kod nomodify notrap
server 10.1.254.5
server 10.4.254.5
driftfile /var/lib/ntp/drift
edit steptickers:
10.1.254.5
10.4.254.5
and type the following at the
cli:
esxcfg-firewall --enableService ntpClient
service ntpd restartchkconfig --level 345 ntpd on
hwclock --systohc
Duncan
My virtualisation blog:
jftwp,
The log messages indicate problems with your ntp.conf file. You need to add a "restrict 127.0.0.1" line so that the NTP daemon can contact your DNS resolver on the Service Console. Also, the NTP daemon doesn't run as root for security reasons, and does not have permission to write into the /etc/ntp directory. You should place your NTP drift file in /var/lib/ntp, or another directory where the "ntp" user has write permission. Finally, it appears that your ntp.conf file wasn't created on the ESX Service Console, so the line terminators are DOS-based, rather than Unix/Linux-based. Make sure you transfer ntp.conf from your Windows box to your ESX server as "text" rather than "binary".
Joe
Thank you. I'll test and follow up with results after a week of having made change/s and checking for offsets and permissions errors, etc.
Did the fix work for you?
I have a "virtually" identical situation but with VM's consistently running fast by 1-5 minutes. VMTools is set to since with the host and Windows Time Service is disabled. The difference is that my hosts are not out of synch with their time source by more than 0.0005 seconds. I've read all the documentation and still found no resolution for a fast running VM. We don't have any other time synch running on the VM's that would cause the jump in time. Has anything changes since 3.0.2 build 63195 to fix this?
This worked for me
Release found: Red Hat Enterprise Linux 3
Resolution:*
*This error is caused by an incorrectly configured /etc/ntp.conf file. In earlier versions of Red Hat Enterprise Linux, the drift file was located in the /etc/ntp directory, owned by root. As the ntp daemon does not run as root, it cannot create a new drift file. The preferred location for the drift file in the newer releases of Red Hat Enterprise Linux is the /var/lib/ntp directory.
To correct this error, change the line in /etc/ntp.conf that reads:
driftfile /etc/ntp/drift
to:
driftfile /var/lib/ntp/drift
Richard J Minick, VCP