(2008-03-19) Windows Time Service
Posted by Jorge on 2008-03-19
In addition to a previous post a did, I would like to point you to a Microsoft blog about the Windows Time Service (W32TIME). That blog contains interesting information. My favorite posts on that blog are:
- Keeping the Domain On Time (Explaining how Windows Time Service works in an AD forest)
- Configuring the Time Service: NtpServer and SpecialPollInterval (Explaining specific W32TIME configurations)
- Configuring the Time Service: Enabling the Debug Log (Explaining how to use the debug log for W32TIME)
- Configuring the Time Service: Max[Pos/Neg]PhaseCorrection (Explaining how to protect yourself if your external/internal clock goes crazy by jumping back/forward in time)
You can use the links above, but you can also read the information below which was copied from that Microsoft blog about the Windows Time Service (W32TIME) . All credits go to the person (Ryan Sizemore) that blogged about this.
Keeping the Domain On Time
(Explaining how Windows Time Service works in an AD forest)
Windows Time Service on a domain (referred to as ‘Domain Synchronization’ or ‘Domain Sync’ for short) is a huge topic. I will do my best to cover all of its aspects in this article, but some concepts won’t be covered until a later date, and others still relate directly to the original RFC for NTP.
As I stated in my previous post, the original reasons for developing w32time stemmed from the requirements imposed by Kerberos. In order for Kerberos to function securely, the time difference between the participating machines needs to be less than five minutes. In time, other components have come to rely on w32time, including Active Directory Replication and Windows Update. In a Windows domain, w32time needs to keep machines synchronized, and it needs to do so in a quick, efficient, and quiet manner.
The NTP protocol described in the RFC goes a long way toward designing a robust time synchronization solution. But in the end, what we are really interested in is just that: the solution. Keeping time synchronized between two machines is possible, but the solution needs to be more robust to deal with computers belonging to a domain. In particular, w32time works to answer these questions (just to name a few):
- How do we ensure that in a large network of computers, an efficient chain of time sources is picked?
- How do we auto-configure so that an administrator has to do a minimal amount of work to set it up?
- How do we keep it secure and still auto-configurable?
- How do we allow administrators to get a look at what is happening?
- How do we alert the administrators when something goes wrong?
These questions are important, specifically in the domain scenario (as opposed to the home user scenario), since the needs of the home user and the needs of the domain user are quite different.
Designing Inside the Box
Because many components within Windows depend on w32time to keep the clock synchronized, w32time cannot take (hardly) any dependencies itself. If w32time relied on component X to do something fancy, and component X relied on Kerberos, then we would have a problem, since Kerberos relies on w32time. This would create a circular dependency and, well, that’s a bad thing.
For this reason, w32time has a simplified mechanism to authenticate time syncs. More information on the authentication mechanism will be covered in a future post.
The first issue to address is finding someone to synchronize with. Each machine needs to sync with another machine to get its time. To do this efficiently and automatically, w32time uses the domain hierarchy created with the domain itself. In the simplest frame of mind, a domain consists of the following distinct entities (aka computers):
- Exactly one primary domain controller (or PDC-emulator)
- Zero or more replica domain controllers (DCs)
- Zero or more member computer (either server or workstations)
The inner working of what a domain is and how it operates is beyond the scope of this post, but this should be enough to provide the groundwork for our discussion.
Time Source Selection
Each member of the domain follows a different set of requirements, based on its role. Lets take a look at those roles:
- Primary Domain Controller – This machine is the authoritative time source for a domain. It will have the most accurate time available in the domain, and must sync with a DC in the parent domain (except in special cases).
- Replica Domain Controller – This machine will act as a time source for clients and member servers in the domain. A DC can sync with the PDC of its own domain, or a DC or PDC in the parent domain.
- Clients/Member Servers – This machine can sync with any DC or PDC of its own domain, or a DC or PDC in the parent domain
These are the default rules of where a machine can go looking for a time source. Keep in mind that there are corner cases where the rules can be bent a little. A few additional rules:
- A machine can only look for a time source in its own domain or the parent domain. A machine will never go to a domain on a parallel level, or a "skip-level" parent domain.
- Within a domain, a machine cannot sync with its own kind. A DC cannot sync with another DC. A client cannot sync with another client.
Also, you may have noticed that a PDC can only sync from a DC or PDC in the parent domain. Well, what if you are in the parent domain already? This is a special case, which is detailed below in the section "Special Case: The Root PDC".
The time source selection mechanism works great to enumerate the possible machines to sync from. The problem is that this usually leaves more than one machine as a possible partner. We need a way to pick the "best" one of the group, and that is what scoring does for us.
Each possible machine is given a score, based on certain criteria. Once all of the candidates have a score, w32time simply chooses the machine with the highest score. Here is what the scoring looks like:
- 8 points if the machine is in-site
- 4 points if the machine is "reliable"
- 2 points if the machine is in the parent domain
- 1 point if the machine is a PDC (or PDC emulator)
So why are these points given? Let’s look at the rules individually. Machines that are in the same site as the one in question have the best chance of providing us with good time.
- Machines that are out of site probably are disconnected physically in one way or another, and would likely introduce delay.
- A machine that is "reliable" is pre-configured to be directly connected to a reliable time source, such as a GPS or atomic clock. These devices provide very accurate, very stable time samples. If a machine is configured to sync directly with one of these devices, a registry value can be changed to indicate that this machine will be a source of reliable time.
- A machine higher in the forest will be closer to the root, and hence will have more accurate time than a machine in the current domain.
- A PDC (or PDC-emulator) will be more accurate than a DC in the same domain because it is guaranteed to sync with a machine in the parent domain.
From this, we can derive a score for each machine, and then choose the machine with the highest score.
When a machine boots up, it will go looking for a time source. Depending on its role, it will be required to choose from a subset of possible machines to sync with. But how do we prioritize between the available choices? Lets take a look at the following example:
This example utilized the graphic above. The domains will be referred to as the "Left Domain", the "Right Domain", and the "Parent Domain".
Computer foo has just been joined to the Left Domain as a regular client (not a DC), and it booting up for the first time on a domain. First, we need to enumerate which machines are possible as partners to sync with. We will look at each machine to see if it is a possible sync partner.
- "Domain Controller" [Left Domain] is a DC in the same domain, so it is a valid choice
- "PDC Emulator" [Left Domain] is a PDC in the same domain, so it is a valid choice
- "Domain Controller" [Parent Domain] is a DC in the parent domain, so it is a valid choice
- "PDC Emulator" [Parent Domain] is a PDC in the parent domain, so it is a valid choice
Which machines aren’t valid? Let’s take a look (and find out why)
- "Workstation" [Left Domain] is not a DC, so it is not a valid choice
- "Server" [Left Domain] is not a DC, so it is not a valid choice
- "Workstation" [Parent Domain] is not a DC, so it is not a valid choice
- "Server" [Parent Domain] is not a DC, so it is not a valid choice
- Anything in the [Right Domain] is not in the same domain, and not in the parent domain, so it is not a valid choice
Ok, so we have our possible choices, but now we need to prioritize them to pick the best one. To do this, we will utilize the scoring system. Assuming that our entire forest is in one site, and we don’t have any machines configured as "reliable":
- "Domain Controller" [Left Domain] Score = 8
- "PDC Emulator" [Left Domain] Score = 8 + 1 = 9
- "Domain Controller" [Parent Domain] Score = 8 + 2 = 10
- "PDC Emulator" [Parent Domain] Score = 8 + 2 + 1 = 11
So there we have it. The PDC in the parent domain will be our time source. But what if the [Left Domain] was put into a separate site?
Assume the same scenario as the above example, except that [Left Domain] exists in a different site from the rest of the forest. We will use the same logic applied above to determine a time source.
So the [Left Domain] is in a different site. Since the first part of time source selection does not take site location into consideration, we will get the same possible machines to sync with. However, the scoring system will provide us with a different machine when all is said and done. Lets look at how the scoring would now occur:
- "Domain Controller" [Left Domain] Score = 8
- "PDC Emulator" [Left Domain] Score = 8 + 1 = 9
- "Domain Controller" [Parent Domain] Score = 2 = 2
- "PDC Emulator" [Parent Domain] Score = 2 + 1 = 3
Because the DC and PDC in the [Parent Domain] are in a different site, they don’t get the +8 to their score. This leaves us with the PDC of the current domain, with a score of 9. But what about the PDC of the [Left Domain]?
Assume the same scenario as Example 2, Again, we will use the same logic applied above to determine a time source.
With the left domain in a different site from the rest of the forest, and with the PDC of the [Left Domain] being the authoritative time source for the [Left Domain], we will need to go out of site for a time source – we have no other choice. So we will look at the scores for the various eligible time sources:
- "Domain Controller" [Parent Domain] Score = 2 = 2
- "PDC Emulator" [Parent Domain] Score = 2 + 1 = 3
We cannot sync with any time sources in our own domain, so we only have the time sources from the [Parent Domain]. The scoring will give us the PDC of the [Parent Domain].
Plan B: Fail over
So what happens when things don’t go as planned? Windows Time Service has been built to handle fail over situations from the beginning. For a generic example, assume that a client is currently synchronizing with a time source. If the time source goes away for one reason or another, the client will need to go looking for another time source.
For this reason, we use the scoring system illustrated above. The client will reassess the available time sources, score each of them, and choose the best one. Since the previous time source (which was probably the best first choice) has gone away, w32time will pick the next highest scoring time source.
Special Case: The Root PDC
The PDC for the domain at the root of the forest (the root PDC) poses a problem. Since it has no time sources that are more authoritative than it, it cannot choose a time source automatically. Thus, the administrator will need to set one up manually, or the domain will operate in a "standalone" mode. In the case of a standalone domain, the root PDC will still be the authoritative time source, but its time will come from its own clock.
We have taken a look at how w32time operates in a domain at a very high level. Future posts will dive deeper into specific areas of w32time, and this will provide a groundwork for those other articles. If you have specific thoughts or questions about this post, please feel free to leave a comment. For general questions about w32time, especially if you have problems with your w32time setup, I encourage you to ask them on Windows Vista Applications section of the Microsoft Technet forums. One way or another, questions posted there should make their way to my inbox, and I will do my very best to answer them.
Configuring the Time Service: NtpServer and SpecialPollInterval
(Explaining specific W32TIME configurations)
One of the most talked about configuration options for W32Time has to be the list of time sources that W32Time connects to for synchronization. It is important to note that W32Time will only actively synchronize with one time source at a time, even though you are able to list more than one time source. The reason for this is simple: If your favorite time source goes down, it would be good to have a backup, or possibly a list of backups.
W32Time configures the list of time sources through the following key:
The NtpServer key is a space-delimited list of time servers, either as DNS address or as IP addresses. Each server in the list can optionally have a set of flags, which are denoted as a hex value at the end of the address, separated by a comma. We will get to the flags in a moment. Here are a few examples of NtpServer values:
time.windows.com,0x01 time.nist.gov,0x01 my.time.server.com,0x02
In the first example, we are specifying a time source of time.windows.com, with the 0x01 and 0x08 flags. In the second example, we are specifying 3 time sources, each with a different set of flags (0x01 & 0x08; 0x01; 0x02 respectively).
Now lets take a look at the flags. We have 4 possible flags:
For 99% of cases, we only care about the first two options, so that is where we will focus. If you use the SpecialInterval flag, then you need to also set the "SpecialPollInterval" key:
Normally, W32Time will poll (make a time request) on a floating interval, based on the quality of the time samples being returned by the time source. You can however specify a static interval that the time service will syncronize on. This value is in seconds. For example, if you set a of 3600, the time service will syncronize every hour (60 minutes * 60 seconds).
The second flag is the UseAsFallbackOnly option. Setting this flag will tell the time service that you want to try every other time server specified before trying this one.
That wraps up this one. As usual, If you have specific thoughts or questions about this post, please feel free to leave a comment. For general questions about w32time, especially if you have problems with your w32time setup, I encourage you to ask them on Directory Services section of the Microsoft Technet forums.
Configuring the Time Service: Enabling the Debug Log
(Explaining how to use the debug log for W32TIME)
The debug log is a powerful tool in the W32Time bag of tricks when you need to figure out why something isn’t working. The debug log tell you (for better or worse) what the Time Service is doing under the hood. Where it is connecting to, how long it is waiting between polls, etc.
In Windows Vista/Server 2008, we added the /debug option to the w32tm.exe command. This is the quickest and easiest way to configure the time service, and should be used if possible. A secondary option (if you are running XP/W2k3) is to edit the settings in the registry. Both will have the same effect, but using the w32tm command will keep you from having to get your hands dirty with registry editing. We will take a look at the w32tm command first:
Using the w32tm.exe command
To enable the w32time debug logging:
w32tm /debug /enable /file:C:windowstempw32time.log /size:10000000 /entries:0-300
The command uses the following options:
- /debug – This tells w32tm that you will be changing the debug log settings
- /enable – We are turning on the debug log (as opposed to turning it off)
- /file – Here we are specifying the full path of where the log file will be created; in this case: "C:windowstempw32time.log"
- /size: The maximum size of the log file, in bytes; in this case, it is 10 Mb. When the log is full, the w32time service will wrap to the top of the log file
- /entries: This field is a mask, where you can mask off certain types of entries. More about this later.
Turning off the debug log is just as easy:
w32tm /debug /disable
Using the registry
In essence, the w32tm.exe command shown above does exactly what we are about to do here. The only real difference is that when you use w32tm, it handles the reloading of the config, which will actually apply the values found in the registry. Since we will now be making the changes ourselves, we will need to reload the config ourselves.
Note: If you just want a quick .reg file that you can modify and merge, skip to the bottom of this post.
To get started, fire up the Windows registry editor:
Start -> Run -> Regedit.exe
Next, browse to the w32time config key, where we keep all of the w32time configuration:
Here, you will be creating the following three keys (if they do not exist):
- FileLogName (REG_SZ)
- FileLogSize (REG_DWORD)
- FileLogEntries (REG_SZ)
Once they are created, go ahead and add the values that you want.
- FileLogName should point to the full path where you want to store the log file. C:windowstemp is the preferred location. Just ensure that a service running as LOCAL_SYSTEM has write access to the directory.
- FileLogSize should be the maximum size of the log file, in bytes. Remember to convert to hex as needed 10Mb in hex would be 0x989680.
- FileLogEntries is a numerical mask of the entries that you want to have logged in the log file. Each number in the range 1 – 300 represents a particular logging entry, such as polling intervals, packets received, etc. For the sake of simplicity, you should enable all logging. This is really only useful if you need to track a particular entry over a long period of time, and you don’t want all of the other logging to clobber your file. Using 0-300 will guarantee that everything possible will be logged.
Once you apply the changes to the registry, you need to tell the w32time service that it needs to re-read the configuration information. To do this, you can use the following command:
w32tm /config /update
Example .reg file
Here is an example .reg file you can modify to simplify the process:
As usual, If you have specific thoughts or questions about this post, please feel free to leave a comment. For general questions about w32time, especially if you have problems with your w32time setup, I encourage you to ask them on Directory Services section of the Microsoft Technet forums.
Configuring the Time Service: Max[Pos/Neg]PhaseCorrection
(Explaining how to protect yourself if your external/internal clock goes crazy by jumping back/forward in time)
In the last few months of the Windows Server 2008 development, a good friend of mine was discussing a problem they have been seeing with customers. The problem, lovingly titled as the "Large Time Jump" issue involves a machine in the domain (usually the PDC) making a large jump in time, either forward or backwards. Regardless of the direction of the jump, the results are equally catastrophic.
How it all happens
Lets look at how this can happen. Some of the causes are more likely than you think. Here is a quick list:
- Hardware changes. It is quite common for a company to have a hardware failure for a DC, even the PDC. Assume a situation where the motherboard in the PDC fails. After the hardware is all installed, the technician boots the machine back up. As hoped, the machine looks to be running just fine. However, the part that the manufacturer shipped out is considered an "after market" part, so the BIOS isn’t configured correctly. By default, the BIOS date is set to the manufacture date of the motherboard – sometime in the past. Of course, when the technician gets the new part, he sees that everything looks to be in order, but he neglects to notice that the date it wrong.
- Bad external time source. Sometimes, the time source that you are syncing the PDC with (such as a network device, like a high-end router) can get a wild hair and jump to an invalid time.
- Bad CMOS battery. Every once in a while, a BIOS battery in a computer will fails. After all, they don’t last forever. If the PDC isn’t configured to sync with an external time source, it will by default use its own internal clock, which is based on the BIOS clock. A computer cannot maintain it’s time across a reboot, so it stores the current time in the BIOS. If the CMOS battery has failed, the time after the reboot will be incorrect.
- User error. It is not outside of the realm of possibility for someone to log onto the root PDC and change the clock. It sounds unlikely, but it is still possible.
Fixing the glitch
The real problem here is that in a domain environment, domain controller completely and utterly trust the time that they get from another DC. The root PDC is a special case, but it still counts.
The solution is to do a "sanity check" on the time that any domain controller gets from anywhere. In this way, you are ensuring that if a domain controller gets out of whack, it will not spread that time to other DCs. This is done by setting the MaxPosPhaseCorrection and MaxNegPhaseCorrection values.
MaxPosPhaseCorrection and MaxNegPhaseCorrection limit the allowable offset taken from a time sample. When any instance of w32time polls another machine for the time, it will determine the offset between the time source and itself. This value is known as the "Sample Offset". Before the samples is used by the time service, it will be compared to the phase correction limits. If the sample offset is greater than the phase correction limit, then sample will be thrown out and a "TOO BIG" event will be generated. The event contains all of the information about the time sample, including who sent it. The purpose of doing this is to isolate domain controllers in the network who get into a bad time state. In this way, the other DCs will log and error about the time samples being too big rather than blindly accepting it.
Knowing your limits
The next question is: What is an acceptable limit of phase correction? After much analysis and debate, we are advising a value of 48 hours. If a domain controller receives a sample that says it is more than 48 hours off, either in the future or in the past, the domain controller will throw it out. However, every customer should evaluate their own situation to be sure.
This is advised for both limits, both forward and backwards.
A packaged solution
Here is an example of a registry entry that you can merge on-demand to apply a 48 hour limit to the phase correction:
As usual, If you have specific thoughts or questions about this post, please feel free to leave a comment. For general questions about w32time, especially if you have problems with your w32time setup, I encourage you to ask them on Directory Services section of the Microsoft Technet forums
* This posting is provided "AS IS" with no warranties and confers no rights!
* Always evaluate/test yourself before using/implementing this!
* DISCLAIMER: https://jorgequestforknowledge.wordpress.com/disclaimer/
############### Jorge’s Quest For Knowledge #############
######### http://JorgeQuestForKnowledge.wordpress.com/ ########