Jorge's Quest For Knowledge!

All About Identity And Security On-Premises And In The Cloud – It's Just Like An Addiction, The More You Have, The More You Want To Have!

(2013-09-11) Follow-Up On “AD DB Becomes Corrupted When W2K12 Hyper-V Host Server Crashes”

Posted by Jorge on 2013-09-11


The guys from the AskPFE Team Blog have written a great follow-up article about the corruption of Active Directory databases in virtualized domain controllers running on Windows Server 2012 Hyper-V host computers. Kudos and credits of course go to the writer of the post on the AskPFE Team Blog. BE AWARE THAT THIS NOW ALSO APPLIES TO W2K8R2 HYPER-V HOSTS AND OTHER HYPER-V GUEST!

SOURCE: Clarifications on KB 2853952, Server 2012 and Active Directory error c00002e2 or c00002e3

<QUOTE SOURCE=”Clarifications on KB 2853952, Server 2012 and Active Directory error c00002e2 or c00002e3”>

Hey y’all, Mark and Tom here to clear up some confusion on MSKB 2853952, that describes the corruption of Active Directory databases in virtualized domain controllers running on Windows Server 2012 Hyper-V host computers.

The article was released in July 2013 with title “Active Directory database becomes corrupted when a Windows Server 2012-based Hyper-V host server crashes” but has sense since been renamed to “Loss of consistency with IDE-attached virtual hard disks when a Windows Server 2012-based Hyper-V host server experiences an unplanned restart” Confused already?  Please continue reading!!

The Problem

Following “hard” shutdowns (i.e. the plug is pulled) on Windows Server 2012  Hyper-V hosts, virtualized Domain Controller role computers may experience boot failures with error 2e2.

2e2 boot failures have occurred for years on DCs running on physical hardware when some specific guidelines (we’ll get to those in a minute) were not being followed. Deploying Active Directory – and therefore, AD databases, which are really just Jet databases, (as discussed in our AD Internals post) in a virtual environment introduces another additional root cause which is mitigated by MSKB 2853952.

The KB tells us that Jet databases placed on virtual IDE drives on virtual guests are vulnerable to corruption when the underlying Windows Server 2012 hyper-V host computer experiences an unplanned shutdown. Possible causes for such unscheduled shutdowns might include a loss of power to the data center or simply the intern tripping on the power cable in the data center. It has happened before and it will happen again.

Domain controller log files or database files that are damaged by an unscheduled shutdown may experience normal mode boot failures with a stop c00002e2 or c00002e3 error. If auto reboot is enabled on your domain controllers following a blue screen, DCs may continually reboot once their hyper-V host restarts.

Text and graphical examples of the c00002e2 are shown below

c00002e2 Directory Services could not start because of the following error: %hs Error Status: 0x%x. Please shutdown this system and reboot into Directory Services Restore Mode, check the event log for more detailed information.”

image

Figure 1: Uh oh…

The KB goes on to explain that this behavior occurs because the Hyper-V virtual IDE controller reports incorrectly “success” if the guest requests to disable the disk cache. Consequently, an application, like Active Directory, may think an I/O was written directly to disk, but was actually written to the disk cache. Since the power was lost, so was contents of the disk cache.

The Fix

There are four fundamental configuration changes to lessen the possibility from this occurring (whether DCs are deployed on physical or virtual machines):

  1. Make sure you are running on Server class hardware. That means that physical hard drives hosting Active Directory databases and other jet-dependent server roles (DHCP, FRS, WINS, etc) reside on SAS drives as opposed to IDE drives. IDE drives may not support forced unit access that is needed to ensure that critical writes by VM guests get transitively committed through the virtual hosts to underlying disk.
  2. Drive controllers should be configured with battery-backed caching controllers so that jet operations can be replayed when the hyper-V hosts and guests are restarted.
  3. If Hyper-V hosts can be configured with UPS devices so that both the host and the guest enjoy graceful shutdowns in the event of power losses, all the better.
  4. If you feel like the auto-reboot behavior masks the 2e2 or 2e3 boot errors, then disable the “automatically restart” option by going to the advanced tab on system properties under startup and recovery.

Next, MSKB 2853952 or the July 2013 cumulative rollup 2855336 (we’ve detailed these rollups in an earlier post) which includes standalone QFE 2853952 should be installed on Windows Server 2012 Hyper-V hosts and Windows Server 2012 guests.

A pending update, currently scheduled for release today (September 10th, 2013) will update 2853952 to apply to

  • Windows Server 2008 R2 Hyper-V hosts.
  • Windows 7 and Windows Server 2008 R2 virtual guests running on either Windows Server 2008 R2 or Windows Server 2012 Hyper-V hosts.

In summary, the updated version of KB 2853952 should be installed on both Windows Server 2008 R2 and Windows Server 2012 Hyper-V hosts (using the existing version of KB 2853952), and Windows 7 / Windows Server 2008 R2 virtual guests utilizing a jet-based store like Active Directory.

A workaround that can be deployed NOW, is to deploy jet databases, including the Active Directory  database and log files on virtual SCSI drives when Windows Server 2008 R2 and Windows Server 2012 virtual guests resides on Windows Server 2012 virtual hosts.

The reason SCSI or Virtual SCSI is recommended is that SCSI controllers will honor forced unit access or requests to disable write cache. Forced Unit Access (FUA) is a flag that NTFS uses to bypass the cache on the disk – essentially writing directly to the disk. SCSI has supported this via the t10 specification but this support was not available in the original t13 ATA specifications. While FUA support was added to the t13 ATA specifications after the original release, support for this has been inconsistent. More importantly, Windows does not support FUA on ATA drives.

Active Directory uses FUA to perform un-buffered writes to preserve the integrity of the database in the event of a power failure. AD will behave this way on physical and virtual platforms. If the underlying disk subsystem does not honor the FUA write, there could be database corruption and/or a “USN Bubble”. Further, some SCSI controllers feature a battery backed cache, just in case there are IOs still in memory when power is lost. (Thanks to fellow PFE Brent Caskey for doing some digging on this)

Applying the July update rollup and the pending September updates on the relevant Hyper-V hosts and virtual guests will greatly reduce the likelihood of damage to jet files when Hyper-V guests reside on virtual IDE disks. However the recommendation is still to use virtual SCSI disks for jet-based workloads and other critical data.

FAQ about this update

This update probably sent many of your admin spidey sense tingling and for good reason. Let’s try to answer ones that you are thinking about.

Does this only affect Active Directory?

By reading the actual problem you’ll notice it’s not a problem with Active Directory itself so the answer is no. The title of the KB has been updated to reflect this and hopefully provide some clarity. The problem is with applications that require I/O guarantee. IDE doesn’t provide I/O guarantee and neither does Virtual IDE.

 

How Should I Be Configured?

You are going to want to have your data stored on Virtual SCSI (vSCSI) disks for the reasons stated above.

What about physical machines on IDE drives, are they at risk too?

Yes. If you still have physical machines that are running on IDE drives, you will want to try to move the server data to SCSI disks as well.

I have all my data on the boot drive, can I boot off Virtual SCSI?

You cannot. In Server 2012 R2 we actually have Virtual SAS which you can use for both boot and data. For now you’ll need to use a separate virtual SCSI disk for data.

Is only Server 2012 affected by this?

No this also affects 2008 R2. However the new update is now for both 2008 R2 and 2012.

Where do I apply this update, host, guest or both?

The update should be applied to Windows Server 2012 hosts, and in a post July 2013 update, Windows Server 2008 R2 Hyper-V hosts, and Windows Server 2012/Windows Server 2008 R2 / Windows 7 virtual guests.

Anything else we should be doing for this?

You’ll want to make sure any operational and configuration changes are in place to avoid any unscheduled down time until you are able to move the data to a virtual SCSI disk and apply the appropriate updates.

 

I have a lot of DCs that are set up improperly, a little help?

Tom recently helped out a customer with moving their DB and logs to SCSI disks. Thanks to PowerShell and his powershell-fu, this is all pretty simple but it does take the AD service down on the target DC for a period of time.

First, on the Hyper-V host, you’ll need to attach a new disk to the virtual machine. Launch PowerShell as an admin on the host. Pre-identify the VM name and the physical location where you’ll create the new VHDX file.

Then run:

$vhd = New-VHD -Path [PATH TO VHDX] -SizeBytes 10GB -Dynamic:$false Add-VMHardDiskDrive -path $vhd.path -ControllerType:SCSI -ControllerNumber 0 -VMName [VMNAME]

Obviously, replace the bracketed parameters with your parameters. Also modify the disk size to something appropriate for your database. 10GB will cover most customers.

After, you need to log on to the guest VM to create the volume and move the DB. In the example below, we’ve used drive letter E. Modify this based on your company standards or configuration.

First, check to see if the disk is offline, and set it to online if it is.

Get-Disk | Where { $_.OperationalStatus -eq "Offline" } | Set-Disk -IsOffline:$false

Once it’s online, you just need to create the volume. PowerShell makes this very easy on Windows Server 2012. If you’re using 2008, you will need to replace this part with diskpart commands. For the sake of brevity, we’ll just cover PowerShell.

Get-Disk | Where { $_.PartitionStyle -eq "RAW" } | Initialize-Disk -PartitionStyle:GPT -PassThru | New-Partition -UseMaximumSize -DriveLetter E | Format-Volume -Full:$false -FileSystem:NTFS -NewFileSystemLabel "NTDS" -Force

Ok, that doesn’t look easy, but it’s all one line, making use of the PowerShell pipeline. As we complete each task, we pass the result to the next cmdlet. Finally, we end up with an E drive. Next, we need to move the database and logs. We’ll use ntdsutil to do this.

First, stop NTDS. Then, run ntdsutil. Modify the paths below to fit the drive letter you chose above.

#stop NTDS Stop-Service NTDS -Force #use NTDSutil to move logs/db ntdsutil activate instance ntds files move db to e:\NTDS move logs to e:\NTDS quit quit

Verify the output from ntdsutil. If you’re scripting this out, I recommend extensive testing ahead of time. You may be able to use Test-Path to figure out if the database and logs moved successfully or not. Assuming everything checked out, run Start-Service NTDS to restart NTDS. Congrats, you’re made it to SCSI disks.

Any questions please let us know in the comments.

Mark “Crash test dummy #1” Morowczynski and Tom “Crash test dummy #2” Moser

</QUOTE SOURCE=”Clarifications on KB 2853952, Server 2012 and Active Directory error c00002e2 or c00002e3”>

Cheers,

Jorge

———————————————————————————————

* This posting is provided "AS IS" with no warranties and confers no rights!

* Always evaluate/test yourself before using/implementing this!

* DISCLAIMER: https://jorgequestforknowledge.wordpress.com/disclaimer/

———————————————————————————————

############### Jorge’s Quest For Knowledge #############

######### http://JorgeQuestForKnowledge.wordpress.com/ ########

———————————————————————————————

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.