Back to the Basics; Recovering from a Crash

TL;DR summary of how we recovered test.local from a crash, and why we write these articles.

Welcome to Part III of our ‘Back to the Basics’ series!

Part I: NTDS.dit vs SAM

Part II: Ownership Matters

Part III: Recovering from a Crash

Part IV: Setting up a Simple Honeypot Account

Part V: Automating DC Deployment

Part VI: Sometimes it’s the dumbest thing

Background

We had been running test.local in the free version of ESXi, which itself was running in the free version of VMware Player on my desktop. I had given it 1 TB of HD and 36 GB of RAM. It worked well enough and was a free solution. I could use the desktop for other things like gaming and basic office tasks, no issues.

The problem was of course that the underlying OS on the desktop has to pull updates and restart sometimes. I would occasionally get a corrupted *.vmdk in ESXi during this process. For some reason that one vendor’s VM with a pre-built AD loved corrupting itself, to the point where I keep a saved session in PuTTY just for SSHing into ESXi and copy/pasting the below:

cd “vmfs/volumes/datastore1/AD Security”
vmkfstools -x check “AD Security.vmdk”
vmkfstools -x repair “AD Security.vmdk”

However I didn’t think the *.vmdk or *.vmx of the ESXi VM itself would get corrupted. Unfortunately it did, fortunately it occurred right after I finally pulled the proverbial trigger and ordered a refurbished server off Amazon.

Restoring the domain

The steps to restore the domain are below. The ‘on prem’ AD portions were covered to a degree previously. I cleaned up and consolidated things a bit, as well as adding the AAD portion, so I’m making a new article. I have said before that I take notes because my memory sucks, and I blog because it forces me to take better notes. If you stumbled on this because you were Googling “howto seize FSMO roles” or “manually set DNS PowerShell” or “cheap refurbished server for home lab” or etc and any of this answers your question then I am happy I could help.

Honestly though, the big reason while I write any of this is because I go back and check myself all the time.

So without further ado, here’s a bullet list of steps to restore the domain, followed by explanations of each:

Seizing the FSMO roles

I had one backup Domain Controller (DC) in VMware Player on my laptop (a refurbished Dell 5580, so fitting for this lab). DCs are great in that by default they sync AD and Group Policy automatically. In this lab they also sync the share drive as we are using DFS replication. Hence the first thing to do is to seize the FSMO roles.

Move-ADDirectoryServerOperationMasterRole -Identity “BackupDC4” PDCEmulator, RIDMaster, InfrastructureMaster, SchemaMaster, DomainNamingMaster -Force

This is covered in more detail here. It bears repeating that if you are going to seize the roles following a crash then it is NOT recommended to bring the prior PDC back online as it will likely cause issues. If you are simply moving the roles to a new DC and all DCs are still online then do not use the ‘-force’ in the command above.

On a funny sidenote; Microsoft deprecated the command ‘netdom query fsmo’, however they still recommend using it in Azure AD Health Monitoring messages.

Setup ESXi

Thanks to the wonders of the Internet and used devices I found a refurbished HP ProLiant DL360p Gen8 with 96 GB of RAM and a pair of 300 GB HDs for $220. I grabbed a 1 TB SSD and a pack of drive caddies for about $100, so for right around $320 we were in business.

Mishka is using an older monitor that has VGA on her ‘tiny computer’ and like most I have a couple of USB corded mice & keyboards laying around. ESXi was simple to install and configure using the ‘crash cart’ and a USB DVD-RW drive. Simply set the IPv4 settings on ESXi and the rest is done remotely.

Configure the new VMs

Adding a new VM to ESXi and loading the OS from an ISO is quite simple, to the point I don’t even bother to include it in my notes. The important part is that you plan out your IP scheme at least a little bit ahead of time. For example I use *.100 for ESXi, .101-110 for servers, and .111-.120 for clients. Obviously use whatever makes sense for you, this is just what’s easy for me to remember when I am RDPing and PSSessioning around, not to mention using Invoke-Command for ‘one to many’.

I simply run the below to get new VMs online:

Write-Host “Welcome to Mishky’s networking setup script for new Windows servers”
Write-Host “Please enter the below info for IPv4 to set a static IP and the right DNS”
Write-Host “FYSA Mishky also disables IPv6 & NetBIOS, because the network isn’t using them”
$IP = read-host “Please enter the server’s IP address”
$Gateway = read-host “Please enter the gateway IP address”
$ServerName = read-host “Please enter the server’s name”
#Disable IPv6
Disable-NetAdapterBinding -InterfaceAlias “Ethernet0” -ComponentID ms_tcpip6
#Disable NetBIOS
$regkey = “HKLM:SYSTEM\CurrentControlSet\services\NetBT\Parameters\Interfaces”
Get-ChildItem $regkey |foreach { Set-ItemProperty -Path “$regkey\$($_.pschildname)” -Name NetbiosOptions -Value 2 -Verbose}
#Set IPv4 address, gateway, & DNS servers
New-NetIPAddress -InterfaceAlias “Ethernet0” -AddressFamily IPv4 -IPAddress $IP -PrefixLength 24 -DefaultGateway $Gateway
Set-DNSClientServerAddress -InterfaceAlias “Ethernet0” -ServerAddresses (“192.168.0.101”, “192.168.0.102”, “192.168.0.104”, “<ISP DNS #1>”, “<ISP DNS #2>”)
#Rename the server
Rename-Computer -NewName $ServerName -LocalCredential Administrator -PassThru -restart -force

Then join the lab’s domain:

Write-Host “Join the test.local domain”
$User = Read-Host “Please enter your domain admin username”
Add-Computer -DomainName test.local -Credential $User -restart -force

Finally if the VM will be part of DFS:

#Prep a new folder for adding to an existing DFS namespace
$NewDirPath = “C:\Test Share”
$NewShareName = “Test Share”
try
{
Get-Item -Path $NewDirPath -ErrorAction Stop
}
catch
{
Write-Host “Dir not found. Cleared hot.” -ForegroundColor Green
}
New-Item $NewDirPath -ItemType directory
New-SMBShare -Name $NewShareName -Path $NewDirPath
#Install DFS tools
Add-WindowsFeature -Name FS-DFS-Namespace
Add-WindowsFeature -Name FS-DFS-Replication
Add-WindowsFeature -Name RSAT-DFS-Mgmt-Con
#Add a new server to DFS. BackupDC4 is already hosting the namespace \\test.local\Mishky’s Share\Test Share
$newDFSserver = “TestDC”
New-DfsnFolderTarget -Path “\\test.local\Mishky’s Share\Test Share” -TargetPath “\\$newDFSserver\Test Share” -ReferralPriorityClass SiteCostNormal
Get-DfsReplicationGroup -GroupName “test.local\Mishky’s Share\Test Share” | Get-DfsReplicatedFolder -FolderName “Test Share” | Add-DfsrMember -ComputerName $newDFSserver
Add-DfsrConnection -GroupName “test.local\Mishky’s Share\Test Share” -SourceComputerName BackupDC4 -DestinationComputerName $newDFSserver
Set-DfsrMembership -GroupName “test.local\Mishky’s Share\Test Share” -FolderName “Test Share” -ComputerName $newDFSserver -ContentPath “C:\Test Share”
#Confirm
Get-DfsReplicationGroup -GroupName “test.local\Mishky’s Share\Test Share” | Get-DfsReplicatedFolder -FolderName “Test Share” | Get-DfsrMembership

Move the FSMO roles to the new DC

This part is simple, just run

Move-ADDirectoryServerOperationMasterRole -Identity “TestDC” PDCEmulator, RIDMaster, InfrastructureMaster, SchemaMaster, DomainNamingMaster

Restore the Azure AD Connect member server

Simply stand up a VM in ESXi, set the IPv4, DNS, and domain membership as shown above, and then install Microsoft’s Azure AD Connect. We outlined setting up Azure AD Connect here. As before we are using the default settings in Azure AD Connect; Password Hash Sync and letting the install automatically create the account used for syncing.

You can skip the steps about configuring Group Policy and alternate UPNs as this was already done. Thanks to having a backup DC there is no need to re-do it.

Azure recommends that you roll the decryption key on the computer account Azure uses with SSO. Just run

#https://azurecloudai.blog/2020/08/03/roll-over-kerberos-decryption-key-for-seamless-sso-computer-account/
Set-Location “C:\Program Files\Microsoft Azure Active Directory Connect\”
Import-Module .\AzureADSSO.psd1
New-AzureADSSOAuthenticationContext
Get-AzureADSSOStatus | ConvertFrom-Json
Update-AzureADSSOForest

Verify in AAD:

Disable/delete the old account used for syncing

It’s a good idea to disable the old account Azure AD Connect used for syncing. It’s easy to find since by default it includes “MSOL” in the account name and the computer name of the Azure AD Connect server in the account description.

Get-ADUser -Filter {Name -like “*MSOL*”} -Properties * | Select-Object SamAccountName, CreateTimeStamp, Description
Disable-ADAccount MSOL_xyz

Please note that some of these screenshots are redacted IOT omit publicly accessible AAD info.

Naturally you’ll want to disable it in AAD as well. Either of these works, it’s just shooter’s preference as far as which module you like for managing AAD.

Connect-AzureAD
Set-AzureADUser -ObjectID <ID> -AccountEnabled $false
Connect-MsolService
Set-MsolUser –ObjectId <ID> -BlockCredential $true

You can simply

Get-AzureADUser

Copy/paste the ObjectID for the old sync account, and run the above to disable it. This works fine on my dinky little hybrid AD environment since we have more DCs than we do users. But what if you need to locate this account out of hundreds or thousands?

Rather annoyingly AAD does not use similar syntax to AD in many cases, so it’s not as simple as

Get-ADUser -Filter {Name -like “*Sync*”}

However we can use

Get-AzureADUser -All $true | Where-Object {$_.DisplayName -like “*Sync*”}

And then copy/paste the ObjectID. The account is easy to locate using this method as the only two that should show up are the old and new sync accounts. I used a computer name for my new member server that is hosting Azure AD Connect that makes sense this time around. Last time I was just using a pre-existing member server. The computer name hosting Azure AD Connect is part of the sync account’s UPN, so this makes it easy to tell which account is which.

The ‘easy button’ method is to just wait at least an hour after disabling these accounts and then confirm in AAD that it is still syncing with ‘on prem’ AD.

Or if you’re impatient then run

Get-MsolCompanyInformation | Select DisplayName, LastDirSyncTime, LastPasswordSyncTime

Please note that this is in Greenwich Mean Time (GMT).

If all is well then go ahead and delete the old accounts from AAD:

Remove-AzureADUser -ObjectId <ID>

And from AD:

Remove-ADUser MSOL_xyz

One may also wish to remove the old MSOL account from the AD ACLs it was added to. The Azure AD Connect installer delegates certain privileges to the AD account it creates automatically. One can check this with the query

(Get-Acl (Get-ADDomain).DistinguishedName).Access | Where-Object {$_.IdentityReference -like “*MSOL*”}

Please note that deleting an account from AD does NOT remove it from the AD ACLs it was given privileges on. It does mean that the ACLs will now show a SID instead of the account name. You can remove it from all ACLs in AD by removing it from the domain root’s ACL, since it inherits down from there.

Import-Module ActiveDirectory
Set-Location AD:
#Problem = (Get-Acl “ou=vips,dc=test,dc=local”).Access | Where-Object {$_.IdentityReference -like “*S-1–5–21–4103247791–2828088783–3009141321–3631*”}
#https://ex-shell.com/2017/06/16/remove-a-usergroup-permission-on-an-ad-object-via-powershell/
$DistinguishedName = (Get-ADDomain).DistinguishedName
#$user = “domainjdoe” (to use this substitute $user for $Stale_SID on line 15)
$Stale_SID = “S-1–5–21–4103247791–2828088783–3009141321–3631”
#Collect the current ACL
$Acl = Get-Acl $DistinguishedName
#Loop each access permission in the ACL
foreach ($access in $acl.Access)
{
if ($access.IdentityReference.Value -eq $Stale_SID)
{
$acl.RemoveAccessRule($access)
}
}
#Set the ACL Back to the AD Object
set-acl $DistinguishedName -AclObject $acl

This removes all ACEs from the domain root ACL with the given SID. If you haven’t deleted the account yet, like I did, then just substitute accordingly.

Remove the old DCs from Azure Health Monitor and add the new ones

If you are using the AAD Health Monitor then you will obviously want to remove the old DCs that no longer exist thanks to the crash. Once that’s complete simply install the Health Monitor on the new DCs. I keep all this stuff on the lab’s share drive to make this simple.

Summary

I restored a client VM from a backup that was a year old. It had a “trust relationship” issue, so I just kicked it offline, logged into cache, dropped it off the domain, re-added it, and let it pull from Group Policy. Maybe not the most ideal method but it worked fine. My users who are synced with AAD can utilize Seamless SSO.

The Cloud and Group Policy ‘on prem’ take care of most of the recovery. The great thing about Group Policy is that you just need to spin up replacement VMs, configure them via copy/pasting some PowerShell, and let Group Policy push the domain configuration to them.

The trick is to keep a backup DC on separate physical hardware.

References

Transfer or seize FSMO roles: https://docs.microsoft.com/en-us/troubleshoot/windows-server/identity/transfer-or-seize-fsmo-roles-in-ad-ds

Roll the SSO: https://azurecloudai.blog/2020/08/03/roll-over-kerberos-decryption-key-for-seamless-sso-computer-account/

Disable AAD account: https://docs.microsoft.com/en-us/microsoft-365/enterprise/block-user-accounts-with-microsoft-365-powershell?view=o365-worldwide

Manage AAD in PS: https://docs.microsoft.com/en-us/microsoft-365/enterprise/view-user-accounts-with-microsoft-365-powershell?view=o365-worldwide

Find AAD users via filter: https://www.easy365manager.com/get-azureaduser-filter-example/

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rich

I work various IT jobs & like Windows domain security as a hobby. Most of what’s here is my notes from auditing or the lab.