Hybrid Active Directory with Real Disaster Recovery

Everyone has a disaster recovery plan. Almost nobody has tested it. Here’s how I built hybrid AD at Whitepages with DR that actually works.

The Architecture

Two on-premises domain controllers, two in AWS. The AWS DCs aren’t just replicas sitting idle — they actively serve authentication for cloud workloads. This means failover isn’t a cold start; it’s just the on-prem DCs going offline while AWS continues serving.

The Hard Part: DNS

AD is DNS. If your DNS failover doesn’t work, your AD failover doesn’t work. I configured conditional forwarders and split-brain DNS so that cloud workloads always resolve against the AWS DCs, and on-prem workloads prefer on-prem DCs with automatic fallback to AWS. This was the most time-consuming part of the project and the most important.

Testing DR (For Real)

Quarterly DR tests where we actually shut down the on-prem DCs. Not a tabletop exercise — a real failover. We documented every service that broke, fixed it, and added it to the runbook. After three quarterly tests, our failover was genuinely clean. Before testing, we had six services that silently depended on on-prem DC availability that nobody knew about.

The Azure AD Bridge

With the DC upgrade to Server 2016, we enabled Azure AD Connect for hybrid join. This gave us Intune device management alongside traditional GPO — a bridge strategy while we migrated workloads to cloud-native management. The key is not trying to do everything at once. Hybrid is a transition state, not an end state.