What happened?
Between 15:45 UTC on 29 October and 00:05 UTC on 30 October 2025, customers and Microsoft services leveraging Azure Front Door (AFD) may have experienced latencies, timeouts, and errors.
Affected Azure services include, but are not limited to: App Service, Azure Active Directory B2C, Azure Communication Services, Azure Databricks, Azure Healthcare APIs, Azure Maps, Azure Portal, Azure SQL Database, Azure Virtual Desktop, Container Registry, Media Services, Microsoft Copilot for Security, Microsoft Defender External Attack Surface Management, Microsoft Entra ID (Mobility Management Policy Service, Identity & Access Management, and User Management UX), Microsoft Purview, Microsoft Sentinel (Threat Intelligence), and Video Indexer.
Customer configuration changes to AFD remain temporarily blocked. We will notify customers once this block has been lifted. While error rates and latency are back to pre-incident levels, a small number of customers may still be seeing issues and we are still working to mitigate this long tail. Updates will be provided directly via Azure Service Health.
What went wrong and why?
An inadvertent tenant configuration change within Azure Front Door (AFD) triggered a widespread service disruption affecting both Microsoft services and customer applications dependent on AFD for global content delivery. The change introduced an invalid or inconsistent configuration state that caused a significant number of AFD nodes to fail to load properly, leading to increased latencies, timeouts, and connection errors for downstream services.
As unhealthy nodes dropped out of the global pool, traffic distribution across healthy nodes became imbalanced, amplifying the impact and causing intermittent availability even for regions that were partially healthy. We immediately blocked all further configuration changes to prevent additional propagation of the faulty state and began deploying a ‘last known good’ configuration across the global fleet. Recovery required reloading configurations across a large number of nodes and rebalancing traffic gradually to avoid overload conditions as nodes returned to service. This deliberate, phased recovery was necessary to stabilize the system while restoring scale and ensuring no recurrence of the issue.
The trigger was traced to a faulty tenant configuration deployment process. Our protection mechanisms, to validate and block any erroneous deployments, failed due to a software defect which allowed the deployment to bypass safety validations. Safeguards have since been reviewed and additional validation and rollback controls have been immediately implemented to prevent similar issues in the future.
How did we respond?
What happens next?
Our team will be completing an internal retrospective to understand the incident in more detail and will share findings within 14 days. Once we complete our internal retrospective, generally within 14 days, we will publish a final Post Incident Review (PIR) to all impacted customers.
To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts
For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs
The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability may vary between customers and resources – for guidance on implementing monitoring to understand granular impact: https://aka.ms/AzPIR/Monitoring
Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness
How can we make our incident communications more useful?
You can rate this PIR and provide any feedback using our quick 3-question survey: https://aka.ms/AzPIR/YKYN-BWZ