Below is the practical active-passive APIM circuit breaker pattern for two Microsoft Foundry model endpoints, for example:
- Primary: Sweden Central
- Backup: East US 2
- APIM: front door / AI Gateway
The recommended APIM shape is:
Client
↓
APIM API
↓ set-backend-service
APIM backend pool
├─ foundry-primary priority 1
└─ foundry-backup priority 2
Use backend pool + circuit breaker + retry together. APIM backend pools support round-robin, weighted, and priority-based routing. With priority-based routing, lower-priority backends are used only when higher-priority backends are unavailable because their circuit breaker has tripped.
Reference: https://learn.microsoft.com/en-us/azure/api-management/backends
For failover, the primary and backup should use the same model family, version, and API contract. Microsoft’s guidance is to avoid failing over from one model/version to another because that can change client behavior unexpectedly.
Reference: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/azure-openai-gateway-multi-backend
For Azure OpenAI-compatible Foundry deployments, I strongly recommend using the same deployment name in both regions.
Example:
Sweden Central Foundry deployment: gpt-5-prod
East US 2 Foundry deployment: gpt-5-prod
This matters because Azure OpenAI-style calls often include the deployment name in the path:
POST /openai/deployments/gpt-5-prod/chat/completionsIf the backup deployment has a different name, APIM may fail over correctly but the backup endpoint can still reject the request because the path references a deployment name that does not exist there.
You can import Microsoft Foundry model endpoints directly into API Management. The Foundry import wizard can configure operations, a backend resource, set-backend-service, and managed identity authentication to the backend.
Reference: https://learn.microsoft.com/en-us/azure/api-management/azure-ai-foundry-api
In APIM:
APIM instance
→ APIs
→ + Add API
→ Microsoft Foundry
→ Select Foundry resource / model deployment
→ Choose client compatibility
Choose the compatibility option based on how clients call the endpoint:
| Option | Use when |
|---|---|
| Azure OpenAI | You are exposing Azure OpenAI in Foundry Models with /openai/deployments/... paths |
| Azure AI | You want Azure AI Model Inference API style with /models/... |
| Azure OpenAI v1 | You want the newer Azure OpenAI v1 endpoint style |
APIM supports these Foundry import modes.
Use APIM managed identity rather than API keys where possible.
In APIM:
APIM instance
→ Managed identities
→ System assigned
→ On
Then assign the APIM managed identity access to both Foundry/Azure OpenAI resources.
For Azure OpenAI in Foundry Models, assign:
Cognitive Services OpenAI User
APIM can use authentication-managed-identity to obtain a token for https://cognitiveservices.azure.com, and Microsoft documents this as the managed identity pattern for Azure OpenAI / Foundry model deployments.
Reference: https://learn.microsoft.com/en-us/azure/api-management/api-management-authenticate-authorize-ai-apis
In APIM:
APIM instance
→ APIs
→ Backends
→ + Add
Create:
foundry-primary
foundry-backup
Example backend URLs:
https://<primary-foundry-or-aoai-endpoint>.services.ai.azure.com/openai
https://<backup-foundry-or-aoai-endpoint>.services.ai.azure.com/openai
or for older Azure OpenAI resource endpoints:
https://<primary-resource>.openai.azure.com/openai
https://<backup-resource>.openai.azure.com/openai
Use the same path shape that your imported API expects.
For model endpoints, the most important status code is usually:
429 Too Many Requests
Azure OpenAI / Foundry model endpoints may return 429 with a Retry-After header. Microsoft specifically recommends circuit breaker rules that handle 429 and accept the Retry-After duration.
Reference: https://learn.microsoft.com/en-us/azure/api-management/backends
For each backend:
Backends
→ foundry-primary
→ Settings
→ Circuit breaker settings
→ Add new
Recommended starting point for active-passive failover:
| Setting | Value |
|---|---|
| Rule name | foundry-breaker |
| Failure count | 1 |
| Failure interval | 1 minute |
| Failure status code range | 429-429 and optionally 500-599 |
| Trip duration | 1 minute |
| Check Retry-After header | True / Accept |
Use:
429-429
Failure count: 1
Accept Retry-After: true
Use:
429-429
500-599
Failure count: 1
Accept Retry-After: true
The trade-off is that with 500-599 and failure count 1, a single transient 500 can trip the primary. If that is too aggressive, use count 2 or 3, but then the current request may not fail over immediately.
Important limitation: APIM currently supports only one circuit breaker rule per backend, so you cannot have one threshold for 429 and a different threshold for 5xx on the same backend.
Reference: https://learn.microsoft.com/en-us/azure/api-management/backends
Now create a pool:
Backends
→ Load balancer
→ + Create new pool
Name:
foundry-model-pool
Add the two backends:
| Backend | Priority | Weight |
|---|---|---|
foundry-primary |
1 |
1 |
foundry-backup |
2 |
1 |
This gives you active-passive routing:
Use primary while healthy.
Use backup only when primary circuit is open.
APIM uses lower-priority backends only when all higher-priority backends are unavailable because circuit breaker rules are tripped.
Reference: https://learn.microsoft.com/en-us/azure/api-management/backends
At the API or operation level, set the backend to the pool.
<policies>
<inbound>
<base />
<!-- Route to the APIM backend pool, not directly to one Foundry endpoint -->
<set-backend-service backend-id="foundry-model-pool" />
<!-- Authenticate APIM to Foundry / Azure OpenAI using APIM managed identity -->
<authentication-managed-identity
resource="https://cognitiveservices.azure.com"
output-token-variable-name="managed-id-access-token"
ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
</set-header>
<!-- Optional: make sure clients cannot pass their own backend key through -->
<set-header name="api-key" exists-action="delete" />
</inbound>
<backend>
<!--
Retry gives the current request a chance to be retried through the pool.
Without this, the circuit breaker may help future requests, but the first failed
request can still return 429/5xx to the client.
-->
<retry condition="@(
context.Response != null &&
(
context.Response.StatusCode == 429 ||
context.Response.StatusCode >= 500
)
)"
count="2"
interval="1"
first-fast-retry="true">
<forward-request timeout="120" />
</retry>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>Why the retry is needed: the APIM retry policy re-executes its child policies while the retry condition remains true, and Microsoft’s Azure OpenAI/APIM sample uses retry around forward-request for 429 handling.
References:
- https://learn.microsoft.com/en-us/azure/api-management/retry-policy
- https://techcommunity.microsoft.com/blog/fasttrackforazureblog/using-azure-api-management-circuit-breaker-and-load-balancing-with-azure-openai-/4041003
For two backends, count="2" is a sensible starting point. Microsoft’s FastTrack sample also uses count="2" for the Azure OpenAI backend pool pattern.
This is the infrastructure shape.
param apimName string
param primaryUrl string
param backupUrl string
resource apim 'Microsoft.ApiManagement/service@2024-05-01' existing = {
name: apimName
}
resource foundryPrimary 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
parent: apim
name: 'foundry-primary'
properties: {
description: 'Primary Foundry model endpoint'
protocol: 'http'
url: primaryUrl
circuitBreaker: {
rules: [
{
name: 'foundry-primary-breaker'
acceptRetryAfter: true
tripDuration: 'PT1M'
failureCondition: {
count: 1
interval: 'PT1M'
statusCodeRanges: [
{
min: 429
max: 429
}
{
min: 500
max: 599
}
]
}
}
]
}
}
}
resource foundryBackup 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
parent: apim
name: 'foundry-backup'
properties: {
description: 'Backup Foundry model endpoint'
protocol: 'http'
url: backupUrl
circuitBreaker: {
rules: [
{
name: 'foundry-backup-breaker'
acceptRetryAfter: true
tripDuration: 'PT1M'
failureCondition: {
count: 1
interval: 'PT1M'
statusCodeRanges: [
{
min: 429
max: 429
}
{
min: 500
max: 599
}
]
}
}
]
}
}
}
resource foundryPool 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
parent: apim
name: 'foundry-model-pool'
properties: {
description: 'Priority-based Foundry model endpoint pool'
type: 'Pool'
pool: {
failureResponse: {
statusCode: 503
}
services: [
{
id: foundryPrimary.id
priority: 1
weight: 1
}
{
id: foundryBackup.id
priority: 2
weight: 1
}
]
}
}
}The backend resource schema supports circuitBreaker, pool, priorities, weights, and acceptRetryAfter.
Reference: https://learn.microsoft.com/en-us/azure/templates/microsoft.apimanagement/service/backends
Do not test by deleting the model deployment first. Deleting the deployment often produces a 404 or model/deployment error, which may not match your circuit breaker rule and is not representative of throttling.
Better tests:
Use a dev deployment with low quota, or temporarily point foundry-primary to a mock endpoint that returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 60Expected result:
1. Primary returns 429.
2. APIM circuit breaker opens for primary.
3. Retry runs.
4. Backend pool selects backup.
5. Client receives successful response from backup, assuming backup is healthy.
Temporarily point foundry-primary to a test endpoint returning:
HTTP/1.1 503 Service UnavailableExpected result:
Primary trips.
Retry re-enters the pool.
Backup handles the request.
Check:
APIM → Monitoring → Metrics
APIM → Logs / Application Insights
APIM → Backend response codes
APIM also emits Event Grid events when a backend circuit breaker trips or resets, which can be useful for alerting.
Reference: https://learn.microsoft.com/en-us/azure/api-management/backends
| Gotcha | Why it matters |
|---|---|
| Circuit breaker is not supported in Consumption tier | Use a supported APIM tier. |
| Circuit breaker and load balancing are approximate | APIM gateway instances do not perfectly synchronize breaker state across all instances. |
| Use retry with the pool | Circuit breaker helps future routing; retry helps the current failed request. |
| Same deployment name is strongly recommended | Avoid backup failure due to mismatched /deployments/{name} path. |
Accept Retry-After for 429 |
Foundry/Azure OpenAI may return long Retry-After; APIM should respect it. |
| Do not fail over to a different model version casually | It can change behaviour and break client assumptions. |
For a Sweden Central primary and East US 2 backup Foundry model setup:
Backend: foundry-primary
Priority: 1
Circuit breaker: 429 + 500-599
Failure count: 1
Trip duration: 1 minute
Accept Retry-After: true
Backend: foundry-backup
Priority: 2
Circuit breaker: 429 + 500-599
Failure count: 1
Trip duration: 1 minute
Accept Retry-After: true
Backend pool: foundry-model-pool
foundry-primary → priority 1
foundry-backup → priority 2
API policy:
set-backend-service → foundry-model-pool
authentication-managed-identity → cognitiveservices
retry on 429 and 5xx
That gives you the cleanest APIM-native circuit breaker pattern for Foundry model endpoint failover.