APIM Circuit Breaker Pattern for Foundry Model Endpoints

Below is the practical active-passive APIM circuit breaker pattern for two Microsoft Foundry model endpoints, for example:

Primary: Sweden Central
Backup: East US 2
APIM: front door / AI Gateway

The recommended APIM shape is:

Client
  ↓
APIM API
  ↓ set-backend-service
APIM backend pool
  ├─ foundry-primary   priority 1
  └─ foundry-backup    priority 2

Use backend pool + circuit breaker + retry together. APIM backend pools support round-robin, weighted, and priority-based routing. With priority-based routing, lower-priority backends are used only when higher-priority backends are unavailable because their circuit breaker has tripped.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends

1. Keep the model contract identical

For failover, the primary and backup should use the same model family, version, and API contract. Microsoft’s guidance is to avoid failing over from one model/version to another because that can change client behavior unexpectedly.

Reference: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/azure-openai-gateway-multi-backend

For Azure OpenAI-compatible Foundry deployments, I strongly recommend using the same deployment name in both regions.

Example:

Sweden Central Foundry deployment: gpt-5-prod
East US 2 Foundry deployment:     gpt-5-prod

This matters because Azure OpenAI-style calls often include the deployment name in the path:

POST /openai/deployments/gpt-5-prod/chat/completions

If the backup deployment has a different name, APIM may fail over correctly but the backup endpoint can still reject the request because the path references a deployment name that does not exist there.

2. Import or create the Foundry API in APIM

You can import Microsoft Foundry model endpoints directly into API Management. The Foundry import wizard can configure operations, a backend resource, set-backend-service, and managed identity authentication to the backend.

Reference: https://learn.microsoft.com/en-us/azure/api-management/azure-ai-foundry-api

In APIM:

APIM instance
  → APIs
  → + Add API
  → Microsoft Foundry
  → Select Foundry resource / model deployment
  → Choose client compatibility

Choose the compatibility option based on how clients call the endpoint:

Option	Use when
Azure OpenAI	You are exposing Azure OpenAI in Foundry Models with `/openai/deployments/...` paths
Azure AI	You want Azure AI Model Inference API style with `/models/...`
Azure OpenAI v1	You want the newer Azure OpenAI v1 endpoint style

APIM supports these Foundry import modes.

3. Enable APIM managed identity and grant backend access

Use APIM managed identity rather than API keys where possible.

In APIM:

APIM instance
  → Managed identities
  → System assigned
  → On

Then assign the APIM managed identity access to both Foundry/Azure OpenAI resources.

For Azure OpenAI in Foundry Models, assign:

Cognitive Services OpenAI User

APIM can use authentication-managed-identity to obtain a token for https://cognitiveservices.azure.com, and Microsoft documents this as the managed identity pattern for Azure OpenAI / Foundry model deployments.

Reference: https://learn.microsoft.com/en-us/azure/api-management/api-management-authenticate-authorize-ai-apis

4. Create two APIM backend entities

In APIM:

APIM instance
  → APIs
  → Backends
  → + Add

Create:

foundry-primary
foundry-backup

Example backend URLs:

https://<primary-foundry-or-aoai-endpoint>.services.ai.azure.com/openai
https://<backup-foundry-or-aoai-endpoint>.services.ai.azure.com/openai

or for older Azure OpenAI resource endpoints:

https://<primary-resource>.openai.azure.com/openai
https://<backup-resource>.openai.azure.com/openai

Use the same path shape that your imported API expects.

5. Configure circuit breaker on each backend

For model endpoints, the most important status code is usually:

429 Too Many Requests

Azure OpenAI / Foundry model endpoints may return 429 with a Retry-After header. Microsoft specifically recommends circuit breaker rules that handle 429 and accept the Retry-After duration.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends

For each backend:

Backends
  → foundry-primary
  → Settings
  → Circuit breaker settings
  → Add new

Recommended starting point for active-passive failover:

Setting	Value
Rule name	`foundry-breaker`
Failure count	`1`
Failure interval	`1 minute`
Failure status code range	`429-429` and optionally `500-599`
Trip duration	`1 minute`
Check Retry-After header	`True / Accept`

For capacity failover only

Use:

429-429
Failure count: 1
Accept Retry-After: true

For capacity + regional outage failover

Use:

429-429
500-599
Failure count: 1
Accept Retry-After: true

The trade-off is that with 500-599 and failure count 1, a single transient 500 can trip the primary. If that is too aggressive, use count 2 or 3, but then the current request may not fail over immediately.

Important limitation: APIM currently supports only one circuit breaker rule per backend, so you cannot have one threshold for 429 and a different threshold for 5xx on the same backend.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends

6. Create a backend pool

Now create a pool:

Backends
  → Load balancer
  → + Create new pool

Name:

foundry-model-pool

Add the two backends:

Backend	Priority	Weight
`foundry-primary`	`1`	`1`
`foundry-backup`	`2`	`1`

This gives you active-passive routing:

Use primary while healthy.
Use backup only when primary circuit is open.

APIM uses lower-priority backends only when all higher-priority backends are unavailable because circuit breaker rules are tripped.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends

7. Update the API policy to use the pool

At the API or operation level, set the backend to the pool.

<policies>
  <inbound>
    <base />

    <!-- Route to the APIM backend pool, not directly to one Foundry endpoint -->
    <set-backend-service backend-id="foundry-model-pool" />

    <!-- Authenticate APIM to Foundry / Azure OpenAI using APIM managed identity -->
    <authentication-managed-identity
        resource="https://cognitiveservices.azure.com"
        output-token-variable-name="managed-id-access-token"
        ignore-error="false" />

    <set-header name="Authorization" exists-action="override">
      <value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
    </set-header>

    <!-- Optional: make sure clients cannot pass their own backend key through -->
    <set-header name="api-key" exists-action="delete" />
  </inbound>

  <backend>
    <!-- 
      Retry gives the current request a chance to be retried through the pool.
      Without this, the circuit breaker may help future requests, but the first failed
      request can still return 429/5xx to the client.
    -->
    <retry condition="@(
        context.Response != null &&
        (
          context.Response.StatusCode == 429 ||
          context.Response.StatusCode &gt;= 500
        )
      )"
      count="2"
      interval="1"
      first-fast-retry="true">

      <forward-request timeout="120" />
    </retry>
  </backend>

  <outbound>
    <base />
  </outbound>

  <on-error>
    <base />
  </on-error>
</policies>

Why the retry is needed: the APIM retry policy re-executes its child policies while the retry condition remains true, and Microsoft’s Azure OpenAI/APIM sample uses retry around forward-request for 429 handling.

References:

For two backends, count="2" is a sensible starting point. Microsoft’s FastTrack sample also uses count="2" for the Azure OpenAI backend pool pattern.

8. Optional Bicep version

This is the infrastructure shape.

param apimName string
param primaryUrl string
param backupUrl string

resource apim 'Microsoft.ApiManagement/service@2024-05-01' existing = {
  name: apimName
}

resource foundryPrimary 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
  parent: apim
  name: 'foundry-primary'
  properties: {
    description: 'Primary Foundry model endpoint'
    protocol: 'http'
    url: primaryUrl
    circuitBreaker: {
      rules: [
        {
          name: 'foundry-primary-breaker'
          acceptRetryAfter: true
          tripDuration: 'PT1M'
          failureCondition: {
            count: 1
            interval: 'PT1M'
            statusCodeRanges: [
              {
                min: 429
                max: 429
              }
              {
                min: 500
                max: 599
              }
            ]
          }
        }
      ]
    }
  }
}

resource foundryBackup 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
  parent: apim
  name: 'foundry-backup'
  properties: {
    description: 'Backup Foundry model endpoint'
    protocol: 'http'
    url: backupUrl
    circuitBreaker: {
      rules: [
        {
          name: 'foundry-backup-breaker'
          acceptRetryAfter: true
          tripDuration: 'PT1M'
          failureCondition: {
            count: 1
            interval: 'PT1M'
            statusCodeRanges: [
              {
                min: 429
                max: 429
              }
              {
                min: 500
                max: 599
              }
            ]
          }
        }
      ]
    }
  }
}

resource foundryPool 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
  parent: apim
  name: 'foundry-model-pool'
  properties: {
    description: 'Priority-based Foundry model endpoint pool'
    type: 'Pool'
    pool: {
      failureResponse: {
        statusCode: 503
      }
      services: [
        {
          id: foundryPrimary.id
          priority: 1
          weight: 1
        }
        {
          id: foundryBackup.id
          priority: 2
          weight: 1
        }
      ]
    }
  }
}

The backend resource schema supports circuitBreaker, pool, priorities, weights, and acceptRetryAfter.

Reference: https://learn.microsoft.com/en-us/azure/templates/microsoft.apimanagement/service/backends

9. Test the failover

Do not test by deleting the model deployment first. Deleting the deployment often produces a 404 or model/deployment error, which may not match your circuit breaker rule and is not representative of throttling.

Better tests:

Test 1: Simulate 429

Use a dev deployment with low quota, or temporarily point foundry-primary to a mock endpoint that returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

Expected result:

1. Primary returns 429.
2. APIM circuit breaker opens for primary.
3. Retry runs.
4. Backend pool selects backup.
5. Client receives successful response from backup, assuming backup is healthy.

Test 2: Simulate 5xx

Temporarily point foundry-primary to a test endpoint returning:

HTTP/1.1 503 Service Unavailable

Expected result:

Primary trips.
Retry re-enters the pool.
Backup handles the request.

Test 3: Observe APIM

Check:

APIM → Monitoring → Metrics
APIM → Logs / Application Insights
APIM → Backend response codes

APIM also emits Event Grid events when a backend circuit breaker trips or resets, which can be useful for alerting.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends

10. Important gotchas

Gotcha	Why it matters
Circuit breaker is not supported in Consumption tier	Use a supported APIM tier.
Circuit breaker and load balancing are approximate	APIM gateway instances do not perfectly synchronize breaker state across all instances.
Use retry with the pool	Circuit breaker helps future routing; retry helps the current failed request.
Same deployment name is strongly recommended	Avoid backup failure due to mismatched `/deployments/{name}` path.
Accept `Retry-After` for 429	Foundry/Azure OpenAI may return long `Retry-After`; APIM should respect it.
Do not fail over to a different model version casually	It can change behaviour and break client assumptions.

Recommended setup

For a Sweden Central primary and East US 2 backup Foundry model setup:

Backend: foundry-primary
  Priority: 1
  Circuit breaker: 429 + 500-599
  Failure count: 1
  Trip duration: 1 minute
  Accept Retry-After: true

Backend: foundry-backup
  Priority: 2
  Circuit breaker: 429 + 500-599
  Failure count: 1
  Trip duration: 1 minute
  Accept Retry-After: true

Backend pool: foundry-model-pool
  foundry-primary → priority 1
  foundry-backup  → priority 2

API policy:
  set-backend-service → foundry-model-pool
  authentication-managed-identity → cognitiveservices
  retry on 429 and 5xx

That gives you the cleanest APIM-native circuit breaker pattern for Foundry model endpoint failover.

ManojNair/apimcircuitbreaker.md

Select an option

No results found

Select an option

No results found

APIM Circuit Breaker Pattern for Foundry Model Endpoints

1. Keep the model contract identical

2. Import or create the Foundry API in APIM

3. Enable APIM managed identity and grant backend access

4. Create two APIM backend entities

5. Configure circuit breaker on each backend

For capacity failover only

For capacity + regional outage failover

6. Create a backend pool

7. Update the API policy to use the pool

8. Optional Bicep version

9. Test the failover

Test 1: Simulate 429

Test 2: Simulate 5xx

Test 3: Observe APIM

10. Important gotchas

Recommended setup