Skip to content

Instantly share code, notes, and snippets.

@ManojNair
Created May 5, 2026 00:15
Show Gist options
  • Select an option

  • Save ManojNair/afddf7b7f34ad5b05bd76879f5660c5c to your computer and use it in GitHub Desktop.

Select an option

Save ManojNair/afddf7b7f34ad5b05bd76879f5660c5c to your computer and use it in GitHub Desktop.
APIM Circuit Breaker

APIM Circuit Breaker Pattern for Foundry Model Endpoints

Below is the practical active-passive APIM circuit breaker pattern for two Microsoft Foundry model endpoints, for example:

  • Primary: Sweden Central
  • Backup: East US 2
  • APIM: front door / AI Gateway

The recommended APIM shape is:

Client
  ↓
APIM API
  ↓ set-backend-service
APIM backend pool
  ├─ foundry-primary   priority 1
  └─ foundry-backup    priority 2

Use backend pool + circuit breaker + retry together. APIM backend pools support round-robin, weighted, and priority-based routing. With priority-based routing, lower-priority backends are used only when higher-priority backends are unavailable because their circuit breaker has tripped.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends


1. Keep the model contract identical

For failover, the primary and backup should use the same model family, version, and API contract. Microsoft’s guidance is to avoid failing over from one model/version to another because that can change client behavior unexpectedly.

Reference: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/azure-openai-gateway-multi-backend

For Azure OpenAI-compatible Foundry deployments, I strongly recommend using the same deployment name in both regions.

Example:

Sweden Central Foundry deployment: gpt-5-prod
East US 2 Foundry deployment:     gpt-5-prod

This matters because Azure OpenAI-style calls often include the deployment name in the path:

POST /openai/deployments/gpt-5-prod/chat/completions

If the backup deployment has a different name, APIM may fail over correctly but the backup endpoint can still reject the request because the path references a deployment name that does not exist there.


2. Import or create the Foundry API in APIM

You can import Microsoft Foundry model endpoints directly into API Management. The Foundry import wizard can configure operations, a backend resource, set-backend-service, and managed identity authentication to the backend.

Reference: https://learn.microsoft.com/en-us/azure/api-management/azure-ai-foundry-api

In APIM:

APIM instance
  → APIs
  → + Add API
  → Microsoft Foundry
  → Select Foundry resource / model deployment
  → Choose client compatibility

Choose the compatibility option based on how clients call the endpoint:

Option Use when
Azure OpenAI You are exposing Azure OpenAI in Foundry Models with /openai/deployments/... paths
Azure AI You want Azure AI Model Inference API style with /models/...
Azure OpenAI v1 You want the newer Azure OpenAI v1 endpoint style

APIM supports these Foundry import modes.


3. Enable APIM managed identity and grant backend access

Use APIM managed identity rather than API keys where possible.

In APIM:

APIM instance
  → Managed identities
  → System assigned
  → On

Then assign the APIM managed identity access to both Foundry/Azure OpenAI resources.

For Azure OpenAI in Foundry Models, assign:

Cognitive Services OpenAI User

APIM can use authentication-managed-identity to obtain a token for https://cognitiveservices.azure.com, and Microsoft documents this as the managed identity pattern for Azure OpenAI / Foundry model deployments.

Reference: https://learn.microsoft.com/en-us/azure/api-management/api-management-authenticate-authorize-ai-apis


4. Create two APIM backend entities

In APIM:

APIM instance
  → APIs
  → Backends
  → + Add

Create:

foundry-primary
foundry-backup

Example backend URLs:

https://<primary-foundry-or-aoai-endpoint>.services.ai.azure.com/openai
https://<backup-foundry-or-aoai-endpoint>.services.ai.azure.com/openai

or for older Azure OpenAI resource endpoints:

https://<primary-resource>.openai.azure.com/openai
https://<backup-resource>.openai.azure.com/openai

Use the same path shape that your imported API expects.


5. Configure circuit breaker on each backend

For model endpoints, the most important status code is usually:

429 Too Many Requests

Azure OpenAI / Foundry model endpoints may return 429 with a Retry-After header. Microsoft specifically recommends circuit breaker rules that handle 429 and accept the Retry-After duration.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends

For each backend:

Backends
  → foundry-primary
  → Settings
  → Circuit breaker settings
  → Add new

Recommended starting point for active-passive failover:

Setting Value
Rule name foundry-breaker
Failure count 1
Failure interval 1 minute
Failure status code range 429-429 and optionally 500-599
Trip duration 1 minute
Check Retry-After header True / Accept

For capacity failover only

Use:

429-429
Failure count: 1
Accept Retry-After: true

For capacity + regional outage failover

Use:

429-429
500-599
Failure count: 1
Accept Retry-After: true

The trade-off is that with 500-599 and failure count 1, a single transient 500 can trip the primary. If that is too aggressive, use count 2 or 3, but then the current request may not fail over immediately.

Important limitation: APIM currently supports only one circuit breaker rule per backend, so you cannot have one threshold for 429 and a different threshold for 5xx on the same backend.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends


6. Create a backend pool

Now create a pool:

Backends
  → Load balancer
  → + Create new pool

Name:

foundry-model-pool

Add the two backends:

Backend Priority Weight
foundry-primary 1 1
foundry-backup 2 1

This gives you active-passive routing:

Use primary while healthy.
Use backup only when primary circuit is open.

APIM uses lower-priority backends only when all higher-priority backends are unavailable because circuit breaker rules are tripped.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends


7. Update the API policy to use the pool

At the API or operation level, set the backend to the pool.

<policies>
  <inbound>
    <base />

    <!-- Route to the APIM backend pool, not directly to one Foundry endpoint -->
    <set-backend-service backend-id="foundry-model-pool" />

    <!-- Authenticate APIM to Foundry / Azure OpenAI using APIM managed identity -->
    <authentication-managed-identity
        resource="https://cognitiveservices.azure.com"
        output-token-variable-name="managed-id-access-token"
        ignore-error="false" />

    <set-header name="Authorization" exists-action="override">
      <value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
    </set-header>

    <!-- Optional: make sure clients cannot pass their own backend key through -->
    <set-header name="api-key" exists-action="delete" />
  </inbound>

  <backend>
    <!-- 
      Retry gives the current request a chance to be retried through the pool.
      Without this, the circuit breaker may help future requests, but the first failed
      request can still return 429/5xx to the client.
    -->
    <retry condition="@(
        context.Response != null &&
        (
          context.Response.StatusCode == 429 ||
          context.Response.StatusCode &gt;= 500
        )
      )"
      count="2"
      interval="1"
      first-fast-retry="true">

      <forward-request timeout="120" />
    </retry>
  </backend>

  <outbound>
    <base />
  </outbound>

  <on-error>
    <base />
  </on-error>
</policies>

Why the retry is needed: the APIM retry policy re-executes its child policies while the retry condition remains true, and Microsoft’s Azure OpenAI/APIM sample uses retry around forward-request for 429 handling.

References:

For two backends, count="2" is a sensible starting point. Microsoft’s FastTrack sample also uses count="2" for the Azure OpenAI backend pool pattern.


8. Optional Bicep version

This is the infrastructure shape.

param apimName string
param primaryUrl string
param backupUrl string

resource apim 'Microsoft.ApiManagement/service@2024-05-01' existing = {
  name: apimName
}

resource foundryPrimary 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
  parent: apim
  name: 'foundry-primary'
  properties: {
    description: 'Primary Foundry model endpoint'
    protocol: 'http'
    url: primaryUrl
    circuitBreaker: {
      rules: [
        {
          name: 'foundry-primary-breaker'
          acceptRetryAfter: true
          tripDuration: 'PT1M'
          failureCondition: {
            count: 1
            interval: 'PT1M'
            statusCodeRanges: [
              {
                min: 429
                max: 429
              }
              {
                min: 500
                max: 599
              }
            ]
          }
        }
      ]
    }
  }
}

resource foundryBackup 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
  parent: apim
  name: 'foundry-backup'
  properties: {
    description: 'Backup Foundry model endpoint'
    protocol: 'http'
    url: backupUrl
    circuitBreaker: {
      rules: [
        {
          name: 'foundry-backup-breaker'
          acceptRetryAfter: true
          tripDuration: 'PT1M'
          failureCondition: {
            count: 1
            interval: 'PT1M'
            statusCodeRanges: [
              {
                min: 429
                max: 429
              }
              {
                min: 500
                max: 599
              }
            ]
          }
        }
      ]
    }
  }
}

resource foundryPool 'Microsoft.ApiManagement/service/backends@2025-03-01-preview' = {
  parent: apim
  name: 'foundry-model-pool'
  properties: {
    description: 'Priority-based Foundry model endpoint pool'
    type: 'Pool'
    pool: {
      failureResponse: {
        statusCode: 503
      }
      services: [
        {
          id: foundryPrimary.id
          priority: 1
          weight: 1
        }
        {
          id: foundryBackup.id
          priority: 2
          weight: 1
        }
      ]
    }
  }
}

The backend resource schema supports circuitBreaker, pool, priorities, weights, and acceptRetryAfter.

Reference: https://learn.microsoft.com/en-us/azure/templates/microsoft.apimanagement/service/backends


9. Test the failover

Do not test by deleting the model deployment first. Deleting the deployment often produces a 404 or model/deployment error, which may not match your circuit breaker rule and is not representative of throttling.

Better tests:

Test 1: Simulate 429

Use a dev deployment with low quota, or temporarily point foundry-primary to a mock endpoint that returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

Expected result:

1. Primary returns 429.
2. APIM circuit breaker opens for primary.
3. Retry runs.
4. Backend pool selects backup.
5. Client receives successful response from backup, assuming backup is healthy.

Test 2: Simulate 5xx

Temporarily point foundry-primary to a test endpoint returning:

HTTP/1.1 503 Service Unavailable

Expected result:

Primary trips.
Retry re-enters the pool.
Backup handles the request.

Test 3: Observe APIM

Check:

APIM → Monitoring → Metrics
APIM → Logs / Application Insights
APIM → Backend response codes

APIM also emits Event Grid events when a backend circuit breaker trips or resets, which can be useful for alerting.

Reference: https://learn.microsoft.com/en-us/azure/api-management/backends


10. Important gotchas

Gotcha Why it matters
Circuit breaker is not supported in Consumption tier Use a supported APIM tier.
Circuit breaker and load balancing are approximate APIM gateway instances do not perfectly synchronize breaker state across all instances.
Use retry with the pool Circuit breaker helps future routing; retry helps the current failed request.
Same deployment name is strongly recommended Avoid backup failure due to mismatched /deployments/{name} path.
Accept Retry-After for 429 Foundry/Azure OpenAI may return long Retry-After; APIM should respect it.
Do not fail over to a different model version casually It can change behaviour and break client assumptions.

Recommended setup

For a Sweden Central primary and East US 2 backup Foundry model setup:

Backend: foundry-primary
  Priority: 1
  Circuit breaker: 429 + 500-599
  Failure count: 1
  Trip duration: 1 minute
  Accept Retry-After: true

Backend: foundry-backup
  Priority: 2
  Circuit breaker: 429 + 500-599
  Failure count: 1
  Trip duration: 1 minute
  Accept Retry-After: true

Backend pool: foundry-model-pool
  foundry-primary → priority 1
  foundry-backup  → priority 2

API policy:
  set-backend-service → foundry-model-pool
  authentication-managed-identity → cognitiveservices
  retry on 429 and 5xx

That gives you the cleanest APIM-native circuit breaker pattern for Foundry model endpoint failover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment