Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save cardoe/e5c95ba4d0f77d790ad3109b994ad733 to your computer and use it in GitHub Desktop.

Select an option

Save cardoe/e5c95ba4d0f77d790ad3109b994ad733 to your computer and use it in GitHub Desktop.

Design Specification Status: Draft Depends on: [[Multi-Fabric VXLAN Support]]


Table of Contents

  1. Overview
  2. Problem Statement
  3. Goals and Non-Goals
  4. Background and Prior Art
  5. Detailed Design
  6. End-to-End Data Flow
  7. Patch Inventory
  8. Backward Compatibility
  9. Known Limitations and Future Work
  10. References

1. Overview

This document describes a design for enabling EVPN BGP Type-5 (IP Prefix) route advertisement for tenant networks within the multi-fabric VXLAN deployment described in the companion design. It allows a tenant to connect their Neutron network or router to a BGP VPN scoped to a specific physical fabric, with subnet prefixes advertised as EVPN Type-5 routes into that fabric’s BGP EVPN control plane.

The design builds on networking-bgpvpn as the Neutron API layer, extends the network_segment_range-style VNI pool mechanism from the prior design to cover L3VNIs, and defines a pluggable backend driver interface (implementing BGPVPNDriverBase) that realizes the VRF and BGP configuration on the physical fabric switches.


2. Problem Statement

The multi-fabric VXLAN design provides L2 isolation and Nova scheduling constraints per fabric. It does not address L3 reachability between tenant networks and the broader routing domain. Specifically:

  • Tenant subnet prefixes are not advertised into the fabric’s BGP EVPN control plane.
  • There is no mechanism to associate a Neutron router or network with a VRF on the physical fabric switches.
  • L3VNIs — required for symmetric IRB and EVPN Type-5 prefix advertisement — have no allocation mechanism in Neutron.
  • There is no pluggable driver interface in the existing stack for a switch-backed BGP VPN realization that does not depend on OVN or a software BGP stack (FRR/bagpipe).

3. Goals and Non-Goals

3.1 Goals

  • Allow an admin to create a BGPVPN resource scoped to a specific fabric, with an L3VNI automatically allocated from a fabric-scoped pool at creation time.
  • Allow a tenant (member role) to associate that BGPVPN to a Neutron router or network, triggering realization of the VRF and Type-5 route advertisement on the fabric switches.
  • Use symmetric IRB throughout: the same L3VNI is used across all fabrics participating in a given BGPVPN, since all fabrics are visible within the same Neutron instance.
  • Define a pluggable BGPVPNDriverBase backend that abstracts switch realization, keeping Neutron-side logic switch-agnostic.
  • Reuse the physical_network-scoped VNI pool mechanism from the prior design for L3VNI allocation.
  • Require no database schema changes beyond what networking-bgpvpn already defines, in v1.

3.2 Non-Goals (v1)

  • OVN or FRR involvement. This design targets hardware switch realization only.
  • Per-fabric RT/RD configuration. A single symmetric Route Target is used per BGPVPN in v1.
  • Automated RT/RD assignment from a pool. RT values are admin-specified at BGPVPN creation in v1.
  • Multi-fabric spanning within a single BGPVPN. Each BGPVPN is scoped to one fabric in v1.
  • L2 BGPVPN type. Only type=l3 is in scope.

4. Background and Prior Art

4.1 Multi-Fabric VXLAN Design (Companion Document)

The prior design introduces:

  • physical_network-scoped VXLAN network_segment_range entries, giving each fabric a named L2VNI pool.
  • A fabric attribute on Neutron networks, selecting the L2VNI pool and driving Nova placement trait injection.

This design reuses the same physical_network scoping and naming convention for L3VNI pools, and the same fabric attribute as the coordination key between networks/routers and their BGPVPNs.

4.2 networking-bgpvpn

networking-bgpvpn provides an API and framework to interconnect L3VPNs and E-VPNs with Neutron resources — Networks, Routers, and Ports — supporting both RFC 4364 BGP/MPLS IP VPNs and RFC 7432 E-VPN.

The relevant API objects are:

  • BGPVPN resource: defines the VPN properties — type (l2/l3), route_targets, import_targets, export_targets, route_distinguishers, and vni. The vni field sets the VXLAN Network Identifier to be used for this BGPVPN when VXLAN encapsulation is used. This is the L3VNI in our context.
  • Network association: attaches a BGPVPN to a Neutron network. For type L3, all subnets bound to the network will be interconnected with the BGP VPN.
  • Router association: attaches a BGPVPN to a Neutron router. All subnets bound to the router will be interconnected with the BGPVPN. Only supported for type=l3.

The BGPVPNDriverBase interface defines callbacks for all CRUD operations on these resources, which backend drivers implement to realize the configuration on their respective forwarding planes.

4.3 networking-generic-switch Analogy

networking-generic-switch implements the ML2 MechanismDriver interface and provides a pluggable backend for physical switch port operations, with switch-specific drivers (Cisco IOS, NX-OS, Juniper, etc.) behind a common interface. This design follows the same pattern: a common BGPVPNDriverBase implementation handles Neutron-side logic (pool allocation, validation, DB), and delegates switch realization to a pluggable backend that can be implemented per-platform.

4.4 EVPN Type-5 and Symmetric IRB

EVPN Type-5 (IP Prefix) routes advertise subnet prefixes into the BGP EVPN control plane without requiring a MAC address, making them the appropriate route type for inter-subnet and inter-fabric L3 reachability. Symmetric IRB uses a dedicated L3VNI for routed traffic between VTEPs: ingress VTEP encapsulates routed packets with the L3VNI, and the egress VTEP decapsulates and routes locally. This requires the same L3VNI to be configured consistently across all VTEPs participating in a given VRF, which is achievable here because all fabrics are managed by the same Neutron instance.


5. Detailed Design

5.1 L3VNI Pool: network_segment_range Extension for L3VNI

L3VNI allocation reuses the network_segment_range extension from the prior design, using a naming convention on the physical_network value to distinguish L3VNI ranges from L2VNI ranges. No new range type is introduced.

The convention is that L3VNI ranges use a physical_network value suffixed with :l3, e.g.:

POST /v2.0/network_segment_ranges

{
  "network_segment_range": {
    "name": "fabric-a-l3vni",
    "network_type": "vxlan",
    "physical_network": "fabric-a:l3",
    "minimum": 3000001,
    "maximum": 3000999,
    "shared": false
  }
}

The shared: false here is intentional: L3VNI ranges are admin-managed and not directly accessible to tenants. Allocation from these ranges is performed by the BGPVPN service plugin, not by the type driver directly.

The allocator follows the same flat-pool semantics from the prior design: a single global VNI uniqueness table, with the physical_network value as a selection filter. The :l3 suffix ensures L3VNI ranges are never selected by the L2VNI allocation path and vice versa.

Note: The :l3 suffix is a convention enforced by the BGPVPN service plugin at validation time. The network_segment_range extension itself does not distinguish L2 from L3 VNI ranges — it stores whatever physical_network value is provided.

5.2 BGPVPN Resource: fabric Attribute and Auto-Allocation

A new fabric attribute is added to the BGPVPN resource, parallel to the fabric attribute on networks from the prior design:

Attribute Type Access Description
fabric string (nullable) R/W on create; R after Names the physical fabric this BGPVPN is scoped to. Maps to a physical_network value on the L2VNI ranges, and physical_network: <fabric>:l3 on the L3VNI range. Admin-only on create.

On bgpvpn create with fabric set and no explicit vni:

  • The service plugin looks up the network_segment_range where physical_network="<fabric>:l3".
  • A VNI is allocated from that range using the same flat-pool allocator as the prior design.
  • The allocated VNI is stored as the vni field on the BGPVPN object.
  • If the range is exhausted or does not exist, the API returns HTTP 409. No fallback.

On bgpvpn create with fabric set and an explicit vni:

  • The admin-specified vni is validated against the <fabric>:l3 range bounds.
  • If within bounds, it is allocated directly (marked used in the pool). If out of bounds or already allocated, HTTP 409.

On bgpvpn create without fabric:

  • Behavior is identical to current networking-bgpvpn: vni is either admin-specified or left null (backend-managed). No pool allocation occurs.

On bgpvpn delete:

  • The allocated L3VNI is returned to the pool.

5.3 Fabric Consistency Validation

When a tenant creates a network or router association to a BGPVPN, the service plugin validates that the associated network (or all networks attached to the associated router) have a fabric attribute matching the BGPVPN’s fabric. If any network’s fabric does not match, the association is rejected with HTTP 409 and a descriptive error.

This validation ensures that a BGPVPN scoped to fabric-a is never associated to a network on fabric-b, preventing misconfiguration at the API layer before any switch realization occurs.

For router associations, all networks currently attached to the router are validated at association time. Networks attached to the router after the association is created are also validated at attach time (Neutron subnet-router interface add path).

5.4 Route Target and Route Distinguisher Handling

In v1, RT assignment is admin-specified at BGPVPN creation time using the existing route_targets field. A single symmetric RT (used for both import and export) is the expected v1 configuration:

POST /v2.0/bgpvpn/bgpvpns

{
  "bgpvpn": {
    "name": "tenant-a-fabric-a",
    "type": "l3",
    "fabric": "fabric-a",
    "route_targets": ["4200420008:10001"],
    "route_distinguishers": ["10.100.20.1:10001"]
  }
}

The route_distinguishers field provides a hint to the backend for RD assignment per VTEP. When not specified, the backend driver is responsible for deriving per-VTEP RDs (e.g. <loopback-ip>:<index>), which is the standard FRR/NX-OS auto-RD behavior.

Auto-allocation of RT values from a pool is a known gap and is deferred to a future design iteration. See Section 9.

5.5 BGPVPNDriverBase: Pluggable Backend Interface

The existing BGPVPNDriverBase interface in networking-bgpvpn already defines the correct callback surface. No new interface is required. The backend driver receives fully-populated objects including the allocated vni, fabric, route_targets, and route_distinguishers.

The relevant callbacks for this design are:

Callback Trigger Expected Backend Action
create_bgpvpn Admin creates BGPVPN (L3VNI already allocated by service plugin) Optionally pre-provision VRF skeleton on fabric switches if desired; otherwise no-op until association.
delete_bgpvpn Admin deletes BGPVPN Remove VRF from fabric switches; withdraw all Type-5 routes.
create_router_association Tenant associates BGPVPN to router Create VRF with L3VNI on relevant fabric switches; configure RD/RT; advertise Type-5 routes for all subnets attached to router.
delete_router_association Tenant removes router association Withdraw Type-5 routes for router’s subnets; remove VRF if no remaining associations.
create_network_association Tenant associates BGPVPN to network Create VRF with L3VNI on relevant fabric switches; advertise Type-5 routes for network’s subnets.
delete_network_association Tenant removes network association Withdraw Type-5 routes for network’s subnets; remove VRF if no remaining associations.

The backend driver is responsible for determining which physical switches are relevant for a given association. The mechanism for this (e.g. querying Ironic node-to-switch mappings, a topology database, or Nautobot) is implementation-defined and outside the scope of this design. The Neutron-side plugin provides the VNI, fabric name, RT, RD hints, and the set of subnet CIDRs to advertise — the driver handles all switch interaction.

Switch configuration is managed via templating in the concrete driver implementation. The interface is intentionally agnostic to the templating mechanism (RESTCONF, NETCONF, SSH, Ansible, Nautobot Jobs) so that different operators can provide their own backend.

5.6 BGP AS Number

The BGP AS number for the EVPN peering between the OpenStack gateway nodes and the fabric route reflectors is configured per-deployment in the backend driver configuration, not stored as a per-BGPVPN attribute. This matches the behavior described in the ovn-bgp-agent and is appropriate because all BGPVPNs on a given fabric share the same underlay BGP sessions.

A separate AS number from the fabric underlay AS is recommended to keep tenant VPN BGP sessions segregated from the fabric IGP/underlay BGP. The specific AS value is driver-configured.


6. End-to-End Data Flow

  1. Admin configures L3VNI pool for each fabric via POST /v2.0/network_segment_ranges with physical_network="fabric-a:l3".

  2. Admin creates BGPVPN scoped to a fabric:

    POST /v2.0/bgpvpn/bgpvpns
    { "fabric": "fabric-a", "type": "l3", "route_targets": ["4200420008:10001"] }
    

    Service plugin allocates L3VNI from the fabric-a:l3 range and stores it as vni on the BGPVPN object.

  3. Admin associates BGPVPN to tenant project (existing networking-bgpvpn tenant_id scoping or sharing mechanism).

  4. Tenant creates a network with fabric=fabric-a (per the prior design). L2VNI is allocated from the fabric-a range.

  5. Tenant creates a router and attaches the network to it via a subnet interface.

  6. Tenant creates a router association:

    POST /v2.0/bgpvpn/bgpvpns/{bgpvpn_id}/router_associations
    { "router_id": "<router-uuid>" }
    

    Service plugin validates that all networks on the router have fabric=fabric-a. On success, the create_router_association callback is invoked on the backend driver with the full BGPVPN object (including vni, route_targets, route_distinguishers) and the router’s subnet CIDRs.

  7. Backend driver realizes the VRF on the relevant fabric switches: creates the VRF with the L3VNI, sets RD/RT, configures BGP EVPN to advertise Type-5 routes for the subnet CIDRs.

  8. Type-5 routes are advertised into the fabric’s BGP EVPN control plane. Remote VTEPs (other bare metal nodes, gateway devices) import the routes and can route traffic to the tenant subnets.

  9. On association delete, the backend driver withdraws Type-5 routes and removes the VRF from the switches. The L3VNI remains allocated to the BGPVPN until the BGPVPN itself is deleted.


7. Patch Inventory

Repository Component Change Summary
openstack/neutron network_segment_range extension No change required; L3VNI ranges use existing extension with :l3 physical_network convention
openstack/networking-bgpvpn BGPVPN API extension Add fabric attribute to BGPVPN resource; admin-only on create
openstack/networking-bgpvpn BGPVPN service plugin On bgpvpn create with fabric set: allocate L3VNI from <fabric>:l3 range; validate vni if admin-specified; return L3VNI to pool on delete
openstack/networking-bgpvpn Network/router association validation Validate fabric consistency between BGPVPN and associated network/router at association create time
openstack/networking-bgpvpn BGPVPNDriverBase No interface changes required; existing callbacks receive populated fabric and vni fields
rackerlabs/understack New networking-bgpvpn backend driver Implement BGPVPNDriverBase for the target switch platform; handle VRF creation, RD/RT config, Type-5 route advertisement via switch templating
rackerlabs/understack Deployment config Configure L3VNI ranges per fabric; configure backend driver AS number and switch inventory source

8. Backward Compatibility

  • All changes to networking-bgpvpn are additive. Existing BGPVPN deployments without fabric set continue to function unchanged.
  • BGPVPNs created without fabric do not interact with the L3VNI pool and retain existing behavior (admin-specified vni or backend-managed).
  • The network_segment_range changes are additive: the :l3 suffix convention is only enforced by the BGPVPN service plugin path; other consumers of the range extension are unaffected.
  • No database migrations are required in v1 beyond what networking-bgpvpn already defines.

9. Known Limitations and Future Work

Limitation Impact Future Mitigation
RT values are admin-specified manually Operator must coordinate RT assignment across fabrics and tenants. Risk of RT collision or misconfiguration. Add RT pool allocation analogous to L3VNI pool; auto-assign from a per-fabric or global RT range.
Each BGPVPN is scoped to a single fabric Cross-fabric L3VPN (same VRF spanning fabric-a and fabric-b) requires two BGPVPN objects and manual RT coordination between them. Add multi-fabric BGPVPN support with explicit fabric list and shared L3VNI/RT assignment.
Fabric consistency validation at association time only A router’s networks could change fabric after association, creating inconsistency. Add validation on subnet-router interface attach when router has an active BGPVPN association.
RD assignment is driver-managed No Neutron-level visibility into per-VTEP RDs. Add RD pool allocation and expose assigned RDs on the BGPVPN object.
No auto-allocation of RT values Covered above. RT pool as a network_segment_range-style resource.
Backend driver switch inventory is implementation-defined No standard way for the driver to discover which switches are relevant for a given association. Define a standard topology discovery interface (e.g. query Neutron port bindings or an external CMDB) as part of a future driver API extension.
L2 BGPVPN type excluded No Type-2 (MAC/IP) route advertisement for stretched L2 across fabrics. Separate design; out of scope for bare metal Type-5 use case.

10. References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment