Design Specification Status: Draft
- Overview
- Problem Statement
- Goals and Non-Goals
- Background and Prior Art
- Detailed Design
- End-to-End Data Flow
- Patch Inventory
- Backward Compatibility
- Known Limitations and Future Work
- References
This document describes a design for supporting multiple VXLAN fabrics within a single OpenStack Neutron instance. The goal is to allow non-admin users (member role) to create networks scoped to a specific physical VXLAN fabric and provision bare metal servers onto those networks, with Nova/Placement scheduling constrained to nodes physically connected to the correct fabric.
The design is intentionally conservative for v1. It requires minimal to no database schema changes and is fully backward compatible with existing Neutron VXLAN deployments. Known limitations are documented explicitly with paths for future improvement.
Neutron’s VXLAN type driver maintains a single global VNI allocation pool with no awareness of physical topology. This creates several issues in multi-fabric deployments:
- Neutron has no mechanism to express that some nodes are connected to one VXLAN fabric and others to a different fabric.
- A network allocated a VNI from the global pool carries no signal about which fabric it belongs to, so Nova/Ironic placement scheduling cannot use it as a constraint.
- The VLAN type driver already supports
physical_networkscoping to partition VLAN ID pools, but no equivalent exists for VXLAN. - The
network_segment_rangeextension supportsphysical_networkfor VLAN type ranges, but not for VXLAN.
The result is that in a deployment with multiple isolated VXLAN fabrics, an operator must use workarounds (separate Neutron instances, manual coordination, admin-only provider networks) that are operationally costly and do not expose self-service to tenant users.
- Allow a Neutron network to be associated with a named fabric at creation time by a non-admin user.
- Allocate the VNI for that network from a VNI range scoped to that fabric’s
physical_networkvalue. - Preserve full backward compatibility: existing VXLAN deployments with no fabric annotation continue to work without modification.
- Integrate with Nova scheduling so that placement requests for servers on a fabric-scoped network automatically require the corresponding trait on candidate nodes.
- Expose available fabrics to tenant users via a discoverable API.
- Require no database schema changes in v1.
- Per-fabric VNI uniqueness guarantees. Fabrics are assumed to be isolated L2 domains; duplicate VNIs across fabrics are harmless. This is a known v1 limitation — see Section 9.
- Upstream merge in v1. Patches are developed downstream first with upstream contribution as the stated goal.
The Neutron VLAN type driver accepts a physical_network parameter on network creation and in ml2_type_vlan configuration. Each physical_network has its own VLAN ID range, and allocation is scoped to the physical network specified by the user or matched from a shared pool. Tenant networks automatically allocate from physical networks marked as shared.
This is the direct inspiration for the VXLAN extension described here, with the key difference that for VXLAN we do not require physical_network on tenant networks and do not auto-allocate from shared fabric pools — physical_network=None remains the tenant default.
The network_segment_range extension (introduced in Stein) allows operators to define named VNI/VLAN ranges and associate them with projects or mark them as shared. It already supports physical_network for VLAN type ranges. This design extends that support to VXLAN type ranges.
The networking-baremetal project’s baremetal-l2vni plugin provides a direct precedent for using physical_network as a leaf switch scoping mechanism in the context of VXLAN and bare metal provisioning. That plugin maps VXLAN segment VNIs to VLAN IDs on physical switch ports using physical_network as the coordination key for identifying the leaf switches. This design reuses the same conceptual mapping: physical_network names a fabric, not a specific physical interface, and the VNI pool is scoped to it accordingly.
Nova already has a mechanism for Neutron networks to inject placement resource requests and traits. The minimum-bandwidth QoS rule is the canonical example: Neutron annotates port binding data with required placement resources, and Nova’s network metadata handling in nova.network.neutron translates these into placement allocation candidates and required traits. This same channel is the primary path for injecting CUSTOM_FABRIC_<value> traits, avoiding direct patches to Nova’s scheduler logic where possible.
neutron/plugins/ml2/drivers/type_vxlan.py is patched to:
- Accept an optional
physical_networkparameter inallocate_tenant_segment,allocate_partially_specified_segment, and the segment creation path. - When
physical_networkisNone(the default), behavior is identical to the current implementation: VNI is allocated from the global pool. - When
physical_networkis set, the VNI is allocated from the sub-range of the global pool associated with thatphysical_networkvalue (see Section 5.2). If the sub-range is exhausted, allocation fails hard — there is no fallback to theNonepool. - The allocator does not partition the VNI namespace by
physical_network. All VNIs remain in a single flat allocation table. Thephysical_networkvalue on a range is a selection filter, not a separate pool. This is the critical difference from the VLAN type driver and is what allows v1 to avoid DB schema changes.
v1 Limitation: Because the allocator uses a single flat table, VNI uniqueness is guaranteed only within a given fabric’s configured range. It is not possible for two fabrics with overlapping configured ranges, to generate duplicate VNI assignments. As this approach is only considering the L2VNI use case and there is also L3VNI and different L2VNI domains can overlap without concern, this should be a permissible configuration. A future version can expand on this.
The network_segment_range extension is patched to accept physical_network on VXLAN type ranges, mirroring existing VLAN behavior:
POST /v2.0/network_segment_ranges
{
"network_segment_range": {
"name": "fabric-a-vxlan",
"network_type": "vxlan",
"physical_network": "fabric-a",
"minimum": 2000001,
"maximum": 2000999,
"shared": true
}
}The physical_network value is stored in the existing network_segment_ranges table. No schema migration is required in v1 because the physical_network column already exists on this table for VLAN ranges.
Validation is added to warn on overlapping minimum/maximum bounds across VXLAN ranges with different physical_network values (warning at configuration time, not a hard API rejection in v1).
A new Neutron API extension adds a fabric attribute to the network resource:
| Attribute | Type | Access | Description |
|---|---|---|---|
fabric |
string (nullable) | R/W on create; R after | Names the physical VXLAN fabric. Maps to a physical_network value on a configured VXLAN network_segment_range. |
Policy is configured to allow the member role to set fabric on network create. No admin privilege is required.
On network creation with fabric set:
- The type driver looks up VXLAN
network_segment_rangeswherephysical_networkmatches thefabricvalue and the range is accessible to the requesting project (shared=Trueorproject_idmatches). - If no matching range exists or all matching ranges are exhausted, the API returns HTTP 409 with a descriptive error. There is no fallback to
physical_network=Noneranges. - The allocated segment’s
physical_networkis set to thefabricvalue and stored normally.
On network creation without fabric set:
- Behavior is identical to current Neutron VXLAN tenant network creation: VNI allocated from ranges with
physical_network=None.
As a low-effort v1 approach, each configured fabric is exposed as a network availability zone:
GET /v2.0/availability_zones
Tenant users see fabric names alongside standard availability zones. No new endpoints required.
A cleaner future endpoint will expose fabric names filtered to those accessible to the requesting project, based on the shared flag and project_id scoping on the underlying network_segment_ranges:
GET /v2.0/fabrics
or
GET /v2.0/network_segment_ranges/fabrics
{
"fabrics": [
"fabric-a",
"fabric-b"
]
}This avoids polluting the availability zone namespace. Targeted for a follow-on patch.
Nova already processes port binding data from Neutron when building placement requests. The minimum-bandwidth QoS rule demonstrates this: Neutron sets resource_request on port binding data, which nova.network.neutron.API translates into placement allocation candidates and required traits.
This design uses the same channel. When a network has a fabric attribute set, Neutron sets a required traits entry in the port’s resource_request:
"resource_request": {
"required": ["CUSTOM_FABRIC_FABRIC_A"]
}CUSTOM_FABRIC_<value> follows Nova’s custom trait naming convention (uppercase, underscores, CUSTOM_ prefix). The fabric name is normalized: fabric-a becomes CUSTOM_FABRIC_FABRIC_A.
Nova’s existing placement request construction code (nova/scheduler/utils.py) already aggregates required traits from port resource requests. No scheduler logic changes are needed if the port data channel carries the trait correctly.
Note: This path must be validated against the current Nova codebase during implementation. The
minimum-bandwidthpath setsresource_classandamount, not bare traits. If traits-onlyresource_requestentries are not handled by the existing consumer code, the Option A fallback below is required.
If the port binding data channel does not support bare trait injection without a resource class, the fallback is a targeted patch to Nova’s network metadata handling:
- In
nova.network.neutron.API(likely_get_port_binding_infoor equivalent), detect thefabricattribute on network details fetched during server build. - Inject
CUSTOM_FABRIC_<value>into the required traits set passed to the placement API for the scheduling request.
This is more invasive but a clean, single-purpose change. It is the explicit fallback if Option B proves insufficient.
Ironic nodes connected to a given fabric must have the corresponding CUSTOM_FABRIC_<value> trait set. This is performed out-of-band by operators at node enrollment time:
openstack baremetal node add trait <node-uuid> CUSTOM_FABRIC_FABRIC_AUnderstack automation (e.g. via Nautobot Jobs or ArgoCD-driven node enrollment workflows) is expected to set these traits as part of the standard node onboarding process. Automated discovery of fabric membership is out of scope for v1.
- User discovers available fabrics via
GET /v2.0/availability_zones?resource=network(v1) orGET /v2.0/fabrics(v2). - User creates a network with
fabricset:POST /v2.0/networkswith{"fabric": "fabric-a"}. - Neutron allocates a VNI from the
network_segment_rangewherephysical_network="fabric-a". The segment is stored withphysical_network="fabric-a". - User creates a port on the network and requests a server. Nova fetches port and network details from Neutron.
- Nova constructs the placement request.
CUSTOM_FABRIC_FABRIC_Ais included as a required trait via the portresource_request(Option B) or via direct Nova patch (Option A fallback). - Placement returns only candidates with
CUSTOM_FABRIC_FABRIC_Aset — i.e. Ironic nodes physically connected tofabric-a. - Nova schedules and Ironic provisions the server on a node connected to the correct fabric. The VXLAN network is reachable from the provisioned node.
| Repository | Component | Change Summary |
|---|---|---|
openstack/neutron |
ml2/drivers/type_vxlan.py |
Add optional physical_network param to allocation path; fail hard on exhausted fabric range |
openstack/neutron |
network_segment_range extension |
Allow physical_network on VXLAN type ranges; add overlap validation warning |
openstack/neutron |
New API extension | Add fabric attribute to network resource; add policy rule for member access |
openstack/neutron |
Availability zone driver | Expose configured fabric names as network availability zones (v1 discovery) |
openstack/neutron |
Port binding / resource_request |
Set CUSTOM_FABRIC_<value> in port resource_request when network has fabric set |
openstack/nova (validation) |
nova.network.neutron |
Verify/extend traits-only resource_request consumption from port binding data |
openstack/nova (fallback) |
nova.network.neutron / scheduler/utils.py |
Inject CUSTOM_FABRIC_<value> required trait from network fabric attribute during placement request construction |
- All changes are additive. Existing VXLAN deployments with no fabric configuration continue to function unchanged.
- Tenant networks created without the
fabricattribute allocate VNIs fromphysical_network=Noneranges, identical to current behavior. - The
network_segment_rangechanges are additive: existing VLAN ranges are unaffected; existing VXLAN ranges withoutphysical_networkcontinue to work. - The Nova integration is additive: ports on networks without a
fabricattribute carry no new placement constraints. - No database migrations are required in v1.
| Limitation | Impact | Future Mitigation |
|---|---|---|
| No cross-fabric VNI uniqueness guarantee | Duplicate VNIs possible if ranges overlap across fabrics. Harmless for isolated L2 fabrics. | Enforce non-overlapping ranges at configuration time; add per-fabric VNI allocation table in a future DB migration. |
| Fabric discovery via availability zones is a workaround | AZ namespace is polluted with fabric names. | Implement dedicated /v2.0/fabrics endpoint with project-scoped visibility. |
| Nova Scheduling Integration | Traits-only resource_request may not be consumed by current Nova placement code. |
Confirm during patch development; fall back if needed. |
- Neutron ML2 VLAN type driver:
neutron/plugins/ml2/drivers/type_vlan.py - Neutron ML2 VXLAN type driver:
neutron/plugins/ml2/drivers/type_vxlan.py network_segment_rangeextension:neutron/extensions/network_segment_range.py- networking-baremetal
baremetal-l2vniplugin: https://opendev.org/openstack/networking-baremetal - Nova minimum-bandwidth QoS placement integration:
nova/network/neutron.py,nova/scheduler/utils.py - OpenStack Placement traits API: https://docs.openstack.org/placement/latest/usage/traits.html