Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active April 27, 2025 19:06
Show Gist options
  • Save scyto/58b5cd9a18e1f5846048aabd4b152564 to your computer and use it in GitHub Desktop.
Save scyto/58b5cd9a18e1f5846048aabd4b152564 to your computer and use it in GitHub Desktop.
New version of my mesh network using openfabric

Enable Dual Stack (IPv4 and IPv6) OpenFabric Routing

Version 2.2 (2025.04.25)

this gist is part of this series

This assumes you are running Proxmox 8.4 and that the line source /etc/network/interfaces.d/* is at the end of the interfaces file (this is automatically added to both new and upgraded installations of Proxmox 8.2).

This changes the previous file design thanks to @NRGNet and @tisayama to make the system much more reliable in general, more maintainable esp for folks using IPv4 on the private cluster network (i still recommend the use of the IPv6 FC00 network you will see in these docs)

Notable changes from original version here

  • move IP address configuration from interfaces.d/thundebolt to frr configuration
  • new approach to remove dependecy on post-up, new script in if-up.d that logs to systemlog
  • reminder to copy frr.conf > frr.conf.local to prevent breakage if you enable Proxmox SDN
  • dependent on the changes to the udev link scripts here

This will result in an IPv4 and IPv6 routable mesh network that can survive any one node failure or any one cable failure. Alls the steps in this section must be performed on each node

NOTES on Dual Stack

I have included this for completeness, i only run the FC00:: IPv6 network as ceph does not support dual stack, i strongly recommend you consider only using IPv6. For example for ceph do not dual stack - either use IPv4 or IPv6 addressees for all the monitors, MDS and daemons - despite the docs implying it is ok my findings on quincy are is it is funky....

With all the scripts and changes folks have contributed IPv4 should now be stable. I am recommending new folks use IPv4 for ceph as documented in that gists in the series. This is to avoid ongoing issues with SDN and IPv6. I have yet to decide if i will migrate my ceph back to IPv4 so i can play with SDN or just wait for the SDN issues to be solved.

Defining thunderbolt network

Create a new file using nano /etc/network/interfaces.d/thunderbolt and populate with the following There should no lober be any IP addresses in this file for lo and lo:6

allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

Save file, repeat on each node.

Enable IPv4 and IPv6 forwarding

  1. use nano /etc/sysctl.conf to open the file
  2. uncomment #net.ipv6.conf.all.forwarding=1 (remove the # symbol)
  3. uncomment #net.ipv4.ip_forward=1 (remove the # symbol)
  4. save the file
  5. issue reboot now for a complete reboot

FRR Setup

Install & enable FRR

Install Free Range Routing (FRR) apt install frr Enable frr systemctl enable frr

Enable the fabricd daemon

  1. edit the frr daemons file (nano /etc/frr/daemons) to change fabricd=no to fabricd=yes
  2. save the file
  3. restart the service with systemctl restart frr

Mitigate FRR Timing Issues (I need someone with an MS-101 to confirm if helps solve their IPv4 issues)

create script that is automatically processed when en05/en06 are brougt up to restart frr

this should make IPv4 more stable for all users (i ended up seeing IPv4 issues too, just less commonly than MS-101 users)

  1. create a new file with nano /etc/network/if-up.d/en0x
  2. add to file the following
#!/bin/bash
# note the logger entries log to the system journal in the pve UI etc

INTERFACE=$IFACE

if [ "$INTERFACE" = "en05" ] || [ "$INTERFACE" = "en06" ]; then
    logger "Checking if frr.service is running for $INTERFACE"
    
    if ! systemctl is-active --quiet frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] frr.service not running. Starting service."
        if systemctl start frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully started frr.service"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to start frr.service"
        fi
        exit 0
    fi

    logger "Attempting to reload frr.service for $INTERFACE"
    if systemctl reload frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Successfully reloaded frr.service for $INTERFACE"
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] Failed to reload frr.service for $INTERFACE"
    fi
fi
  1. make it executable with chmod +x /etc/network/if-up.d/en0x

mitgigate issues cause by things that reset the loopback

create script that is automatically processed when lo is reprocessed by ifreload, ifupdown2, pve set, etc

  1. create a new file with nano /etc/network/if-up.d/lo
  2. add to file the following
#!/bin/bash

INTERFACE=$IFACE

if [ "$INTERFACE" = "lo" ]  ; then
    logger "Attempting to restart frr.service for $INTERFACE"
    if systemctl restart frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Successfully restart frr.service for $INTERFACE"
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] Failed to restart frr.service for $INTERFACE"
    fi
fi

make it executable with chmod +x /etc/network/if-up.d/lo

Configure OpenFabric (perforn on all nodes)

**note: if (and only if) you have already configured SDN you should make these settings in /etc/frr/frr.conf.local and reapply your SDN configuration to have SDN propogate these into frr.conf (you can also make the edits to both files if you prefer) if you make these edits to only frr.conf with SDN active and then reapply the settings it will loose these settings.

  1. enter the FRR shell with vtysh
  2. optionally show the current config with show running-config
  3. enter the configure mode with configure
  4. Apply the bellow configuration (it is possible to cut and paste this into the shell instead of typing it manually, you may need to press return to set the last !. Also check there were no errors in repsonse to the paste text.).

Note: the X should be the number of the node you are working on For example node 1 would use 1 in place of X

ip forwarding
ipv6 forwarding
!
interface en05
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface en06
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface lo
ip address 10.0.0.8x/32
ipv6 address fc00::8x/128
ip router openfabric 1
ipv6 router openfabric 1
openfabric passive
exit
!
router openfabric 1
net 49.0000.0000.000x.00
exit
!
exit

  1. you may need to press return after the last exit to get to a new line - if so do this
  2. save the configu with write memory
  3. show the configure applied correctly with show running-config - note the order of the items will be different to how you entered them and thats ok. (If you made a mistake i found the easiest way was to edt /etc/frr/frr.conf - but be careful if you do that.)
  4. use the command exit to leave setup
  5. repeat steps 1 to 9 on the other 3 nodes
  6. once you have configured all 3 nodes issue the command vtysh -c "show openfabric topology" if you did everything right you will see (note it may take 45 seconds for for all routes to show if you just restarted frr for any reason):
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
10.0.0.81/32         IP internal  0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
10.0.0.82/32         IP TE        20     pve2                 en06      pve2(4)
10.0.0.83/32         IP TE        20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
fc00::81/128         IP6 internal 0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
fc00::82/128         IP6 internal 20     pve2                 en06      pve2(4)
fc00::83/128         IP6 internal 20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

Now you should be in a place to ping each node from evey node across the thunderbolt mesh using IPv4 or IPv6 as you see fit.

IMPORTAT - you need to do this to stop SDN breaking you in future

if all is working issue a cp /etc/frr/frr.conf /etc/frr/frr.conf.local this is because when enabling proxmox SDN proxmox will overwrite frr.conf - however it will read the .local file and apply that.

**note: if you already have SDN configured do not do the step above as you will mess both your SDN and this openfabric topology (see note at start of frr instructions)

based on this response https://forum.proxmox.com/threads/relationship-of-frr-conf-and-frr-conf-local.165465/ if you have SDN all local (non SDN) configuration changes should be made in .local, this should be read next time SDN apply is used. do not copy frr.conf > frr.conf.local after doing anything with SDN or when you tear down SDN the settings will not be removed from frr.conf

@0xD4
Copy link

0xD4 commented Apr 22, 2025

Hi @scyto, by disabling the comment function in the old Gist, all helpful posts and comments have disappeared. I was just about to document these for myself and would have liked to look something up.

Either could you keep this up for a few more days so that everyone is able to back up whats needed, or at least save it via the WebArchive and provide a link to the archive? That way, at least the more recent comments would still be available.

@scyto
Copy link
Author

scyto commented Apr 22, 2025

@jochristian thanks i will try those, i like your reload vs restart approach as that will stop the service from ever getting blocked from too may restarts in a short window....

i just identified an issue that if pvesh set /nodes/pve1/network is called (this is the same as clicking apply on the nodes network config in the GUI i think)b or ifupdown2 is called directly by any process this removes the loopback adresses applied to lo - meaning frr reoad is required, not sure best way to do this as i think technical lo never went down or came up.

also the VMs on node 1 loose their connectivity via vmbr0 when that command is processed - i am not sure what the heck is going on there (could be the hacky version of ifupdown2 i have...)

this means the script doesn't get executed at all... i am not sure what to do about that, ideas?

--edit--
scratch that, if pvesh set runs and wipes the IP addresses set via frr on the loopback interface a restart is the only thing that will bring them back....

---edit 2-- ok amending the /interfaces file to call restart in the lo stanza works (reload doesnt), maybe there is a more elegant way of doing it?

auto lo
iface lo inet loopback
  post-up sh -c 'sleep 5 && /usr/bin/systemctl restart frr.service'

if you are interested this is the settings it applies.... it doesn't explicitly set anything on lo - but because lo is marked as auto in (my) interfaces it gets processed and wipes away whatever frr did and frr reload doesn't correct it.... bugger

pvesh get /nodes/pve1/network --output-format json-pretty

@scyto
Copy link
Author

scyto commented Apr 22, 2025

could you keep this up

done

@scyto
Copy link
Author

scyto commented Apr 22, 2025

@jochristian maybe we check for frr state - if its started, reload it, if it is not restart it, chatgpt expanded this for me, havent't tested - thoughts? (only thought i had is if promox is in middle of restarting service the service may barf having too many start requests in a short time and lock itself out - my inital gut says not to account for that condition on the service).

i have already seen failures where proxmox does something to the network and frr exits (stops or crash) and doesn't restart....

#!/bin/bash
# note the logger entries log to the system journal in the pve UI etc

INTERFACE=$IFACE

if [ "$INTERFACE" = "en05" ] || [ "$INTERFACE" = "en06" ]; then
    logger "Checking if frr.service is running for $INTERFACE"
    
    if ! systemctl is-active --quiet frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] frr.service not running. Starting service."
        if systemctl start frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully started frr.service"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to start frr.service"
        fi
        exit 0
    fi

    logger "Attempting to reload frr.service for $INTERFACE"
    if systemctl reload frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Successfully reloaded frr.service for $INTERFACE"
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] Failed to reload frr.service for $INTERFACE"
    fi
fi

as an aside, i am loving using chatgpt for this type of stuff, it even explained to me how the evalaution by the if was done, when i asked...

https://chatgpt.com/share/6807f3f9-0934-800d-9931-b1d4937473f2

@scyto
Copy link
Author

scyto commented Apr 22, 2025

@jochristian
Copy link

jochristian commented Apr 23, 2025

Hi @scyto,

I like your new approach.
But did you do systemctl enable frr in this setup? Or maybe it's not needed with the way you did lo interface?

@xenpie
Copy link

xenpie commented Apr 24, 2025

Hi @scyto,

I think you can condense the loopback and frr fix into one single script. The following is what I run at the moment and I didn't face any issues so far, even after rebooting numerous times to make sure it works.
Thank you for the gists, by the way! They really helped me to get my own Thunderbolt ring network up and running stable.
Now I am in the same situation as you, where I want to use the existing network also for VM and/or Ceph traffic. I saw the SDN forum post, but so far it really has been a pain, things are constantly breaking...

nano /etc/network/if-up.d/frr
#!/bin/bash
# note the logger entries log to the system journal in the pve UI etc

INTERFACE=$IFACE

if [ "$INTERFACE" = "en05" ] || [ "$INTERFACE" = "en06" ] || [ "$INTERFACE" = "lo" ]; then
    if systemctl is-active --quiet frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Reloading frr.service for $INTERFACE"
        if systemctl reload frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully reloaded frr.service for $INTERFACE"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to reload frr.service for $INTERFACE"
        fi
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] frr.service is not running, attempting to restart for $INTERFACE"
        if systemctl restart frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully restarted frr.service for $INTERFACE"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to restart frr.service for $INTERFACE"
        fi
    fi
fi
chmod +x /etc/network/if-up.d/frr

Example output:

root@pve1:~# journalctl -b -t SCYTO
Apr 24 01:46:44 pve1 SCYTO[1511]:    [SCYTO SCRIPT ] frr.service is not running, attempting to restart for lo
Apr 24 01:46:49 pve1 SCYTO[1610]:    [SCYTO SCRIPT ] Successfully restarted frr.service for lo
Apr 24 01:46:50 pve1 SCYTO[1715]:    [SCYTO SCRIPT ] Reloading frr.service for en05
Apr 24 01:46:50 pve1 SCYTO[1806]:    [SCYTO SCRIPT ] Successfully reloaded frr.service for en05
Apr 24 01:46:50 pve1 SCYTO[1846]:    [SCYTO SCRIPT ] Reloading frr.service for en06
Apr 24 01:46:51 pve1 SCYTO[1893]:    [SCYTO SCRIPT ] Successfully reloaded frr.service for en06
Apr 24 01:46:51 pve1 SCYTO[2165]:    [SCYTO SCRIPT ] Reloading frr.service for en06
Apr 24 01:46:52 pve1 SCYTO[2210]:    [SCYTO SCRIPT ] Successfully reloaded frr.service for en06
Apr 24 01:46:54 pve1 SCYTO[2348]:    [SCYTO SCRIPT ] Reloading frr.service for en05
Apr 24 01:46:54 pve1 SCYTO[2393]:    [SCYTO SCRIPT ] Successfully reloaded frr.service for en05
root@pve1:~#

@scyto
Copy link
Author

scyto commented Apr 24, 2025

But did you do systemctl enable frr in this setup? Or maybe it's not needed with the way you did lo interface?

well interesting question, its certainly not in https://gist.github.com/scyto/58b5cd9a18e1f5846048aabd4b152564#enable-the-fabricd-daemon i wonder if a version of proxmox enabled it by default - or if i just missed it from the docs, i will add! it wasn't there on the old on either so i guess people self-fixed ot something else enabled the service by default....

@scyto
Copy link
Author

scyto commented Apr 24, 2025

@xenpie

I saw the SDN forum post, but so far it really has been a pain, things are constantly breaking...

thanks, yeah my adventues on that have been, um interesting, it seems until there is a defintely good version of ipdown2 there is no progress to be made, i am hoping proxmox patch it, there seems to be more issues with EVPN on IPv6 underlays, i am unclear how much that affects us as I really am only on day 4 of understanding EVPNs, SDN etc but i think until this FRRouting/frr#18539 is in the upstream code and comes down to promox we may be scrwed

i am going to try and use a VXLAN next to see if i can get a VM to talk to the mesh IP addresses - but so far my attempts at that have failed - can't even ssh to the 10.x series addresses from a VM, until i can do that i won't even contemplate migrating my ceph from IPv6 to IPv4.

I think you can condense the loopback and frr fix into one single script

nice, i was going to do that at the weekend, so thanks for take a stab, one thing tho - in my testing there are certain conditions when a restart vs a reload is needed - thats why for lo its not a reload in my script - the reload did not work for lo when that went up and down - thats why its two scripts currently up in the gist, but it wont be hard to make it one script

@xenpie
Copy link

xenpie commented Apr 24, 2025

thanks, yeah my adventues on that have been, um interesting, it seems until there is a defintely good version of ipdown2 there is no progress to be made, i am hoping proxmox patch it, there seems to be more issues with EVPN on IPv6 underlays, i am unclear how much that affects us as I really am only on day 4 of understanding EVPNs, SDN etc but i think until this FRRouting/frr#18539 is in the upstream code and comes down to promox we may be scrwed

I am currently only using IPv4 as it seems to work stable for me. But I am also not using Ceph at the moment - I mainly use the Thunderbolt ring for HA migration and ZFS replication. Next step for me is allowing the VMs to access this network, then I'll move onto Ceph.

i am going to try and use a VXLAN next to see if i can get a VM to talk to the mesh IP addresses - but so far my attempts at that have failed - can't even ssh to the 10.x series addresses from a VM, until i can do that i won't even contemplate migrating my ceph from IPv6 to IPv4.

Yeah, that's also where I'm bashing my head against a wall right now.
Maybe it helps with troubleshooting, I found a "solution" (more a hack) to at least automatically fix the frr.conf file after applying SDN settings. I got tired of having to fix it manually every time for every node and this way I only lose a maximum of 10 pings and don't have to touch anything. Ideally this won't be needed once I have a working setup, but while testing and constantly adjusting settings, this really helps me.

Basically this addresses what is written in section "2.6 - Fixing up FRR config"​.
It rewrites the bgp router entries which get set to the management address instead of the lo address and it also removes the nodes own lo address from the neighbor list (not sure if that causes any issues, just following the forum tutorial here). Afterwards it restarts the frr service. This is working fine for me, but may need tweaking for other setups, especially the hostname and ip handling, but those could just be hardcoded for each node instead as well.

The script:

nano /usr/local/bin/fix-frr-config.sh
#!/bin/bash
set -e

LOCKFILE="/run/fix-frr-config.lock"
if [ -e "$LOCKFILE" ]; then
        exit 0
fi
touch "$LOCKFILE"
trap "rm -f $LOCKFILE" EXIT

FRR_CONF="/etc/frr/frr.conf"
HOSTNAME_SHORT=$(hostname -s) # pve1, pve2, pve3
NODE_NUMBER="${HOSTNAME_SHORT//[!0-9]/}" # extracts the number from pve1, pve2, etc.
ROUTER_ID="10.0.0.8$NODE_NUMBER" # ADJUST THIS TO YOUR OWN NETWORK SETTINGS
WRONG_ROUTER="192.168.8.8$NODE_NUMBER" # ADJUST THIS TO YOUR OWN NETWORK SETTINGS
NEEDS_UPDATE=0

if grep -q "bgp router-id $WRONG_ROUTER" "$FRR_CONF"; then
    NEEDS_UPDATE=1
        sed -i "s/^ *bgp router-id $WRONG_ROUTER/ bgp router-id $ROUTER_ID/" "$FRR_CONF"
fi

if grep -q "neighbor $ROUTER_ID peer-group VTEP" "$FRR_CONF"; then
    NEEDS_UPDATE=1
        sed -i "/neighbor $ROUTER_ID peer-group VTEP/d" "$FRR_CONF"
fi

if [ "$NEEDS_UPDATE" -eq 1 ]; then
        systemctl restart frr.service
fi
chmod +x /usr/local/bin/fix-frr-config.sh

The config watcher:

nano /etc/systemd/system/fix-frr.path
[Unit]
Description=Watch frr.conf for changes

[Path]
PathChanged=/etc/frr/frr.conf

[Install]
WantedBy=multi-user.target

The service:

nano /etc/systemd/system/fix-frr.service
[Unit]
Description=Fix FRR configuration after SDN apply

[Service]
Type=oneshot
ExecStart=/usr/local/bin/fix-frr-config.sh

Enable and start it:

systemctl daemon-reexec
systemctl daemon-reload
systemctl enable --now fix-frr.path

nice, i was going to do that at the weekend, so thanks for take a stab, one thing tho - in my testing there are certain conditions when a restart vs a reload is needed - thats why for lo its not a reload in my script - the reload did not work for lo when that went up and down - thats why its two scripts currently up in the gist, but it wont be hard to make it one script

Ah I see, didn't run into that issue myself yet. Then maybe this would be a safer option:

#!/bin/bash
# note the logger entries log to the system journal in the pve UI etc

INTERFACE=$IFACE

# Always restart for lo
if [ "$INTERFACE" = "lo" ]; then
        logger -t SCYTO "   [SCYTO SCRIPT ] frr.service is not running, attempting to restart for $INTERFACE"
        if systemctl restart frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully restarted frr.service for $INTERFACE"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to restart frr.service for $INTERFACE"
        fi
    fi
fi

# Reload or restart for en05 / en06
if [ "$INTERFACE" = "en05" ] || [ "$INTERFACE" = "en06" ]; then
    if systemctl is-active --quiet frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Reloading frr.service for $INTERFACE"
        if systemctl reload frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully reloaded frr.service for $INTERFACE"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to reload frr.service for $INTERFACE"
        fi
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] frr.service is not running, attempting to restart for $INTERFACE"
        if systemctl restart frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully restarted frr.service for $INTERFACE"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to restart frr.service for $INTERFACE"
        fi
    fi
fi

Edit:

I also noticed in the "/etc/network/interfaces.d/sdn" file the ip addresses for "vxlan-local-tunnelip" in the "vrfvx_evpnPRD" and "vxlan_vxnet1" sections are (seemingly) randomly switching after every sdn apply between my node management ip addresses and lo addresses. Not sure why that happens or which one would be correct for our intended use case.

@ronindesign
Copy link

ronindesign commented Apr 24, 2025

Working through a fresh 3 node cluster setup with MS-01s now. Nothing actionable, but just an FYI, on pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-10-pve), frr package appears to already be installed by default:

# apt install -y frr && systemctl enable frr
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
frr is already the newest version (10.2.2-1+pve1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Created symlink /etc/systemd/system/multi-user.target.wants/frr.service → /lib/systemd/system/frr.service.

EDIT: Confirming the above (and all steps so far) are working as expected with no issues on 3 node cluster, MS-01, with Proxmox v8.4.1 (kernel: 6.8.12-10-pve). All nodes can ping one another on IPv4, IPv6. Thunderbolt network survives reboot, all nodes interpingable.

@scyto
Copy link
Author

scyto commented Apr 24, 2025

Maybe it helps with troubleshooting, I found a "solution" (more a hack) to at least automatically fix the frr.conf file after applying SDN settings.

good to here, i can say the config in my gists does not break when SDN is applied now, maybe the promox gang fixed some stuff since you did you script approach (or i just haven't hit your issue) - the biggest issue was SDN blew away IPs on interface lo i havent see anything else yet.... rofl.....

@scyto
Copy link
Author

scyto commented Apr 24, 2025

package appears to already be installed by default:

yeah it is these days i think because of SDN

good to hear its working, did you use my old gist this new gist or forge your own path (i am asking to know if the instructions work above, along with the chnages in the thunderbolt gist)

@scyto
Copy link
Author

scyto commented Apr 24, 2025

so given the state of SDN and IPv6 i would say that most people should use IPv4 for ceph until that stuff is all acutally working with IPv6

@scyto
Copy link
Author

scyto commented Apr 24, 2025

Basically this addresses what is written in section "2.6 - Fixing up FRR config"​.

yes saw that section i was very confused at first because SDN wasn't writing anything like any of that stuff into frr.conf!

i will note its possible for SDN to get into some very weird states, along with networking.service, for example even after removing all of the SDN config in the UI the network service would not reload or restart, it required a reboot to truly fix the networking.service state

i think there are lots edge cases where SDN can bork itself.... esp when we are playing around and messing up.....

i haven't even been able to get the simple vxlan up and running on IPv4 -the IPAM wouldn't even issue a IP to a VM bridged on the vnet...... i need to reboot all 3 nodes to get rid of the strange error and maybe that will clear multiple of my weird issues....

basically this https://forum.proxmox.com/threads/cant-restart-or-reload-networking-service-error-cannot-add-dependency-job.165461/ - i don't know how much this was causing my SDN to not work.... waiting to see if i get a response tomorrow, if not i will kill the remaining two nodes this happend on an start my SDN from scratch - starting with a VXLAN to get my head around that before i move to bgp...

@christensenjairus
Copy link

I saw your post in the Proxmox forum. I think I'm trying to do the same thing as you. I need frr for my local mesh network (100gbe) but SDN blows away the file. I also get strange functionality when using simple routing instead of frr, so I'm interested to see what the answer is there.

@ronindesign
Copy link

ronindesign commented Apr 25, 2025

good to hear its working, did you use my old gist this new gist or forge your own path (i am asking to know if the instructions work above, along with the chnages in the thunderbolt gist)

I followed all of your most recent gists (including this gist @ v2.1), including edits made over the last few days. I thought it would be a helpful opportunity to provide feedback with a fresh cluster deployment using your latest instructions.

All has worked 100%, no issues, multiple reboots, TB network survives every time, without errors; haven't tried unplugging/replugging much at all yet. I doubt it matters, but I am using BIOS v1.26 (latest) on the MS-01s, due to Proxmox instability on previous BIOS versions (e.g. kernel panics, etc.)

Anyways, don't want to add any further noise -- clearly bigger fish to fry with SDN it looks like! Just wanted to give a 👍for latest revisions of the guide. Thanks again so much, coming back to this a year or more later and it's such a helpful resource, appreciate all your time and energy on it (hope the surgery went ok!)

@scyto
Copy link
Author

scyto commented Apr 25, 2025

I thought it would be a helpful opportunity to provide feedback with a fresh cluster deployment using your latest instructions.

thanka i really appreicate that, glad to hear it worked!

Eek on the BIOS issues, i hadn't heard about that. Add as much noise as you want, i do :-) (and yes my surgery went well, thanks for asking)

@scyto
Copy link
Author

scyto commented Apr 25, 2025

I saw your post in the Proxmox forum. I think I'm trying to do the same thing as you. I need frr for my local mesh network (100gbe) but SDN blows away the file. I also get strange functionality when using simple routing instead of frr, so I'm interested to see what the answer is there.

yeah, i searched in frr.conf.local in the forum and realized i couldn't find a good description of how it is used, i also found that the SDN left networking.service in weird invalid states until a reboot - i will repeat my SND tests if i get time (though this weekend is a new server rack so that will take most of my time!)

@scyto
Copy link
Author

scyto commented Apr 25, 2025

@ALL i changed the guidance on copying frr.conf after SDN has been configured - if you copy ffr.conf to ffr.conf.local after configuring SDN then SDN won't teardown the settings as it thinks they are local and not SDN settings and this means SDN settings remain in your frr.conf when they shouldn't

@scyto
Copy link
Author

scyto commented Apr 25, 2025

@folks using these settings

 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
  • how long does it take from doing an frr restart till you see all 3 routes doing vtysh -c "sh open topo"?
  • have you had any issues woith flapping routes - where the route changes constantly (this could cause variable ping times for example or even dropped packets as the routing changes)?

my testing shows it doesn't make convergce of routes faster at frr service start - seems to always take 45 seconds+

hmm well this is interesting https://chatgpt.com/share/680bcb97-3598-800d-9c54-22f27173f658

@scyto
Copy link
Author

scyto commented Apr 25, 2025

i think the 3 settings above are basically irrelevant on startup, i don't think they harm, i don't know what benefit they are giving - like the vtysh line that is also irrelevant (and i notice that SDN adds it too).

try adding the 3 spf and 1 lsp settings below to you router section - for me the routes converge almost instantly compared to >45 seconds before on frr start.... this would mean ceph has the chance to come up 45 seconds faster.....

--edit= those 3 spf settings caused crashes as they were not supposed to be in the router section, thanks chatgpt

i have this configured on all 3 nodes, if no one experiences issues i will add these 3 new settings to the gist

(these settings may not be a good thing where there is a large routed network, but fine for homelabs / esp isolated mesh)

example of what my node 3 looks like:

!
router openfabric 1
 net 49.0000.0000.0003.00
 lsp-gen-interval 5
exit
!

it might also be good to move to point to point link than broadcasts, then csnp and hello timings are basically irrelevant, might test that over the weekend

@xenpie
Copy link

xenpie commented Apr 25, 2025

 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2

I have been using these for a while, since I saw them in the SDN forum tutorial and they didn't seem to cause any harm so I just kept them in.

* how long does it take from doing an frr restart till you see all 3 routes doing `vtysh -c "sh open topo"`?

I just checked on my system, after restarting the frr service it takes less than 5 seconds before I see all the routes. Tried it multiple times on all nodes, always with the same result.

* have you had any issues woith flapping routes - where the route changes constantly (this could cause variable ping times for example or even dropped packets as the routing changes)?

I'd say no but then again not sure if I would notice it with my current use case. I just ran a quick ping test for 10 minutes and it looks good to me.

--- 10.0.0.82 ping statistics ---
574 packets transmitted, 574 received, 0% packet loss, time 586720ms
rtt min/avg/max/mdev = 0.038/0.145/0.358/0.059 ms
root@pve1:~#

--- 10.0.0.83 ping statistics ---
570 packets transmitted, 570 received, 0% packet loss, time 582585ms
rtt min/avg/max/mdev = 0.044/0.131/0.316/0.050 ms
root@pve2:~#

--- 10.0.0.81 ping statistics ---
567 packets transmitted, 567 received, 0% packet loss, time 579606ms
rtt min/avg/max/mdev = 0.045/0.139/0.345/0.054 ms
root@pve3:~#

@scyto
Copy link
Author

scyto commented Apr 25, 2025

I just checked on my system, after restarting the frr service it takes less than 5 seconds before I see all the routes.

thanks, ineresting, those made no difference to the route convergence time on startup for me, agree they are harmless in a small isolated mesh

@scyto
Copy link
Author

scyto commented Apr 26, 2025

@ALL i edited the settings under the router don't use the spf settings i had the earlier, remove them immediately if you implemented them or things will get very wonky

@scyto
Copy link
Author

scyto commented Apr 27, 2025

so i have spent the day with chatgpt and ceph - trying several new topolgies, hilariously most didn't work, but i understand why and chatgpt move me on - until we got right back to basically the design in this git with a few key difference - i am not ready to post that, but as part of this i needed to move my cluster from having /128s on each node to having /64 addresses (part of plan to try different routing options as it really looks like thunderbolt ports cannot be bridged!

Anyhoo. this is the migration plan i did, chatgpt made the document content and markdown for me too based on the hours of conversationsi had....

https://gist.github.com/scyto/64e79a694b286d3b70f8b3663d19eb76

not linking to this in my gists, but thought folks might be interested, i can share the chatgpt logs of how i got here, but its long and starts with broadcast storm issue (after trying a bridging solution to allow VMs to bridge to the thunderbolt network and is several hours of troubleshooting very very broken ceph clusters, times when i ignored its intructions because etc) if anyon thinks that would be interesting i can link to that too

this is an FYI as i just thought it was incredibly interesting how chatgpt let me try many different mesh network configurations, gave me wrong answers some times, but ultimately helped me in the back and forth

-edit-

shit i asked it to summarize what the setup was when we started before migration and how to make it,

and it gave me this! straight away https://gist.github.com/scyto/bdd5381fe9170ec10009cddf8687446b - not sure why it insists this IS-IS when its openfrabric, but whatever, i can edit that, the rest is right

--edit2--
so now i am using it for options on how to connect VMs to the ceph mesh, it remebered from hours ago that bridging doesn't work with thunderbolt (at least it doesn't for me and thats what i told it)

now it offers to summarize what to do AND because i have twice asked for gist.md format asks me if i want it in that, i am beyond impressed

@scyto
Copy link
Author

scyto commented Apr 27, 2025

image

it is a bit too fucking chipper mind you

@scyto
Copy link
Author

scyto commented Apr 27, 2025

i have been doing this nearly 10 hrs straight....

i now have a fully routed mesh network - VMs can access the ceph mesh network, anything anywhere on my lan can access the mesh network - i have tested with shh and ping, ceph next..... going to bed now.... oh and so far i see no evidence i need the frr restart scripts either.... but no promises.... but it now seems to all work as it should.... will publish a v3 setup in the next few days.... no complex SDN stuff needed....

@ronindesign
Copy link

Success -- very nice! Can't wait to see the results, well done! Will be great to be able to bridge for VM access.

@eidgenosse
Copy link

A first rough test with an MS-01 shows that the reboot problem is fixed with the script. Thanks a lot for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment