This builds upon excellent foundational work by @scyto.
- Original TB4 research from @scyto: https://gist.github.com/scyto/76e94832927a89d977ea989da157e9dc
- My Original PVE 9 Writeup: https://gist.github.com/taslabs-net/9f6e06ab32833864678a4acbb6dc9131
Key contributions from @scyto's work:
- TB4 hardware detection and kernel module strategies
- Systemd networking and udev automation techniques
- MTU optimization and performance tuning approaches
This guide provides a step-by-step, tested (lightly) for building a high-performance Thunderbolt 4 + Ceph cluster on Proxmox VE 9 beta.
Lab Results:
- TB4 Mesh Performance: Sub-millisecond latency, 65520 MTU, full mesh connectivity
- Ceph Performance: 1,300+ MB/s write, 1,760+ MB/s read with optimizations
- Reliability: 0% packet loss, automatic failover, persistent configuration
- Integration: Full Proxmox GUI visibility and management
Hardware Environment:
- Nodes: 3x systems with dual TB4 ports (tested on MS01 mini-PCs)
- Memory: 64GB RAM per node (optimal for high-performance Ceph)
- CPU: 13th Gen Intel (or equivalent high-performance processors)
- Storage: NVMe drives for Ceph OSDs
- Network: TB4 mesh (10.100.0.0/24) + management (10.11.12.0/24)
Software Stack:
- Proxmox VE: 9.0 beta with native SDN OpenFabric support
- Ceph: Nautilus with BlueStore, LZ4 compression, 2:1 replication
- OpenFabric: IPv4-only mesh routing for simplicity and performance
- 3 nodes minimum: Each with dual TB4 ports (tested with MS01 mini-PCs)
- TB4 cables: Quality TB4 cables for mesh connectivity
- Ring topology: Physical connections n2→n3→n4→n2 (or similar mesh pattern)
- Management network: Standard Ethernet for initial setup and management
- Proxmox VE 9.0 beta (test repository)
- SSH root access to all nodes
- Basic Linux networking knowledge
- Patience: TB4 mesh setup requires careful attention to detail!
- Management network: 10.11.12.0/24 (adjust to your environment)
- TB4 cluster network: 10.100.0.0/24 (for Ceph cluster traffic)
- Router IDs: 10.100.0.12 (n2), 10.100.0.13 (n3), 10.100.0.14 (n4)
Critical: Perform these steps on ALL mesh nodes (n2, n3, n4).
Load TB4 kernel modules:
# Execute on each node:
for node in n2 n3 n4; do
ssh $node "echo 'thunderbolt' >> /etc/modules"
ssh $node "echo 'thunderbolt-net' >> /etc/modules"
ssh $node "modprobe thunderbolt && modprobe thunderbolt-net"
done
Verify modules loaded:
for node in n2 n3 n4; do
echo "=== TB4 modules on $node ==="
ssh $node "lsmod | grep thunderbolt"
done
Expected output: Both thunderbolt
and thunderbolt_net
modules present.
Find TB4 controllers and interfaces:
for node in n2 n3 n4; do
echo "=== TB4 hardware on $node ==="
ssh $node "lspci | grep -i thunderbolt"
ssh $node "ip link show | grep -E '(en0[5-9]|thunderbolt)'"
done
Expected: TB4 PCI controllers detected, TB4 network interfaces visible.
Critical: Create interface renaming rules based on PCI paths for consistent naming.
For all nodes (n2, n3, n4):
# Create systemd link files for TB4 interface renaming:
for node in n2 n3 n4; do
ssh $node "cat > /etc/systemd/network/00-thunderbolt0.link << 'EOF'
[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
EOF"
ssh $node "cat > /etc/systemd/network/00-thunderbolt1.link << 'EOF'
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06
EOF"
done
Note: Adjust PCI paths if different on your hardware (check with lspci | grep -i thunderbolt
)
Add TB4 interfaces to network configuration with optimal settings:
# Configure TB4 interfaces on all nodes:
for node in n2 n3 n4; do
ssh $node "cat >> /etc/network/interfaces << 'EOF'
auto en05
iface en05 inet manual
mtu 65520
auto en06
iface en06 inet manual
mtu 65520
EOF"
done
Required for systemd link files to work:
# Enable and start systemd-networkd on all nodes:
for node in n2 n3 n4; do
ssh $node "systemctl enable systemd-networkd && systemctl start systemd-networkd"
done
Automation for reliable interface bringup on cable insertion:
Create udev rules:
for node in n2 n3 n4; do
ssh $node "cat > /etc/udev/rules.d/10-tb-en.rules << 'EOF'
ACTION==\"move\", SUBSYSTEM==\"net\", KERNEL==\"en05\", RUN+=\"/usr/local/bin/pve-en05.sh\"
ACTION==\"move\", SUBSYSTEM==\"net\", KERNEL==\"en06\", RUN+=\"/usr/local/bin/pve-en06.sh\"
EOF"
done
Create interface bringup scripts:
# Create en05 bringup script for all nodes:
for node in n2 n3 n4; do
ssh $node "cat > /usr/local/bin/pve-en05.sh << 'EOF'
#!/bin/bash
LOGFILE=\"/tmp/udev-debug.log\"
echo \"\$(date): en05 bringup triggered\" >> \"\$LOGFILE\"
for i in {1..5}; do
{
ip link set en05 up mtu 65520
echo \"\$(date): en05 up successful on attempt \$i\" >> \"\$LOGFILE\"
break
} || {
echo \"\$(date): Attempt \$i failed, retrying in 3 seconds...\" >> \"\$LOGFILE\"
sleep 3
}
done
EOF"
ssh $node "chmod +x /usr/local/bin/pve-en05.sh"
done
# Create en06 bringup script for all nodes:
for node in n2 n3 n4; do
ssh $node "cat > /usr/local/bin/pve-en06.sh << 'EOF'
#!/bin/bash
LOGFILE=\"/tmp/udev-debug.log\"
echo \"\$(date): en06 bringup triggered\" >> \"\$LOGFILE\"
for i in {1..5}; do
{
ip link set en06 up mtu 65520
echo \"\$(date): en06 up successful on attempt \$i\" >> \"\$LOGFILE\"
break
} || {
echo \"\$(date): Attempt \$i failed, retrying in 3 seconds...\" >> \"\$LOGFILE\"
sleep 3
}
done
EOF"
ssh $node "chmod +x /usr/local/bin/pve-en06.sh"
done
Apply all TB4 configuration changes:
# Update initramfs on all nodes:
for node in n2 n3 n4; do
ssh $node "update-initramfs -u -k all"
done
# Reboot all nodes to apply changes:
echo "Rebooting all nodes - wait for them to come back online..."
for node in n2 n3 n4; do
ssh $node "reboot"
done
# Wait and verify after reboot:
echo "Waiting 60 seconds for nodes to reboot..."
sleep 60
# Verify TB4 interfaces after reboot:
for node in n2 n3 n4; do
echo "=== TB4 interfaces on $node after reboot ==="
ssh $node "ip link show | grep -E '(en05|en06)'"
done
Expected result: TB4 interfaces should be named en05
and en06
with proper MTU settings.
Essential: TB4 mesh requires IPv4 forwarding for OpenFabric routing.
# Configure IPv4 forwarding on all nodes:
for node in n2 n3 n4; do
ssh $node "echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf"
ssh $node "sysctl -p"
done
Verify forwarding enabled:
for node in n2 n3 n4; do
echo "=== IPv4 forwarding on $node ==="
ssh $node "sysctl net.ipv4.ip_forward"
done
Expected: net.ipv4.ip_forward = 1
on all nodes.
Location: Datacenter → SDN → Fabrics
-
Click: "Add Fabric" → "OpenFabric"
-
Configure in the dialog:
- Name:
tb4
- IPv4 Prefix:
10.100.0.0/24
- IPv6 Prefix: (leave empty for IPv4-only)
- Hello Interval:
3
(default) - CSNP Interval:
10
(default)
- Name:
-
Click: "OK"
Expected result: You should see a fabric named tb4
with Protocol OpenFabric
and IPv4 10.100.0.0/24

Still in: Datacenter → SDN → Fabrics → (select tb4
fabric)
-
Click: "Add Node"
-
Configure for n2:
- Node:
n2
- IPv4:
10.100.0.12
- IPv6: (leave empty)
- Interfaces: Select
en05
anden06
from the interface list
- Node:
-
Click: "OK"
-
Repeat for n3: IPv4:
10.100.0.13
, interfaces:en05, en06
-
Repeat for n4: IPv4:
10.100.0.14
, interfaces:en05, en06
Expected result: You should see all 3 nodes listed under the fabric with their IPv4 addresses and interfaces (en05, en06
for each)
Important: You need to manually configure /30
point-to-point addresses on the en05
and en06
interfaces to create mesh connectivity. Example addressing scheme:
- n2: en05:
10.100.0.1/30
, en06:10.100.0.5/30
- n3: en05:
10.100.0.9/30
, en06:10.100.0.13/30
- n4: en05:
10.100.0.17/30
, en06:10.100.0.21/30
These /30
subnets allow each interface to connect to exactly one other interface in the mesh topology. Configure these addresses in the Proxmox network interface settings for each node.

Critical: This activates the mesh - nothing works until you apply!
In GUI: Datacenter → SDN → "Apply" (button in top toolbar)
Expected result: Status table shows all nodes with "OK" status like this:
SDN Node Status
localnet... n3 OK
localnet... n1 OK
localnet... n4 OK
localnet... n2 OK

Critical: OpenFabric routing requires FRR (Free Range Routing) to be running.
# Start and enable FRR on all mesh nodes:
for node in n2 n3 n4; do
ssh $node "systemctl start frr && systemctl enable frr"
done
Verify FRR is running:
for node in n2 n3 n4; do
echo "=== FRR status on $node ==="
ssh $node "systemctl status frr | grep Active"
done
Expected output:
=== FRR status on n2 ===
Active: active (running) since Mon 2025-01-27 20:15:23 EST; 2h ago
=== FRR status on n3 ===
Active: active (running) since Mon 2025-01-27 20:15:25 EST; 2h ago
=== FRR status on n4 ===
Active: active (running) since Mon 2025-01-27 20:15:27 EST; 2h ago
Command-line verification:
# Check SDN services on all nodes:
for node in n2 n3 n4; do
echo "=== SDN status on $node ==="
ssh $node "systemctl status frr | grep Active"
done
Expected output:
=== SDN status on n2 ===
Active: active (running) since Mon 2025-01-27 20:15:23 EST; 2h ago
=== SDN status on n3 ===
Active: active (running) since Mon 2025-01-27 20:15:25 EST; 2h ago
=== SDN status on n4 ===
Active: active (running) since Mon 2025-01-27 20:15:27 EST; 2h ago
Check TB4 interfaces are up with correct settings:
for node in n2 n3 n4; do
echo "=== TB4 interfaces on $node ==="
ssh $node "ip addr show | grep -E '(en05|en06|10\.100\.0\.)'"
done
Expected output example (n2):
=== TB4 interfaces on n2 ===
inet 10.100.0.12/32 scope global dummy_tb4
11: en05: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UP group default qlen 1000
inet 10.100.0.1/30 scope global en05
12: en06: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UP group default qlen 1000
inet 10.100.0.5/30 scope global en06
What this shows:
- Router ID address:
10.100.0.12/32
on dummy_tb4 interface - TB4 interfaces UP:
en05
anden06
withstate UP
- Jumbo frames:
mtu 65520
on both interfaces - Point-to-point addresses:
/30
subnets for mesh connectivity
Critical test: Verify full mesh communication works.
# Test router ID connectivity (should be sub-millisecond):
for target in 10.100.0.12 10.100.0.13 10.100.0.14; do
echo "=== Testing connectivity to $target ==="
ping -c 3 $target
done
Expected output:
=== Testing connectivity to 10.100.0.12 ===
PING 10.100.0.12 (10.100.0.12) 56(84) bytes of data.
64 bytes from 10.100.0.12: icmp_seq=1 ttl=64 time=0.618 ms
64 bytes from 10.100.0.12: icmp_seq=2 ttl=64 time=0.582 ms
64 bytes from 10.100.0.12: icmp_seq=3 ttl=64 time=0.595 ms
--- 10.100.0.12 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
=== Testing connectivity to 10.100.0.13 ===
PING 10.100.0.13 (10.100.0.13) 56(84) bytes of data.
64 bytes from 10.100.0.13: icmp_seq=1 ttl=64 time=0.634 ms
64 bytes from 10.100.0.13: icmp_seq=2 ttl=64 time=0.611 ms
64 bytes from 10.100.0.13: icmp_seq=3 ttl=64 time=0.598 ms
--- 10.100.0.13 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
What to look for:
- All pings succeed:
3 received, 0% packet loss
- Sub-millisecond latency:
time=0.6xx ms
(typical ~0.6ms) - No timeouts or errors: Should see response for every packet
If connectivity fails: TB4 interfaces may need manual bring-up after reboot:
# Bring up TB4 interfaces manually:
for node in n2 n3 n4; do
ssh $node "ip link set en05 up mtu 65520"
ssh $node "ip link set en06 up mtu 65520"
ssh $node "ifreload -a"
done
Test mesh latency and basic throughput:
# Test latency between router IDs:
for node in n2 n3 n4; do
echo "=== Latency test from $node ==="
ssh $node "ping -c 5 -i 0.2 10.100.0.12 | tail -1"
ssh $node "ping -c 5 -i 0.2 10.100.0.13 | tail -1"
ssh $node "ping -c 5 -i 0.2 10.100.0.14 | tail -1"
done
Expected: Round-trip times under 1ms consistently.
Install Ceph packages on all mesh nodes:
# Initialize Ceph on mesh nodes:
for node in n2 n3 n4; do
echo "=== Installing Ceph on $node ==="
ssh $node "pveceph install --repository test"
done
Essential: Proper directory structure and ownership:
# Create base Ceph directories with correct ownership:
for node in n2 n3 n4; do
ssh $node "mkdir -p /var/lib/ceph && chown ceph:ceph /var/lib/ceph"
ssh $node "mkdir -p /etc/ceph && chown ceph:ceph /etc/ceph"
done
CLI Approach:
# Create initial monitor on n2:
ssh n2 "pveceph mon create"
Expected output:
Monitor daemon started successfully on node n2.
Created new cluster with fsid: 12345678-1234-5678-9abc-123456789abc
GUI Approach:
- Location: n2 node → Ceph → Monitor → "Create"
- Result: Should show green "Monitor created successfully" message
Verify monitor creation:
ssh n2 "ceph -s"
Expected output:
cluster:
id: 12345678-1234-5678-9abc-123456789abc
health: HEALTH_OK
services:
mon: 1 daemons, quorum n2 (age 2m)
mgr: n2(active, since 1m)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
Set public and cluster networks for optimal TB4 performance:
# Configure Ceph networks:
ssh n2 "ceph config set global public_network 10.11.12.0/24"
ssh n2 "ceph config set global cluster_network 10.100.0.0/24"
# Configure monitor networks:
ssh n2 "ceph config set mon public_network 10.11.12.0/24"
ssh n2 "ceph config set mon cluster_network 10.100.0.0/24"
Create 3-monitor quorum on mesh nodes:
CLI Approach:
# Create monitor on n3:
ssh n3 "pveceph mon create"
# Create monitor on n4:
ssh n4 "pveceph mon create"
Expected output (for each):
Monitor daemon started successfully on node n3.
Monitor daemon started successfully on node n4.
GUI Approach:
- n3: n3 node → Ceph → Monitor → "Create"
- n4: n4 node → Ceph → Monitor → "Create"
- Result: Green success messages on both nodes
Verify 3-monitor quorum:
ssh n2 "ceph quorum_status"
Expected output:
{
"election_epoch": 3,
"quorum": [
0,
1,
2
],
"quorum_names": [
"n2",
"n3",
"n4"
],
"quorum_leader_name": "n2",
"quorum_age": 127,
"monmap": {
"epoch": 3,
"fsid": "12345678-1234-5678-9abc-123456789abc",
"modified": "2025-01-27T20:15:42.123456Z",
"created": "2025-01-27T20:10:15.789012Z",
"min_mon_release_name": "reef",
"mons": [
{
"rank": 0,
"name": "n2",
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.11.12.12:3300"
}
]
}
}
]
}
}
What to verify:
- 3 monitors in quorum:
"quorum_names": ["n2", "n3", "n4"]
- All nodes listed: Should see all 3 mesh nodes
- Leader elected:
"quorum_leader_name"
should show one of the nodes
Create high-performance OSDs on NVMe drives:
CLI Approach:
# Create OSDs on n2:
ssh n2 "pveceph osd create /dev/nvme0n1"
ssh n2 "pveceph osd create /dev/nvme1n1"
# Create OSDs on n3:
ssh n3 "pveceph osd create /dev/nvme0n1"
ssh n3 "pveceph osd create /dev/nvme1n1"
# Create OSDs on n4:
ssh n4 "pveceph osd create /dev/nvme0n1"
ssh n4 "pveceph osd create /dev/nvme1n1"
Expected output (for each OSD):
Creating OSD on /dev/nvme0n1
OSD.0 created successfully.
OSD daemon started.
GUI Approach:
- Location: Each node → Ceph → OSD → "Create: OSD"
- Select: Choose
/dev/nvme0n1
and/dev/nvme1n1
from device list - Advanced: Leave DB/WAL settings as default (co-located)
- Result: Green "OSD created successfully" messages
Verify all OSDs are up:
ssh n2 "ceph osd tree"
Expected output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 5.45776 root default
-3 1.81959 host n2
0 ssd 0.90979 osd.0 up 1.00000 1.00000
1 ssd 0.90979 osd.1 up 1.00000 1.00000
-5 1.81959 host n3
2 ssd 0.90979 osd.2 up 1.00000 1.00000
3 ssd 0.90979 osd.3 up 1.00000 1.00000
-7 1.81959 host n4
4 ssd 0.90979 osd.4 up 1.00000 1.00000
5 ssd 0.90979 osd.5 up 1.00000 1.00000
What to verify:
- 6 OSDs total: 2 per mesh node (osd.0-5)
- All 'up' status: Every OSD shows
up
in STATUS column - Weight 1.00000: All OSDs have full weight (not being rebalanced out)
- Hosts organized: Each node (n2, n3, n4) shows as separate host with 2 OSDs
Configure optimal memory usage for high-performance hardware:
# Set OSD memory target to 8GB per OSD (ideal for 64GB nodes):
ssh n2 "ceph config set osd osd_memory_target 8589934592"
# Set BlueStore cache sizes for NVMe performance:
ssh n2 "ceph config set osd bluestore_cache_size_ssd 4294967296"
# Set memory allocation optimizations:
ssh n2 "ceph config set osd osd_memory_cache_min 1073741824"
ssh n2 "ceph config set osd osd_memory_cache_resize_interval 1"
Optimize for high-performance CPUs:
# Set CPU threading optimizations:
ssh n2 "ceph config set osd osd_op_num_threads_per_shard 2"
ssh n2 "ceph config set osd osd_op_num_shards 8"
# Set BlueStore threading for NVMe:
ssh n2 "ceph config set osd bluestore_sync_submit_transaction false"
ssh n2 "ceph config set osd bluestore_throttle_bytes 268435456"
ssh n2 "ceph config set osd bluestore_throttle_deferred_bytes 134217728"
# Set CPU-specific optimizations:
ssh n2 "ceph config set osd osd_client_message_cap 1000"
ssh n2 "ceph config set osd osd_client_message_size_cap 1073741824"
Optimize network settings for TB4 high-performance cluster communication:
# Set network optimizations for TB4 mesh (65520 MTU, sub-ms latency):
ssh n2 "ceph config set global ms_tcp_nodelay true"
ssh n2 "ceph config set global ms_tcp_rcvbuf 134217728"
ssh n2 "ceph config set global ms_tcp_prefetch_max_size 65536"
# Set cluster network optimizations for 10.100.0.0/24 TB4 mesh:
ssh n2 "ceph config set global ms_cluster_mode crc"
ssh n2 "ceph config set global ms_async_op_threads 8"
ssh n2 "ceph config set global ms_dispatch_throttle_bytes 1073741824"
# Set heartbeat optimizations for fast TB4 network:
ssh n2 "ceph config set osd osd_heartbeat_interval 6"
ssh n2 "ceph config set osd osd_heartbeat_grace 20"
Configure BlueStore for maximum NVMe and TB4 performance:
# Set BlueStore optimizations for NVMe drives:
ssh n2 "ceph config set osd bluestore_compression_algorithm lz4"
ssh n2 "ceph config set osd bluestore_compression_mode aggressive"
ssh n2 "ceph config set osd bluestore_compression_required_ratio 0.7"
# Set NVMe-specific optimizations:
ssh n2 "ceph config set osd bluestore_cache_trim_interval 200"
# Set WAL and DB optimizations for NVMe:
ssh n2 "ceph config set osd bluestore_block_db_size 5368709120"
ssh n2 "ceph config set osd bluestore_block_wal_size 1073741824"
Configure scrubbing for high-performance environment:
# Set scrubbing optimizations:
ssh n2 "ceph config set osd osd_scrub_during_recovery false"
ssh n2 "ceph config set osd osd_scrub_begin_hour 2"
ssh n2 "ceph config set osd osd_scrub_end_hour 6"
# Set deep scrub optimizations:
ssh n2 "ceph config set osd osd_deep_scrub_interval 1209600"
ssh n2 "ceph config set osd osd_scrub_max_interval 1209600"
ssh n2 "ceph config set osd osd_scrub_min_interval 86400"
# Set recovery optimizations for TB4 mesh:
ssh n2 "ceph config set osd osd_recovery_max_active 8"
ssh n2 "ceph config set osd osd_max_backfills 4"
ssh n2 "ceph config set osd osd_recovery_op_priority 1"
Create optimized storage pool with 2:1 replication ratio:
# Create pool with optimal PG count for 6 OSDs (256 PGs = ~85 PGs per OSD):
ssh n2 "ceph osd pool create cephtb4 256 256"
# Set 2:1 replication ratio (size=2, min_size=1) for test lab:
ssh n2 "ceph osd pool set cephtb4 size 2"
ssh n2 "ceph osd pool set cephtb4 min_size 1"
# Enable RBD application for Proxmox integration:
ssh n2 "ceph osd pool application enable cephtb4 rbd"
Check that cluster is healthy and ready:
ssh n2 "ceph -s"
Expected results:
- Health: HEALTH_OK (or HEALTH_WARN with minor warnings)
- OSDs: 6 osds: 6 up, 6 in
- PGs: All PGs active+clean
- Pools: cephtb4 pool created and ready
Run comprehensive performance testing to validate optimizations:
# Test write performance with optimized cluster:
ssh n2 "rados -p cephtb4 bench 10 write --no-cleanup -b 4M -t 16"
# Test read performance:
ssh n2 "rados -p cephtb4 bench 10 rand -t 16"
# Clean up test data:
ssh n2 "rados -p cephtb4 cleanup"
Results
Write Performance:
- Average Bandwidth: 1,294 MB/s
- Peak Bandwidth: 2,076 MB/s
- Average IOPS: 323
- Average Latency: ~48ms
Read Performance:
- Average Bandwidth: 1,762 MB/s
- Peak Bandwidth: 2,448 MB/s
- Average IOPS: 440
- Average Latency: ~36ms
Check that all optimizations are active in Proxmox GUI:
- Navigate: Ceph → Configuration Database
- Verify: All optimization settings visible and applied
- Check: No configuration errors or warnings
Key optimizations to verify:
osd_memory_target: 8589934592
(8GB per OSD)bluestore_cache_size_ssd: 4294967296
(4GB cache)bluestore_compression_algorithm: lz4
cluster_network: 10.100.0.0/24
(TB4 mesh)public_network: 10.11.12.0/24
Problem: TB4 interfaces not coming up after reboot
# Solution: Manually bring up interfaces and reapply SDN config:
for node in n2 n3 n4; do
ssh $node "ip link set en05 up mtu 65520"
ssh $node "ip link set en06 up mtu 65520"
ssh $node "ifreload -a"
done
Problem: Mesh connectivity fails between some nodes
# Check interface status:
for node in n2 n3 n4; do
echo "=== $node TB4 status ==="
ssh $node "ip addr show | grep -E '(en05|en06|10\.100\.0\.)'"
done
# Verify FRR routing service:
for node in n2 n3 n4; do
ssh $node "systemctl status frr"
done
Problem: OSDs going down after creation
- Root Cause: Usually network connectivity issues (TB4 mesh not working)
- Solution: Fix TB4 mesh first, then restart OSD services:
# Restart OSD services after fixing mesh:
for node in n2 n3 n4; do
ssh $node "systemctl restart ceph-osd@*.service"
done
Problem: Inactive PGs or slow performance
# Check cluster status:
ssh n2 "ceph -s"
# Verify optimizations are applied:
ssh n2 "ceph config dump | grep -E '(memory_target|cache_size|compression)'"
# Check network binding:
ssh n2 "ceph config get osd cluster_network"
ssh n2 "ceph config get osd public_network"
Problem: Proxmox GUI doesn't show OSDs
- Root Cause: Usually config database synchronization issues
- Solution: Restart Ceph monitor services and check GUI again
For even better performance on high-end hardware:
# Apply on all mesh nodes:
for node in n2 n3 n4; do
ssh $node "
# Network tuning:
echo 'net.core.rmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 268435456' >> /etc/sysctl.conf
echo 'net.core.netdev_max_backlog = 30000' >> /etc/sysctl.conf
# Memory tuning:
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
echo 'vm.min_free_kbytes = 4194304' >> /etc/sysctl.conf
# Apply settings:
sysctl -p
"
done