Skip to content

Instantly share code, notes, and snippets.

@trickkiste
Created May 20, 2026 20:14
Show Gist options
  • Select an option

  • Save trickkiste/830172c5ce241d66cb1b40466a8629b1 to your computer and use it in GitHub Desktop.

Select an option

Save trickkiste/830172c5ce241d66cb1b40466a8629b1 to your computer and use it in GitHub Desktop.
siderolabs/talos#12619 — full forensic evidence bundle. DL320e Gen8 v2 / Intel VT-d / IOMMU PTE fault during v1.13.2 install. See README.md for the bundle guide.

siderolabs/talos#12619 — focused evidence bundle

Distilled from the full forensic capture at bug-12619-forensics/2026-05-{19,20}-* (~6 MB of raw kmsg + machined logs across multiple boot/install/wipe cycles). Each file in this directory tells one specific part of the story; together they paint the full picture without drowning a reviewer in repetitive ATA-reset cycles or routine boot kmsg.

TL;DR

The v1.13.2 install on HP ProLiant DL320e Gen8 v2 fails not in Talos's installer code path but in the kernel's IOMMU subsystem rejecting DMA writes from the AHCI SATA controller due to a firmware bug in the DL320e Gen8 v2's VT-d implementation (BIOS P80 03/28/2014). The v1.13.x kernel auto-enables Intel VT-d when CONFIG_INTEL_IOMMU is =y and CONFIG_INTEL_IOMMU_DEFAULT_ON=y; v1.11.6's older kernel did not on this hardware vintage. Workaround: intel_iommu=off iommu=off on the kernel cmdline.

File guide

File What it shows Why it matters
reproduction-matrix.md 6-row test matrix, hardware constants ruled out across machines + disks + bays + schematics, regression boundary at the kernel version Sets the boundary for triage
case-A-failing-install-DMAR-cascade.log Decoded kmsg, install start → format BOOT → DMAR fault → ATA host bus error → cascade → emergency cleanup failure Smoking gun for the failure mode
case-B-success-install-with-iommu-off.log Decoded kmsg, same Talos installer code path, but intel_iommu=off iommu=off on the maintenance kernel cmdline Zero DMAR/ATA faults; install completes cleanly in ~24 s
case-C-disk-boot-post-success.log Decoded kmsg of the first cold disk-boot after the successful install DMAR: IOMMU disabled line proves the workaround persists into the installed system
case-D-machined-installer-progress-iommu-fault.log JSON-lines machined service-log view of the same failing install as Case A Complementary channel — captures sequencer/controller events that don't go through /dev/kmsg
case-E-machined-installer-progress-success.log Same channel, successful install Side-by-side with Case D
disk-state-after-failed-install.txt parted + MBR/bios_grub dd dump of /dev/sda on hp5 after a failed install Disk-side picture: GPT and BOOT staged, MBR all-zero, bios_grub all-zero — installer aborted before writing the bootloader
kernel-cmdlines.txt Side-by-side cmdline strings — failing maintenance, successful maintenance, installed system, vanilla v1.13.2 reference Shows exactly which knob produces which behaviour

Capture pipeline (for the curious)

The original report had a forensic gap: iLO4 SOL on the DL320e relays only POST + iPXE phases, and Talos's kernel does not ship CONFIG_NETCONSOLE (zero references in the Talos source tree). The working capture path is Talos-native:

  1. kmsg shipper — kernel cmdline talos.logging.kernel=tcp://<host>:<port> activates KmsgLogDeliveryController which ships /dev/kmsg as JSON-lines over TCP. The sequencer-driven installer writes its container stdout/stderr directly to /dev/kmsg via internal/pkg/install/install.go:211-218 (os.OpenFile("/dev/kmsg", …) → kmsg.Writer → cio.WithStreams(r, w, w)), so every line the installer prints lands in this stream.
  2. machined service-log shipper.machine.logging.destinations[] in the machine config sets a parallel TCP endpoint for machined's own zap-logger output. Activated immediately by apply-config, no reboot needed.

Listeners on the boot server:

socat -u TCP-LISTEN:6666,reuseaddr,fork - >> kmsg.log
socat -u TCP-LISTEN:6667,reuseaddr,fork - >> machined.log

Both channels JSON-lines. The case-* files above are decoded extracts from these raw streams. Full unfiltered captures (3-5 MB each) at bug-12619-forensics/2026-05-20-decisive/kmsg-shipper-*.log and machined-shipper-*.log.

Suggested next steps for siderolabs/talos

  1. Loud installer failure when the kernel logs DMAR / host-bus errors during the install task. Today the installer exits 0 even when every DMA write failed. A sync + re-read of MBR sector 0 (must not be all-zero) after the bootloader install step, or a dmesg scan for DMAR:, host bus error, I/O error, dev sd patterns scoped to the install task's wall-clock window, would catch this and save the next victim hours.
  2. Validation message for install.extraKernelArgs + install.grubUseUKICmdline: true could include a pointer to the Image Factory's customization.extraKernelArgs — that's where you put kernel args when the UKI cmdline path is active. The current message is correct but doesn't tell the user where to go.
  3. Doc callout in the metal-install / boot-asset guide about kernel IOMMU defaults and the intel_iommu=off escape hatch for older HP/Dell/Lenovo hardware with broken VT-d firmware.
# Case A: hp4 v1.13.2 install attempt WITHOUT intel_iommu=off
# Source: bug-12619-forensics/2026-05-20-decisive/kmsg-shipper-iommu-fault.log (3,243 raw kmsg events)
# Captured via talos.logging.kernel=tcp://10.133.0.66:6666 (Talos's native kmsg shipper)
# Filtered for narrative: each ATA/DMAR error pattern appears once; full installer-progress kept.
# Total raw kmsg in this window: install start → emergency-cleanup failure
# Total occurrence counts across the captured run:
# DMAR.*fault : 14
# host bus error : 25
# SError: { HostInt } : 12
# WRITE FPDMA QUEUED : 12
# hard resetting link : 12
# ata1: EH complete : 12
# configured for UDMA : 12
# SATA link up : 12
# XFS \(sd : 10
# I/O error, dev sd : 3
# ----- timeline -----
2026-05-20T13:53:32.855 user /warning : 2026/05/20 13:53:33 running Talos installer v1.13.2
2026-05-20T13:53:32.966 user /warning : 2026/05/20 13:53:33 writing /boot/grub/grub.cfg to disk
2026-05-20T13:53:45.816 user /warning : 2026/05/20 13:53:46 formatting the partition "/dev/sda1" as "vfat" with label "EFI"
2026-05-20T13:53:45.855 user /warning : 2026/05/20 13:53:46 formatting the partition "/dev/sda2" as "zeroes" with label "BIOS"
2026-05-20T13:53:45.872 user /warning : 2026/05/20 13:53:46 formatting the partition "/dev/sda3" as "xfs" with label "BOOT"
2026-05-20T13:53:45.872 user /warning : 2026/05/20 13:53:46 creating xfs filesystem on /dev/sda3 with args: [-n ftype=1 -c options=/usr/share/xfsprogs/mkfs/lts_6.18.conf -f -L BOOT -p file=/proc/self/fd/6 /dev/sda3]
2026-05-20T13:53:48.344 user /warning : 2026/05/20 13:53:48 formatting the partition "/dev/sda4" as "zeroes" with label "META"
2026-05-20T13:53:48.381 kern /notice : XFS (sda3): Mounting V5 Filesystem 5367e674-a51a-45f4-bb6d-7bdff37dffae
2026-05-20T13:53:48.385 kern /err : DMAR: DRHD: handling fault status reg 3
2026-05-20T13:53:48.453 kern /err : ata1.00: irq_stat 0x20000000, host bus error
2026-05-20T13:53:48.453 kern /err : ata1: SError: { HostInt }
2026-05-20T13:53:48.453 kern /err : ata1.00: failed command: WRITE FPDMA QUEUED
2026-05-20T13:53:48.454 kern /info : ata1: hard resetting link
2026-05-20T13:53:48.072 kern /info : ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
2026-05-20T13:53:48.103 kern /info : ata1.00: configured for UDMA/133
2026-05-20T13:53:48.125 kern /info : ata1: EH complete
2026-05-20T13:53:50.784 kern /err : I/O error, dev sda, sector 2256954 op 0x1:(WRITE) flags 0x1800 phys_seg 32 prio class 2
2026-05-20T13:53:49.838 user /warning : [talos] task install (1/1): failed: task "upgrade" failed: exit code 1
2026-05-20T13:53:49.839 user /warning : [talos] phase install (2/8): failed
2026-05-20T13:53:49.927 user /warning : [talos] emergencyVolumeCleanup sequence: 4 phase(s)
2026-05-20T13:53:49.930 user /warning : [talos] phase volumeFinalize (4/4): 1 tasks(s)
2026-05-20T13:53:49.930 user /warning : [talos] task teardownLifecycle (1/1): starting
2026-05-20T13:54:19.927 user /warning : [talos] task teardownLifecycle (1/1): failed: context deadline exceeded
2026-05-20T13:54:19.927 user /warning : [talos] phase volumeFinalize (4/4): failed
2026-05-20T13:54:19.927 user /warning : [talos] emergencyVolumeCleanup sequence: failed
2026-05-20T13:54:19.927 user /warning : [talos] WARNING: emergency volume cleanup failed: error running phase 4 in emergencyVolumeCleanup sequence: task 1/1: failed, context deadline exceeded
# Case B: hp4 v1.13.2 install attempt WITH intel_iommu=off iommu=off on maintenance kernel cmdline
# Source: bug-12619-forensics/2026-05-20-decisive/kmsg-shipper-success.log (3,683 raw kmsg events)
# Same forensic capture pipeline as Case A — talos.logging.kernel=tcp://10.133.0.66:6666
# Counters across the install window (compare to Case A: 14 / 25 / 3):
# DMA Read fault (PTE corrupt) : 0 ← Case A's pathology, here ZERO
# ATA host bus error : 0
# I/O error, dev sd* : 0
# [INTR-REMAP] (NIC, unrelated) : 1 ← interrupt-remap rejection on a bnx2x NIC,
# single non-fatal event, does not affect install
# ----- timeline -----
2026-05-20T14:45:22.400 user /warning : 2026/05/20 14:45:22 running Talos installer v1.13.2
2026-05-20T14:45:22.512 user /warning : 2026/05/20 14:45:22 writing /boot/grub/grub.cfg to disk
2026-05-20T14:45:35.456 user /warning : 2026/05/20 14:45:35 formatting the partition "/dev/sda1" as "vfat" with label "EFI"
2026-05-20T14:45:35.495 user /warning : 2026/05/20 14:45:35 formatting the partition "/dev/sda2" as "zeroes" with label "BIOS"
2026-05-20T14:45:35.520 user /warning : 2026/05/20 14:45:35 formatting the partition "/dev/sda3" as "xfs" with label "BOOT"
2026-05-20T14:45:35.520 user /warning : 2026/05/20 14:45:35 creating xfs filesystem on /dev/sda3 with args: [-n ftype=1 -c options=/usr/share/xfsprogs/mkfs/lts_6.18.conf -f -L BOOT -p file=/proc/self/fd/6 /dev/sda3]
2026-05-20T14:45:36.912 user /warning : 2026/05/20 14:45:37 formatting the partition "/dev/sda4" as "zeroes" with label "META"
2026-05-20T14:45:37.440 user /warning : 2026/05/20 14:45:38 META: loading from /dev/sda4
2026-05-20T14:45:37.446 user /warning : 2026/05/20 14:45:38 META: loaded 0 keys
2026-05-20T14:45:37.447 user /warning : 2026/05/20 14:45:38 META: saving to /dev/sda4
2026-05-20T14:45:37.448 user /warning : 2026/05/20 14:45:38 META: saved 0 keys
2026-05-20T14:45:37.449 user /warning : 2026/05/20 14:45:38 installation of v1.13.2 complete
2026-05-20T14:45:37.504 user /warning : [talos] task install (1/1): install successful
2026-05-20T14:45:37.504 user /warning : [talos] task install (1/1): waiting for the image cache copy
2026-05-20T14:45:37.505 user /warning : [talos] task install (1/1): done, 27.527444271s
# Total Talos install duration (start → done) — extracted from `task install` timing line above.
# Case C: hp4 first cold disk-boot after the successful install (Case B)
# Source: bug-12619-forensics/2026-05-20-decisive/kmsg-shipper-success.log
# Significance: proves the iommu workaround persists into the installed system —
# the iommu args are baked into the disk's grub.cfg via machine.install.extraKernelArgs
# (legacy path) OR will be in the UKI's .cmdline section once we migrate to the new
# Factory schematic (modern path; see follow-up Case F when that test completes).
# Counters in the disk-boot window:
# DMA Read fault (PTE) : 0
# ATA host bus error : 0
# ----- timeline -----
2026-05-20T14:46:08.720 kern /info : Command line: talos.platform=metal talos.config=none console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on selinux=1 module.sig_enforce=1 proc_mem.force_override=never intel_iommu=off iommu=off tal...
2026-05-20T14:46:08.725 kern /info : DMI: HP ProLiant DL320e Gen8 v2, BIOS P80 03/28/2014
2026-05-20T14:46:08.767 kern /notice : Kernel command line: talos.platform=metal talos.config=none console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on selinux=1 module.sig_enforce=1 proc_mem.force_override=never intel_iommu=off iommu=...
2026-05-20T14:46:08.767 kern /info : DMAR: IOMMU disabled
2026-05-20T14:46:08.850 kern /info : DMAR: Host address width 39
2026-05-20T14:46:08.850 kern /info : DMAR: DRHD base: 0x000000fed91000 flags: 0x1
2026-05-20T14:46:08.850 kern /info : DMAR: dmar0: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
2026-05-20T14:46:08.850 kern /info : DMAR: RMRR base: 0x000000edffd000 end: 0x000000edffffff
2026-05-20T14:46:08.850 kern /info : DMAR: RMRR base: 0x000000edff6000 end: 0x000000edffcfff
2026-05-20T14:46:08.850 kern /info : DMAR: RMRR base: 0x000000edf93000 end: 0x000000edf94fff
2026-05-20T14:46:08.850 kern /info : DMAR: RMRR base: 0x000000edf8f000 end: 0x000000edf92fff
2026-05-20T14:46:08.850 kern /info : DMAR: RMRR base: 0x000000edf7f000 end: 0x000000edf8efff
2026-05-20T14:46:08.850 kern /info : DMAR: RMRR base: 0x000000edf7e000 end: 0x000000edf7efff
2026-05-20T14:46:08.851 kern /info : DMAR: RMRR base: 0x000000000f4000 end: 0x000000000f4fff
2026-05-20T14:46:08.851 kern /info : DMAR: RMRR base: 0x000000000e8000 end: 0x000000000e8fff
2026-05-20T14:46:08.851 kern /err : DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000000e8000-0x00000000000e8fff], contact BIOS vendor for fixes
2026-05-20T14:46:08.851 kern /warning : DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000000e8000-0x00000000000e8fff]
BIOS vendor: HP; Ver: P80; Product Version:
2026-05-20T14:46:08.851 kern /info : DMAR: RMRR base: 0x000000eddee000 end: 0x000000eddeefff
2026-05-20T14:46:08.852 kern /err : DMAR: DRHD: handling fault status reg 2
2026-05-20T14:46:08.852 kern /err : DMAR: [INTR-REMAP] Request device [01:00.0] fault index 0x1a [fault reason 0x26] Blocked an interrupt request due to source-id verification failure
# Case D: machined service-level log view of the install attempt without iommu=off
# Source: bug-12619-forensics/2026-05-20-decisive/machined-shipper-iommu-fault.log
# Captured via .machine.logging.destinations[].endpoint=tcp://10.133.0.66:6667
# Format: JSON-lines from machined's zap logger.
# Field schema: msg, talos-time, talos-level, talos-service, component, controller
# Shows the lifecycle/sequencer/controller view of the same failure Case A captures in kmsg.
{"msg":"kern: info: [2026-05-20T13:52:26.019103232Z]: DMAR: Host address width 39","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.763853671Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.019236164Z]: DMAR: DRHD base: 0x000000fed91000 flags: 0x1","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.763905726Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.019389696Z]: DMAR: dmar0: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.763960838Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.019582084Z]: DMAR: RMRR base: 0x000000edffd000 end: 0x000000edffffff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.76403405Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.019752872Z]: DMAR: RMRR base: 0x000000edff6000 end: 0x000000edffcfff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764093104Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.019910216Z]: DMAR: RMRR base: 0x000000edf93000 end: 0x000000edf94fff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764164461Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.020071352Z]: DMAR: RMRR base: 0x000000edf8f000 end: 0x000000edf92fff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764228637Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.020228806Z]: DMAR: RMRR base: 0x000000edf7f000 end: 0x000000edf8efff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764359858Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.020387336Z]: DMAR: RMRR base: 0x000000edf7e000 end: 0x000000edf7efff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764429083Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.020544774Z]: DMAR: RMRR base: 0x000000000f4000 end: 0x000000000f4fff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764592666Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.020713068Z]: DMAR: RMRR base: 0x000000000e8000 end: 0x000000000e8fff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764686921Z"}
{"msg":"kern: err: [2026-05-20T13:52:26.020870527Z]: DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000000e8000-0x00000000000e8fff], contact BIOS vendor for fixes","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764739475Z"}
{"msg":"kern: warning: [2026-05-20T13:52:26.021143397Z]: DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000000e8000-0x00000000000e8fff]","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.764843262Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.021440039Z]: DMAR: RMRR base: 0x000000eddee000 end: 0x000000eddeefff","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.765004988Z"}
{"msg":"2026/05/20 13:53:21.742954 [talos] phase install (2/8): 1 tasks(s)","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T13:53:21.773549918Z"}
{"msg":"2026/05/20 13:53:21.743066 [talos] task install (1/1): starting","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T13:53:21.773583165Z"}
{"msg":"2026/05/20 13:53:21.743289 [talos] task install (1/1): waiting for the image cache","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T13:53:21.773620365Z"}
{"msg":"2026/05/20 13:53:21.743547 [talos] task install (1/1): installing Talos to disk /dev/sda","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T13:53:21.773657696Z"}
{"component":"controller-runtime","controller":"block.VolumeConfigController","error":"error flushing meta: file does not exist","msg":"2026-05-20T13:53:21.723Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T13:53:21.988901911Z"}
{"component":"controller-runtime","controller":"network.RouteSpecController","error":"1 error occurred:\n\t* error removing route: netlink receive: no such process\n\n","msg":"2026-05-20T13:53:21.774Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T13:53:21.99493029Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.331442109Z]: DMAR: No ATSR found","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.998049861Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.33151777Z]: DMAR: No SATC found","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.99808675Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.331521172Z]: DMAR: dmar0: Using Queued invalidation","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:21.998105553Z"}
{"msg":"kern: info: [2026-05-20T13:52:26.336715945Z]: DMAR: Intel(R) Virtualization Technology for Directed I/O","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.000582239Z"}
{"msg":"kern: err: [2026-05-20T13:52:26.606981217Z]: DMAR: DRHD: handling fault status reg 2","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.010267704Z"}
{"msg":"kern: err: [2026-05-20T13:52:26.607062373Z]: DMAR: [INTR-REMAP] Request device [01:00.0] fault index 0x1a [fault reason 0x26] Blocked an interrupt request due to source-id verification failure","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.010294211Z"}
{"msg":"kern: info: [2026-05-20T13:52:28.396974451Z]: ata1.00: ATA-8: KINGSTON SV300S37A60G, 505ABBF0, max UDMA/133","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.014311058Z"}
{"msg":"kern: info: [2026-05-20T13:52:28.397431819Z]: ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 32), AA","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.01432836Z"}
{"msg":"kern: info: [2026-05-20T13:52:28.397734385Z]: ata1.00: Features: HIPM DIPM","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.014346522Z"}
{"msg":"kern: info: [2026-05-20T13:52:28.419398025Z]: ata1.00: configured for UDMA/133","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.014360557Z"}
{"msg":"user: warning: [2026-05-20T13:53:21.738764842Z]: [talos] phase install (2/8): 1 tasks(s)","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.018630106Z"}
{"msg":"user: warning: [2026-05-20T13:53:21.738889269Z]: [talos] task install (1/1): starting","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.01864447Z"}
{"msg":"user: warning: [2026-05-20T13:53:21.739092236Z]: [talos] task install (1/1): waiting for the image cache","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.018660344Z"}
{"msg":"user: warning: [2026-05-20T13:53:21.739351964Z]: [talos] task install (1/1): installing Talos to disk /dev/sda","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:22.018681341Z"}
{"msg":"user: warning: [2026-05-20T13:53:32.854978559Z]: 2026/05/20 13:53:33 running Talos installer v1.13.2","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:33.305071442Z"}
{"msg":"user: warning: [2026-05-20T13:53:32.966474816Z]: 2026/05/20 13:53:33 writing /boot/grub/grub.cfg to disk","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:33.360980618Z"}
{"msg":"user: warning: [2026-05-20T13:53:45.816310746Z]: 2026/05/20 13:53:46 formatting the partition \"/dev/sda1\" as \"vfat\" with label \"EFI\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:46.28496189Z"}
{"msg":"user: warning: [2026-05-20T13:53:45.855790724Z]: 2026/05/20 13:53:46 formatting the partition \"/dev/sda2\" as \"zeroes\" with label \"BIOS\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:46.30494964Z"}
{"msg":"user: warning: [2026-05-20T13:53:45.872522445Z]: 2026/05/20 13:53:46 formatting the partition \"/dev/sda3\" as \"xfs\" with label \"BOOT\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:46.312964197Z"}
{"msg":"user: warning: [2026-05-20T13:53:48.344496704Z]: 2026/05/20 13:53:48 formatting the partition \"/dev/sda4\" as \"zeroes\" with label \"META\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.548966642Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.385045766Z]: DMAR: DRHD: handling fault status reg 3","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.569030016Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.385128483Z]: DMAR: [DMA Read NO_PASID] Request device [00:1f.2] fault addr 0xff400000 [fault reason 0x0c] non-zero reserved fields in PTE","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.56904113Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.45330378Z]: ata1.00: exception Emask 0x60 SAct 0x1000 SErr 0x800 action 0x6 frozen","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.605008147Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.453425939Z]: ata1.00: irq_stat 0x20000000, host bus error","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.6050372Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.453564084Z]: ata1.00: failed command: WRITE FPDMA QUEUED","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.605058826Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.453642484Z]: ata1.00: cmd 61/00:60:3a:70:22/10:00:00:00:00/40 tag 12 ncq dma 2097152 ou","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.605067425Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.453859183Z]: ata1.00: status: { DRDY }","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.605079614Z"}
{"msg":"kern: info: [2026-05-20T13:53:48.103540188Z]: ata1.00: configured for UDMA/133","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.929107285Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.126306359Z]: DMAR: DRHD: handling fault status reg 3","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.941034552Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.126410754Z]: DMAR: [DMA Read NO_PASID] Request device [00:1f.2] fault addr 0xff200000 [fault reason 0x0c] non-zero reserved fields in PTE","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.941059477Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.197371016Z]: ata1.00: exception Emask 0x60 SAct 0x80000000 SErr 0x800 action 0x6 frozen","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.977071447Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.197517428Z]: ata1.00: irq_stat 0x20000000, host bus error","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.977196966Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.197728626Z]: ata1.00: failed command: WRITE FPDMA QUEUED","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.977330104Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.197838118Z]: ata1.00: cmd 61/00:f8:3a:70:22/10:00:00:00:00/40 tag 31 ncq dma 2097152 ou","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.977370707Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.198107461Z]: ata1.00: status: { DRDY }","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:48.977424271Z"}
{"msg":"kern: info: [2026-05-20T13:53:48.845410659Z]: ata1.00: configured for UDMA/133","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.301046092Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.864311056Z]: DMAR: DRHD: handling fault status reg 3","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.30938125Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.864467599Z]: DMAR: [DMA Read NO_PASID] Request device [00:1f.2] fault addr 0xff400000 [fault reason 0x0c] non-zero reserved fields in PTE","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.309425818Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.933411026Z]: ata1.00: exception Emask 0x60 SAct 0x400000 SErr 0x800 action 0x6 frozen","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.345120113Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.933616749Z]: ata1.00: irq_stat 0x20000000, host bus error","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.345207109Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.933875638Z]: ata1.00: failed command: WRITE FPDMA QUEUED","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.345294842Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.934015687Z]: ata1.00: cmd 61/00:b0:3a:70:22/10:00:00:00:00/40 tag 22 ncq dma 2097152 ou","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.34532432Z"}
{"msg":"kern: err: [2026-05-20T13:53:48.93437596Z]: ata1.00: status: { DRDY }","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.345388257Z"}
{"msg":"kern: info: [2026-05-20T13:53:49.583456142Z]: ata1.00: configured for UDMA/133","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.669153412Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.60624664Z]: DMAR: DRHD: handling fault status reg 3","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.681092684Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.669434516Z]: ata1.00: exception Emask 0x60 SAct 0x8 SErr 0x800 action 0x6 frozen","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.713224371Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.669610156Z]: ata1.00: irq_stat 0x20000000, host bus error","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.713318092Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.669842284Z]: ata1.00: failed command: WRITE FPDMA QUEUED","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.713442725Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.66995906Z]: ata1.00: cmd 61/00:18:3a:70:22/10:00:00:00:00/40 tag 3 ncq dma 2097152 ou","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.713591463Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.670256982Z]: ata1.00: status: { DRDY }","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:49.713668541Z"}
{"msg":"kern: info: [2026-05-20T13:53:49.325336323Z]: ata1.00: configured for UDMA/133","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.04110013Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.389439669Z]: ata1.00: exception Emask 0x60 SAct 0x40000 SErr 0x800 action 0x6 frozen","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.073219113Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.389623462Z]: ata1.00: irq_stat 0x20000000, host bus error","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.073342072Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.389865249Z]: ata1.00: failed command: WRITE FPDMA QUEUED","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.073430215Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.389986914Z]: ata1.00: cmd 61/00:90:3a:70:22/10:00:00:00:00/40 tag 18 ncq dma 2097152 ou","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.073468854Z"}
{"msg":"kern: err: [2026-05-20T13:53:49.390294696Z]: ata1.00: status: { DRDY }","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.073521345Z"}
{"msg":"kern: info: [2026-05-20T13:53:50.045233436Z]: ata1.00: configured for UDMA/133","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.401106538Z"}
{"msg":"kern: err: [2026-05-20T13:53:50.10943014Z]: ata1.00: exception Emask 0x60 SAct 0x80000 SErr 0x800 action 0x6 frozen","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.433163363Z"}
{"msg":"kern: err: [2026-05-20T13:53:50.109634792Z]: ata1.00: irq_stat 0x20000000, host bus error","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.43327493Z"}
{"msg":"kern: err: [2026-05-20T13:53:50.10989712Z]: ata1.00: failed command: WRITE FPDMA QUEUED","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T13:53:50.433382398Z"}
# 80 events shown.
# Case E: machined service-level log view of the success install
# Source: bug-12619-forensics/2026-05-20-decisive/machined-shipper-success.log
# Captured via .machine.logging.destinations[].endpoint=tcp://10.133.0.66:6667
# Format: JSON-lines from machined's zap logger; one event per kmsg/log emission.
# Field schema: msg, talos-time, talos-level, talos-service, component, controller
# (when present). Filtered for installer/lifecycle/block-mount relevance.
{"msg":"2026/05/20 14:45:10.517278 [talos] phase install (2/8): 1 tasks(s)","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:10.551976366Z"}
{"msg":"2026/05/20 14:45:10.517370 [talos] task install (1/1): starting","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:10.55203121Z"}
{"msg":"2026/05/20 14:45:10.517529 [talos] task install (1/1): waiting for the image cache","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:10.552044322Z"}
{"msg":"2026/05/20 14:45:10.517690 [talos] task install (1/1): installing Talos to disk /dev/sda","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:10.552056552Z"}
{"component":"controller-runtime","controller":"block.VolumeConfigController","error":"error flushing meta: file does not exist","msg":"2026-05-20T14:45:10.513Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:45:10.561453123Z"}
{"msg":"user: warning: [2026-05-20T14:45:10.453564684Z]: [talos] phase install (2/8): 1 tasks(s)","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:10.578071452Z"}
{"msg":"user: warning: [2026-05-20T14:45:10.453675823Z]: [talos] task install (1/1): starting","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:10.578085451Z"}
{"msg":"user: warning: [2026-05-20T14:45:10.453813057Z]: [talos] task install (1/1): waiting for the image cache","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:10.578095078Z"}
{"msg":"user: warning: [2026-05-20T14:45:10.453985572Z]: [talos] task install (1/1): installing Talos to disk /dev/sda","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:10.578108347Z"}
{"component":"controller-runtime","controller":"network.RouteSpecController","error":"1 error occurred:\n\t* error adding route: netlink receive: network is unreachable, message {Family:2 DstLength:0 SrcLength:0 Tos:0 Table:0 Protocol:3 Scope:0 Type:1 Flags:0 Attributes:{Dst:<nil> Src:10.133.0.71 Gateway:10.133.0.111 OutIface:8 Priority:1024 Table:254 Mark:0 Pref:<nil> Expires:<nil> Metrics:<nil> Multipath:[]}}\n\n","msg":"2026-05-20T14:45:10.611Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:45:10.611792404Z"}
{"msg":"user: warning: [2026-05-20T14:45:22.400446223Z]: 2026/05/20 14:45:22 running Talos installer v1.13.2","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:22.494483708Z"}
{"msg":"user: warning: [2026-05-20T14:45:22.512218233Z]: 2026/05/20 14:45:22 writing /boot/grub/grub.cfg to disk","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:22.550525309Z"}
{"msg":"user: warning: [2026-05-20T14:45:35.456138983Z]: 2026/05/20 14:45:35 formatting the partition \"/dev/sda1\" as \"vfat\" with label \"EFI\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:35.522488699Z"}
{"msg":"user: warning: [2026-05-20T14:45:35.495943577Z]: 2026/05/20 14:45:35 formatting the partition \"/dev/sda2\" as \"zeroes\" with label \"BIOS\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:35.54248548Z"}
{"msg":"user: warning: [2026-05-20T14:45:35.520264567Z]: 2026/05/20 14:45:35 formatting the partition \"/dev/sda3\" as \"xfs\" with label \"BOOT\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:35.554529879Z"}
{"msg":"user: warning: [2026-05-20T14:45:36.912864779Z]: 2026/05/20 14:45:37 formatting the partition \"/dev/sda4\" as \"zeroes\" with label \"META\"","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:37.750478244Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.440770534Z]: 2026/05/20 14:45:38 META: loading from /dev/sda4","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.014533024Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.446967157Z]: 2026/05/20 14:45:38 META: loaded 0 keys","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.018527955Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.447047193Z]: 2026/05/20 14:45:38 META: saving to /dev/sda4","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.018569414Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.448448062Z]: 2026/05/20 14:45:38 META: saved 0 keys","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.018592015Z"}
{"msg":"2026/05/20 14:45:38.044606 [talos] task install (1/1): install successful","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:38.044617056Z"}
{"msg":"2026/05/20 14:45:38.044711 [talos] task install (1/1): waiting for the image cache copy","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:38.044760253Z"}
{"msg":"2026/05/20 14:45:38.044814 [talos] task install (1/1): done, 27.527444271s","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:38.044821094Z"}
{"msg":"2026/05/20 14:45:38.044906 [talos] phase install (2/8): done, 27.527641054s","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:45:38.044933335Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.504855315Z]: [talos] task install (1/1): install successful","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.046528614Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.504948682Z]: [talos] task install (1/1): waiting for the image cache copy","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.046546386Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.505083084Z]: [talos] task install (1/1): done, 27.527444271s","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.046575119Z"}
{"msg":"user: warning: [2026-05-20T14:45:37.505181937Z]: [talos] phase install (2/8): done, 27.527641054s","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:45:38.046583254Z"}
{"msg":"2026/05/20 14:46:05.852053 [talos] META: saving to /dev/sda4","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:46:05.852141886Z"}
{"msg":"2026/05/20 14:46:05.853215 [talos] META: saved 1 keys","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:46:05.853246957Z"}
{"msg":"user: warning: [2026-05-20T14:46:05.120642314Z]: [talos] META: saving to /dev/sda4","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:46:05.854574442Z"}
{"msg":"user: warning: [2026-05-20T14:46:05.121818124Z]: [talos] META: saved 1 keys","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:46:05.854596527Z"}
{"msg":"2026/05/20 14:46:05.854775 [talos] task teardownLifecycle (1/1): starting","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:46:05.85479467Z"}
{"msg":"2026/05/20 14:46:05.856332 [talos] task teardownLifecycle (1/1): done, 1.544468ms","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:46:05.856378289Z"}
{"msg":"user: warning: [2026-05-20T14:46:05.127481978Z]: [talos] task teardownLifecycle (1/1): starting","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:46:05.858766617Z"}
{"msg":"user: warning: [2026-05-20T14:46:05.129046272Z]: [talos] task teardownLifecycle (1/1): done, 1.544468ms","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:46:05.858813723Z"}
{"msg":"2026/05/20 14:46:15.128194 [talos] META: loading from /dev/sda4","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:46:23.173936403Z"}
{"msg":"2026/05/20 14:46:15.134824 [talos] META: loaded 1 keys","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:46:23.17408004Z"}
{"component":"controller-runtime","controller":"network.DNSResolveCacheController","error":"error updating dns runner 'Network: udp, Addr: 169.254.116.108:53': error creating runner: error creating \"udp\" packet conn: listen udp4 169.254.116.108:53: bind: cannot assign requested address","msg":"2026-05-20T14:46:15.158Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:46:23.228016294Z"}
{"component":"controller-runtime","controller":"network.DNSResolveCacheController","error":"error updating dns runner 'Network: udp, Addr: 169.254.116.108:53': error creating runner: error creating \"udp\" packet conn: listen udp4 169.254.116.108:53: bind: cannot assign requested address","msg":"2026-05-20T14:46:15.486Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:46:23.239475389Z"}
{"msg":"user: warning: [2026-05-20T14:46:14.374939359Z]: [talos] META: loading from /dev/sda4","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:46:23.253854338Z"}
{"msg":"user: warning: [2026-05-20T14:46:14.38949987Z]: [talos] META: loaded 1 keys","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:46:23.253870915Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\terror getting node: nodes \"talos-avs-wvw\" is forbidden: User \"system:node:hp4\" cannot get resource \"nodes\" in API group \"\" at the cluster scope: node 'hp4' cannot read 'talos-avs-wvw', only its own Node object\n\terror getting node: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:46:41.069Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:46:41.069523741Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:46:51.286Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:46:51.28613345Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:47:01.993Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:47:01.993548643Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:47:12.679Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:47:12.67983845Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026/05/20 14:47:24.578868 [talos] controller failed","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:47:24.578927476Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:47:24.578Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:47:24.578893421Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"user: warning: [2026-05-20T14:47:24.171813951Z]: [talos] controller failed","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:47:24.580717585Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:47:37.048Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:47:37.048861765Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026/05/20 14:47:37.048865 [talos] controller failed","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:47:37.048887314Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"user: warning: [2026-05-20T14:47:36.110760733Z]: [talos] controller failed","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:47:37.049972946Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:47:51.607Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:47:51.607203215Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026/05/20 14:47:51.607212 [talos] controller failed","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:47:51.607269439Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"user: warning: [2026-05-20T14:47:51.227513541Z]: [talos] controller failed","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:47:51.608748516Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:48:05.254Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:48:05.254250651Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026/05/20 14:48:05.254235 [talos] controller failed","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:48:05.254281461Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"user: warning: [2026-05-20T14:48:04.522405366Z]: [talos] controller failed","talos-level":"info","talos-service":"kernel","talos-time":"2026-05-20T14:48:05.256680192Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026-05-20T14:48:26.855Z \u001b[31mERROR\u001b[0m controller failed","talos-level":"info","talos-service":"controller-runtime","talos-time":"2026-05-20T14:48:26.855436337Z"}
{"component":"controller-runtime","controller":"k8s.NodeApplyController","error":"2 error(s) occurred:\n\tnodes \"hp4\" is forbidden: node \"hp4\" is not allowed to modify taints\n\tclient rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline","msg":"2026/05/20 14:48:26.855436 [talos] controller failed","talos-level":"info","talos-service":"machined","talos-time":"2026-05-20T14:48:26.855454738Z"}
# 60 events shown.
# Case F: hp4 v1.13.2 install using the MODERN UKI path
# - Factory schematic 5151fcc4… with customization.extraKernelArgs:
# [intel_iommu=off, iommu=off] baked into the UKI's .cmdline PE section
# - machine.install.grubUseUKICmdline: true (v1.12+ default; explicit here)
# - NO machine.install.extraKernelArgs (validator would reject the combination)
# Source: bug-12619-forensics/2026-05-20-modern-uki/kmsg-shipper-modern-install.log
# Captured via talos.logging.kernel=tcp://10.133.0.66:6666
# Counters across the install window (compare to Case A: 14 / 25 / 3):
# DMA Read fault (PTE) : 0
# ATA host bus error : 0
# I/O error, dev sd* : 0
# ----- timeline -----
2026-05-20T19:27:33.837 user /warning : 2026/05/20 19:27:33 running Talos installer v1.13.2
2026-05-20T19:27:32.948 user /warning : 2026/05/20 19:27:33 writing /boot/grub/grub.cfg to disk
2026-05-20T19:27:34.163 user /warning : 2026/05/20 19:27:34 formatting the partition "/dev/sda1" as "vfat" with label "EFI"
2026-05-20T19:27:34.218 user /warning : 2026/05/20 19:27:34 formatting the partition "/dev/sda2" as "zeroes" with label "BIOS"
2026-05-20T19:27:34.227 user /warning : 2026/05/20 19:27:35 formatting the partition "/dev/sda3" as "xfs" with label "BOOT"
2026-05-20T19:27:34.228 user /warning : 2026/05/20 19:27:35 creating xfs filesystem on /dev/sda3 with args: [-n ftype=1 -c options=/usr/share/xfsprogs/mkfs/lts_6.18.conf -f -L BOOT -p file=/proc/self/fd/6 /dev/sda3]
2026-05-20T19:27:38.818 user /warning : 2026/05/20 19:27:39 formatting the partition "/dev/sda4" as "zeroes" with label "META"
2026-05-20T19:27:41.038 user /warning : 2026/05/20 19:27:41 META: loading from /dev/sda4
2026-05-20T19:27:41.044 user /warning : 2026/05/20 19:27:41 META: loaded 0 keys
2026-05-20T19:27:41.044 user /warning : 2026/05/20 19:27:41 META: saving to /dev/sda4
2026-05-20T19:27:41.046 user /warning : 2026/05/20 19:27:41 META: saved 0 keys
2026-05-20T19:27:41.053 user /warning : 2026/05/20 19:27:41 installation of v1.13.2 complete
2026-05-20T19:27:41.106 user /warning : [talos] task install (1/1): install successful
2026-05-20T19:27:41.106 user /warning : [talos] task install (1/1): waiting for the image cache copy
2026-05-20T19:27:41.107 user /warning : [talos] task install (1/1): done, 1m37.094128612s
# ----- after reboot: installed system's /proc/cmdline -----
talos.platform=metal console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0
nvme_core.io_timeout=4294967295 printk.devkmsg=on selinux=1 module.sig_enforce=1
proc_mem.force_override=never intel_iommu=off iommu=off
# Notice: no `talos.logging.kernel`, no `netconsole`, no `console=ttyS*` —
# those were iPXE-only additions for the maintenance kernel. The installed
# system inherits ONLY the UKI's .cmdline section. The iommu args reach the
# installed system via the schematic's customization.extraKernelArgs, baked
# into the UKI at image-build time (pkg/imager/imager.go:407-413 →
# internal/pkg/uki/generate.go:51-65).
# Disk state on hp5 after a v1.13.2 install reported "Applied configuration without a reboot"
#
# Captured 2026-05-19 by booting hp5 from Alpine via the iPXE menu's
# `alpine` entry, then reading /dev/sda through parted/dd/blkid.
# Source: bug-12619-forensics/2026-05-19/disk-forensic-alpine.txt
#
# This was taken BEFORE we'd characterised the IOMMU/DMAR fault. It shows
# the disk-side picture from the installer's perspective: GPT and partition
# layout written, BOOT partition correctly staged with kernel+initrd+
# grub.cfg ready to chainload — but MBR bootcode and the BIOS-boot
# partition (sda2) are all zeros. The installer was writing happily until
# the moment grub-install needed to push bytes to the BIOS-grub embedded
# area; that's the point at which the AHCI DMA path was being rejected
# by IOMMU and every WRITE FPDMA QUEUED came back with host bus error.
# The installer's exit code did NOT reflect this — it reported success.
#
# === BEGIN VERBATIM DUMP ===
===== parted print =====
Model: ATA Samsung SSD 840 (scsi)
Disk /dev/sda: 120GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 106MB 105MB fat32 EFI boot, esp
2 106MB 107MB 1049kB BIOS bios_grub, legacy_boot
3 107MB 2204MB 2097MB xfs BOOT
4 2204MB 2205MB 1049kB META
5 2205MB 2310MB 105MB xfs STATE
6 2310MB 120GB 118GB xfs EPHEMERAL
===== parted print + flags =====
Model: ATA Samsung SSD 840 (scsi)
Disk /dev/sda: 234441648s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 2048s 206847s 204800s fat32 EFI boot, esp
2 206848s 208895s 2048s BIOS bios_grub, legacy_boot
3 208896s 4304895s 4096000s xfs BOOT
4 4304896s 4306943s 2048s META
5 4306944s 4511743s 204800s xfs STATE
6 4511744s 234440703s 229928960s xfs EPHEMERAL
===== MBR first 512 bytes (GRUB stage1?) =====
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001c0 02 00 ee ff ff ff 01 00 00 00 af 4b f9 0d 00 00 |...........K....|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200
===== sda2 BIOS partition first 2KB (GRUB core.img?) =====
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000800
===== strings in sda2 (grub markers) =====
(sda2 size in bytes:)
1048576
===== blkid =====
/dev/sda6: LABEL="EPHEMERAL" UUID="d0b8864b-f7a8-41af-a2e5-a860b07369b9" TYPE="xfs"
/dev/sda5: LABEL="STATE" UUID="9f68523c-42ce-46fc-af05-a7207fe1ef39" TYPE="xfs"
/dev/sda3: LABEL="BOOT" UUID="f46dc095-8f95-408f-8b14-0d0cf12878dc" TYPE="xfs"
/dev/sda1: LABEL="EFI" UUID="3827-C1D7" TYPE="vfat"
/dev/loop/0: TYPE="squashfs"
/dev/loop0: TYPE="squashfs"
===== EFI partition (sda1) contents =====
/mnt/efi:
total 1
drwxr-xr-x 2 root root 512 Jan 1 1970 .
drwxr-xr-x 3 root root 60 May 19 01:14 ..
===== BOOT partition (sda3) contents =====
/mnt/boot:
total 0
drwxr-xr-x 4 root root 27 May 19 00:49 .
drwxr-xr-x 4 root root 80 May 19 01:14 ..
drwxr-xr-x 2 root root 41 May 19 00:49 A
drwx------ 2 root root 22 May 19 00:49 grub
/mnt/boot/A:
total 126436
drwxr-xr-x 2 root root 41 May 19 00:49 .
# === END DUMP ===
#
# Full unedited dump at bug-12619-forensics/2026-05-19/disk-forensic-alpine.txt
# Kernel cmdlines observed across the cases
## Case A — FAILING install
### Maintenance kernel (booted via iPXE for the install attempt)
talos.platform=metal console=tty0 console=ttyS0,115200n8 console=ttyS1,115200n8
init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295
printk.devkmsg=on selinux=1 talos.logging.kernel=tcp://10.133.0.66:6666
netconsole=+6666@10.133.0.71/eno1,6666@10.133.0.66/3e:66:15:c3:b6:ed
(No intel_iommu=off. Talos's v1.13.2 kernel auto-enables Intel VT-d when
CONFIG_INTEL_IOMMU_DEFAULT_ON=y is set in the kernel config, which it is.
DL320e Gen8 firmware emits invalid PTEs → install fails → reboot, BIOS
finds empty MBR, falls back to PXE.)
## Case B — SUCCESSFUL install with the iommu workaround
### Maintenance kernel cmdline (added intel_iommu=off iommu=off)
talos.platform=metal console=tty0 console=ttyS0,115200n8 console=ttyS1,115200n8
init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295
printk.devkmsg=on selinux=1 intel_iommu=off iommu=off
talos.logging.kernel=tcp://10.133.0.66:6666
netconsole=+6666@10.133.0.71/eno1,6666@10.133.0.66/3e:66:15:c3:b6:ed
### Installed-system cmdline (post-install disk boot, Case C captures the
### kernel log line "DMAR: IOMMU disabled" confirming this took effect)
talos.platform=metal talos.config=none console=tty0 init_on_alloc=1
slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295
printk.devkmsg=on selinux=1 module.sig_enforce=1 proc_mem.force_override=never
intel_iommu=off iommu=off
talos.logging.kernel=tcp://10.133.0.66:6666
(Installed system cmdline got the iommu args via
`machine.install.extraKernelArgs` in the machine config + the legacy path
`grubUseUKICmdline: false`. Talos's installer wrote this cmdline into
the grub.cfg on the BOOT partition; grub reads it at every boot.
A v1.12+ "modern" alternative is to bake intel_iommu=off into the UKI's
.cmdline PE section via a Factory schematic with
`customization.extraKernelArgs: [intel_iommu=off, iommu=off]` and let
`grubUseUKICmdline: true` (the default) read from there. Either path
works; we have validated the legacy path end-to-end.)
## Reference: vanilla v1.13.2 installed-system cmdline (for comparison)
talos.platform=metal talos.config=none console=tty0 init_on_alloc=1
slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295
printk.devkmsg=on selinux=1 module.sig_enforce=1 proc_mem.force_override=never
(No iommu args by default.)

Reproduction matrix — siderolabs/talos#12619

All tests on HP ProLiant DL320e Gen8 v2 (Lynx Point PCH, AHCI controller at PCIe 00:1f.2, BIOS P80 dated 03/28/2014). hp4 and hp5 are the two servers used; both are the same SKU.

# Date Node Disk Talos version iPXE cmdline intel_iommu machine-config extraKernelArgs Schematic Outcome
1 2026-05-19 hp5 Samsung 840 EVO 120 GB v1.13.2 not set (default kernel: on) not set eecf8569… (Factory default, dual-boot/auto) FAIL: MBR all-zero after install, BIOS falls through to PXE
2 2026-05-19 hp5 Kingston SV300S3 60 GB v1.13.2 not set not set 1b1cb5fa… (dual-boot, no extras) FAIL: same
3 2026-05-19 hp4 Kingston SV300S3 (/dev/sdb) v1.13.2 not set not set 47cfeac6… (BIOS-grub-only) FAIL: same
4 2026-05-19 hp4 Kingston SV300S3 (/dev/sdb) v1.11.6 not set not set eecf8569… (v1.11.6 era) WORKS: v1.11.6's older kernel does not auto-enable VT-d on this hardware
5 2026-05-20 hp4 Kingston SV300S3 (now /dev/sda after physical move to Bay 0) v1.13.2 not set grubUseUKICmdline: false (= early hypothesis: ec0a813 cmdline regression) 47cfeac6… FAIL: same install failure → disproves the cmdline hypothesis
6 2026-05-20 hp4 Kingston SV300S3 v1.13.2 intel_iommu=off iommu=off intel_iommu=off iommu=off + grubUseUKICmdline: false 47cfeac6… WORKS: clean install, no DMAR/ATA faults, post-install disk boot clean, joins cluster, RBAC up
7 (pending) hp4 Kingston SV300S3 or Samsung 840 EVO v1.13.2 intel_iommu=off iommu=off none (cluster default grubUseUKICmdline: true) 5151fcc4… (BIOS-grub + iommu args baked in UKI .cmdline) (pending validation)

The "modern UKI" Case 7 will land in this directory as case-F-modern-uki-success.log once executed.

Key data points

  • Cases 1, 2, 3 all fail identically across two machines, two SSD models, two drive bays, and three Factory schematics — rules out disk, cabling, drive-bay, bootloader-choice as the cause.
  • Case 4 establishes the regression boundary at the kernel version rather than at the installer code: same hardware, same disk, same installation flow; only the kernel version differs.
  • Case 5 disproves the v1.12 commit ec0a813 ("feat: unify cmdline handling GRUB/sd-boot") as the cause. Setting install.grubUseUKICmdline: false reproduces the failure identically. The cmdline source is not the regression.
  • Case 6 isolates the real root cause — Intel VT-d firmware bug on the DL320e Gen8 v2 — by adding the kernel-cmdline workaround. Zero DMAR/ATA faults, install completes in ~24 s wall-clock, MBR/bios_grub populated, node boots from disk and joins the cluster.

Schematic IDs

Short Full ID
eecf8569… eecf8569481dd82573fc8b748e3c55d413c585971559907d66bee9427421b755
1b1cb5fa… 1b1cb5fa41ada49940d8725c78dcd5a7891a85a0416dca29efa8260bbfb16ba3
47cfeac6… 47cfeac685eb6251a8e413d08646e7726b6af469b39a8dc4d7b3e63918fef5b8
5151fcc4… 5151fcc4a452d16f5ed76359b0013925a254a8ecb3dd1f11bba43eaec2ee07bf (newly registered, has customization.extraKernelArgs: [intel_iommu=off, iommu=off] baked into the UKI)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment