SVE/SME/Neon helpers commonly receive vd == vn (or vd == vn == vm) because the architectural instruction allows source and
destination registers to be the same. A reviewer who sees
uint8_t *d = vd; uint32_t *n = vn; with vd == vn may jump to
"in-place transformation corrupts the source". Before asserting
this, trace the order of operations within one iteration.
The lane-locked safe pattern looks like this:
for (size_t i = 0; i < nelem; ++i) {
float32 e0 = n0[H4(i)]; /* load lane i */
float32 e1 = n1[H4(i)];
d[H2(2 * i + 0)] = fcvt_f32_to_fp8(e0, ...); /* store lane i */
d[H2(2 * i + 1)] = fcvt_f32_to_fp8(e1, ...);
}Even when vd == vn, this is safe because:
- Loads precede stores within an iteration.
e0ande1are read into locals before anyd[...]write happens, so the writes cannot corrupt the values being converted in this iteration. - Lanes are lockstep. The two destination stores in
iteration
i(d[H2(2*i+0)]andd[H2(2*i+1)], total 4 bytes) hit the same 4-byte region thatn0[H4(i)]was just read from. They do not reach into the lane that iterationi+1will load next.
Before reporting "in-place corruption" or "needs vectors_overlap
/ scratch copy", you must show either:
- An iteration where a store precedes a load it depends on, or
- A store in iteration
ithat lands inside a lane iterationj > ihas not yet read, or - A widening (destination lane larger than the read lane it overlaps with), so writes spill into the next source lane.
If none of those is true, the helper is correct as written and
the vectors_overlap / memcpy(&scratch, ...) pattern is not
required. That pattern exists for helpers whose access pattern
genuinely cannot be lane-locked (e.g. do_tbl1 does arbitrary
index lookups; SQCVTN2 packs from two source registers into
overlapping halves of a destination).
When you do flag a real aliasing bug, name the specific iteration
indices i and j and the specific byte offsets where the
collision occurs. "Later iterations read from the same backing
storage" is not a proof of bug — it is the premise that needs
proving.