Skip to content

Instantly share code, notes, and snippets.

@chrisforbes
Last active January 25, 2021 02:09
Show Gist options
  • Save chrisforbes/c6205eca44c93e3079a97a8be4dff44d to your computer and use it in GitHub Desktop.
Save chrisforbes/c6205eca44c93e3079a97a8be4dff44d to your computer and use it in GitHub Desktop.
agx gpu open questions
1) either fadd.16 src encoding is wrong, or 16b load/store encoding is wrong:
-ld_compute w0, [thread_position_in_grid].x
# 85 08 00 0E 00 C0 10 00
+load h4, h#0.abs.neg, h#0, h0
# 85 0C 04 0E 00 C0 10 00
+load h6, h#4.abs.neg, h#0, h0
# 38 00
-unk38 h0
# 26 84 C3 20 0C 00
-fadd.16 h2, h3, h2 <---- but we just put the operands in h4, h6!
# C5 08 08 0E 00 C0 10 00
+store h4, h#8.abs.neg, h#0, h0
# 88 00 00 00
+stop h0, h#0, h#0
2) compiler likes xor with a constant [presumably 0x80000000] for float32 negate. why? this seems a bad trade if it requires introducing a driver-managed uniform?
similarly, the compiler likes &~0x80000000 for float32 abs when there's no expression to fold into. In both cases, if there is an fadd required next anyway, the compiler chooses to fold neg/abs into the source and we decode correctly.
# 7E 05 88 29 6C 00
bitop w2, #0x6 [a_xor_b], const_4, w2
Confirmed abs/neg modifiers combine in the expected way for encoding -abs(x).
for float16 the compiler emits an fadd.16 which seems reasonable, although we have the abs/neg bits in the wrong places
// half: out[index] = -inA[index]
# 26 84 C2 04 00 02
-fadd.16 h2, h2.abs, h#0 /* unk5 = 02 */
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment