Last active
January 25, 2021 02:09
-
-
Save chrisforbes/c6205eca44c93e3079a97a8be4dff44d to your computer and use it in GitHub Desktop.
agx gpu open questions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1) either fadd.16 src encoding is wrong, or 16b load/store encoding is wrong: | |
-ld_compute w0, [thread_position_in_grid].x | |
# 85 08 00 0E 00 C0 10 00 | |
+load h4, h#0.abs.neg, h#0, h0 | |
# 85 0C 04 0E 00 C0 10 00 | |
+load h6, h#4.abs.neg, h#0, h0 | |
# 38 00 | |
-unk38 h0 | |
# 26 84 C3 20 0C 00 | |
-fadd.16 h2, h3, h2 <---- but we just put the operands in h4, h6! | |
# C5 08 08 0E 00 C0 10 00 | |
+store h4, h#8.abs.neg, h#0, h0 | |
# 88 00 00 00 | |
+stop h0, h#0, h#0 | |
2) compiler likes xor with a constant [presumably 0x80000000] for float32 negate. why? this seems a bad trade if it requires introducing a driver-managed uniform? | |
similarly, the compiler likes &~0x80000000 for float32 abs when there's no expression to fold into. In both cases, if there is an fadd required next anyway, the compiler chooses to fold neg/abs into the source and we decode correctly. | |
# 7E 05 88 29 6C 00 | |
bitop w2, #0x6 [a_xor_b], const_4, w2 | |
Confirmed abs/neg modifiers combine in the expected way for encoding -abs(x). | |
for float16 the compiler emits an fadd.16 which seems reasonable, although we have the abs/neg bits in the wrong places | |
// half: out[index] = -inA[index] | |
# 26 84 C2 04 00 02 | |
-fadd.16 h2, h2.abs, h#0 /* unk5 = 02 */ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment