chrisforbes · January 25, 2021 02:09
diff --git a/gistfile1.txt b/gistfile1.txt
 1) either fadd.16 src encoding is wrong, or 16b load/store encoding is wrong:

 -ld_compute w0, [thread_position_in_grid].x
 # 85 08 00 0E 00 C0 10 00
 +load h4, h#0.abs.neg, h#0, h0
 # 85 0C 04 0E 00 C0 10 00
 +load h6, h#4.abs.neg, h#0, h0
 # 38 00
 -unk38 h0
 # 26 84 C3 20 0C 00
 -fadd.16 h2, h3, h2            <---- but we just put the operands in h4, h6!
 # C5 08 08 0E 00 C0 10 00
 +store h4, h#8.abs.neg, h#0, h0
 # 88 00 00 00
 +stop h0, h#0, h#0



 2) compiler likes xor with a constant [presumably 0x80000000] for float32 negate. why? this seems a bad trade if it requires introducing a driver-managed uniform?
 similarly, the compiler likes &~0x80000000 for float32 abs when there's no expression to fold into. In both cases, if there is an fadd required next anyway, the compiler chooses to fold neg/abs into the source and we decode correctly.

 # 7E 05 88 29 6C 00
 bitop w2, #0x6 [a_xor_b], const_4, w2

 Confirmed abs/neg modifiers combine in the expected way for encoding -abs(x).

 for float16 the compiler emits an fadd.16 which seems reasonable, although we have the abs/neg bits in the wrong places

 // half: out[index] = -inA[index]
 # 26 84 C2 04 00 02
 -fadd.16 h2, h2.abs, h#0 /* unk5 = 02 */
	1) either fadd.16 src encoding is wrong, or 16b load/store encoding is wrong:

	-ld_compute w0, [thread_position_in_grid].x
	# 85 08 00 0E 00 C0 10 00
	+load h4, h#0.abs.neg, h#0, h0
	# 85 0C 04 0E 00 C0 10 00
	+load h6, h#4.abs.neg, h#0, h0
	# 38 00
	-unk38 h0
	# 26 84 C3 20 0C 00
	-fadd.16 h2, h3, h2 <---- but we just put the operands in h4, h6!
	# C5 08 08 0E 00 C0 10 00
	+store h4, h#8.abs.neg, h#0, h0
	# 88 00 00 00
	+stop h0, h#0, h#0



	2) compiler likes xor with a constant [presumably 0x80000000] for float32 negate. why? this seems a bad trade if it requires introducing a driver-managed uniform?
	similarly, the compiler likes &~0x80000000 for float32 abs when there's no expression to fold into. In both cases, if there is an fadd required next anyway, the compiler chooses to fold neg/abs into the source and we decode correctly.

	# 7E 05 88 29 6C 00
	bitop w2, #0x6 [a_xor_b], const_4, w2

	Confirmed abs/neg modifiers combine in the expected way for encoding -abs(x).

	for float16 the compiler emits an fadd.16 which seems reasonable, although we have the abs/neg bits in the wrong places

	// half: out[index] = -inA[index]
	# 26 84 C2 04 00 02
	-fadd.16 h2, h2.abs, h#0 /* unk5 = 02 */