Skip to content

Instantly share code, notes, and snippets.

@arichardson
Created April 17, 2026 21:26
Show Gist options
  • Select an option

  • Save arichardson/e4ce8d6d1a96495982ae7c012f9b25b4 to your computer and use it in GitHub Desktop.

Select an option

Save arichardson/e4ce8d6d1a96495982ae7c012f9b25b4 to your computer and use it in GitHub Desktop.

8-Bit Floating-Point Immediate Specification (Uniform Scaling)

This specification defines an 8-bit immediate encoding designed for fine-grained memory allocation sizing. It utilizes a 3-bit exponent and a 5-bit mantissa to provide byte-granular precision for small allocations, scaling dynamically to standard cache-aligned sizes for medium allocations up to just under 4 KB.

1. Instruction Field Layout

The 8-bit immediate (imm8) is partitioned into two fields:

  • imm8[7:5] (3 bits): Exponent (Exp). Determines the implicit bit injection and the fixed left-shift amount.
  • imm8[4:0] (5 bits): Mantissa (M). Forms the base value.

2. Decoding Logic

The hardware decoder applies a strictly uniform mathematical progression across all states, branching only to handle the subnormal zero-state.

  • Subnormal Range (Exp = 0)
    • Logic: No implicit bit is added. The mantissa is unshifted.
    • Formula: Value = M
  • Shifted Normal Range (Exp = 1 through 7)
    • Logic: An implicit leading 1 is injected at bit 5 (adding 32 to the mantissa). The result is shifted left by Exp - 1.
    • Formula: Value = (32 + M) << (Exp - 1)

3. Allocation Ranges and Alignment

This uniform encoding provides 256 unique states with zero overlaps. The alignment naturally scales to match standard hardware boundaries (e.g., 64-byte cache lines) at the upper end of the range.

Exponent (Exp) Shift Amount Representable Range Alignment (Step Size)
0 (000) << 0 0 B to 31 B 1 Byte
1 (001) << 0 32 B to 63 B 1 Byte
2 (010) << 1 64 B to 126 B 2 Bytes
3 (011) << 2 128 B to 252 B 4 Bytes
4 (100) << 3 256 B to 504 B 8 Bytes
5 (101) << 4 512 B to 1,008 B 16 Bytes
6 (110) << 5 1,024 B to 2,016 B 32 Bytes
7 (111) << 6 2,048 B to 4,032 B 64 Bytes

4. Hardware Implementation (Zero-Cost Shifts)

This encoding is designed for zero-latency combinational decoding. It requires no decoders, adders, or ALUs.

The implementation consists of a single 8-to-1 Multiplexer. The 8 inputs to the multiplexer are created purely through static wire routing:

  • The implicit 1 (or 0 for Exp=0) is concatenated with the 5-bit M bus to form a 6-bit base vector.
  • For each multiplexer input, this 6-bit vector is physically wired to the output bus offset by the required shift amount (0 to 6 bit positions). All lower bits are tied permanently to 0.
  • The 3-bit Exp field is wired directly to the multiplexer's select pins.

Note on Future Expansion: Large Page Allocations

This 8-bit floating-point immediate explicitly stops at 4,032 bytes, just shy of a standard 4 KB OS memory page limit.

To support large memory management operations without bloating this primary instruction, architectures should implement a separate macro-allocation instruction (YBNDSWIP2).

YBNDSWIP2 Architecture Draft:

  • Immediate: 5-bit unsigned integer (imm5).
  • Logic: Dedicated power-of-two allocator biased by +12 to seamlessly continue where the 8-bit micro-allocator stops.
  • Formula: Value = 2^(imm5 + 12)
  • Range Covered: 4 KB (imm5 = 0) up to 8 TB (imm5 = 31).

By pairing the 8-bit micro-allocator with the YBNDSWIP2 macro-allocator, the architecture can provide complete, alignment-optimized coverage of the entire 64-bit address space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment