Skip to content

Instantly share code, notes, and snippets.

View kprotty's full-sized avatar

protty kprotty

View GitHub Profile

Currently, the std.Io interface isn't too optimizable:

  1. All forms of concurrent execution goes through io.async/asynConcurrent(f) and io.await which can't be statically aware of when .await will be called, so it must dynamically allocate the context needed to run f.
  2. The main api for blocking/unblocking on an arbitrary state is through Mutex & Condvar which require locking a mutex to wait (& wake correctly) + restrict implementors to only a usize with a biased representation for each state.
  3. It ties cancellation to the task/concurrency model instead of to blocking operations (which are really the ones getting cancelled); To cancel a set of operations, they must be wrapped in a new spawned Future. It's also unclear, due to the racy nature of io.cancel, if a blocking operation consumes the stored cancellation request, or if it persists & causes all future blocking ops in that Future to return Cancelled.

I've thought of some ideas on how to address these + the all-encompassing nature of th

  1. Every atomic object has a timeline (TL) of writes:

    • A write is either a store or a read-modify-write (RMW): it read latest write & pushed new one.
    • A write is either tagged Relaxed, Release, or SeqCst.
    • A read observes some write on the timeline:
      • On the same thread, future reads can't go backwards on the timeline.
      • A read is either tagged Relaxed, Acquire, or SeqCst.
      • RMWs can also be tagged Acquire (or AcqRel). If so, the Acquire refers to the "read" portion of "RMW".
  2. Each thread has its own view of the world:

  • Shared write timelines but each thread could be reading at different points.
const M = 12; // Probability scale for rANS state. Symbol frequencies in this log range. Usually 8-12.
const L = 23; // Renormalization factor to control dumping rANS state to bitstream. From rans_byte.h.
const m_min = 8 - 2 - (std.math.divCeil(u32, M, 4) catch unreachable); // Small-size-opt limit when compressing frequencies.
const m_max = [_]u16{m_min, m_min+16, m_min+16+256, m_min+16+256+4096, 1<<M}; // Size ranges for frequencies after small size limit.
fn compress(dst: anytype, src: []const u8) !void {
// Histogram for the frequency of each byte in input.
var hist = [_]u32{0} ** 256;
for (src) |byte| hist[byte] += 1;
@kprotty
kprotty / lz4_block.zig
Last active January 2, 2024 11:14
Simple LZ4 block enc/dec in 100 LOC
const assert = @import("std").debug.assert;
fn compressBlock(writer: anytype, src: []const u8) !void {
var table = [_]u32{0} ** 4096; // size is pow2. bump to match more. ideal = (0xffff+1) / sizeof(u32)
var anchor: u32 = 0;
if (src.len > 12) { // LZ4 spec restriction: last match must start 12b before end of block.
var pos: u32 = 0;
while (pos + 4 < src.len - 5) { // LZ4 spec restriction: last 5b are always literal.
const blk: u32 = @bitCast(src[pos..][0..4].*);
const std = @import("std");
const assert = std.debug.assert;
const log = std.log.scoped(.lsm_manifest_fuzz);
const vsr = @import("../vsr.zig");
const constants = @import("../constants.zig");
const stdx = @import("../stdx.zig");
const fuzz = @import("../testing/fuzz.zig");
const allocator = fuzz.allocator;
const snapshot_latest = @import("tree.zig").snapshot_latest;
#include <cstdint>
#include <cstddef>
#include <cstring>
#include <cassert>
#include <x86intrin.h>
typedef struct {
__m256i state[2];
__m256i counter;
__m256i output;
type Event:
wait()
set()
type Link:
get(ptr: Ptr):
if LOAD(ptr, Acquire) |value|:
return value
e = Event{}
@kprotty
kprotty / faa_vs_cas.zig
Last active July 9, 2025 02:55
Updated to Zig 0.13
const std = @import("std");
const builtin = @import("builtin");
pub fn main() !void {
try bench(struct {
pub const name = "FAA";
pub fn inc(counter: *std.atomic.Value(usize), current: usize, rng: *u32) usize {
_ = rng;
_ = current;
const std = @import("std");
const Channel = struct {
value: u32 = undefined,
frame: anyframe,
};
fn generate(out_channel: **Channel) void {
var ch = Channel{ .frame = @frame() };
suspend out_channel.* = &ch;
partial ordering: time/events are related to each other as graphs of dependencies
total ordering: time/events are related to each other as a sequence of steps
atomic op: op which is observed to either happen fully or not at all (no in-between)
atomic variable: memory location which atomic ops happen to
data race: non-atomic ops from 2+ threads on same memory location where one op is a write
load: an atomic read to observe a value
store: an atomic write to publish a value
read-modify-write (rmw): a read, updating the value, then a write, all atomically