std_io_opt_proposal.md

Currently, the std.Io interface isn't too optimizable:

All forms of concurrent execution goes through io.async/asynConcurrent(f) and io.await which can't be statically aware of when .await will be called, so it must dynamically allocate the context needed to run f.
The main api for blocking/unblocking on an arbitrary state is through Mutex & Condvar which require locking a mutex to wait (& wake correctly) + restrict implementors to only a usize with a biased representation for each state.
It ties cancellation to the task/concurrency model instead of to blocking operations (which are really the ones getting cancelled); To cancel a set of operations, they must be wrapped in a new spawned Future. It's also unclear, due to the racy nature of io.cancel, if a blocking operation consumes the stored cancellation request, or if it persists & causes all future blocking ops in that Future to return Cancelled.

I've thought of some ideas on how to address these + the all-encompassing nature of the interface. Here's a brainstorm: https://zigbin.io/54da01

First Principles

The idea starts with first consolidating the "async"-bits of the interface into their own thing. Ignoring the I/O stuff, their real goal is to abstract away the coroutine implementation (stackful, stackless, threads, serial). This means that for maximum flexiblity, it should at least expose the core functionality that old zig async provided: async/await/suspend/resume. Here I borrowed from thread terminology to avoid conflating the interface with how async as a lang feature (i.e. stackless) may be referred to, so spawn/join/park/unpark respectively.

Spawn & join roughly match the asyncConcurrent/await from before. Park & unpark are actually closer to suspend/resume from old zig async, but the suspend-block is now an interface passed into park & the anyframe that can be "resumed" (more like "scheduled to run soon" here) is an opaque ref provided by the Io impl.

The name of the whole thing has also changed to Task instead of Io. Idea was for Io to include networking/timers + a separate Task interface, but this just a suggestion. I hope instead for the following to to be the main takeaways:

Proposed Changes

Park & unpark replace Mutex & Condvar from before as the more efficient blocking API. They only concern themselves with the scheduling of a single task, and work with intrusive interfaces & opaque pointers to allow flexible coroutine-suspend/yield/resume impls (solving point 2. from the start). I provided a ResetEvent built upon it. And all sync primitives (like Mutex & Condvar) can be built with an Event as well.

There's no (pollable) Future-level cancellation: It can be made on top of park/unpark: Have a CancelToken that works similarly to the ResetEvent internals from earlier where 1. before park it pushes a callback interface to the token that impls how the park wishes to be cancelled 2. on unpark it removes its callback from token, and 3. on trigger it invokes all pushed callbacks. The runtime is unaware and it can be localized without Future-level concurrency (solving point 3. from the start).

There's now the Runnable interface and run_all_fn([]*Runnable): This is a fork/join style API that allows running multiple functions concurrently but with a statically known lifetime. Meaning Io/Task impls don't need to heap allocate (solving point 1. from the start). io.select can be implemented via run_all + (CancelToken, AtomicBool) where all tasks are Runnable-wrapped with first to set bool triggering the CancelToken. And Loris' io.async(saveFile) example can be replaced by it (see my provided forkJoinExample).

There's no io.async: Differentiating the "asynchrony" (as its being called now) is useful logic-wise, but I was confused how an impl could take advantage of that. In pretty much all cases, it would try to asyncConcurrent it first, then fallback to serial on OutOfMemory. For single-threaded-blocking, the concurrent could simply always fail and hit the serial path. With runAll now more efficiently handling the fork/join cases, it seems like an "asynchrony" api could be kept above the interface (maybe as a helper) rather than go through it.

Sidenotes

Thought about what to do for IO: the argument of merging it with coroutine-stuff seems to be to efficiently block with it. For ex, how would a separate IO instance poll in a single-threaded event-loop model? One answer is to have the Io/Task interface expose hooks for various internal events. In particular on_thread_park as seen in Rust tokio. This runs per-thread when there's no ready tasks & would allow a separate IO thing to poll for parked Suspendables before the coroutine impl (regardless of single-thread or multi-thread) goes to sleep, which is what a combined impl would do anyways.

For networking, I think it could standardize on sendmmsg/recvmmsg for sockets (allows tcp + udp, along with batching io & packets) and poll_readable/writeable for things like pipes or other misc devices where the non-blocking IO itself could be specialized.

For files (& sockets), still unsure if the OS handle should be exposable, to allow for virtual fs/net stubbing. There's also that an efficient epoll/kqueue would use edge-triggering which requires pinning state for the lifetime of a handle & the current api doesn't expose a place to put that state.

So, anything worthwhile in this braindump?

kprotty/std_io_opt_proposal.md

First Principles

Proposed Changes

Sidenotes