Skip to content

Instantly share code, notes, and snippets.

@divideby0
Last active February 18, 2026 00:48
Show Gist options
  • Select an option

  • Save divideby0/4b6ad26f913d65dfe0c164037599bab3 to your computer and use it in GitHub Desktop.

Select an option

Save divideby0/4b6ad26f913d65dfe0c164037599bab3 to your computer and use it in GitHub Desktop.
Feature request draft: Integrate Apple's Foveated Streaming framework into ALVR for eye-tracked foveated encoding on Vision Pro

Apple Foveated Streaming framework support

Apple dropped the Foveated Streaming framework today (visionOS 26.4 beta) along with a reference implementation. It's a first-party API that gives streaming apps access to gaze-directed foveation data — the thing that's been blocked by Apple until now.

Directly relevant to #20, #133, and #157. As @shinyquagsire23 noted in #133: "Eye tracking is an Apple limitation." This framework is Apple's answer.

What the framework gives you

The whole thing is built around a session management protocol that's transport-agnostic:

  • mDNS discovery via _apple-foveated-streaming._tcp
  • TCP + JSON messaging for connection lifecycle and QR code pairing
  • Session states: WAITINGCONNECTINGCONNECTEDPAUSEDDISCONNECTED
  • Auto pause/resume on headset removal

On the visionOS side:

  • FoveatedStreamingSession manages the connection to the streaming endpoint
  • FoveatedStreamingSpace is a new ImmersiveSpace variant with native foveation support
  • Bidirectional message channel for custom data exchange between client and endpoint
  • Supports progressive and mixed immersion styles

The endpoint receives approximate gaze region data and renders high-res content only in that region, reducing bandwidth and compute while improving perceived quality.

How this maps to ALVR's current architecture

I spent some time reading through the alvr-visionos code. Here's where things stand and what would need to change:

Where ALVR is now:

  • FFR.swift does fixed foveated rendering with static center coordinates from server config (centerSizeX/Y, centerShiftX/Y, edgeRatioX/Y). Fovea is always center-of-FOV.
  • RealityKitEyeTrackingSystem.swift is the side-channel workaround — screen recording broadcast extensions + CFNotificationCenter shift registers to get approximate eye position. Requires the RealityKit renderer, limited precision.
  • EventHandler.swift handles mDNS/Bonjour discovery and connection management.
  • ALVRClientApp.swift defines three immersive spaces (DummyImmersiveSpace, RealityKitClient, MetalClient) via CompositorLayer / CompositorServices.

Proposed integration (hybrid approach — keeps ALVR's transport):

  1. Add a FoveatedStreamingSpace alongside the existing immersive spaces in ALVRClientApp.swift as a new rendering mode.

  2. Implement the session management protocol on the streamer side. It's straightforward TCP + JSON, similar to what ALVR already does for its own connection management. The mDNS service type changes to _apple-foveated-streaming._tcp and the handshake adds QR code pairing.

  3. Use Apple's gaze region data to drive FFR.swift — dynamically update centerShiftX/Y from the framework's gaze callbacks instead of static values. This replaces the RealityKitEyeTrackingSystem workaround entirely.

  4. Keep ALVR's video transport (H.264/HEVC/AV1 pipeline). The framework uses NVIDIA CloudXR as its reference transport, but the session management layer is independent. ALVR can implement the session protocol for foveation data while keeping its own streaming pipeline.

There's also the option of going full CloudXR, but that would drop ALVR's cross-platform support and require NVIDIA GPUs. Probably only interesting as a niche mode for users with the right hardware.

Client-side changes (alvr-visionos):

  • New immersive space type wrapping FoveatedStreamingSpace
  • FoveatedStreamingSession connection management alongside existing ALVR connection
  • Deprecate RealityKitEyeTrackingSystem.swift broadcast extension workaround
  • Forward gaze region data to server via ALVR's existing data channel

Server-side changes (ALVR streamer):

  • Implement Apple's session management protocol (mDNS + TCP/JSON)
  • Accept dynamic foveation center coordinates from the client
  • Update encoder foveation parameters per-frame based on gaze data

Worth noting: Apple's sample repo also includes a StreamingSession.xcframework for iOS, so this could eventually enable ALVR streaming to iPhones/iPads too.

References

Beta status

The framework is on visionOS 26.4 developer beta, so APIs could still change before release. Probably makes sense to develop this on a feature branch and gate it behind a setting in GlobalSettings — same pattern as experimental40ppd, enablePersonaFaceTracking, etc. That way it can ship disabled by default and get turned on once the framework goes stable.

The session protocol itself is fully documented with complete message formats and is implementable independently of CloudXR. Given how long eye-tracked foveation has been blocked (#20 is from Feb 2024), seems worth starting the work now even if it can't ship until visionOS 26.4 goes GA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment