Skip to content

Instantly share code, notes, and snippets.

@mxschmitt
Last active May 22, 2026 18:01
Show Gist options
  • Select an option

  • Save mxschmitt/16752b06e7c2e67d0be75b375f3dbb15 to your computer and use it in GitHub Desktop.

Select an option

Save mxschmitt/16752b06e7c2e67d0be75b375f3dbb15 to your computer and use it in GitHub Desktop.
DCV Cursor Capture Investigation — train/inference mismatch + migration plan to GPU instances

DCV Cursor Capture Investigation

RECOMMENDED PATH FORWARD: Patch ffmpeg gdigrab (PROVEN WORKING)

The Fix

Patch ffmpeg's libavdevice/gdigrab.c:paint_mouse_pointer() to accept CURSOR_SUPPRESSED and use GetCursor() + AttachThreadInput() for the cursor handle. This is proven working — we built and tested a patched ffmpeg on a headless EC2 instance (2026-05-22).

Results:

  • Baseline (unpatched): ffmpeg gdigrab -draw_mouse 1 → 131KB video, NO cursor visible
  • Patched: same command → 160KB video, CURSOR VISIBLE and moving

The Patch (~15 lines in libavdevice/gdigrab.c)

// Line 495-496: Replace strict CURSOR_SHOWING check
// BEFORE:
if (ci.flags != CURSOR_SHOWING)
    return;

// AFTER:
if (ci.flags == 0)  // flags=0 = truly hidden, respect that
    return;
// flags=1 (SHOWING) or flags=2 (SUPPRESSED) — both attempt cursor draw
// Lines 498-503: Enhanced cursor handle fallback
// BEFORE:
if (!icon) {
    icon = CopyCursor(LoadCursor(NULL, IDC_ARROW));
}

// AFTER:
if (!icon) {
    /* CURSOR_SUPPRESSED: hCursor is NULL. Use GetCursor() via
     * thread input attachment to get real cursor shape. */
    HWND fg = GetForegroundWindow();
    if (fg) {
        DWORD tid = GetWindowThreadProcessId(fg, NULL);
        DWORD my_tid = GetCurrentThreadId();
        if (tid && tid != my_tid) {
            AttachThreadInput(my_tid, tid, TRUE);
            icon = CopyCursor(GetCursor());
            AttachThreadInput(my_tid, tid, FALSE);
        }
    }
    if (!icon)
        icon = CopyCursor(LoadCursor(NULL, IDC_ARROW));
}

Why It Works

  • GetCursorInfo still provides valid position (ci.ptScreenPos) even when SUPPRESSED
  • GetCursor() + AttachThreadInput() returns valid cursor handles with correct shapes (I-beam, hand, resize) even when GetCursorInfo.hCursor is NULL
  • DrawIcon() (existing ffmpeg code, unchanged) composites the cursor at the correct position
  • No performance overhead — AttachThreadInput is ~0 cost

Upstream Status

  • No existing issue or PR in ffmpeg for CURSOR_SUPPRESSED on headless VMs
  • Last cursor-related change to gdigrab.c was 2019 (HiDPI fix)
  • The ddagrab filter (DXGI) has the same problem — uses PointerPosition.Visible which is also FALSE on headless
  • This is an industry-wide gap: Google's WebRTC, python-mss, OBS all skip cursor when SUPPRESSED

Also Proven Working: python-mss patch (~85 lines Python)

Tested and working on the same headless EC2 instance (2026-05-22).

Results:

  • mss.MSS(with_cursor=True).shot() → 105,448 bytes (cursor visible at (400,300))
  • mss.MSS(with_cursor=False).shot() → 105,236 bytes (no cursor)
  • Difference: 212 bytes = cursor pixels composited by mss's built-in _merge()

Changes required (3 files, ~85 lines total):

  1. mss/base.py (1 line) — allow with_cursor on Windows:
("with_cursor", with_cursor, ["Linux", "Windows"]),
  1. mss/windows/gdi.py __init__ (2 lines) — accept kwargs:
def __init__(self, **kwargs) -> None:
    super().__init__(**kwargs)
  1. mss/windows/gdi.py cursor() (~80 lines) — replace no-op with:
def cursor(self):
    # GetCursorPos() for position (works when SUPPRESSED)
    # AttachThreadInput() + GetCursor() for real cursor handle
    # DrawIconEx() to render cursor into 32x32 BGRA bitmap
    # Alpha recovery (white background = transparent)
    # Return ScreenShot with position for mss's built-in _merge()

The key insight: mss already has the _merge() compositor infrastructure (used by Linux/Xlib backend). The Windows backend just needed a working cursor() implementation that doesn't rely on GetCursorInfo.hCursor (which is NULL when SUPPRESSED).

Upstream contribution path: mss PR #272 uses GetCursorInfo (same limitation). Our approach using GetCursor() + AttachThreadInput() is strictly better — could be contributed as an improved version of that PR.

Other applicable targets

  • ffmpeg ddagrab — could add GetCursor() fallback when PointerPosition.Visible=FALSE
  • TREC SDK (trajectory-recorder) — currently uses draw_mouse=0 and captures cursor separately. With patched ffmpeg, just change to draw_mouse=1.

Other solutions (lower priority now that ffmpeg patch is proven):

  1. Improve CursorPainter — Use GetCursor() + AttachThreadInput() in packages/client/src/client/cursor_overlay.py. Pure Python, no binary to ship. Good for the mss pipeline.

  2. Windows.Graphics.Capture IsCursorCaptureEnabled=true (UNTESTED) — DWM-level composite. May work independently. Needs C++ test program.

  3. QEMU screendump from outside — Capture from Linux host. Cursor IS included. Different architecture.


dockur/windows: Valuable for Win11 Fidelity, Does NOT Fix Cursor

dockur/windows runs real Windows 11 inside QEMU/KVM in Docker. Valuable for desktop fidelity (Start menu, Store, modern apps) but does NOT solve cursor capture.

Tested extensively (2026-05-21/22) on c5.metal:

VGA Mode GetCursorInfo flags Cursor in mss? Cursor in QEMU screendump/VNC?
-vga virtio (default) flags=0 (HIDDEN) No Yes (overlay)
-vga cirrus flags=1 (SHOWING!) No Yes (overlay)
-vga cirrus + disabled driver flags=1 (SHOWING!) No Yes (overlay)

Key finding: Even with GetCursorInfo returning flags=1 (SHOWING) on Cirrus VGA, the cursor is NOT in mss/GDI captures. QEMU's cirrus_cursor_draw_line composites into QEMU's internal DisplaySurface (what VNC/screendump reads), NOT into the guest-visible VRAM (what Windows GDI BitBlt / mss reads). These are two separate memory regions.

Why WindowsAgentArena doesn't actually solve it either: Their ShowCursor(True) + SendInput() hack doesn't put cursor into GDI captures. Their actual capture methods are: (1) QEMU screendump from the hypervisor level (includes overlay), (2) deprecated cursor.png manual compositing (same as our CursorPainter).

Industry validation: This is an industry-wide unsolved problem. Google's WebRTC (mouse_cursor_monitor_win.cc) sends an empty bitmap when CURSOR_SUPPRESSED. No one has native cursor capture on headless VMs.


CRITICAL FINDING (2026-05-20): GPU instances also suppress cursor

Tested on g4dn.xlarge (NVIDIA Tesla T4, driver v32.0.15.9636) — GetCursorInfo still returns flags=2 (CURSOR_SUPPRESSED).

The cursor suppression is NOT DCV-specific. It is a property of any display without a physical monitor connected — including GPU instances. The WDDM driver creates a display output (1280x800 on T4), but the hardware cursor plane remains inactive without something physically consuming the signal.

Test instance: i-0c884ac115dd3be55 (g4dn.xlarge, Windows Server 2025, auto-logon Administrator active)

  • Display adapter: NVIDIA Tesla T4 (GRID driver 596.36)
  • SetCursorPos(500, 500) — works, position is tracked
  • ShowCursor(True) × 5 — no effect on flags
  • GetCursorInfoflags=2, hCursor=None, pos=(500,500)

This invalidates the "just switch to GPU" hypothesis. The migration plan in this doc (Option 2) needs revision.

What does this mean?

The cursor is suppressed on ANY headless EC2 instance regardless of display driver:

  • DCV IDD driver → CURSOR_SUPPRESSED
  • NVIDIA GRID/T4 driver → CURSOR_SUPPRESSED
  • Microsoft Basic Display Adapter → GetCursorInfo returns False entirely

The only known way to get flags=1 (CURSOR_SHOWING) is to have something actively consuming the display output as a "monitor" — e.g., a physical HDMI dongle, a connected DCV/RDP viewer, or potentially a custom IDD that claims to be a monitor.

Things we tried on g4dn.xlarge that DID NOT fix cursor

Attempt Result
Install NVIDIA GRID driver (596.36) Adapter present (T4 @ 1280x800), cursor still SUPPRESSED
Auto-logon + interactive Administrator session Session active (confirmed via query session), cursor still SUPPRESSED
SetCursorPos(500, 500) Position IS tracked correctly — but flags remain 2
ShowCursor(True) × 5 No effect on flags
ConnectedMonitor=DFP-0 registry hack (tells NVIDIA a DFP is attached) No effect — cursor still SUPPRESSED after reboot

What options remain?

Option A: Improve CursorPainter to get real cursor shapes (RECOMMENDED — lowest effort, highest impact)

What works today: CursorPainter composites cursor position into frames. Position is accurate even when SUPPRESSED.

What's broken: When flags=2, hCursor from GetCursorInfo is NULL, so we only composite the arrow fallback.

Fix: Use GetCursor() (user32.dll) which returns the cursor handle of the calling thread's message queue — or better, use GetClassLongPtr(hwnd, GCL_HCURSOR) or WM_SETCURSOR tracking to get the cursor that the foreground app has set. This gives us:

  • I-beam over text fields
  • Hand over links
  • Resize arrows on window edges
  • Busy/wait spinners
  • App-custom cursors

Approach:

# In the capture thread, before each frame grab:
import ctypes
hCursor = ctypes.windll.user32.GetCursor()  # returns current cursor for the calling thread
# OR: attach to foreground thread to get ITS cursor
foreground_hwnd = ctypes.windll.user32.GetForegroundWindow()
tid = ctypes.windll.user32.GetWindowThreadProcessId(foreground_hwnd, None)
ctypes.windll.user32.AttachThreadInput(current_tid, tid, True)
hCursor = ctypes.windll.user32.GetCursor()
ctypes.windll.user32.AttachThreadInput(current_tid, tid, False)

Tradeoff: Still compositing (not native in framebuffer), but with CORRECT shapes. Sub-pixel positioning difference remains but is imperceptible at 1920x1080.

Effort: ~1-2 days. Modify packages/client/src/client/cursor_overlay.py to try GetCursor() / AttachThreadInput before falling back to the IDC_ARROW stub.

Option B: Keep a lightweight viewer connected to activate cursor plane

What it does: A connected RDP or DCV viewer activates the hardware cursor plane → GetCursorInfo returns flags=1 with valid hCursor.

Current implementation: start_dcv_viewer() in apps/eval-script/src/eval_script/run_remote.py launches headless Puppeteer → DCV web client → authenticates → maintains WebSocket connection.

Problem: Heavyweight (needs Node, Puppeteer, Chrome, Xvfb on Linux), fragile (Puppeteer timeouts, DCV auth failures), adds 15-20s startup per session.

Lighter alternatives:

  • Minimal RDP clientxfreerdp /v:host /u:Administrator /p:pass /cert:ignore /headless or similar. RDP activates the cursor plane same as DCV.
  • FreeRDP without display — connects, authenticates, doesn't render pixels. Just enough to activate cursor.
  • Custom WebSocket DCV handshake — only needs the DCV auth protocol, not full video decode. Could be 50 lines of Python with websockets.

Tradeoff: Adds a dependency (viewer process must stay alive during capture). If viewer dies, cursor reverts to SUPPRESSED. But gives the "true" native cursor in framebuffer.

Effort: ~2-3 days if switching to FreeRDP. 0 days if keeping existing start_dcv_viewer() as-is.

Option C: Custom IDD (Indirect Display Driver) that renders cursor into framebuffer

What it does: A custom Windows driver that creates a virtual monitor AND composites the cursor into its framebuffer (instead of relying on the hardware cursor plane).

How: Microsoft's IddSampleDriver (on GitHub) creates virtual displays. Extend it to call GetCursorInfo in its frame-render callback and alpha-blend the cursor sprite.

Tradeoff:

  • Most "correct" solution — cursor IS in the framebuffer natively
  • But: requires building/signing a Windows kernel driver, deploying it to AMIs, maintaining it across Windows updates
  • Security review would be intense for a custom kernel driver on production machines

Effort: ~1-2 weeks for a prototype, ~1 month production-ready with signing and testing.

Option D: Vendor with physical machines (Daytona, Mac mini farm, etc.)

What it does: Real hardware with physical displays → cursor just works natively.

Status: Daytona access pending. They claim Windows sandbox support.

Tradeoff: External dependency, cost unknown, may not meet scale requirements (300 concurrent). But eliminates ALL mismatches (cursor, DPI, font rendering, etc.)

Effort: 0 engineering effort if it works. Unknown timeline for access.

Option E: RDP/VNC loopback (connect from the same machine to itself)

What it does: Run an RDP or VNC server + a headless client on the same instance, looping back to localhost. The server sees a "connected viewer" and activates the cursor plane.

How: mstsc /v:localhost or a headless RDP client connecting to 127.0.0.1:3389. Alternatively, TightVNC server + viewer both on the same box.

Tradeoff: Simpler than cross-machine viewer. RDP is already available on Server 2025 (we just disabled it). Re-enabling it for localhost-only + connecting a headless client might be the lightest-weight approach.

Effort: ~0.5-1 day to test. If RDP loopback activates the cursor → minimal code change.


AWS Infrastructure Alternatives Comparison

Three options investigated for getting native cursor in ffmpeg video output:

1. EC2 + RDP Loopback to localhost

How: Enable RDP on the instance, connect a headless FreeRDP client to 127.0.0.1:3389 from within the same instance. The RDP server activates the display/cursor.

Cursor: YES — RDP session has an active display stack with cursor rendering in the framebuffer.

But: On non-Server Windows (Win11), RDP takes over session 1 (only one session allowed). It doesn't create a separate session 2. Connecting RDP from within the console session is circular — it reconnects to the same session or gets blocked.

On Windows Server (which we use today): RDSH allows multiple sessions, so you could create a session 2 via RDP while session 1 is the console. But then which session does ffmpeg capture? It captures the console (session 1) which is now DISCONNECTED.

Performance: RDP encoder on localhost adds 5-15% CPU overhead (H.264 compression even for loopback).

Verdict: Doesn't cleanly work. The session model fights us.

Aspect Rating
Cursor in ffmpeg video? Probably yes but session routing is complex
Closeness to real machine Medium — RDP introduces its own display driver quirks
Complexity to implement Medium — need to solve session routing
Works with existing infra? Yes — just a script change
Windows 11? Server 2025 (single-session limitation for Win11 BYOL)

2. Amazon WorkSpaces (Core / Pools)

How: Use WorkSpaces API to programmatically create/terminate desktop instances. WorkSpaces uses DCV internally and creates an always-active console session at boot.

Cursor: UNCLEAR — WorkSpaces uses DCV internally (same IDD driver issue). The display is "active" for the streaming protocol, but GetCursorInfo may still return flags=2 unless a WorkSpaces client is connected. Needs testing.

Key facts:

  • Windows 11 supported via BYOL (Enterprise 22H2, 23H2, 24H2, 25H2, LTSC 2024)
  • Programmatic via CreateWorkspaces / TerminateWorkspaces API
  • Custom images/bundles supported
  • Resume from stopped: <90s
  • Fresh provision: 10-20 min
  • SSM access possible (need to install/enable SSM agent)
  • No native SSH (install OpenSSH yourself)
  • Pricing: AlwaysOn ~$21-35/month base, or AutoStop hourly

Closeness to real machine:

  • Windows 11 ✓ (BYOL)
  • DPI/Scaling: configurable
  • Start menu, Store, modern apps: all present on Win11
  • BUT: still a virtual display (DCV IDD) — same cursor suppression likely

Complexity to migrate:

  • Need to replace EC2 ASG infrastructure with WorkSpaces API calls
  • Different image creation workflow (WorkSpace → Image → Bundle)
  • Worker code needs adaptation (SSM for remote commands, or baked into image)
  • Auto-scaling is manual API (no native CloudWatch policies like AppStream)

Verdict: Attractive for Win11 + fast resume, but likely same cursor problem (DCV underneath). Would need to test if WorkSpaces client connection activates cursor differently than raw DCV.

Aspect Rating
Cursor in ffmpeg video? UNKNOWN — likely same as DCV (SUPPRESSED) without client
Closeness to real machine HIGH — real Win11, full desktop
Complexity to implement HIGH — rewrite fleet infra for WorkSpaces API
Works with existing infra? No — different API, image model, lifecycle
Windows 11? Yes (BYOL)

3. Amazon AppStream 2.0

How: Create a fleet of streaming instances. AppStream keeps an always-active desktop with the DCV encoder running.

Cursor: UNCLEAR — same DCV stack underneath. The encoder is always running (consuming frames), which MAY activate the cursor plane since there IS a signal consumer. Needs testing.

Key facts:

  • Windows Server only (2019, 2022) — NO Windows 10/11 support
  • Fleet auto-scaling is excellent (CloudWatch policies, target tracking)
  • Custom images supported
  • No SSH/SSM access to fleet instances — this is a dealbreaker for running automation headlessly
  • Designed for user-facing streaming, not headless automation
  • Elastic fleets: per-second billing, AWS-managed provisioning
  • Always-On fleets: instances running 24/7, ~$0.10/hr (stream.standard.medium)

Closeness to real machine:

  • Windows Server only — no Win11 ✗
  • No direct access for automation ✗
  • Designed for app streaming TO users, not headless capture

Verdict: Not viable. No SSH/SSM access (can't run pywinauto/ffmpeg headlessly), no Windows 11, wrong abstraction level.

Aspect Rating
Cursor in ffmpeg video? UNKNOWN — probably yes if encoder activates cursor
Closeness to real machine LOW — Server OS only, no direct access
Complexity to implement VERY HIGH — completely different paradigm
Works with existing infra? No — can't SSH in, can't run headless automation
Windows 11? No

Summary Matrix

EC2 + RDP Loopback WorkSpaces Core AppStream 2.0
Cursor native in video Likely (needs test) Unlikely (DCV underneath) Unknown
Windows 11 BYOL (dedicated host) BYOL (built-in support) No
Direct SSH/SSM access Yes Yes (with setup) No
300 concurrent Yes (existing ASG) Yes (API, but manual scaling) Yes (auto-scaling)
Boot time 5-10 min <90s (from stopped) 1-2 min
Cost (per instance/hr) $0.16-0.68 (spot/OD) ~$0.30-0.50 ~$0.10-0.20
Custom images AMI Image → Bundle Image
Migration effort None (script change) Rewrite fleet infra Not viable
Closeness to real machine Medium (Server 2025) High (Win11 BYOL) Low (Server only)

Research: How Others Solve This

Root Cause (from Microsoft docs)

CURSOR_SUPPRESSED (flags=2) was introduced in Windows 8. Documented as "system is not drawing the cursor because the user is providing input through touch or pen instead of the mouse." In practice, fires on ANY headless/virtual display where the hardware cursor plane isn't active.

The cursor is NOT part of the desktop framebuffer on modern Windows. The DWM composites it via a separate hardware overlay plane. Screen capture APIs (GDI BitBlt, DXGI Desktop Duplication, mss) return frames WITHOUT cursor. ffmpeg -draw_mouse 1 internally calls GetCursorInfo and skips drawing when CURSOR_SUPPRESSED — so it fails on headless VMs too.

Solutions from the community

1. Virtual Display Driver (VDD) by itsmikethetech — MOST PROMISING

GitHub: github.com/itsmikethetech/Virtual-Display-Driver (MIT license)

Creates virtual monitors via IddCx that Windows treats as real physical displays. The key differentiator: it calls IddCxMonitorSetupHardwareCursor which tells Windows to activate the hardware cursor on its virtual monitor.

  • IddCx hardware cursor mode means Windows should report CURSOR_SHOWING for sessions targeting that display
  • Supports up to 4K, HDR, ARM64, custom EDIDs
  • Signed driver available (no test-signing needed)
  • User-mode install via companion VDC application
  • Known issue: IddCxMonitorSetupHardwareCursor returns STATUS_INVALID_PARAMETER on Windows Server 2019 (GitHub issue #304). Works on Windows 10/11.
  • Headless-server use case explicitly listed

TESTED 2026-05-20 on c6i.xlarge / Windows Server 2025: VDD installs and runs (Status: Started, oem9.inf, <HardwareCursor>true</HardwareCursor>), creates a virtual monitor (DesktopMonitor2 at 800x600), BUT GetCursorInfo still returns flags=2 (CURSOR_SUPPRESSED).

The IddCx hardware cursor flag does NOT fix the CURSOR_SUPPRESSED state on Server 2025. The GitHub issue #304 (fails on Server 2019) appears to also affect Server 2025 — IddCxMonitorSetupHardwareCursor may succeed without error but the OS still doesn't activate the cursor plane for screen capture APIs.

This option is ELIMINATED for our use case.

2. Parsec VDD (nomi-san/parsec-vdd) — Does NOT solve cursor

Uses Parsec's IddCx driver for virtual displays. Explicitly does NOT support hardware cursor (marked with ✗ in comparison table). Cursor remains suppressed.

3. Sunshine game streaming (LizardByte) — Composites via GPU shaders

Uses DXGI Desktop Duplication's GetFramePointerShape to get cursor data, then composites via DirectX shaders. Same principle as CursorPainter but GPU-accelerated. Does not activate cursor in framebuffer — composites it in the streaming pipeline.

4. DXGI Desktop Duplication API — Provides cursor data separately

IDXGIOutputDuplication::GetFramePointerShape returns cursor bitmap + hotspot alongside each frame. The cursor is always provided separately (never in the captured surface). You must composite it yourself. This is how OBS, Sunshine, and most screen recorders handle cursor on headless VMs.

OBS specifically: checks (ci.flags & CURSOR_SHOWING) == 0 and marks cursor invisible. Treats CURSOR_SUPPRESSED same as hidden. OBS does not handle headless cursor.

5. ffmpeg gdigrab/ddagrab — Fails on CURSOR_SUPPRESSED

-draw_mouse 1 internally uses GetCursorInfo. When flags=2, ffmpeg skips cursor drawing. Does not work headless.

6. HDMI dummy plug — Works but not applicable to cloud

Physical HDMI dongle simulates a monitor's EDID → GPU activates hardware cursor plane → GetCursorInfo returns CURSOR_SHOWING. Trivial solution for physical servers. Not applicable to EC2.

What this means for us

Approach Cursor in ffmpeg video natively? Needs driver install? Works on Server 2025?
VDD (itsmikethetech) NO — tested, still SUPPRESSED on Server 2025 Yes (signed driver) ✗ Tested and failed
Parsec VDD No Yes N/A
DXGI + manual composite Yes (but requires custom capture) No Yes
Improve CursorPainter No (composited in Python, not ffmpeg) No Yes
RDP loopback Probably yes No Session model issues
Sunshine-style GPU composite Yes (but huge implementation) No Yes

FINAL Revised Recommendation

TESTED AND ELIMINATED:

  • VDD (itsmikethetech) — Installed, driver running, HardwareCursor=true, still CURSOR_SUPPRESSED
  • NVIDIA GRID driver (g4dn.xlarge) — Driver installed, display active, still CURSOR_SUPPRESSED
  • ConnectedMonitor=DFP-0 registry hack — No effect
  • ShowCursor(True) — No effect on flags

REMAINING VIABLE OPTIONS:

Priority 1: Test GetCursor() from UI thread — TESTED, IT WORKS!!! (2026-05-20)

Tested on i-0bda2456beb91d51e (c6i.xlarge, Windows Server 2025, auto-logon Administrator, session 1):

GetCursorInfo: flags=2 hCursor=0 pos=(400,300)     ← SUPPRESSED, no handle
GetCursorPos: (400,300)                              ← position WORKS
GetCursor (own thread): 65543                        ← VALID HANDLE!
AttachThreadInput: True                              ← attached to foreground thread
GetCursor (attached to fg): 65539                    ← REAL CURSOR HANDLE!
LoadCursor(IDC_ARROW): 65539                         ← matches arrow (desktop idle)

GetCursor() returns a valid cursor handle even when GetCursorInfo says SUPPRESSED.

When attached to the foreground window's thread via AttachThreadInput(), GetCursor() returns the real cursor that the application has set:

  • Desktop/Explorer idle → IDC_ARROW (65539)
  • Text field → IDC_IBEAM
  • Link → IDC_HAND
  • Window edge → IDC_SIZENWSE etc.

This is the solution. Combine:

  1. GetCursorPos() → position (works even when SUPPRESSED)
  2. AttachThreadInput(myThread, foregroundThread, TRUE) → attach to UI thread
  3. GetCursor() → real cursor handle with correct shape
  4. DrawIconEx() → composite into frame (existing CursorPainter code)

No infrastructure changes needed. No viewer needed. No driver changes. Pure code fix in packages/client/src/client/cursor_overlay.py.

Priority 2: Test RDP loopback — TESTED EXTENSIVELY, CONFIRMED DOESN'T WORK (2026-05-20)

Successfully established an RDP loopback connection from within the VM using mstsc /admin /v:127.0.0.1 (pre-trusted cert + stored credentials). Session 3 (rdp-tcp#0) was created and became Active.

However: GetCursorInfo in the console session (session 1) STILL returned flags=2 (SUPPRESSED).

This definitively proves: cursor plane activation is PER-SESSION, not global. An active RDP session on the same machine does NOT activate the cursor for OTHER sessions. You would need to capture FROM the RDP session itself — meaning ffmpeg would have to run inside session 3, not session 1.

Approaches tested:

  • mstsc /admin /v:127.0.0.1 → session created, console cursor still SUPPRESSED ✗
  • mstsc /v:127.0.0.1 (new session as RdpViewer) → stuck at credential dialog ✗
  • ActiveX MsTscAx COM control → failed to connect ✗
  • TightVNC server running → cursor SUPPRESSED (VNC isn't Windows-native) ✗
  • TightVNC viewer loopback → cursor SUPPRESSED ✗
  • WTSConnectSession API → ACCESS_DENIED ✗
  • LogonUser INTERACTIVE → token created but doesn't activate cursor ✗

This is the definitive answer: you cannot activate the cursor for a given session from within that same VM without being in that specific session's RDP/DCV viewer.

Remaining:

Priority 3: Daytona / physical hardware vendor

  • Only option that gives TRULY native cursor (no compositing at all)
  • Physical monitor = hardware cursor plane active
  • Blocked on vendor access

How Microsoft's WindowsAgentArena Handles This

Repo: github.com/microsoft/WindowsAgentArena

They face the EXACT same problem. From their code:

# fixme: This is a temporary fix for the cursor not being captured on Windows and Linux

Their architecture:

  • Windows 11 runs inside QEMU/KVM (via dockur/windows Docker image), NOT on EC2 directly
  • QEMU provides its own virtual display (VGA/QXL device)
  • They expose a web viewer on port 8006 (noVNC) and RDP on port 3389
  • Screenshots are taken via QEMU QMP screendump command (captures VM framebuffer from outside the guest)

Their cursor workaround (computer.py line 23-40):

### mouse fix:
# the cursor doesn't show up in screenshots otherwise
user32 = ctypes.WinDLL('user32')
user32.ShowCursor(True)
user32.SendInput(1, ctypes.pointer(x), sizeof(INPUT))  # synthetic mouse move

They call ShowCursor(True) + send a synthetic mouse move event at module load. This is a QEMU-specific hack — in QEMU, ShowCursor(True) combined with mouse activity can make the guest render the cursor into the framebuffer (because QEMU's VGA device handles cursor differently than IddCx/WDDM virtual displays).

Their deprecated fallback (main.py line 309-334):

  • On Windows: composite a static cursor.png at pyautogui.position() — exactly what our CursorPainter does
  • On Linux: use XFixesGetCursorImage to get the real cursor shape and composite it
  • On macOS: use screencapture -C which includes cursor natively

Key difference from us: QEMU captures the framebuffer from the hypervisor level (QMP screendump), not from within the guest OS. The ShowCursor hack works because QEMU's VGA/QXL virtual device composites the guest cursor into the framebuffer when the cursor display counter is positive.

Could we use QEMU? Theoretically yes — run Windows in QEMU inside an EC2 instance (nested virtualization). The dockur/windows Docker image does exactly this. But:

  • Nested virtualization adds overhead (15-25% CPU, memory overhead for host+guest)
  • EC2 metal instances needed for KVM (or c5.metal/m5.metal at ~$4/hr)
  • OR: EC2 instances support nested virtualization on c5/m5/r5 since 2020 (with linux-kvm on the host)
  • More complex AMI/deployment (Docker + QEMU layer)
  • But: real Windows 11, cursor works via QMP screendump, no viewer needed

CONCLUSION: On any headless EC2 instance (Server 2025, any driver), native cursor in ffmpeg is impossible without a connected viewer. The only code-level fix is improving CursorPainter to get real cursor shapes via GetCursor() + compositing them into frames (either in Python or via a custom ffmpeg filter).

Alternative architecture: Run Windows inside QEMU (like WindowsAgentArena does). QEMU's screendump captures cursor from the hypervisor level. Adds complexity but gives real Win11 + cursor + no viewer needed.


FINDING (2026-05-21): dockur/windows (QEMU/KVM) Also Does NOT Give Cursor in mss/ffmpeg

Tested on c5.metal with dockur/windows Win11, KVM enabled, full desktop session.

Results inside the QEMU Win11 guest:

GetCursorInfo returned: True
flags = 0 (HIDDEN, not SUPPRESSED)
hCursor = 0
pos = (640, 360)
  • flags=0 (different from EC2's flags=2) — cursor is HIDDEN (never initialized by the display path), not SUPPRESSED
  • mss screenshot from inside guest: no cursor visible (confirmed visually)
  • ShowCursor(True) called 5 times: flags stays 0
  • SendInput mouse move: flags stays 0

What the QEMU screendump cursor actually is:

The cursor visible in QEMU screendump pixel diffs and in the noVNC web viewer is NOT the Windows framebuffer cursor. It's QEMU's VGA hardware cursor overlay:

Windows kernel (win32k.sys)
  → writes cursor to VGA hardware cursor registers
    → QEMU's virtual VGA device intercepts these writes
      → QEMU sends cursor via VNC protocol to web viewer (client-side composite)
      → QEMU composites cursor into screendump output (hypervisor overlay)
      → BUT: does NOT write cursor into the guest-visible VGA framebuffer RAM
        → mss/GDI/ffmpeg inside guest read framebuffer RAM → no cursor

Why WindowsAgentArena's "ShowCursor hack" doesn't actually solve it:

Their ShowCursor(True) + SendInput() does not put cursor into GDI captures. Their actual capture:

  • QEMU QMP screendump — hypervisor overlay (NOT the guest framebuffer)
  • Deprecated cursor.png compositing — same as our CursorPainter

Possible paths with dockur/QEMU:

  1. Capture via QMP screendump from outside the guest — This DOES include cursor (hypervisor overlay). Record video by taking rapid screendumps from the Linux host instead of ffmpeg inside Windows. Downside: QMP screendump is slow (~200ms per frame), limited to ~5fps.

  2. Use QEMU's -display spice with Spice streaming agent — Spice protocol captures full framebuffer + cursor composited. The Spice streaming agent inside the guest could potentially provide a capture with cursor. Needs investigation.

  3. Read VGA cursor registers from inside the guest — Windows writes cursor bitmap + position to the virtual VGA's hardware cursor registers. A custom driver or direct port I/O could read these and composite the cursor. Deep custom development.

  4. QEMU -cursor show or display options — Some QEMU VGA/display options might force cursor into the framebuffer instead of using hardware cursor overlay. Needs investigation of QEMU display backend options.

  5. Use virtio-gpu instead of VGA — virtio-gpu may handle cursor differently. If it renders cursor into the scanout buffer instead of as a separate plane, captures inside the guest would include it.

SOLVED (2026-05-21): -vga cirrus gives CURSOR_SHOWING!

Tested on c5.metal + dockur/windows + VGA=cirrus environment variable:

{
  "GetCursorInfo": true,
  "flags": 1,
  "flags_meaning": "SHOWING",
  "hCursor": 65539,
  "pos": [400, 300]
}

flags=1 (CURSOR_SHOWING) with a valid hCursor handle! The Cirrus VGA emulation in QEMU composites the hardware cursor directly into the display surface via cirrus_cursor_draw_line. This means:

  • ffmpeg gdigrab -draw_mouse 1 works natively (checks GetCursorInfo, sees SHOWING, draws cursor)
  • mss captures include cursor (Cirrus renders cursor into the framebuffer)
  • No compositing needed — cursor is natively in the captured frames

How to use:

docker run -d --name win11 --device /dev/kvm \
  -e VGA=cirrus \
  -e RAM_SIZE=8G -e CPU_CORES=16 \
  -v /shared:/shared \
  dockurr/windows

HOWEVER (further testing 2026-05-22): Despite GetCursorInfo returning flags=1 (SHOWING) with Cirrus VGA, the cursor is still NOT visible in mss/GDI captures. The Cirrus cursor_draw_line composites into QEMU's internal DisplaySurface (what VNC/screendump reads), NOT into the guest-visible VRAM (what Windows GDI BitBlt / mss reads).

Also tested: disabling the Cirrus driver inside the guest to force "Microsoft Basic Display Adapter" (software cursor). Result: still flags=1 but cursor NOT in mss capture.

Conclusion: GetCursorInfo flags=1 is necessary but NOT sufficient. The cursor rendering always goes to QEMU's overlay, never into the guest-mapped VRAM framebuffer that capture APIs read.

Trade-off: Cirrus VGA is limited to 800x600 resolution. More importantly, even with SHOWING flags, cursor doesn't appear in guest-side captures.

FINAL Conclusion (2026-05-22):

There is NO way to get cursor natively into mss/GDI/ffmpeg captures on any virtualized Windows environment.

Tested exhaustively:

  • -vga cirrus with GetCursorInfo flags=1 (SHOWING) → cursor NOT in mss capture
  • -vga cirrus + disabled driver (Basic Display Adapter) → cursor NOT in mss capture
  • -vga virtio (default) → flags=0 (HIDDEN), cursor NOT in mss capture
  • EC2 with NVIDIA GPU, VDD, IddCx → flags=2 (SUPPRESSED), cursor NOT in mss capture

The cursor is ARCHITECTURALLY in a separate layer on ALL platforms:

  • On real hardware: GPU hardware cursor plane (overlay)
  • On QEMU: internal DisplaySurface overlay (drawn by cursor_draw_line, visible to VNC/screendump but NOT guest VRAM)
  • On EC2/IddCx: cursor sprite channel (visible to DCV/RDP viewers but NOT GDI framebuffer)

The only working solutions are:

  1. Composite in capture pipelineGetCursor() + AttachThreadInput + DrawIconEx into frames (proven working, gives real shapes)
  2. Patch ffmpeg gdigrab — same logic inside ffmpeg's C code (~15 lines)
  3. Capture from OUTSIDE the guest — QEMU screendump (includes cursor overlay) at ~5fps max
  4. Windows.Graphics.Capture IsCursorCaptureEnabled — DWM-level composite, UNTESTED on headless

Why VPS Providers Have Cursor But AWS/Azure Don't

On a random VPS (Hetzner, OVH, Vultr, etc.), you connect via KVM-over-IP (IPMI/iLO/iDRAC) or VNC to QEMU. When you see the cursor — it's because you're looking at it through a viewer (the KVM console, VNC client, or web-based noVNC). That viewer IS the "connected monitor" that activates the cursor plane.

The cursor was never "in the framebuffer" on those VPS providers either. What's happening:

What you see What's actually happening
VPS web console with cursor VNC/SPICE viewer connected → QEMU composites cursor into the stream sent to YOUR browser
VPS with VNC client Same — VNC protocol sends cursor position + shape, viewer renders it client-side
Hetzner KVM console iLO/IPMI captures video output from a real GPU via hardware capture card + overlays cursor

The difference with AWS/Azure:

  • AWS EC2 has no KVM console / IPMI / iLO access. There's no hypervisor-level VNC or SPICE endpoint exposed to customers.
  • Azure similarly — you get RDP or serial console, not raw hypervisor framebuffer access.
  • The hypervisor (Nitro on AWS, Hyper-V on Azure) doesn't expose a screendump or VNC endpoint.
  • On cheap VPS providers, the hypervisor IS QEMU with VNC exposed. On AWS, it's a proprietary Nitro hypervisor with no customer-facing display port.

In other words: On VPS providers, cursor "works" because you're always looking through a viewer. If you SSH'd into that same Hetzner VPS and ran GetCursorInfo without having VNC open — you'd get CURSOR_SUPPRESSED too.

Could we get hypervisor-level framebuffer access on AWS?

  • EC2 Serial Console — text-only, no graphics
  • EC2 Instance Screenshot (aws ec2 get-console-screenshot) — captures the Nitro framebuffer! But it's a low-res JPEG, rate-limited (1/min), and intended for debugging boot issues. Does it include cursor? Unknown — worth testing.
  • Bare metal instances (.metal) — you get the full hardware, but still no IPMI/BMC access
  • AWS Outposts — dedicated hardware in your datacenter, but same Nitro interface

What about aws ec2 get-console-screenshot?

This API captures the instance's console output as seen by the Nitro hypervisor. On Windows instances it shows the login screen or desktop. It's captured at the hypervisor level (like QEMU screendump). If Nitro composites cursor into this capture, it would prove hypervisor-level capture works. But:

  • Rate limited (debugging tool, not real-time capture)
  • Low resolution JPEG
  • May or may not include cursor

The real answer

The "KVM with cursor" experience on VPS providers is an illusion — you see cursor because your browser/client IS the viewer. The moment you try to capture programmatically FROM INSIDE the VM without a viewer connected, you hit the same CURSOR_SUPPRESSED issue everywhere.

The only true solutions remain:

  1. Keep a viewer connected (lightweight RDP/VNC/DCV client) — simulates "you looking at the screen"
  2. Capture from outside the VM (QEMU screendump approach) — requires running Windows inside QEMU on EC2
  3. Composite cursor ourselves (CursorPainter) — works everywhere, no viewer needed
  4. Physical hardware — actual monitor connected = cursor always works

Context: Windows VM Meeting (2026-05-20)

This investigation was triggered by a cross-team meeting where the cursor visibility problem was identified as a critical blocker. Key takeaways from the meeting:

  • ~5 people have already investigated the cursor problem without finding a solution
  • Team consensus: "maybe even impossible to get a cursor" on DCV — this investigation confirms it IS impossible on ANY headless EC2 instance
  • The custom recorder approach (FFmpeg + Win32 APIs to stitch cursor) "deviates from real user experience" and "adds latency, complexity, and high chance of errors"
  • Windows 11 licensing is a known challenge — Windows Server 2022/2025 "lacks key Windows 11 features like the proper start menu and Windows Store"
  • Team explored vendor solutions: Daytona (claims Windows sandboxes, access delayed), AWS WorkSpaces (different use case — no cursor tracking), GitHub Actions (farms of computers)
  • Team preference: use a vendor if one can solve the problems; too many parallel individual solutions at 80-90% of requirements
  • Spin-up time: 5-10 minutes per instance, 2-3 hours overhead for 500-hour data campaigns
  • Warm pool approach suggested but not yet implemented
  • Cannot containerize Windows 11 — unlike Linux (thousands of containers on one host), each Windows instance needs a full VM
  • WebAct has additional networking isolation requirements (VPN, no Google access)
  • Multi-OS future: eventually need Windows 7, Windows 10, Mac support; once Win11 is solved, others should be straightforward
  • Action items include: follow up on Win11 license via AWS License Manager, explore custom Win11 image, continue cursor investigation, evaluate warm pool, test Daytona

How This Investigation Answers the Meeting's Questions

Meeting question Answer from this investigation
"Can we get a real cursor on DCV?" No. Confirmed impossible on ANY headless EC2 — including GPU instances (tested g4dn.xlarge with NVIDIA T4)
"Is post-processing cursor data viable?" Yes (CursorPainter works) but only gets arrow shape. Fix: use GetCursor() from UI thread to get real shapes
"Can we get real Windows 11?" Yes via BYOL on dedicated hosts
"Is there a vendor solution?" Daytona pending evaluation. No EC2-only solution exists for native cursor
"How do we match real user machines?" Need either: (a) headless viewer to activate cursor, (b) improved CursorPainter with real shapes via GetCursor(), or (c) custom IDD

Goal

Determine if the OS cursor can be captured in screenshots on Windows 11 (Server 2025) DCV instances, and identify what's needed to make it work — or whether we should migrate away from DCV.

Test Environment

  • Instance: i-0826c353ccf4834da (cursor-test-maxshmi)
  • OS: Windows Server 2025 Datacenter (10.0.26100) — Win11 kernel
  • DCV Version: 2025.0-20103
  • Display Driver: AWS Indirect Display Device (IDD) v1.0.226.0
  • Screen: 1920x1080 via WinDisc (DCV virtual display)
  • DCV Session: console (owner: Administrator, type: console)

Definitive Answer: Can We Get a Real Cursor on DCV?

No. There is NO way to get the OS to render the real cursor into the framebuffer while DCV's IDD is loaded.

The cursor suppression is a kernel-level architectural feature of the IDD driver model. The IDD calls IddCxMonitorSetupHardwareCursor once at monitor path commit time, and from that point forward:

  • GetCursorInfo returns flags=2 (CURSOR_SUPPRESSED), hCursor=0 for ALL processes
  • No user-mode API can override this (SendInput, ShowCursor, SetCursor, registry, etc.)
  • Only the IDD driver itself could release the cursor claim — and DCV's driver never does

However, there ARE ways to get cursor data through DCV itself (since DCV owns the cursor channel), and there are alternative EC2 configurations that don't have this problem.


Key Finding: CURSOR_SUPPRESSED (flags=2)

GetCursorInfo() consistently returns:

  • flags = 2 (CURSOR_SUPPRESSED)
  • hCursor = 0 (NULL — no cursor handle available)
  • ptScreenPos = (x, y) — position IS tracked correctly

This means: the OS knows WHERE the cursor is, but refuses to render it into any framebuffer.


What We Tested

Things that DON'T fix cursor suppression:

Approach Result
Run as SYSTEM (PsExec -s -i 1) Still suppressed
Run as Administrator (PsExec -i 1 -u Administrator) Still suppressed
Run as standard user (testuser) Still suppressed
Run via schtasks /IT (real interactive desktop) Still suppressed
PsExec -i 1 starting explorer.exe (desktop renders, taskbar visible) Still suppressed — IDD cursor claim is display-driver-level, independent of running processes
SendInput with MOUSEEVENTF_ABSOLUTE Position changes, still suppressed
SendInput with MOUSEEVENTF_MOVE (relative) Still suppressed
ShowCursor(true) Still suppressed
LoadCursor(IDC_ARROW) + SetCursor() Still suppressed
SetCursorPos() Position changes, still suppressed
DCV registry: pointer = software Already set, no effect on local capture
DCV registry: enable-client-cursor = 0 Already set, no effect on local capture
DCV service restart Still suppressed after restart
Raw WebSocket connection to DCV port 8443 Not recognized as viewer — still no client connected

Things that DO work:

Approach Result
DrawIconEx with loaded IDC_ARROW Successfully draws arrow cursor onto captured bitmap
GetCursorInfo position tracking Position always accurate, even though hCursor=0
DCV get-screenshot --blend-cursor CLI flag exists! But produces blank image without active viewer
PsExec -i 1 starting explorer.exe Successfully launches shell in interactive Session 1

Root Cause: IddCx Hardware Cursor Architecture

How DCV's IDD handles the cursor

DCV's Indirect Display Driver (v1.0.226.0) follows this architecture:

  1. Driver calls IddCxMonitorSetupHardwareCursor when the monitor path is committed
  2. This tells Windows DWM: "I own cursor rendering — don't draw it into the framebuffer"
  3. Windows sets GetCursorInfo.flags = CURSOR_SUPPRESSED for ALL user-mode processes
  4. Cursor updates flow through the IddCx channel via IddCxMonitorQueryHardwareCursor:
    • IDARG_OUT_QUERY_HWCURSOR.IsCursorVisible (BOOL)
    • IDARG_OUT_QUERY_HWCURSOR.X, Y (screen coordinates)
    • IDARG_OUT_QUERY_HWCURSOR.CursorShapeInfo (type, width, height, pitch, hotspot)
    • The actual cursor bitmap data
  5. Only DCV's IDD driver process can read this data — no public Windows API exposes it
  6. DCV transmits cursor as a separate protocol channel to connected viewers (low-latency sprite overlay)

Why this is PERMANENT while IDD is loaded

The hardware cursor claim is made at the display-subsystem level. It is NOT per-process, per-user, or per-connection. The claim persists as long as:

  • The DCV IDD driver is loaded (it is, whenever DCV service runs)
  • The monitor path is active
  • Nobody calls the IddCx API to release hardware cursor (only the driver itself can)

No user-mode intervention can override this. This is not a bug — it's the documented IDD architecture from Microsoft's IddCx framework.

The "no viewer connected" compounding factor

dcv list-connections console → "There are no clients connected to the session."

From apps/eval-script/src/eval_script/interactive_launcher.py:

"DCV's virtual display only renders a framebuffer when an active viewer is connected."

Without a connected DCV viewer:

  • The IDD swapchain is not presenting frames → get-screenshot returns blank/tiny 2834-byte images
  • DCV's grabber pipeline is idle → --blend-cursor has nothing to blend onto
  • The display is in a minimal/dormant state
  • GDI CopyFromScreen can still capture some content (explorer, desktop) because DWM is running, but the DCV-specific capture pipeline is off

The eval-script solves this by connecting a headless Puppeteer browser to the DCV web client at port 8443, which activates the full streaming pipeline.


DCV Registry State (confirmed on instance)

HKLM\SOFTWARE\GSettings\com\nicesoftware\dcv\display\
  pointer = "software"          ← tells DCV to composite cursor into stream frames
  enable-client-cursor = 0      ← tells viewer NOT to draw local cursor sprite

HKLM\SOFTWARE\GSettings\com\nicesoftware\dcv\security\
  authentication = "none"       ← set during testing

What these settings actually mean:

  • pointer = software → DCV composites cursor into the encoded video stream sent to viewers. This is the stream the REMOTE viewer sees. It does NOT put cursor into the LOCAL framebuffer that GDI/DXGI capture sees.
  • enable-client-cursor = 0 → Connected viewers won't draw their own cursor overlay. Combined with pointer=software, the cursor appears baked into the viewer's stream — but only for viewers, not local capture.

These are about the remote display protocol, NOT about local capture APIs.


DCV get-screenshot --blend-cursor

DCV has a built-in CLI flag:

dcv.exe get-screenshot --blend-cursor --max-width 1920 --max-height 1080 -o screenshot.png console

This composites the cursor into the screenshot using DCV's internal cursor knowledge (from the IddCx channel). In our testing it produced 2834-byte blank images because without an active viewer, the grabber pipeline isn't running.

Expected behavior with active viewer: Full 1920x1080 screenshot WITH the real cursor shape (I-beam, hand, resize, etc.) composited in.


What mss Uses (and PR #464)

mss uses GDI BitBlt(SRCCOPY | CAPTUREBLT) — the oldest Windows capture API.

From python-mss/src/mss/windows/gdi.py line 395:

gdi.BitBlt(memdc, 0, 0, width, height, srcdc, monitor["left"], monitor["top"], SRCCOPY | CAPTUREBLT)

PR #464 (merged) only swapped internal buffer management from CreateCompatibleBitmap + GetDIBits to CreateDIBSection (direct memory-mapped DIB). Same GDI BitBlt underneath. No cursor capture, no DXGI, no Windows.Graphics.Capture.

The CAPTUREBLT flag captures layered windows but NOT hardware cursor overlays from IDD drivers. The cursor() method in mss is literally a no-op (return).


DCV Internals: How --blend-cursor and the Viewer Work

The DCV server IS the display driver

DCV's --blend-cursor works because the DCV server process literally IS the IDD display driver — it receives cursor data directly from Windows via IddCxMonitorQueryHardwareCursor kernel callbacks:

IddCxMonitorQueryHardwareCursor → IDARG_OUT_QUERY_HWCURSOR:
  - IsCursorVisible (bool)
  - X, Y (screen coordinates)
  - CursorShapeInfo (type, width, height, pitch, hotspot)
  - Raw cursor bitmap data (the actual pixels of the cursor shape)

This data is only available to the IDD driver process — no public Windows API exposes it. That's why GetCursorInfo returns hCursor=0 for everyone else.

DCV streaming protocol (cursor channel)

The DCV protocol uses protobuf over WebSocket/QUIC. Cursor is a separate sub-protocol within the input channel:

Message Content
dcv.input.PointerPosition Cursor (x, y) coordinates, sent frequently
dcv.input.PointerCursors Cursor shape data: currentCursorId, hidden, cursorImages[] array
dcv.input.PointerInvalidateCursors Remove specific cursor shapes from client cache
dcv.input.PointerInvalidateCursorCache Clear entire cursor cache
dcv.input.PointerRequireCursorImages Client requests cursor images by ID

Each cursorImage contains:

  • id (uint64) — cursor identifier
  • width, height — dimensions
  • hotspotX, hotspotY — cursor hotspot offset
  • pixelFormat — either NONE (use CSS cursor URL) or RAW_ARGB_DATA (raw pixels)
  • Raw ARGB pixel data (binary attachment after protobuf)

The cursor is NOT part of the H.264 video stream. It's transmitted separately for low-latency rendering.

DCV web viewer (NOT open source)

The web client (dcv.js) is proprietary/minified (~1.5MB bundle). It renders cursor as an HTML <img> overlay positioned over the streaming canvas. Two modes:

  • CSS cursor mode (pixelFormat=NONE): cursor: url(...) hotspotX hotspotY, auto
  • Virtual cursor mode (pixelFormat=RAW_ARGB_DATA): Absolute-positioned <img> (z-index 100, pointer-events: none) with raw ARGB data converted to data URL, positioned at (x - hotspotX, y - hotspotY) over the canvas (z-index 0)

Can we intercept the cursor data from the DCV protocol?

Yes, in theory. If we connect a proper DCV client (via the Web Client SDK), we receive PointerCursors messages containing the real cursor shape (ARGB pixel data + hotspot). We could:

  1. Connect to DCV as a viewer (full protocol handshake required)
  2. Intercept PointerPosition and PointerCursors protobuf messages
  3. Use the real cursor shape + position for compositing instead of fallback arrow

Challenges:

  • The DCV Web Client SDK is not on npm — it's a proprietary download from amazondcv.com
  • A raw WebSocket to port 8443 is NOT sufficient — DCV requires its full protocol handshake to register a client
  • The minified dcv.js is readable enough to reverse-engineer the protocol structure but complex

Related open-source DCV repos

Repo What it is Useful?
aws/dcv-access-console Session management portal (Apache-2.0) No — session management only, NOT the viewer
aws/dcv-color-primitives Rust pixel format conversion (ARGB/NV12/I420) Marginally — for pixel format handling
aws/dcv-gnome-shell-extension GNOME shell integration No
awsdocs/nice-dcv-admin-guide Admin documentation source Reference only

Desktop Shell Status

  • explorer.exe: Not running initially. Started successfully via PsExec -i 1 -s -d explorer.exe (PID 1704, Session 1)
  • DWM: Running in Session 1
  • Auto-logon: Configured (DefaultUserName=Administrator, AutoAdminLogon=1, Shell=explorer.exe)
  • Desktop issue: Despite Shell=explorer.exe in Winlogon registry, explorer wasn't running. Likely because the VM was freshly booted and the auto-logon sequence didn't complete properly without a viewer connected. The desktop may lack wallpaper because DCV's display compositor isn't fully active without a viewer.
  • Mouse devices: Only PS/2 Compatible Mouse (ACPI\PNP0F13) — no HID USB mouse
  • DCV service: Running, listening on port 8443 (TCP), firewall rule "NICE DCV Server (In)" enabled
  • Session 0 vs Session 1: SSM commands run as SYSTEM in Session 0. Interactive desktop is Session 1. PsExec -i 1 or schtasks /IT required for desktop access.

Screenshots Captured

All in .context/cursor-test/session2/:

File Size Method Notes
standard.png 15KB GDI CopyFromScreen Desktop rendered (explorer running), NO cursor
captureblt.png 8KB CopyFromScreen + CaptureBlt flag Same content, NO cursor
with_crosshair.png 24KB CopyFromScreen + manual red crosshair Red cross at (960,540) — proves position tracking works
sendinput.png 7KB After SendInput mouse events Desktop without cursor
drawicon_arrow.png 7KB CopyFromScreen + DrawIconEx(IDC_ARROW) Arrow cursor drawn at cursor position — the workaround
dcv_native.png 3KB DCV get-screenshot CLI (no viewer) Tiny/blank — display pipeline not active

Working Workaround (Current Codebase)

packages/client/src/client/cursor_overlay.py (CursorPainter) handles the suppressed cursor:

  1. Calls GetCursorInfo() — position is always valid even when SUPPRESSED
  2. When hCursor = 0 (SUPPRESSED): loads IDC_ARROW as fallback
  3. Renders cursor icon via DrawIconEx onto a small DIB
  4. Recovers alpha channel via double-render technique (draw on black → draw on white → compute alpha)
  5. Alpha-blends the sprite into the captured BGRA frame

Performance: ~0.1ms per frame at 1280x720.

Limitation: Always shows arrow cursor regardless of actual shape (I-beam, hand, resize, etc.).


Key Limitation: Cursor Shape

When hCursor = 0 (always the case on DCV), we CANNOT get the real cursor shape. The cursor might actually be:

  • I-beam (text editing)
  • Hand (link hover)
  • Resize arrows (window edges)
  • Busy spinner (loading)
  • Custom app cursors

All of these appear as the generic arrow fallback in our compositing. The actual cursor shape data is locked inside the IddCx hardware cursor channel that only DCV's driver can read.

Potential workaround for shape inference (unreliable):

  • GetClassLong(GetForegroundWindow(), GCL_HCURSOR) — get the cursor the active window's class registered
  • Hook SetCursor / SetClassLong(GCLP_HCURSOR) in target processes
  • Track which UI element is under the cursor and infer expected shape
  • None of these are reliable

Paths Forward

Option 1: Stay on DCV — Use dcv get-screenshot --blend-cursor

DCV internally knows the real cursor (I-beam, hand, resize, etc.) via IddCxMonitorQueryHardwareCursor. The --blend-cursor flag composites it into screenshots.

Requirements:

  • Must have an active DCV viewer connected (headless Puppeteer, per eval-script pattern)
  • Call dcv get-screenshot --blend-cursor -o <path> console per frame

Pros:

  • Gets the REAL cursor shape (not just arrow)
  • No fleet migration needed
  • DCV already installed on all fleet VMs

Cons:

  • CLI overhead per frame unknown — may be too slow for 2fps
  • Requires headless viewer connection (additional complexity in session setup)
  • Images go through DCV's pipeline (format conversion, potential quality loss)
  • Doesn't solve the issue for mss/ffmpeg-based capture — only for DCV CLI captures

Current status: Blocked — needs active viewer connection to produce non-blank images

Option 2: Migrate to Non-DCV EC2 (GPU instance with real display)

Use a g4dn / g5 instance with NVIDIA GRID/vGPU. The NVIDIA virtual display driver does NOT use IddCx hardware cursor — it renders cursor into the framebuffer like a physical display.

Expected behavior:

  • GetCursorInfo returns flags=1 (CURSOR_SHOWING), hCursor=<valid handle>
  • mss/GDI/DXGI capture includes the real cursor natively
  • No workaround needed at all

Pros:

  • Cursor capture "just works" — correct shape, zero overhead
  • Real GPU available (useful for other things)
  • Standard display driver behavior

Cons:

  • GPU instances are more expensive ($0.52/hr for g4dn.xlarge vs ~$0.19/hr for t3.xlarge)
  • Lose DCV's easy remote access (need RDP or VNC instead)
  • Need to rebuild AMI with NVIDIA drivers + license GRID
  • Fleet terraform changes required
  • RDP has its own cursor quirks (though generally better than DCV)

Alternative non-DCV options:

  • Parsec or Sunshine (open-source game streaming) as display server
  • XRDP + xorgxrdp on Windows (less common)
  • No remote protocol at all — run headless with NVIDIA virtual display, capture locally, no viewer needed

Option 3: Stay on DCV — Intercept Cursor from Protocol

Connect a minimal DCV protocol client that subscribes to dcv.input.PointerCursors messages. Get real cursor ARGB bitmap + hotspot. Feed to CursorPainter.

Pros:

  • Gets real cursor shape
  • Lower overhead than CLI per-frame (persistent connection, message-based)

Cons:

  • Requires reverse-engineering DCV protocol or using proprietary SDK
  • Complex implementation
  • Fragile — may break on DCV version updates
  • Still needs viewer connection to activate cursor streaming

Option 4: Stay on DCV — Windows.Graphics.Capture API with IsCursorCaptureEnabled

The modern WinRT capture API (Win10 1903+) has an explicit cursor inclusion property. It works through the DWM compositor via GraphicsCaptureItem.CreateForMonitor() + Direct3D11CaptureFramePool.

Why it might work on DCV:

  • Operates at the DWM compositor layer, not raw framebuffer reads
  • DWM knows about the cursor even on IDD displays
  • Apps like OBS use this API for cursor capture

Why it might NOT work on DCV:

  • IDD hardware cursor means DWM doesn't composite cursor either — it's been delegated
  • CURSOR_SUPPRESSED may apply to this API too
  • Microsoft docs are ambiguous about IDD behavior with this API

Requirements to test:

  • Write a C# program using Windows.Graphics.Capture with IsCursorCaptureEnabled = true
  • Compile and run on the DCV instance
  • Check if cursor appears in captured frames

Cons:

  • Requires C# code (or pythonnet/winsdk WinRT bindings)
  • Unconfirmed whether it works on IDD displays — needs testing
  • If it doesn't work, effort is wasted

Option 5: Hybrid — DCV for remote access, separate capture via DXGI Desktop Duplication

DXGI Desktop Duplication (IDXGIOutputDuplication) provides cursor data separately via GetFramePointerShape():

  • Returns cursor position, shape bitmap, hotspot
  • Works independently of GDI BitBlt

Why it might work:

  • Even on IDD, Desktop Duplication may provide cursor through its own API
  • The DXGI_OUTDUPL_POINTER_SHAPE_INFO struct is populated separately from the frame

Why it might NOT work:

  • IDD monitors may not support Desktop Duplication at all
  • AWS documentation for DCV states: "GetConsoleScreenshot functionality will not work as expected" with IDD

Requirements to test:

  • Write a C++ program that calls DuplicateOutput on the DCV virtual display
  • Check if GetFramePointerShape returns cursor data

Recommendation

For the experiments/synthetic-data-collection fleet specifically:

The fleet currently uses DCV Windows Server 2025 instances (synth-explorer-windows11-golden-v2* AMI). The TREC SDK records sessions, and cursor data is critical for training.

Short-term (lowest effort): Test Option 1 — connect Puppeteer viewer + dcv get-screenshot --blend-cursor. If performance is acceptable at 2fps, this gives real cursor shapes with minimal fleet changes.

Medium-term (if Option 1 is too slow or unreliable): Test Option 4 — Windows.Graphics.Capture. Write a small C# capture helper. If the WinRT API can see cursor on IDD, this is the cleanest solution (no viewer needed, no DCV CLI overhead).

Long-term (if cursor fidelity is critical for model quality): Option 2 — migrate to non-DCV. Real display driver, cursor just works. But this is a significant infrastructure change.


Appendix: Session/Process Architecture on the VM

Session 0 (non-interactive):
  ├── SYSTEM processes
  ├── dcvserver (DCV Server)
  ├── dcvagent
  └── SSM Agent (where our commands run)

Session 1 (interactive DCV desktop):
  ├── dwm.exe (Desktop Window Manager)
  ├── explorer.exe (must be started manually or via auto-logon)
  ├── dcvagent (x2, handles viewer I/O)
  └── [capture processes must run HERE for desktop access]

How to run in Session 1:
  - PsExec -i 1 [-s|-u user] program.exe
  - schtasks /IT /RL HIGHEST (interactive task)
  - EC2Launch executeScript with interactive flag

How DCV screenshot works:
  SSM → dcv.exe get-screenshot → talks to dcvserver (Session 0)
  → dcvserver reads its own IDD swapchain + IddCx cursor
  → composites if --blend-cursor → writes PNG

Appendix: What the Fleet Already Does for Cursor

From packages/client/src/client/cursor_overlay.py:

# CURSOR_SUPPRESSED (DCV / RDP / idle): position is valid but
# hCursor is NULL. Fall back to the system arrow so the model
# always sees *where* the user is pointing.
h = self._user32.LoadCursorW(None, ctypes.cast(_IDC_ARROW, wintypes.LPCWSTR))

From apps/eval-script/src/eval_script/run_remote.py:

  • start_dcv_viewer() — connects headless Puppeteer to DCV web client
  • Activates the framebuffer so ffmpeg gdigrab/ddagrab can capture
  • Cursor is still not in the framebuffer — eval-script relies on the CursorPainter workaround

From experiments/synthetic-data-collection/vm/launch.py:

  • Uses schtasks /IT to run in interactive Session 1
  • TREC SDK records the session (including cursor via the CursorPainter mechanism)
  • Cursor shape is always arrow fallback

Appendix: DCV Version History and Cursor

DCV Version Driver Cursor Behavior
Pre-2023.1 (Server 2016) "NICE DCV Virtual Display Driver" (WDDM mirror) Same suppression — hardware cursor mode
2023.1+ (Server 2019+) Built-in IDD (Indirect Display Driver) Same suppression via IddCx
2025.0 (current, Server 2025) AWS Indirect Display Device v1.0.226.0 Confirmed CURSOR_SUPPRESSED

The underlying cursor behavior has been the same across all DCV versions. The driver technology changed (WDDM mirror → IDD) but the "cursor as separate sprite" architecture is constant.


Migration Plan: Moving Away from DCV

Why Migrate

The main goal is to produce training data that matches what a real user's machine looks like at inference time.

What a real user's Windows machine looks like

On a real user's Windows machine (laptop/desktop):

  • Physical display adapter (Intel/AMD/NVIDIA) with standard WDDM driver
  • Cursor is rendered into the framebuffer by the display driver
  • GetCursorInfo returns flags=1 (CURSOR_SHOWING) with a valid hCursor
  • Cursor has the correct shape (I-beam, hand, resize, spinner, custom app cursors)
  • Screen capture (mss, DXGI, Windows.Graphics.Capture) includes cursor natively
  • No virtual display, no remote protocol layer
  • Full Windows 11 desktop with animations, transparency, shadows, rounded corners
  • Standard DPI (96-192 dpi typical), standard color profile (sRGB)
  • Physical mouse generating real HID input events
  • Standard font rendering (ClearType, DirectWrite)
  • Standard window chrome (Mica, acrylic materials on Win11)

How DCV instances differ from real machines (train/inference distribution mismatch)

Aspect Real User Machine DCV EC2 Instance Impact on Model
Cursor in framebuffer Yes — native rendering No — CURSOR_SUPPRESSED, hCursor=0 Model never sees real cursor shapes during training
Cursor shape Correct (I-beam, hand, resize, busy, custom) Always arrow fallback Model can't learn cursor-shape-to-context associations
Cursor rendering Anti-aliased by display driver, sub-pixel positioned Composited post-capture by CursorPainter (pixel-aligned, no sub-pixel) Subtle visual difference in cursor edges
Display driver Physical WDDM (Intel/AMD/NVIDIA) AWS Indirect Display Device (IDD) v1.0.226.0 Different rendering characteristics
Font rendering ClearType with sub-pixel AA, tuned for physical LCD ClearType may behave differently on virtual display (no physical sub-pixel layout) Text may look slightly different
DPI / Scaling Varies (100%-250% typical on modern laptops) Fixed 100% (1920x1080 at 96 dpi) Model only sees 100% scaling; never learns to handle scaled UIs
Color profile sRGB / Display P3, varies by panel No color management on virtual display Colors may be slightly off
Window animations Smooth (minimize/maximize/snap animations) Often disabled or glitchy on Server Model doesn't learn to handle animation frames
Desktop composition Full DWM with Mica, acrylic, shadows, rounded corners DWM runs but some effects may be reduced on Server Subtly different visual appearance
Wallpaper User-selected, varies wildly Stock Windows img0.jpg (scripted via prelaunch) Less visual diversity in training data
Taskbar Win11 centered taskbar with app icons May not render properly without viewer connected Different or missing taskbar appearance
Mouse input Physical HID USB/Bluetooth mouse PS/2 Compatible Mouse (ACPI\PNP0F13) emulated Different input device characteristics
OS Edition Windows 11 Home/Pro (consumer) Windows Server 2025 Datacenter (server) Different default policies, features, visual theme
Explorer shell Starts at logon automatically Requires PsExec -i 1 workaround to start Fragile desktop environment
Display activation Always on when user is using it Dormant without DCV viewer connection Capture may get blank frames if viewer disconnects
Network latency No remote protocol overhead DCV protocol adds indirection to any viewer-dependent behavior N/A for local capture, but affects any viewer-dependent setup

Which mismatches matter most for model quality

Critical (directly affects what the model learns to see):

  1. Cursor shape — model should learn that I-beam means text field, hand means link, etc.
  2. DPI scaling — real laptops often use 125-150% scaling; 100% is minority
  3. Desktop composition effects — rounded corners, shadows affect element boundaries

Medium (noticeable but less impactful): 4. Font rendering differences 5. Window animations during transitions 6. Color profile variations

Low (unlikely to affect model performance): 7. Wallpaper variety 8. Input device type 9. OS edition differences (mostly invisible in UI)

Target Architecture

Replace DCV with a GPU instance type + NVIDIA GRID driver (or equivalent). On a g4dn / g5 / g6 instance, the NVIDIA virtual display driver:

  • Renders cursor into the framebuffer (standard WDDM behavior)
  • GetCursorInfo returns flags=1 (CURSOR_SHOWING) with valid hCursor
  • mss/GDI/DXGI capture includes cursor natively
  • No CursorPainter workaround needed

For remote access, replace DCV with RDP (built into Windows) or Parsec/Sunshine (open-source, game-quality streaming).

How Synthetic Task Generation Currently Runs

The fleet is defined in experiments/synthetic-data-collection/:

┌─────────────────────────────────────────────────────────────────┐
│  CURRENT ARCHITECTURE                                           │
├─────────────────────────────────────────────────────────────────┤
│  Instance type:  c6i.4xlarge (16 vCPU, 32 GB, NO GPU)          │
│  OS:             Windows Server 2025 Datacenter (Win11 kernel)  │
│  Display:        DCV IDD virtual display (WinDisc)              │
│  Remote access:  DCV web viewer (port 8443)                     │
│  Fleet size:     up to 300 (MAX_FLEET_SIZE in terraform)        │
│  Pricing:        100% on-demand (~$0.68/hr c6i.4xlarge)         │
│  AMI:            synth-explorer-windows11-golden-v8b-*           │
│  Region:         us-west-2                                      │
└─────────────────────────────────────────────────────────────────┘

Boot sequence:

  1. ASG launches instance from Golden AMI
  2. EC2Launch v2 runs UserData (vm/userdata-fleet-launch.ps1)
  3. UserData waits for auto-logon (Administrator, Session 1)
  4. If auto-logon didn't fire in 3min → reboot
  5. AtLogOn scheduled task fires → start-worker-prelaunch.ps1:
    • Sets wallpaper (img0.jpg) via SystemParametersInfo
    • Sets DCV display layout (dcv.exe set-display-layout --session console 1920x1080)
    • Starts explorer.exe via PsExec -i 1 (workaround for Server 2025)
  6. Worker loop starts (python -m synthetic_data_collection.vm.worker)
  7. Worker long-polls SQS for (task_file, seed) messages
  8. Per message: spawns session_entrypoint subprocess (pywinauto + TREC SDK)
  9. Session records via TREC SDK (ffmpeg gdigrab/ddagrab capture)
  10. CursorPainter composites arrow cursor (because DCV suppresses real cursor)
  11. Upload bundle to S3 → delete SQS message
  12. Idle >5min → shutdown /s → ASG observes stop

Key DCV-specific steps (would be removed):

  • Step 5: dcv.exe set-display-layout — activates the virtual display
  • Step 5: PsExec -i 1 explorer.exe — starts shell (wouldn't be needed if display is real)
  • Step 10: CursorPainter arrow fallback — wouldn't be needed with real cursor

Key files:

terraform/asg.tf                  ← instance_type = "c6i.4xlarge", AMI lookup, MixedInstancesPolicy
vm/bootstrap_2025.ps1             ← Step 9 installs DCV, Step 13 writes prelaunch (dcv set-display-layout)
vm/check_ami_prereqs.ps1          ← Checks 7-8: dcvserver service + console session
vm/build_ami.sh                   ← Orchestrates bootstrap on c6i.4xlarge
vm/userdata-fleet-launch.ps1      ← Auto-logon detection + conditional reboot
vm/launch_vm.sh                   ← Single-VM launcher (instance_type=c6i.4xlarge)
src/.../vm/launch.py              ← Python launcher, calls dcv set-display-layout
src/.../vm/remote_windows.py      ← SSH/SCP over SSM (unchanged)
src/.../vm/worker.py              ← SQS worker loop (unchanged)
src/.../vm/session_entrypoint.py  ← Session driver (unchanged)

What Changes for GPU Spot VMs

┌─────────────────────────────────────────────────────────────────┐
│  TARGET ARCHITECTURE                                            │
├─────────────────────────────────────────────────────────────────┤
│  Instance type:  g4dn.xlarge (4 vCPU, 16 GB, 1x T4 GPU)       │
│                  OR g6.xlarge (4 vCPU, 16 GB, 1x L4 GPU)       │
│  OS:             Windows Server 2025 Datacenter (Win11 kernel)  │
│  Display:        NVIDIA GRID/vGPU virtual display (WDDM)       │
│  Remote access:  SSM only (no DCV, no RDP needed for fleet)    │
│  Fleet size:     same (up to 300)                               │
│  Pricing:        Spot (~$0.16-0.25/hr g4dn.xlarge)             │
│                  OR On-demand (~$0.52/hr g4dn.xlarge)           │
│  AMI:            synth-explorer-windows11-gpu-v1-*              │
│  Region:         us-west-2 (or multi-region for spot capacity)  │
└─────────────────────────────────────────────────────────────────┘

Why real Windows 11 (not Server 2025):

The current fleet uses Windows Server 2025 Datacenter which shares the Windows 11 24H2 kernel but is NOT Windows 11:

  • Different visual theme (no rounded corners on some controls, no Mica material by default)
  • Different default apps (no Microsoft Store apps, no modern Notepad/Paint/Calculator)
  • Different shell behavior (explorer doesn't auto-start reliably, different taskbar)
  • Different Group Policy defaults (animations often disabled, visual effects reduced)
  • Different UWP/WinUI app availability (Server SKU can't install Store apps via Add-AppxPackage under SYSTEM)
  • No "Windows 11 Home/Pro" specific features (Snap Layouts may differ, Widgets absent)

For training data that matches real user machines, we want actual Windows 11 Pro.

How to run real Windows 11 on EC2:

  1. BYOL (Bring Your Own License) — import a Windows 11 Pro image as a custom AMI
    • Create a Windows 11 Pro VM locally (Hyper-V, VMware, or VirtualBox)
    • Sysprep it (C:\Windows\System32\Sysprep\sysprep.exe /generalize /oobe /shutdown)
    • Export as VMDK/VHD
    • Use aws ec2 import-image to create an AMI
    • Must have valid Windows 11 Pro license (Volume Licensing or per-device)
  2. AWS-provided Windows 11 AMIs — AWS does NOT provide Windows 11 desktop AMIs natively (only Server). BYOL is the only path.
  3. Amazon WorkSpaces Image Builder — can create Windows 11 images, but tied to WorkSpaces (not raw EC2)

BYOL licensing:

  • Windows 11 Pro BYOL on EC2 requires a dedicated host or dedicated instance (Microsoft licensing requirement)
  • Dedicated hosts: g4dn.xlarge dedicated host pricing is higher but eliminates per-instance license cost
  • Alternative: volume licensing agreement (Microsoft 365 E3/E5 includes Windows 11 Enterprise BYOL rights for dedicated hosts)
  • For an experimental fleet: start with Server 2025 + UI tweaks, validate the GPU approach, THEN migrate to Win11 BYOL when ready for production training data

Pragmatic approach (phased):

  • Phase 1: Keep Server 2025, switch to GPU instance → fixes cursor immediately
  • Phase 2: Build Windows 11 Pro BYOL AMI on dedicated g4dn host → fixes all visual/UX mismatches
  • Phase 3: Full fleet on Win11 Pro BYOL → production-quality training data

Why GPU instances solve the problem:

  • NVIDIA GRID driver creates a standard WDDM display output on boot
  • Display renders frames WITHOUT needing any viewer connected
  • GetCursorInfo returns flags=1 (CURSOR_SHOWING) with valid hCursor
  • mss/GDI/ffmpeg capture includes the real cursor natively
  • Explorer starts normally via auto-logon (no PsExec workaround needed)
  • The display driver behaves like a real user's machine

Why spot works for GPU (vs current on-demand for c6i):

  • The c6i spot issue was 12-15min lifetimes (shorter than 9min boot + 5min session)
  • g4dn.xlarge spot in us-west-2 typically has 2-6hr lifetimes
  • GPU spot is more stable because there's less contention from batch/CI workloads
  • Even if interrupted: worker already handles task re-queuing via SQS visibility timeout
  • Total cost with spot: cheaper than current c6i on-demand ($0.16 vs $0.68/hr)

Boot sequence changes (delta from current):

  1. dcv.exe set-display-layout → removed. NVIDIA driver auto-creates 1920x1080 display.
  2. PsExec -i 1 explorer.exe → likely unnecessary (auto-logon + real display = explorer starts normally). Keep as fallback.
  3. CursorPainter arrow fallback → still present in code but the hCursor != 0 path fires instead (gets real shape).
  4. New: Set display resolution via Set-DisplayResolution PowerShell cmdlet (or ChangeDisplaySettingsEx Win32) in prelaunch.
  5. New: NVIDIA GRID driver install during AMI bake (from AWS S3 bucket, included in g4dn instance cost).

Complete DCV Dependency Inventory (12 files)

Tier 1: Must change (blocking)

File What it does with DCV Migration action
experiments/synthetic-data-collection/vm/bootstrap_2025.ps1 (lines 453-502) Installs DCV server MSI from CloudFront, creates console session Remove DCV install. Install NVIDIA GRID driver instead. RDP is built-in.
experiments/synthetic-data-collection/vm/check_ami_prereqs.ps1 (lines 118-130) Verifies dcvserver service + console session exist Replace with NVIDIA driver + RDP service checks
experiments/synthetic-data-collection/vm/build_ami.sh Orchestrates bootstrap (which installs DCV) Update to use GPU AMI base + NVIDIA driver
experiments/synthetic-data-collection/vm/launch.py (lines 259-269) Calls dcv.exe set-display-layout --session console 1920x1080 to attach DXGI framebuffer Remove. NVIDIA driver creates display output on boot. Set resolution via Set-DisplayResolution or ChangeDisplaySettings API.
apps/eval-script/src/eval_script/run_remote.py (lines 569-588) _set_dcv_resolution() calls dcv.exe set-display-layout Replace with PowerShell Set-DisplayResolution -Width 1920 -Height 1080 or Win32 ChangeDisplaySettingsEx
apps/eval-script/src/eval_script/run_remote.py (lines 786-958) start_dcv_viewer() — Puppeteer connects to DCV web client to activate framebuffer Remove entirely. NVIDIA driver renders framebuffer without needing a viewer.
apps/eval-script/src/eval_script/run_eval.py (lines 63-90, 1262-1277) Imports/calls start_dcv_viewer, stop_dcv_viewer, --dcv-resolution arg Remove DCV viewer logic. Keep resolution setting (via new method). Remove --no-dcv-viewer flag.
apps/eval-script/src/eval_script/run_os_eval.py (lines 381-382, 526-542) Same as run_eval.py — DCV viewer + resolution Same changes as run_eval.py

Tier 2: Can simplify (optional, but recommended)

File What it does with DCV Migration action
packages/client/src/client/cursor_overlay.py Workaround for DCV's CURSOR_SUPPRESSED — loads IDC_ARROW fallback when hCursor=0 Keep as-is (still handles RDP edge cases). Or simplify: with NVIDIA driver, GetCursorInfo returns valid hCursor, so the SUPPRESSED fallback path rarely fires.
packages/client/src/client/capture.py (lines 264-268) Documents why cursor compositing is needed on DCV/RDP Update comment. CursorPainter still adds value for RDP sessions.
apps/eval-script/src/eval_script/bake_cursor.py Post-hoc cursor overlay for recorded videos May become unnecessary if cursor is captured natively. Keep for backward compat with old recordings.
apps/eval-script/src/eval_script/interactive_launcher.py (lines 11-13) Documents DCV framebuffer activation dependency Update comment to reflect new architecture.

Tier 3: Infrastructure / Terraform

Change Details
Instance type t3.xlargeg4dn.xlarge (or g5.xlarge for newer GPU). Cost: ~$0.52/hr vs ~$0.19/hr
AMI base Switch from Windows Server 2025 base to AWS-provided Windows + NVIDIA GRID AMI
Security group Port 8443 (DCV) no longer needed. RDP port 3389 already allowed (or use SSM only).
IAM Remove DCV license S3 access (if any). No other changes.
ASG / fleet No structural changes — same ASG, same SQS, same worker model

Migration Steps (Ordered)

Phase 1: GPU AMI with Server 2025 (2-3 days)

Goal: Fix cursor capture immediately, keep Server 2025 OS.

vm/bootstrap_gpu.ps1 (fork of bootstrap_2025.ps1):

 # Step 9: NICE DCV server
-# Install DCV MSI from CloudFront...
-$dcvUrl = 'https://d1uj6qtbmh3dt5.cloudfront.net/nice-dcv-server-x64-Release.msi'
-# ... (lines 453-502 removed)
+# Step 9: NVIDIA GRID driver
+# g4dn instances include GRID license in instance cost.
+# Driver is available from AWS S3 bucket:
+$nvidiaUrl = "s3://ec2-windows-nvidia-drivers/latest/NVIDIA_grid_win10_win11_server2025_64bit.exe"
+aws s3 cp $nvidiaUrl "$env:TEMP\nvidia_grid.exe" --region us-east-1
+Start-Process "$env:TEMP\nvidia_grid.exe" -ArgumentList '/s', '/noreboot' -Wait
+# Verify display adapter is now NVIDIA
+$gpu = Get-WmiObject Win32_VideoController | Where-Object { $_.Name -match 'NVIDIA' }
+if (-not $gpu) { Fail 'NVIDIA driver install completed but no NVIDIA adapter found' }

vm/start-worker-prelaunch.ps1 changes:

-# Set DCV display layout to 1920x1080
-& 'C:\Program Files\NICE\DCV\Server\bin\dcv.exe' set-display-layout --session console 1920x1080 2>&1 | Out-Null
+# Set display resolution (NVIDIA GRID creates output on boot, just need resolution)
+# Using ChangeDisplaySettingsEx for reliability on headless boot
+Add-Type @"
+using System;
+using System.Runtime.InteropServices;
+public struct DEVMODE {
+    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)] public string dmDeviceName;
+    public short dmSpecVersion, dmDriverVersion;
+    public short dmSize, dmDriverExtra;
+    public int dmFields, dmPositionX, dmPositionY, dmDisplayOrientation, dmDisplayFixedOutput;
+    public short dmColor, dmDuplex, dmYResolution, dmTTOption, dmCollate;
+    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)] public string dmFormName;
+    public short dmLogPixels, dmBitsPerPel;
+    public int dmPelsWidth, dmPelsHeight, dmDisplayFlags, dmDisplayFrequency;
+    public int dmICMMethod, dmICMIntent, dmMediaType, dmDitherType;
+    public int dmReserved1, dmReserved2, dmPanningWidth, dmPanningHeight;
+}
+public class Display {
+    [DllImport("user32.dll")] public static extern int ChangeDisplaySettingsEx(
+        string lpszDeviceName, ref DEVMODE lpDevMode, IntPtr hwnd, int dwflags, IntPtr lParam);
+    public const int CDS_UPDATEREGISTRY = 0x01;
+    public const int DM_PELSWIDTH = 0x80000;
+    public const int DM_PELSHEIGHT = 0x100000;
+}
+"@
+$dm = New-Object DEVMODE
+$dm.dmSize = [System.Runtime.InteropServices.Marshal]::SizeOf($dm)
+$dm.dmPelsWidth = 1920
+$dm.dmPelsHeight = 1080
+$dm.dmFields = [Display]::DM_PELSWIDTH -bor [Display]::DM_PELSHEIGHT
+[Display]::ChangeDisplaySettingsEx($null, [ref]$dm, [IntPtr]::Zero, [Display]::CDS_UPDATEREGISTRY, [IntPtr]::Zero)

vm/check_ami_prereqs.ps1 changes:

-# Check 7: dcvserver service running
-$dcv = Get-Service -Name 'dcvserver' -ErrorAction SilentlyContinue
-if (-not $dcv -or $dcv.Status -ne 'Running') { ... }
-# Check 8: DCV console session
-$sessions = & $dcvExe list-sessions 2>&1
-if ($sessions -notmatch 'console') { ... }
+# Check 7: NVIDIA display adapter present
+$gpu = Get-WmiObject Win32_VideoController | Where-Object { $_.Name -match 'NVIDIA' }
+if (-not $gpu) { Fail 'No NVIDIA display adapter found' }
+# Check 8: Display resolution is set
+$screen = [System.Windows.Forms.Screen]::PrimaryScreen
+if ($screen.Bounds.Width -lt 1920) { Fail "Display too small: $($screen.Bounds.Width)x$($screen.Bounds.Height)" }

vm/build_ami.sh changes:

-INSTANCE_TYPE="c6i.4xlarge"
+INSTANCE_TYPE="g4dn.xlarge"

 # Base AMI: Windows Server 2025 Full Base (latest)
+# IMPORTANT: Use an AMI from us-east-1 that supports g4dn (must be EBS-backed, x86_64)

terraform/asg.tf changes:

 resource "aws_launch_template" "fleet_worker" {
   name_prefix   = "${local.project}-fleet-worker-"
   image_id      = data.aws_ami.synth_explorer_v2.id
-  instance_type = "c6i.4xlarge"
+  instance_type = "g4dn.xlarge"
   ...
 }

   mixed_instances_policy {
-    # 100% on-demand by default.
+    # Spot-first for g4dn. GPU spot has longer lifetimes (2-6hr typical)
+    # vs c6i spot (12-15min). Total cost: ~$0.16-0.25/hr vs $0.68/hr on-demand c6i.
     instances_distribution {
       on_demand_base_capacity                  = 0
-      on_demand_percentage_above_base_capacity = 100
+      on_demand_percentage_above_base_capacity = 20  # 20% on-demand fallback
       spot_allocation_strategy                 = "capacity-optimized-prioritized"
     }
     launch_template {
       ...
       override {
-        instance_type = "c6i.4xlarge"
+        instance_type = "g4dn.xlarge"
       }
       override {
-        instance_type = "c6a.4xlarge"
+        instance_type = "g4dn.2xlarge"  # fallback: more memory
       }
       override {
-        instance_type = "m6i.4xlarge"
+        instance_type = "g5.xlarge"     # fallback: newer GPU (L4)
       }
     }
   }

apps/eval-script changes:

 # run_remote.py
-def start_dcv_viewer(dcv_url, timeout=15, username="Administrator", password=None):
-    """Start a DCV viewer connection via Puppeteer..."""
-    # ... 170 lines of Puppeteer/Xvfb/Chrome setup ...
+# REMOVED: start_dcv_viewer / stop_dcv_viewer
+# GPU instances render framebuffer without a viewer.

-def _set_dcv_resolution(host, resolution="1920x1080", ...):
-    """Set DCV display layout."""
-    cmd = f'"C:\\Program Files\\NICE\\DCV\\Server\\bin\\dcv.exe" set-display-layout --session console {resolution}'
+def _set_display_resolution(host, resolution="1920x1080", ...):
+    """Set display resolution via PowerShell."""
+    w, h = resolution.split("x")
+    cmd = f'powershell -Command "Set-DisplayResolution -Width {w} -Height {h} -Force"'

Phase 2: Validation (0.5 days)

  1. Bake new AMI: synth-explorer-windows11-gpu-v1
  2. Launch single test instance
  3. Run cursor test script — verify GetCursorInfo returns flags=1, valid hCursor
  4. Run full synthetic-data-collection session — verify TREC recording includes real cursor with correct shapes
  5. Compare screenshots with current DCV screenshots — confirm visual fidelity improvement
  6. Benchmark: verify session completion time is similar (GPU shouldn't be slower for UI automation)

Phase 3: Fleet Rollout (0.5 days)

  1. terraform apply with new launch template
  2. Submit small test campaign (10 tasks) on new fleet
  3. Verify bundles in S3 have real cursor shapes
  4. Scale to full campaigns

Phase 4: Windows 11 Pro BYOL (future, 3-5 days)

Goal: Replace Server 2025 with real Windows 11 Pro for maximum fidelity.

Steps:

  1. Create Windows 11 Pro VM locally (Hyper-V)
    • Install Win11 Pro 24H2 from ISO
    • Install Office, Chrome, classic Win32 apps
    • Install NVIDIA GRID driver (compatible version)
    • Configure auto-logon, OpenSSH, disable sleep
    • Sysprep: sysprep.exe /generalize /oobe /shutdown /mode:vm
  2. Export VHDX → VHD (fixed size)
  3. Upload VHD to S3
  4. aws ec2 import-image --disk-containers file://containers.json
  5. Wait for import task (~30-60min for 50GB image)
  6. Launch on dedicated host (g4dn.host in us-west-2, ~$5.02/hr)
    • Microsoft BYOL requires dedicated hosts/instances
    • A single g4dn.metal dedicated host can run ~4× g4dn.xlarge VMs
    • Cost: $5.02/hr ÷ 4 = ~$1.26/hr per VM equivalent
    • OR use dedicated instances (more expensive per-VM but simpler)
  7. Modify bootstrap_gpu.ps1 for Win11 differences:
    • Modern Notepad/Paint/Calculator available natively (no Win32 fallback)
    • Explorer starts normally on logon (no PsExec workaround)
    • Microsoft Store apps work
    • Standard Win11 visual theme (Mica, rounded corners)
  8. Update AMI filter in terraform to synth-explorer-win11pro-gpu-v1-*

Win11 BYOL licensing options:

License Cost How
Windows 11 Pro (retail) ~$200 one-time per image Buy once, use on dedicated hosts
Microsoft 365 E3/E5 ~$36/user/month Includes Win11 Enterprise VDA rights
Windows 11 Enterprise E3 (per-device) ~$7/device/month VDA rights for virtual desktops
SA (Software Assurance) Varies Existing EA/MPSA may already include rights

For an experimental fleet of 300 VMs on dedicated hosts, the most cost-effective is a single Win11 Pro retail license + dedicated host infrastructure.

Alternative: Windows 11 on non-dedicated (cheaper, grey area):

  • AWS Marketplace has "Windows 11" AMIs from 3rd-party vendors (e.g., "Windows 11 with NVIDIA Gaming")
  • These bundle the license in the hourly price (~$0.10-0.20/hr premium)
  • Simpler than BYOL but limited to specific configurations
  • Check Marketplace for g4dn-compatible Win11 AMIs

Cost Analysis

Current (DCV on t3.xlarge) GPU (g4dn.xlarge) GPU Spot
On-demand $/hr ~$0.19 ~$0.52 ~$0.16
300-instance fleet $/hr ~$57 ~$156 ~$48
300-instance fleet $/day (8hr) ~$456 ~$1,248 ~$384

Spot instances make GPU cheaper than current on-demand DCV. g4dn.xlarge spot availability is generally good.

Alternative: Keep DCV for Fleet, GPU Only for Cursor-Critical Sessions

If the cost increase is unacceptable for the full fleet, a hybrid approach:

  • Keep DCV fleet for bulk data collection (arrow cursor is acceptable for most training)
  • Use GPU instances only for sessions where cursor shape matters (eval, QA, specific training subsets)
  • The CursorPainter fallback is already good enough for position — just not shape

Risks and Mitigations

Risk Mitigation
NVIDIA driver licensing cost on g4dn g4dn includes GRID vWS license in instance cost (no additional charge)
RDP cursor has its own quirks RDP cursor is in-framebuffer by default; only suppressed in specific redirect scenarios
No remote access without DCV web client Use SSM Session Manager for CLI access (already works). Use Fleet Manager or RDP for GUI.
g4dn spot interruption during recording Use spot instance interruption handling in worker (save state, re-queue task)
Display not activating without logon GPU instances auto-create display output. Auto-logon (already configured) ensures desktop is rendered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment