Patch ffmpeg's libavdevice/gdigrab.c:paint_mouse_pointer() to accept CURSOR_SUPPRESSED and use GetCursor() + AttachThreadInput() for the cursor handle. This is proven working — we built and tested a patched ffmpeg on a headless EC2 instance (2026-05-22).
Results:
- Baseline (unpatched):
ffmpeg gdigrab -draw_mouse 1→ 131KB video, NO cursor visible - Patched: same command → 160KB video, CURSOR VISIBLE and moving
// Line 495-496: Replace strict CURSOR_SHOWING check
// BEFORE:
if (ci.flags != CURSOR_SHOWING)
return;
// AFTER:
if (ci.flags == 0) // flags=0 = truly hidden, respect that
return;
// flags=1 (SHOWING) or flags=2 (SUPPRESSED) — both attempt cursor draw// Lines 498-503: Enhanced cursor handle fallback
// BEFORE:
if (!icon) {
icon = CopyCursor(LoadCursor(NULL, IDC_ARROW));
}
// AFTER:
if (!icon) {
/* CURSOR_SUPPRESSED: hCursor is NULL. Use GetCursor() via
* thread input attachment to get real cursor shape. */
HWND fg = GetForegroundWindow();
if (fg) {
DWORD tid = GetWindowThreadProcessId(fg, NULL);
DWORD my_tid = GetCurrentThreadId();
if (tid && tid != my_tid) {
AttachThreadInput(my_tid, tid, TRUE);
icon = CopyCursor(GetCursor());
AttachThreadInput(my_tid, tid, FALSE);
}
}
if (!icon)
icon = CopyCursor(LoadCursor(NULL, IDC_ARROW));
}GetCursorInfostill provides valid position (ci.ptScreenPos) even when SUPPRESSEDGetCursor()+AttachThreadInput()returns valid cursor handles with correct shapes (I-beam, hand, resize) even whenGetCursorInfo.hCursoris NULLDrawIcon()(existing ffmpeg code, unchanged) composites the cursor at the correct position- No performance overhead —
AttachThreadInputis ~0 cost
- No existing issue or PR in ffmpeg for CURSOR_SUPPRESSED on headless VMs
- Last cursor-related change to gdigrab.c was 2019 (HiDPI fix)
- The
ddagrabfilter (DXGI) has the same problem — usesPointerPosition.Visiblewhich is also FALSE on headless - This is an industry-wide gap: Google's WebRTC, python-mss, OBS all skip cursor when SUPPRESSED
Tested and working on the same headless EC2 instance (2026-05-22).
Results:
mss.MSS(with_cursor=True).shot()→ 105,448 bytes (cursor visible at (400,300))mss.MSS(with_cursor=False).shot()→ 105,236 bytes (no cursor)- Difference: 212 bytes = cursor pixels composited by mss's built-in
_merge()
Changes required (3 files, ~85 lines total):
mss/base.py(1 line) — allowwith_cursoron Windows:
("with_cursor", with_cursor, ["Linux", "Windows"]),mss/windows/gdi.py__init__(2 lines) — accept kwargs:
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)mss/windows/gdi.pycursor()(~80 lines) — replace no-op with:
def cursor(self):
# GetCursorPos() for position (works when SUPPRESSED)
# AttachThreadInput() + GetCursor() for real cursor handle
# DrawIconEx() to render cursor into 32x32 BGRA bitmap
# Alpha recovery (white background = transparent)
# Return ScreenShot with position for mss's built-in _merge()The key insight: mss already has the _merge() compositor infrastructure (used by Linux/Xlib backend). The Windows backend just needed a working cursor() implementation that doesn't rely on GetCursorInfo.hCursor (which is NULL when SUPPRESSED).
Upstream contribution path: mss PR #272 uses GetCursorInfo (same limitation). Our approach using GetCursor() + AttachThreadInput() is strictly better — could be contributed as an improved version of that PR.
- ffmpeg ddagrab — could add
GetCursor()fallback whenPointerPosition.Visible=FALSE - TREC SDK (trajectory-recorder) — currently uses
draw_mouse=0and captures cursor separately. With patched ffmpeg, just change todraw_mouse=1.
Other solutions (lower priority now that ffmpeg patch is proven):
-
Improve CursorPainter — Use
GetCursor()+AttachThreadInput()inpackages/client/src/client/cursor_overlay.py. Pure Python, no binary to ship. Good for the mss pipeline. -
Windows.Graphics.Capture
IsCursorCaptureEnabled=true(UNTESTED) — DWM-level composite. May work independently. Needs C++ test program. -
QEMU screendump from outside — Capture from Linux host. Cursor IS included. Different architecture.
dockur/windows runs real Windows 11 inside QEMU/KVM in Docker. Valuable for desktop fidelity (Start menu, Store, modern apps) but does NOT solve cursor capture.
Tested extensively (2026-05-21/22) on c5.metal:
| VGA Mode | GetCursorInfo flags | Cursor in mss? | Cursor in QEMU screendump/VNC? |
|---|---|---|---|
-vga virtio (default) |
flags=0 (HIDDEN) | No | Yes (overlay) |
-vga cirrus |
flags=1 (SHOWING!) | No | Yes (overlay) |
-vga cirrus + disabled driver |
flags=1 (SHOWING!) | No | Yes (overlay) |
Key finding: Even with GetCursorInfo returning flags=1 (SHOWING) on Cirrus VGA, the cursor is NOT in mss/GDI captures. QEMU's cirrus_cursor_draw_line composites into QEMU's internal DisplaySurface (what VNC/screendump reads), NOT into the guest-visible VRAM (what Windows GDI BitBlt / mss reads). These are two separate memory regions.
Why WindowsAgentArena doesn't actually solve it either: Their ShowCursor(True) + SendInput() hack doesn't put cursor into GDI captures. Their actual capture methods are: (1) QEMU screendump from the hypervisor level (includes overlay), (2) deprecated cursor.png manual compositing (same as our CursorPainter).
Industry validation: This is an industry-wide unsolved problem. Google's WebRTC (mouse_cursor_monitor_win.cc) sends an empty bitmap when CURSOR_SUPPRESSED. No one has native cursor capture on headless VMs.
Tested on g4dn.xlarge (NVIDIA Tesla T4, driver v32.0.15.9636) — GetCursorInfo still returns flags=2 (CURSOR_SUPPRESSED).
The cursor suppression is NOT DCV-specific. It is a property of any display without a physical monitor connected — including GPU instances. The WDDM driver creates a display output (1280x800 on T4), but the hardware cursor plane remains inactive without something physically consuming the signal.
Test instance: i-0c884ac115dd3be55 (g4dn.xlarge, Windows Server 2025, auto-logon Administrator active)
- Display adapter: NVIDIA Tesla T4 (GRID driver 596.36)
SetCursorPos(500, 500)— works, position is trackedShowCursor(True)× 5 — no effect on flagsGetCursorInfo→flags=2, hCursor=None, pos=(500,500)
This invalidates the "just switch to GPU" hypothesis. The migration plan in this doc (Option 2) needs revision.
The cursor is suppressed on ANY headless EC2 instance regardless of display driver:
- DCV IDD driver → CURSOR_SUPPRESSED
- NVIDIA GRID/T4 driver → CURSOR_SUPPRESSED
- Microsoft Basic Display Adapter → GetCursorInfo returns False entirely
The only known way to get flags=1 (CURSOR_SHOWING) is to have something actively consuming the display output as a "monitor" — e.g., a physical HDMI dongle, a connected DCV/RDP viewer, or potentially a custom IDD that claims to be a monitor.
| Attempt | Result |
|---|---|
| Install NVIDIA GRID driver (596.36) | Adapter present (T4 @ 1280x800), cursor still SUPPRESSED |
| Auto-logon + interactive Administrator session | Session active (confirmed via query session), cursor still SUPPRESSED |
SetCursorPos(500, 500) |
Position IS tracked correctly — but flags remain 2 |
ShowCursor(True) × 5 |
No effect on flags |
ConnectedMonitor=DFP-0 registry hack (tells NVIDIA a DFP is attached) |
No effect — cursor still SUPPRESSED after reboot |
Option A: Improve CursorPainter to get real cursor shapes (RECOMMENDED — lowest effort, highest impact)
What works today: CursorPainter composites cursor position into frames. Position is accurate even when SUPPRESSED.
What's broken: When flags=2, hCursor from GetCursorInfo is NULL, so we only composite the arrow fallback.
Fix: Use GetCursor() (user32.dll) which returns the cursor handle of the calling thread's message queue — or better, use GetClassLongPtr(hwnd, GCL_HCURSOR) or WM_SETCURSOR tracking to get the cursor that the foreground app has set. This gives us:
- I-beam over text fields
- Hand over links
- Resize arrows on window edges
- Busy/wait spinners
- App-custom cursors
Approach:
# In the capture thread, before each frame grab:
import ctypes
hCursor = ctypes.windll.user32.GetCursor() # returns current cursor for the calling thread
# OR: attach to foreground thread to get ITS cursor
foreground_hwnd = ctypes.windll.user32.GetForegroundWindow()
tid = ctypes.windll.user32.GetWindowThreadProcessId(foreground_hwnd, None)
ctypes.windll.user32.AttachThreadInput(current_tid, tid, True)
hCursor = ctypes.windll.user32.GetCursor()
ctypes.windll.user32.AttachThreadInput(current_tid, tid, False)Tradeoff: Still compositing (not native in framebuffer), but with CORRECT shapes. Sub-pixel positioning difference remains but is imperceptible at 1920x1080.
Effort: ~1-2 days. Modify packages/client/src/client/cursor_overlay.py to try GetCursor() / AttachThreadInput before falling back to the IDC_ARROW stub.
What it does: A connected RDP or DCV viewer activates the hardware cursor plane → GetCursorInfo returns flags=1 with valid hCursor.
Current implementation: start_dcv_viewer() in apps/eval-script/src/eval_script/run_remote.py launches headless Puppeteer → DCV web client → authenticates → maintains WebSocket connection.
Problem: Heavyweight (needs Node, Puppeteer, Chrome, Xvfb on Linux), fragile (Puppeteer timeouts, DCV auth failures), adds 15-20s startup per session.
Lighter alternatives:
- Minimal RDP client —
xfreerdp /v:host /u:Administrator /p:pass /cert:ignore /headlessor similar. RDP activates the cursor plane same as DCV. - FreeRDP without display — connects, authenticates, doesn't render pixels. Just enough to activate cursor.
- Custom WebSocket DCV handshake — only needs the DCV auth protocol, not full video decode. Could be 50 lines of Python with
websockets.
Tradeoff: Adds a dependency (viewer process must stay alive during capture). If viewer dies, cursor reverts to SUPPRESSED. But gives the "true" native cursor in framebuffer.
Effort: ~2-3 days if switching to FreeRDP. 0 days if keeping existing start_dcv_viewer() as-is.
What it does: A custom Windows driver that creates a virtual monitor AND composites the cursor into its framebuffer (instead of relying on the hardware cursor plane).
How: Microsoft's IddSampleDriver (on GitHub) creates virtual displays. Extend it to call GetCursorInfo in its frame-render callback and alpha-blend the cursor sprite.
Tradeoff:
- Most "correct" solution — cursor IS in the framebuffer natively
- But: requires building/signing a Windows kernel driver, deploying it to AMIs, maintaining it across Windows updates
- Security review would be intense for a custom kernel driver on production machines
Effort: ~1-2 weeks for a prototype, ~1 month production-ready with signing and testing.
What it does: Real hardware with physical displays → cursor just works natively.
Status: Daytona access pending. They claim Windows sandbox support.
Tradeoff: External dependency, cost unknown, may not meet scale requirements (300 concurrent). But eliminates ALL mismatches (cursor, DPI, font rendering, etc.)
Effort: 0 engineering effort if it works. Unknown timeline for access.
What it does: Run an RDP or VNC server + a headless client on the same instance, looping back to localhost. The server sees a "connected viewer" and activates the cursor plane.
How: mstsc /v:localhost or a headless RDP client connecting to 127.0.0.1:3389. Alternatively, TightVNC server + viewer both on the same box.
Tradeoff: Simpler than cross-machine viewer. RDP is already available on Server 2025 (we just disabled it). Re-enabling it for localhost-only + connecting a headless client might be the lightest-weight approach.
Effort: ~0.5-1 day to test. If RDP loopback activates the cursor → minimal code change.
Three options investigated for getting native cursor in ffmpeg video output:
How: Enable RDP on the instance, connect a headless FreeRDP client to 127.0.0.1:3389 from within the same instance. The RDP server activates the display/cursor.
Cursor: YES — RDP session has an active display stack with cursor rendering in the framebuffer.
But: On non-Server Windows (Win11), RDP takes over session 1 (only one session allowed). It doesn't create a separate session 2. Connecting RDP from within the console session is circular — it reconnects to the same session or gets blocked.
On Windows Server (which we use today): RDSH allows multiple sessions, so you could create a session 2 via RDP while session 1 is the console. But then which session does ffmpeg capture? It captures the console (session 1) which is now DISCONNECTED.
Performance: RDP encoder on localhost adds 5-15% CPU overhead (H.264 compression even for loopback).
Verdict: Doesn't cleanly work. The session model fights us.
| Aspect | Rating |
|---|---|
| Cursor in ffmpeg video? | Probably yes but session routing is complex |
| Closeness to real machine | Medium — RDP introduces its own display driver quirks |
| Complexity to implement | Medium — need to solve session routing |
| Works with existing infra? | Yes — just a script change |
| Windows 11? | Server 2025 (single-session limitation for Win11 BYOL) |
How: Use WorkSpaces API to programmatically create/terminate desktop instances. WorkSpaces uses DCV internally and creates an always-active console session at boot.
Cursor: UNCLEAR — WorkSpaces uses DCV internally (same IDD driver issue). The display is "active" for the streaming protocol, but GetCursorInfo may still return flags=2 unless a WorkSpaces client is connected. Needs testing.
Key facts:
- Windows 11 supported via BYOL (Enterprise 22H2, 23H2, 24H2, 25H2, LTSC 2024)
- Programmatic via
CreateWorkspaces/TerminateWorkspacesAPI - Custom images/bundles supported
- Resume from stopped: <90s
- Fresh provision: 10-20 min
- SSM access possible (need to install/enable SSM agent)
- No native SSH (install OpenSSH yourself)
- Pricing: AlwaysOn ~$21-35/month base, or AutoStop hourly
Closeness to real machine:
- Windows 11 ✓ (BYOL)
- DPI/Scaling: configurable
- Start menu, Store, modern apps: all present on Win11
- BUT: still a virtual display (DCV IDD) — same cursor suppression likely
Complexity to migrate:
- Need to replace EC2 ASG infrastructure with WorkSpaces API calls
- Different image creation workflow (WorkSpace → Image → Bundle)
- Worker code needs adaptation (SSM for remote commands, or baked into image)
- Auto-scaling is manual API (no native CloudWatch policies like AppStream)
Verdict: Attractive for Win11 + fast resume, but likely same cursor problem (DCV underneath). Would need to test if WorkSpaces client connection activates cursor differently than raw DCV.
| Aspect | Rating |
|---|---|
| Cursor in ffmpeg video? | UNKNOWN — likely same as DCV (SUPPRESSED) without client |
| Closeness to real machine | HIGH — real Win11, full desktop |
| Complexity to implement | HIGH — rewrite fleet infra for WorkSpaces API |
| Works with existing infra? | No — different API, image model, lifecycle |
| Windows 11? | Yes (BYOL) |
How: Create a fleet of streaming instances. AppStream keeps an always-active desktop with the DCV encoder running.
Cursor: UNCLEAR — same DCV stack underneath. The encoder is always running (consuming frames), which MAY activate the cursor plane since there IS a signal consumer. Needs testing.
Key facts:
- Windows Server only (2019, 2022) — NO Windows 10/11 support
- Fleet auto-scaling is excellent (CloudWatch policies, target tracking)
- Custom images supported
- No SSH/SSM access to fleet instances — this is a dealbreaker for running automation headlessly
- Designed for user-facing streaming, not headless automation
- Elastic fleets: per-second billing, AWS-managed provisioning
- Always-On fleets: instances running 24/7, ~$0.10/hr (stream.standard.medium)
Closeness to real machine:
- Windows Server only — no Win11 ✗
- No direct access for automation ✗
- Designed for app streaming TO users, not headless capture
Verdict: Not viable. No SSH/SSM access (can't run pywinauto/ffmpeg headlessly), no Windows 11, wrong abstraction level.
| Aspect | Rating |
|---|---|
| Cursor in ffmpeg video? | UNKNOWN — probably yes if encoder activates cursor |
| Closeness to real machine | LOW — Server OS only, no direct access |
| Complexity to implement | VERY HIGH — completely different paradigm |
| Works with existing infra? | No — can't SSH in, can't run headless automation |
| Windows 11? | No |
| EC2 + RDP Loopback | WorkSpaces Core | AppStream 2.0 | |
|---|---|---|---|
| Cursor native in video | Likely (needs test) | Unlikely (DCV underneath) | Unknown |
| Windows 11 | BYOL (dedicated host) | BYOL (built-in support) | No |
| Direct SSH/SSM access | Yes | Yes (with setup) | No |
| 300 concurrent | Yes (existing ASG) | Yes (API, but manual scaling) | Yes (auto-scaling) |
| Boot time | 5-10 min | <90s (from stopped) | 1-2 min |
| Cost (per instance/hr) | $0.16-0.68 (spot/OD) | ~$0.30-0.50 | ~$0.10-0.20 |
| Custom images | AMI | Image → Bundle | Image |
| Migration effort | None (script change) | Rewrite fleet infra | Not viable |
| Closeness to real machine | Medium (Server 2025) | High (Win11 BYOL) | Low (Server only) |
CURSOR_SUPPRESSED (flags=2) was introduced in Windows 8. Documented as "system is not drawing the cursor because the user is providing input through touch or pen instead of the mouse." In practice, fires on ANY headless/virtual display where the hardware cursor plane isn't active.
The cursor is NOT part of the desktop framebuffer on modern Windows. The DWM composites it via a separate hardware overlay plane. Screen capture APIs (GDI BitBlt, DXGI Desktop Duplication, mss) return frames WITHOUT cursor. ffmpeg -draw_mouse 1 internally calls GetCursorInfo and skips drawing when CURSOR_SUPPRESSED — so it fails on headless VMs too.
GitHub: github.com/itsmikethetech/Virtual-Display-Driver (MIT license)
Creates virtual monitors via IddCx that Windows treats as real physical displays. The key differentiator: it calls IddCxMonitorSetupHardwareCursor which tells Windows to activate the hardware cursor on its virtual monitor.
- IddCx hardware cursor mode means Windows should report
CURSOR_SHOWINGfor sessions targeting that display - Supports up to 4K, HDR, ARM64, custom EDIDs
- Signed driver available (no test-signing needed)
- User-mode install via companion VDC application
- Known issue:
IddCxMonitorSetupHardwareCursorreturnsSTATUS_INVALID_PARAMETERon Windows Server 2019 (GitHub issue #304). Works on Windows 10/11. - Headless-server use case explicitly listed
TESTED 2026-05-20 on c6i.xlarge / Windows Server 2025: VDD installs and runs (Status: Started, oem9.inf, <HardwareCursor>true</HardwareCursor>), creates a virtual monitor (DesktopMonitor2 at 800x600), BUT GetCursorInfo still returns flags=2 (CURSOR_SUPPRESSED).
The IddCx hardware cursor flag does NOT fix the CURSOR_SUPPRESSED state on Server 2025. The GitHub issue #304 (fails on Server 2019) appears to also affect Server 2025 — IddCxMonitorSetupHardwareCursor may succeed without error but the OS still doesn't activate the cursor plane for screen capture APIs.
This option is ELIMINATED for our use case.
Uses Parsec's IddCx driver for virtual displays. Explicitly does NOT support hardware cursor (marked with ✗ in comparison table). Cursor remains suppressed.
Uses DXGI Desktop Duplication's GetFramePointerShape to get cursor data, then composites via DirectX shaders. Same principle as CursorPainter but GPU-accelerated. Does not activate cursor in framebuffer — composites it in the streaming pipeline.
IDXGIOutputDuplication::GetFramePointerShape returns cursor bitmap + hotspot alongside each frame. The cursor is always provided separately (never in the captured surface). You must composite it yourself. This is how OBS, Sunshine, and most screen recorders handle cursor on headless VMs.
OBS specifically: checks (ci.flags & CURSOR_SHOWING) == 0 and marks cursor invisible. Treats CURSOR_SUPPRESSED same as hidden. OBS does not handle headless cursor.
-draw_mouse 1 internally uses GetCursorInfo. When flags=2, ffmpeg skips cursor drawing. Does not work headless.
Physical HDMI dongle simulates a monitor's EDID → GPU activates hardware cursor plane → GetCursorInfo returns CURSOR_SHOWING. Trivial solution for physical servers. Not applicable to EC2.
| Approach | Cursor in ffmpeg video natively? | Needs driver install? | Works on Server 2025? |
|---|---|---|---|
| VDD (itsmikethetech) | NO — tested, still SUPPRESSED on Server 2025 | Yes (signed driver) | ✗ Tested and failed |
| Parsec VDD | No | Yes | N/A |
| DXGI + manual composite | Yes (but requires custom capture) | No | Yes |
| Improve CursorPainter | No (composited in Python, not ffmpeg) | No | Yes |
| RDP loopback | Probably yes | No | Session model issues |
| Sunshine-style GPU composite | Yes (but huge implementation) | No | Yes |
TESTED AND ELIMINATED:
VDD (itsmikethetech)— Installed, driver running, HardwareCursor=true, still CURSOR_SUPPRESSEDNVIDIA GRID driver (g4dn.xlarge)— Driver installed, display active, still CURSOR_SUPPRESSEDConnectedMonitor=DFP-0 registry hack— No effectShowCursor(True)— No effect on flags
REMAINING VIABLE OPTIONS:
Priority 1: Test — TESTED, IT WORKS!!! (2026-05-20)GetCursor() from UI thread
Tested on i-0bda2456beb91d51e (c6i.xlarge, Windows Server 2025, auto-logon Administrator, session 1):
GetCursorInfo: flags=2 hCursor=0 pos=(400,300) ← SUPPRESSED, no handle
GetCursorPos: (400,300) ← position WORKS
GetCursor (own thread): 65543 ← VALID HANDLE!
AttachThreadInput: True ← attached to foreground thread
GetCursor (attached to fg): 65539 ← REAL CURSOR HANDLE!
LoadCursor(IDC_ARROW): 65539 ← matches arrow (desktop idle)
GetCursor() returns a valid cursor handle even when GetCursorInfo says SUPPRESSED.
When attached to the foreground window's thread via AttachThreadInput(), GetCursor() returns the real cursor that the application has set:
- Desktop/Explorer idle → IDC_ARROW (65539)
- Text field → IDC_IBEAM
- Link → IDC_HAND
- Window edge → IDC_SIZENWSE etc.
This is the solution. Combine:
GetCursorPos()→ position (works even when SUPPRESSED)AttachThreadInput(myThread, foregroundThread, TRUE)→ attach to UI threadGetCursor()→ real cursor handle with correct shapeDrawIconEx()→ composite into frame (existing CursorPainter code)
No infrastructure changes needed. No viewer needed. No driver changes. Pure code fix in packages/client/src/client/cursor_overlay.py.
Priority 2: Test RDP loopback — TESTED EXTENSIVELY, CONFIRMED DOESN'T WORK (2026-05-20)
Successfully established an RDP loopback connection from within the VM using mstsc /admin /v:127.0.0.1 (pre-trusted cert + stored credentials). Session 3 (rdp-tcp#0) was created and became Active.
However: GetCursorInfo in the console session (session 1) STILL returned flags=2 (SUPPRESSED).
This definitively proves: cursor plane activation is PER-SESSION, not global. An active RDP session on the same machine does NOT activate the cursor for OTHER sessions. You would need to capture FROM the RDP session itself — meaning ffmpeg would have to run inside session 3, not session 1.
Approaches tested:
mstsc /admin /v:127.0.0.1→ session created, console cursor still SUPPRESSED ✗mstsc /v:127.0.0.1(new session as RdpViewer) → stuck at credential dialog ✗- ActiveX MsTscAx COM control → failed to connect ✗
- TightVNC server running → cursor SUPPRESSED (VNC isn't Windows-native) ✗
- TightVNC viewer loopback → cursor SUPPRESSED ✗
- WTSConnectSession API → ACCESS_DENIED ✗
- LogonUser INTERACTIVE → token created but doesn't activate cursor ✗
This is the definitive answer: you cannot activate the cursor for a given session from within that same VM without being in that specific session's RDP/DCV viewer.
Remaining:
Priority 3: Daytona / physical hardware vendor
- Only option that gives TRULY native cursor (no compositing at all)
- Physical monitor = hardware cursor plane active
- Blocked on vendor access
Repo: github.com/microsoft/WindowsAgentArena
They face the EXACT same problem. From their code:
# fixme: This is a temporary fix for the cursor not being captured on Windows and LinuxTheir architecture:
- Windows 11 runs inside QEMU/KVM (via
dockur/windowsDocker image), NOT on EC2 directly - QEMU provides its own virtual display (VGA/QXL device)
- They expose a web viewer on port 8006 (noVNC) and RDP on port 3389
- Screenshots are taken via QEMU QMP
screendumpcommand (captures VM framebuffer from outside the guest)
Their cursor workaround (computer.py line 23-40):
### mouse fix:
# the cursor doesn't show up in screenshots otherwise
user32 = ctypes.WinDLL('user32')
user32.ShowCursor(True)
user32.SendInput(1, ctypes.pointer(x), sizeof(INPUT)) # synthetic mouse moveThey call ShowCursor(True) + send a synthetic mouse move event at module load. This is a QEMU-specific hack — in QEMU, ShowCursor(True) combined with mouse activity can make the guest render the cursor into the framebuffer (because QEMU's VGA device handles cursor differently than IddCx/WDDM virtual displays).
Their deprecated fallback (main.py line 309-334):
- On Windows: composite a static
cursor.pngatpyautogui.position()— exactly what our CursorPainter does - On Linux: use
XFixesGetCursorImageto get the real cursor shape and composite it - On macOS: use
screencapture -Cwhich includes cursor natively
Key difference from us: QEMU captures the framebuffer from the hypervisor level (QMP screendump), not from within the guest OS. The ShowCursor hack works because QEMU's VGA/QXL virtual device composites the guest cursor into the framebuffer when the cursor display counter is positive.
Could we use QEMU? Theoretically yes — run Windows in QEMU inside an EC2 instance (nested virtualization). The dockur/windows Docker image does exactly this. But:
- Nested virtualization adds overhead (15-25% CPU, memory overhead for host+guest)
- EC2 metal instances needed for KVM (or c5.metal/m5.metal at ~$4/hr)
- OR: EC2 instances support nested virtualization on c5/m5/r5 since 2020 (with linux-kvm on the host)
- More complex AMI/deployment (Docker + QEMU layer)
- But: real Windows 11, cursor works via QMP
screendump, no viewer needed
CONCLUSION: On any headless EC2 instance (Server 2025, any driver), native cursor in ffmpeg is impossible without a connected viewer. The only code-level fix is improving CursorPainter to get real cursor shapes via GetCursor() + compositing them into frames (either in Python or via a custom ffmpeg filter).
Alternative architecture: Run Windows inside QEMU (like WindowsAgentArena does). QEMU's screendump captures cursor from the hypervisor level. Adds complexity but gives real Win11 + cursor + no viewer needed.
Tested on c5.metal with dockur/windows Win11, KVM enabled, full desktop session.
GetCursorInfo returned: True
flags = 0 (HIDDEN, not SUPPRESSED)
hCursor = 0
pos = (640, 360)
flags=0(different from EC2'sflags=2) — cursor is HIDDEN (never initialized by the display path), not SUPPRESSED- mss screenshot from inside guest: no cursor visible (confirmed visually)
ShowCursor(True)called 5 times: flags stays 0- SendInput mouse move: flags stays 0
The cursor visible in QEMU screendump pixel diffs and in the noVNC web viewer is NOT the Windows framebuffer cursor. It's QEMU's VGA hardware cursor overlay:
Windows kernel (win32k.sys)
→ writes cursor to VGA hardware cursor registers
→ QEMU's virtual VGA device intercepts these writes
→ QEMU sends cursor via VNC protocol to web viewer (client-side composite)
→ QEMU composites cursor into screendump output (hypervisor overlay)
→ BUT: does NOT write cursor into the guest-visible VGA framebuffer RAM
→ mss/GDI/ffmpeg inside guest read framebuffer RAM → no cursor
Their ShowCursor(True) + SendInput() does not put cursor into GDI captures. Their actual capture:
- QEMU QMP
screendump— hypervisor overlay (NOT the guest framebuffer) - Deprecated
cursor.pngcompositing — same as our CursorPainter
-
Capture via QMP
screendumpfrom outside the guest — This DOES include cursor (hypervisor overlay). Record video by taking rapid screendumps from the Linux host instead of ffmpeg inside Windows. Downside: QMP screendump is slow (~200ms per frame), limited to ~5fps. -
Use QEMU's
-display spicewith Spice streaming agent — Spice protocol captures full framebuffer + cursor composited. The Spice streaming agent inside the guest could potentially provide a capture with cursor. Needs investigation. -
Read VGA cursor registers from inside the guest — Windows writes cursor bitmap + position to the virtual VGA's hardware cursor registers. A custom driver or direct port I/O could read these and composite the cursor. Deep custom development.
-
QEMU
-cursor showor display options — Some QEMU VGA/display options might force cursor into the framebuffer instead of using hardware cursor overlay. Needs investigation of QEMU display backend options. -
Use virtio-gpu instead of VGA — virtio-gpu may handle cursor differently. If it renders cursor into the scanout buffer instead of as a separate plane, captures inside the guest would include it.
Tested on c5.metal + dockur/windows + VGA=cirrus environment variable:
{
"GetCursorInfo": true,
"flags": 1,
"flags_meaning": "SHOWING",
"hCursor": 65539,
"pos": [400, 300]
}flags=1 (CURSOR_SHOWING) with a valid hCursor handle! The Cirrus VGA emulation in QEMU composites the hardware cursor directly into the display surface via cirrus_cursor_draw_line. This means:
- ffmpeg
gdigrab -draw_mouse 1works natively (checks GetCursorInfo, sees SHOWING, draws cursor) - mss captures include cursor (Cirrus renders cursor into the framebuffer)
- No compositing needed — cursor is natively in the captured frames
How to use:
docker run -d --name win11 --device /dev/kvm \
-e VGA=cirrus \
-e RAM_SIZE=8G -e CPU_CORES=16 \
-v /shared:/shared \
dockurr/windowsHOWEVER (further testing 2026-05-22): Despite GetCursorInfo returning flags=1 (SHOWING) with Cirrus VGA, the cursor is still NOT visible in mss/GDI captures. The Cirrus cursor_draw_line composites into QEMU's internal DisplaySurface (what VNC/screendump reads), NOT into the guest-visible VRAM (what Windows GDI BitBlt / mss reads).
Also tested: disabling the Cirrus driver inside the guest to force "Microsoft Basic Display Adapter" (software cursor). Result: still flags=1 but cursor NOT in mss capture.
Conclusion: GetCursorInfo flags=1 is necessary but NOT sufficient. The cursor rendering always goes to QEMU's overlay, never into the guest-mapped VRAM framebuffer that capture APIs read.
Trade-off: Cirrus VGA is limited to 800x600 resolution. More importantly, even with SHOWING flags, cursor doesn't appear in guest-side captures.
There is NO way to get cursor natively into mss/GDI/ffmpeg captures on any virtualized Windows environment.
Tested exhaustively:
-vga cirruswithGetCursorInfo flags=1(SHOWING) → cursor NOT in mss capture-vga cirrus+ disabled driver (Basic Display Adapter) → cursor NOT in mss capture-vga virtio(default) →flags=0(HIDDEN), cursor NOT in mss capture- EC2 with NVIDIA GPU, VDD, IddCx →
flags=2(SUPPRESSED), cursor NOT in mss capture
The cursor is ARCHITECTURALLY in a separate layer on ALL platforms:
- On real hardware: GPU hardware cursor plane (overlay)
- On QEMU: internal DisplaySurface overlay (drawn by
cursor_draw_line, visible to VNC/screendump but NOT guest VRAM) - On EC2/IddCx: cursor sprite channel (visible to DCV/RDP viewers but NOT GDI framebuffer)
The only working solutions are:
- Composite in capture pipeline —
GetCursor()+AttachThreadInput+DrawIconExinto frames (proven working, gives real shapes) - Patch ffmpeg gdigrab — same logic inside ffmpeg's C code (~15 lines)
- Capture from OUTSIDE the guest — QEMU
screendump(includes cursor overlay) at ~5fps max - Windows.Graphics.Capture
IsCursorCaptureEnabled— DWM-level composite, UNTESTED on headless
On a random VPS (Hetzner, OVH, Vultr, etc.), you connect via KVM-over-IP (IPMI/iLO/iDRAC) or VNC to QEMU. When you see the cursor — it's because you're looking at it through a viewer (the KVM console, VNC client, or web-based noVNC). That viewer IS the "connected monitor" that activates the cursor plane.
The cursor was never "in the framebuffer" on those VPS providers either. What's happening:
| What you see | What's actually happening |
|---|---|
| VPS web console with cursor | VNC/SPICE viewer connected → QEMU composites cursor into the stream sent to YOUR browser |
| VPS with VNC client | Same — VNC protocol sends cursor position + shape, viewer renders it client-side |
| Hetzner KVM console | iLO/IPMI captures video output from a real GPU via hardware capture card + overlays cursor |
The difference with AWS/Azure:
- AWS EC2 has no KVM console / IPMI / iLO access. There's no hypervisor-level VNC or SPICE endpoint exposed to customers.
- Azure similarly — you get RDP or serial console, not raw hypervisor framebuffer access.
- The hypervisor (Nitro on AWS, Hyper-V on Azure) doesn't expose a
screendumpor VNC endpoint. - On cheap VPS providers, the hypervisor IS QEMU with VNC exposed. On AWS, it's a proprietary Nitro hypervisor with no customer-facing display port.
In other words: On VPS providers, cursor "works" because you're always looking through a viewer. If you SSH'd into that same Hetzner VPS and ran GetCursorInfo without having VNC open — you'd get CURSOR_SUPPRESSED too.
- EC2 Serial Console — text-only, no graphics
- EC2 Instance Screenshot (
aws ec2 get-console-screenshot) — captures the Nitro framebuffer! But it's a low-res JPEG, rate-limited (1/min), and intended for debugging boot issues. Does it include cursor? Unknown — worth testing. - Bare metal instances (
.metal) — you get the full hardware, but still no IPMI/BMC access - AWS Outposts — dedicated hardware in your datacenter, but same Nitro interface
This API captures the instance's console output as seen by the Nitro hypervisor. On Windows instances it shows the login screen or desktop. It's captured at the hypervisor level (like QEMU screendump). If Nitro composites cursor into this capture, it would prove hypervisor-level capture works. But:
- Rate limited (debugging tool, not real-time capture)
- Low resolution JPEG
- May or may not include cursor
The "KVM with cursor" experience on VPS providers is an illusion — you see cursor because your browser/client IS the viewer. The moment you try to capture programmatically FROM INSIDE the VM without a viewer connected, you hit the same CURSOR_SUPPRESSED issue everywhere.
The only true solutions remain:
- Keep a viewer connected (lightweight RDP/VNC/DCV client) — simulates "you looking at the screen"
- Capture from outside the VM (QEMU
screendumpapproach) — requires running Windows inside QEMU on EC2 - Composite cursor ourselves (CursorPainter) — works everywhere, no viewer needed
- Physical hardware — actual monitor connected = cursor always works
This investigation was triggered by a cross-team meeting where the cursor visibility problem was identified as a critical blocker. Key takeaways from the meeting:
- ~5 people have already investigated the cursor problem without finding a solution
- Team consensus: "maybe even impossible to get a cursor" on DCV — this investigation confirms it IS impossible on ANY headless EC2 instance
- The custom recorder approach (FFmpeg + Win32 APIs to stitch cursor) "deviates from real user experience" and "adds latency, complexity, and high chance of errors"
- Windows 11 licensing is a known challenge — Windows Server 2022/2025 "lacks key Windows 11 features like the proper start menu and Windows Store"
- Team explored vendor solutions: Daytona (claims Windows sandboxes, access delayed), AWS WorkSpaces (different use case — no cursor tracking), GitHub Actions (farms of computers)
- Team preference: use a vendor if one can solve the problems; too many parallel individual solutions at 80-90% of requirements
- Spin-up time: 5-10 minutes per instance, 2-3 hours overhead for 500-hour data campaigns
- Warm pool approach suggested but not yet implemented
- Cannot containerize Windows 11 — unlike Linux (thousands of containers on one host), each Windows instance needs a full VM
- WebAct has additional networking isolation requirements (VPN, no Google access)
- Multi-OS future: eventually need Windows 7, Windows 10, Mac support; once Win11 is solved, others should be straightforward
- Action items include: follow up on Win11 license via AWS License Manager, explore custom Win11 image, continue cursor investigation, evaluate warm pool, test Daytona
| Meeting question | Answer from this investigation |
|---|---|
| "Can we get a real cursor on DCV?" | No. Confirmed impossible on ANY headless EC2 — including GPU instances (tested g4dn.xlarge with NVIDIA T4) |
| "Is post-processing cursor data viable?" | Yes (CursorPainter works) but only gets arrow shape. Fix: use GetCursor() from UI thread to get real shapes |
| "Can we get real Windows 11?" | Yes via BYOL on dedicated hosts |
| "Is there a vendor solution?" | Daytona pending evaluation. No EC2-only solution exists for native cursor |
| "How do we match real user machines?" | Need either: (a) headless viewer to activate cursor, (b) improved CursorPainter with real shapes via GetCursor(), or (c) custom IDD |
Determine if the OS cursor can be captured in screenshots on Windows 11 (Server 2025) DCV instances, and identify what's needed to make it work — or whether we should migrate away from DCV.
- Instance:
i-0826c353ccf4834da(cursor-test-maxshmi) - OS: Windows Server 2025 Datacenter (10.0.26100) — Win11 kernel
- DCV Version: 2025.0-20103
- Display Driver: AWS Indirect Display Device (IDD) v1.0.226.0
- Screen: 1920x1080 via
WinDisc(DCV virtual display) - DCV Session:
console(owner: Administrator, type: console)
No. There is NO way to get the OS to render the real cursor into the framebuffer while DCV's IDD is loaded.
The cursor suppression is a kernel-level architectural feature of the IDD driver model. The IDD calls IddCxMonitorSetupHardwareCursor once at monitor path commit time, and from that point forward:
GetCursorInforeturnsflags=2(CURSOR_SUPPRESSED),hCursor=0for ALL processes- No user-mode API can override this (SendInput, ShowCursor, SetCursor, registry, etc.)
- Only the IDD driver itself could release the cursor claim — and DCV's driver never does
However, there ARE ways to get cursor data through DCV itself (since DCV owns the cursor channel), and there are alternative EC2 configurations that don't have this problem.
GetCursorInfo() consistently returns:
flags = 2(CURSOR_SUPPRESSED)hCursor = 0(NULL — no cursor handle available)ptScreenPos = (x, y)— position IS tracked correctly
This means: the OS knows WHERE the cursor is, but refuses to render it into any framebuffer.
| Approach | Result |
|---|---|
| Run as SYSTEM (PsExec -s -i 1) | Still suppressed |
| Run as Administrator (PsExec -i 1 -u Administrator) | Still suppressed |
| Run as standard user (testuser) | Still suppressed |
| Run via schtasks /IT (real interactive desktop) | Still suppressed |
| PsExec -i 1 starting explorer.exe (desktop renders, taskbar visible) | Still suppressed — IDD cursor claim is display-driver-level, independent of running processes |
| SendInput with MOUSEEVENTF_ABSOLUTE | Position changes, still suppressed |
| SendInput with MOUSEEVENTF_MOVE (relative) | Still suppressed |
| ShowCursor(true) | Still suppressed |
| LoadCursor(IDC_ARROW) + SetCursor() | Still suppressed |
| SetCursorPos() | Position changes, still suppressed |
DCV registry: pointer = software |
Already set, no effect on local capture |
DCV registry: enable-client-cursor = 0 |
Already set, no effect on local capture |
| DCV service restart | Still suppressed after restart |
| Raw WebSocket connection to DCV port 8443 | Not recognized as viewer — still no client connected |
| Approach | Result |
|---|---|
| DrawIconEx with loaded IDC_ARROW | Successfully draws arrow cursor onto captured bitmap |
| GetCursorInfo position tracking | Position always accurate, even though hCursor=0 |
DCV get-screenshot --blend-cursor |
CLI flag exists! But produces blank image without active viewer |
| PsExec -i 1 starting explorer.exe | Successfully launches shell in interactive Session 1 |
DCV's Indirect Display Driver (v1.0.226.0) follows this architecture:
- Driver calls
IddCxMonitorSetupHardwareCursorwhen the monitor path is committed - This tells Windows DWM: "I own cursor rendering — don't draw it into the framebuffer"
- Windows sets
GetCursorInfo.flags = CURSOR_SUPPRESSEDfor ALL user-mode processes - Cursor updates flow through the IddCx channel via
IddCxMonitorQueryHardwareCursor:IDARG_OUT_QUERY_HWCURSOR.IsCursorVisible(BOOL)IDARG_OUT_QUERY_HWCURSOR.X,Y(screen coordinates)IDARG_OUT_QUERY_HWCURSOR.CursorShapeInfo(type, width, height, pitch, hotspot)- The actual cursor bitmap data
- Only DCV's IDD driver process can read this data — no public Windows API exposes it
- DCV transmits cursor as a separate protocol channel to connected viewers (low-latency sprite overlay)
The hardware cursor claim is made at the display-subsystem level. It is NOT per-process, per-user, or per-connection. The claim persists as long as:
- The DCV IDD driver is loaded (it is, whenever DCV service runs)
- The monitor path is active
- Nobody calls the IddCx API to release hardware cursor (only the driver itself can)
No user-mode intervention can override this. This is not a bug — it's the documented IDD architecture from Microsoft's IddCx framework.
dcv list-connections console → "There are no clients connected to the session."
From apps/eval-script/src/eval_script/interactive_launcher.py:
"DCV's virtual display only renders a framebuffer when an active viewer is connected."
Without a connected DCV viewer:
- The IDD swapchain is not presenting frames →
get-screenshotreturns blank/tiny 2834-byte images - DCV's grabber pipeline is idle →
--blend-cursorhas nothing to blend onto - The display is in a minimal/dormant state
- GDI CopyFromScreen can still capture some content (explorer, desktop) because DWM is running, but the DCV-specific capture pipeline is off
The eval-script solves this by connecting a headless Puppeteer browser to the DCV web client at port 8443, which activates the full streaming pipeline.
HKLM\SOFTWARE\GSettings\com\nicesoftware\dcv\display\
pointer = "software" ← tells DCV to composite cursor into stream frames
enable-client-cursor = 0 ← tells viewer NOT to draw local cursor sprite
HKLM\SOFTWARE\GSettings\com\nicesoftware\dcv\security\
authentication = "none" ← set during testing
What these settings actually mean:
pointer = software→ DCV composites cursor into the encoded video stream sent to viewers. This is the stream the REMOTE viewer sees. It does NOT put cursor into the LOCAL framebuffer that GDI/DXGI capture sees.enable-client-cursor = 0→ Connected viewers won't draw their own cursor overlay. Combined withpointer=software, the cursor appears baked into the viewer's stream — but only for viewers, not local capture.
These are about the remote display protocol, NOT about local capture APIs.
DCV has a built-in CLI flag:
dcv.exe get-screenshot --blend-cursor --max-width 1920 --max-height 1080 -o screenshot.png console
This composites the cursor into the screenshot using DCV's internal cursor knowledge (from the IddCx channel). In our testing it produced 2834-byte blank images because without an active viewer, the grabber pipeline isn't running.
Expected behavior with active viewer: Full 1920x1080 screenshot WITH the real cursor shape (I-beam, hand, resize, etc.) composited in.
mss uses GDI BitBlt(SRCCOPY | CAPTUREBLT) — the oldest Windows capture API.
From python-mss/src/mss/windows/gdi.py line 395:
gdi.BitBlt(memdc, 0, 0, width, height, srcdc, monitor["left"], monitor["top"], SRCCOPY | CAPTUREBLT)PR #464 (merged) only swapped internal buffer management from CreateCompatibleBitmap + GetDIBits to CreateDIBSection (direct memory-mapped DIB). Same GDI BitBlt underneath. No cursor capture, no DXGI, no Windows.Graphics.Capture.
The CAPTUREBLT flag captures layered windows but NOT hardware cursor overlays from IDD drivers. The cursor() method in mss is literally a no-op (return).
DCV's --blend-cursor works because the DCV server process literally IS the IDD display driver — it receives cursor data directly from Windows via IddCxMonitorQueryHardwareCursor kernel callbacks:
IddCxMonitorQueryHardwareCursor → IDARG_OUT_QUERY_HWCURSOR:
- IsCursorVisible (bool)
- X, Y (screen coordinates)
- CursorShapeInfo (type, width, height, pitch, hotspot)
- Raw cursor bitmap data (the actual pixels of the cursor shape)
This data is only available to the IDD driver process — no public Windows API exposes it. That's why GetCursorInfo returns hCursor=0 for everyone else.
The DCV protocol uses protobuf over WebSocket/QUIC. Cursor is a separate sub-protocol within the input channel:
| Message | Content |
|---|---|
dcv.input.PointerPosition |
Cursor (x, y) coordinates, sent frequently |
dcv.input.PointerCursors |
Cursor shape data: currentCursorId, hidden, cursorImages[] array |
dcv.input.PointerInvalidateCursors |
Remove specific cursor shapes from client cache |
dcv.input.PointerInvalidateCursorCache |
Clear entire cursor cache |
dcv.input.PointerRequireCursorImages |
Client requests cursor images by ID |
Each cursorImage contains:
id(uint64) — cursor identifierwidth,height— dimensionshotspotX,hotspotY— cursor hotspot offsetpixelFormat— eitherNONE(use CSS cursor URL) orRAW_ARGB_DATA(raw pixels)- Raw ARGB pixel data (binary attachment after protobuf)
The cursor is NOT part of the H.264 video stream. It's transmitted separately for low-latency rendering.
The web client (dcv.js) is proprietary/minified (~1.5MB bundle). It renders cursor as an HTML <img> overlay positioned over the streaming canvas. Two modes:
- CSS cursor mode (
pixelFormat=NONE):cursor: url(...) hotspotX hotspotY, auto - Virtual cursor mode (
pixelFormat=RAW_ARGB_DATA): Absolute-positioned<img>(z-index 100,pointer-events: none) with raw ARGB data converted to data URL, positioned at(x - hotspotX, y - hotspotY)over the canvas (z-index 0)
Yes, in theory. If we connect a proper DCV client (via the Web Client SDK), we receive PointerCursors messages containing the real cursor shape (ARGB pixel data + hotspot). We could:
- Connect to DCV as a viewer (full protocol handshake required)
- Intercept
PointerPositionandPointerCursorsprotobuf messages - Use the real cursor shape + position for compositing instead of fallback arrow
Challenges:
- The DCV Web Client SDK is not on npm — it's a proprietary download from amazondcv.com
- A raw WebSocket to port 8443 is NOT sufficient — DCV requires its full protocol handshake to register a client
- The minified
dcv.jsis readable enough to reverse-engineer the protocol structure but complex
| Repo | What it is | Useful? |
|---|---|---|
aws/dcv-access-console |
Session management portal (Apache-2.0) | No — session management only, NOT the viewer |
aws/dcv-color-primitives |
Rust pixel format conversion (ARGB/NV12/I420) | Marginally — for pixel format handling |
aws/dcv-gnome-shell-extension |
GNOME shell integration | No |
awsdocs/nice-dcv-admin-guide |
Admin documentation source | Reference only |
- explorer.exe: Not running initially. Started successfully via
PsExec -i 1 -s -d explorer.exe(PID 1704, Session 1) - DWM: Running in Session 1
- Auto-logon: Configured (DefaultUserName=Administrator, AutoAdminLogon=1, Shell=explorer.exe)
- Desktop issue: Despite Shell=explorer.exe in Winlogon registry, explorer wasn't running. Likely because the VM was freshly booted and the auto-logon sequence didn't complete properly without a viewer connected. The desktop may lack wallpaper because DCV's display compositor isn't fully active without a viewer.
- Mouse devices: Only PS/2 Compatible Mouse (ACPI\PNP0F13) — no HID USB mouse
- DCV service: Running, listening on port 8443 (TCP), firewall rule "NICE DCV Server (In)" enabled
- Session 0 vs Session 1: SSM commands run as SYSTEM in Session 0. Interactive desktop is Session 1. PsExec -i 1 or schtasks /IT required for desktop access.
All in .context/cursor-test/session2/:
| File | Size | Method | Notes |
|---|---|---|---|
standard.png |
15KB | GDI CopyFromScreen | Desktop rendered (explorer running), NO cursor |
captureblt.png |
8KB | CopyFromScreen + CaptureBlt flag | Same content, NO cursor |
with_crosshair.png |
24KB | CopyFromScreen + manual red crosshair | Red cross at (960,540) — proves position tracking works |
sendinput.png |
7KB | After SendInput mouse events | Desktop without cursor |
drawicon_arrow.png |
7KB | CopyFromScreen + DrawIconEx(IDC_ARROW) | Arrow cursor drawn at cursor position — the workaround |
dcv_native.png |
3KB | DCV get-screenshot CLI (no viewer) | Tiny/blank — display pipeline not active |
packages/client/src/client/cursor_overlay.py (CursorPainter) handles the suppressed cursor:
- Calls
GetCursorInfo()— position is always valid even when SUPPRESSED - When
hCursor = 0(SUPPRESSED): loadsIDC_ARROWas fallback - Renders cursor icon via
DrawIconExonto a small DIB - Recovers alpha channel via double-render technique (draw on black → draw on white → compute alpha)
- Alpha-blends the sprite into the captured BGRA frame
Performance: ~0.1ms per frame at 1280x720.
Limitation: Always shows arrow cursor regardless of actual shape (I-beam, hand, resize, etc.).
When hCursor = 0 (always the case on DCV), we CANNOT get the real cursor shape. The cursor might actually be:
- I-beam (text editing)
- Hand (link hover)
- Resize arrows (window edges)
- Busy spinner (loading)
- Custom app cursors
All of these appear as the generic arrow fallback in our compositing. The actual cursor shape data is locked inside the IddCx hardware cursor channel that only DCV's driver can read.
Potential workaround for shape inference (unreliable):
GetClassLong(GetForegroundWindow(), GCL_HCURSOR)— get the cursor the active window's class registered- Hook
SetCursor/SetClassLong(GCLP_HCURSOR)in target processes - Track which UI element is under the cursor and infer expected shape
- None of these are reliable
DCV internally knows the real cursor (I-beam, hand, resize, etc.) via IddCxMonitorQueryHardwareCursor. The --blend-cursor flag composites it into screenshots.
Requirements:
- Must have an active DCV viewer connected (headless Puppeteer, per eval-script pattern)
- Call
dcv get-screenshot --blend-cursor -o <path> consoleper frame
Pros:
- Gets the REAL cursor shape (not just arrow)
- No fleet migration needed
- DCV already installed on all fleet VMs
Cons:
- CLI overhead per frame unknown — may be too slow for 2fps
- Requires headless viewer connection (additional complexity in session setup)
- Images go through DCV's pipeline (format conversion, potential quality loss)
- Doesn't solve the issue for mss/ffmpeg-based capture — only for DCV CLI captures
Current status: Blocked — needs active viewer connection to produce non-blank images
Use a g4dn / g5 instance with NVIDIA GRID/vGPU. The NVIDIA virtual display driver does NOT use IddCx hardware cursor — it renders cursor into the framebuffer like a physical display.
Expected behavior:
GetCursorInforeturnsflags=1(CURSOR_SHOWING),hCursor=<valid handle>- mss/GDI/DXGI capture includes the real cursor natively
- No workaround needed at all
Pros:
- Cursor capture "just works" — correct shape, zero overhead
- Real GPU available (useful for other things)
- Standard display driver behavior
Cons:
- GPU instances are more expensive ($0.52/hr for g4dn.xlarge vs ~$0.19/hr for t3.xlarge)
- Lose DCV's easy remote access (need RDP or VNC instead)
- Need to rebuild AMI with NVIDIA drivers + license GRID
- Fleet terraform changes required
- RDP has its own cursor quirks (though generally better than DCV)
Alternative non-DCV options:
- Parsec or Sunshine (open-source game streaming) as display server
- XRDP + xorgxrdp on Windows (less common)
- No remote protocol at all — run headless with NVIDIA virtual display, capture locally, no viewer needed
Connect a minimal DCV protocol client that subscribes to dcv.input.PointerCursors messages. Get real cursor ARGB bitmap + hotspot. Feed to CursorPainter.
Pros:
- Gets real cursor shape
- Lower overhead than CLI per-frame (persistent connection, message-based)
Cons:
- Requires reverse-engineering DCV protocol or using proprietary SDK
- Complex implementation
- Fragile — may break on DCV version updates
- Still needs viewer connection to activate cursor streaming
The modern WinRT capture API (Win10 1903+) has an explicit cursor inclusion property. It works through the DWM compositor via GraphicsCaptureItem.CreateForMonitor() + Direct3D11CaptureFramePool.
Why it might work on DCV:
- Operates at the DWM compositor layer, not raw framebuffer reads
- DWM knows about the cursor even on IDD displays
- Apps like OBS use this API for cursor capture
Why it might NOT work on DCV:
- IDD hardware cursor means DWM doesn't composite cursor either — it's been delegated
CURSOR_SUPPRESSEDmay apply to this API too- Microsoft docs are ambiguous about IDD behavior with this API
Requirements to test:
- Write a C# program using
Windows.Graphics.CapturewithIsCursorCaptureEnabled = true - Compile and run on the DCV instance
- Check if cursor appears in captured frames
Cons:
- Requires C# code (or pythonnet/winsdk WinRT bindings)
- Unconfirmed whether it works on IDD displays — needs testing
- If it doesn't work, effort is wasted
DXGI Desktop Duplication (IDXGIOutputDuplication) provides cursor data separately via GetFramePointerShape():
- Returns cursor position, shape bitmap, hotspot
- Works independently of GDI BitBlt
Why it might work:
- Even on IDD, Desktop Duplication may provide cursor through its own API
- The
DXGI_OUTDUPL_POINTER_SHAPE_INFOstruct is populated separately from the frame
Why it might NOT work:
- IDD monitors may not support Desktop Duplication at all
- AWS documentation for DCV states: "GetConsoleScreenshot functionality will not work as expected" with IDD
Requirements to test:
- Write a C++ program that calls
DuplicateOutputon the DCV virtual display - Check if
GetFramePointerShapereturns cursor data
For the experiments/synthetic-data-collection fleet specifically:
The fleet currently uses DCV Windows Server 2025 instances (synth-explorer-windows11-golden-v2* AMI). The TREC SDK records sessions, and cursor data is critical for training.
Short-term (lowest effort): Test Option 1 — connect Puppeteer viewer + dcv get-screenshot --blend-cursor. If performance is acceptable at 2fps, this gives real cursor shapes with minimal fleet changes.
Medium-term (if Option 1 is too slow or unreliable): Test Option 4 — Windows.Graphics.Capture. Write a small C# capture helper. If the WinRT API can see cursor on IDD, this is the cleanest solution (no viewer needed, no DCV CLI overhead).
Long-term (if cursor fidelity is critical for model quality): Option 2 — migrate to non-DCV. Real display driver, cursor just works. But this is a significant infrastructure change.
Session 0 (non-interactive):
├── SYSTEM processes
├── dcvserver (DCV Server)
├── dcvagent
└── SSM Agent (where our commands run)
Session 1 (interactive DCV desktop):
├── dwm.exe (Desktop Window Manager)
├── explorer.exe (must be started manually or via auto-logon)
├── dcvagent (x2, handles viewer I/O)
└── [capture processes must run HERE for desktop access]
How to run in Session 1:
- PsExec -i 1 [-s|-u user] program.exe
- schtasks /IT /RL HIGHEST (interactive task)
- EC2Launch executeScript with interactive flag
How DCV screenshot works:
SSM → dcv.exe get-screenshot → talks to dcvserver (Session 0)
→ dcvserver reads its own IDD swapchain + IddCx cursor
→ composites if --blend-cursor → writes PNG
From packages/client/src/client/cursor_overlay.py:
# CURSOR_SUPPRESSED (DCV / RDP / idle): position is valid but
# hCursor is NULL. Fall back to the system arrow so the model
# always sees *where* the user is pointing.
h = self._user32.LoadCursorW(None, ctypes.cast(_IDC_ARROW, wintypes.LPCWSTR))From apps/eval-script/src/eval_script/run_remote.py:
start_dcv_viewer()— connects headless Puppeteer to DCV web client- Activates the framebuffer so ffmpeg gdigrab/ddagrab can capture
- Cursor is still not in the framebuffer — eval-script relies on the CursorPainter workaround
From experiments/synthetic-data-collection/vm/launch.py:
- Uses
schtasks /ITto run in interactive Session 1 - TREC SDK records the session (including cursor via the CursorPainter mechanism)
- Cursor shape is always arrow fallback
| DCV Version | Driver | Cursor Behavior |
|---|---|---|
| Pre-2023.1 (Server 2016) | "NICE DCV Virtual Display Driver" (WDDM mirror) | Same suppression — hardware cursor mode |
| 2023.1+ (Server 2019+) | Built-in IDD (Indirect Display Driver) | Same suppression via IddCx |
| 2025.0 (current, Server 2025) | AWS Indirect Display Device v1.0.226.0 | Confirmed CURSOR_SUPPRESSED |
The underlying cursor behavior has been the same across all DCV versions. The driver technology changed (WDDM mirror → IDD) but the "cursor as separate sprite" architecture is constant.
The main goal is to produce training data that matches what a real user's machine looks like at inference time.
On a real user's Windows machine (laptop/desktop):
- Physical display adapter (Intel/AMD/NVIDIA) with standard WDDM driver
- Cursor is rendered into the framebuffer by the display driver
GetCursorInforeturnsflags=1(CURSOR_SHOWING) with a validhCursor- Cursor has the correct shape (I-beam, hand, resize, spinner, custom app cursors)
- Screen capture (mss, DXGI, Windows.Graphics.Capture) includes cursor natively
- No virtual display, no remote protocol layer
- Full Windows 11 desktop with animations, transparency, shadows, rounded corners
- Standard DPI (96-192 dpi typical), standard color profile (sRGB)
- Physical mouse generating real HID input events
- Standard font rendering (ClearType, DirectWrite)
- Standard window chrome (Mica, acrylic materials on Win11)
| Aspect | Real User Machine | DCV EC2 Instance | Impact on Model |
|---|---|---|---|
| Cursor in framebuffer | Yes — native rendering | No — CURSOR_SUPPRESSED, hCursor=0 | Model never sees real cursor shapes during training |
| Cursor shape | Correct (I-beam, hand, resize, busy, custom) | Always arrow fallback | Model can't learn cursor-shape-to-context associations |
| Cursor rendering | Anti-aliased by display driver, sub-pixel positioned | Composited post-capture by CursorPainter (pixel-aligned, no sub-pixel) | Subtle visual difference in cursor edges |
| Display driver | Physical WDDM (Intel/AMD/NVIDIA) | AWS Indirect Display Device (IDD) v1.0.226.0 | Different rendering characteristics |
| Font rendering | ClearType with sub-pixel AA, tuned for physical LCD | ClearType may behave differently on virtual display (no physical sub-pixel layout) | Text may look slightly different |
| DPI / Scaling | Varies (100%-250% typical on modern laptops) | Fixed 100% (1920x1080 at 96 dpi) | Model only sees 100% scaling; never learns to handle scaled UIs |
| Color profile | sRGB / Display P3, varies by panel | No color management on virtual display | Colors may be slightly off |
| Window animations | Smooth (minimize/maximize/snap animations) | Often disabled or glitchy on Server | Model doesn't learn to handle animation frames |
| Desktop composition | Full DWM with Mica, acrylic, shadows, rounded corners | DWM runs but some effects may be reduced on Server | Subtly different visual appearance |
| Wallpaper | User-selected, varies wildly | Stock Windows img0.jpg (scripted via prelaunch) | Less visual diversity in training data |
| Taskbar | Win11 centered taskbar with app icons | May not render properly without viewer connected | Different or missing taskbar appearance |
| Mouse input | Physical HID USB/Bluetooth mouse | PS/2 Compatible Mouse (ACPI\PNP0F13) emulated | Different input device characteristics |
| OS Edition | Windows 11 Home/Pro (consumer) | Windows Server 2025 Datacenter (server) | Different default policies, features, visual theme |
| Explorer shell | Starts at logon automatically | Requires PsExec -i 1 workaround to start | Fragile desktop environment |
| Display activation | Always on when user is using it | Dormant without DCV viewer connection | Capture may get blank frames if viewer disconnects |
| Network latency | No remote protocol overhead | DCV protocol adds indirection to any viewer-dependent behavior | N/A for local capture, but affects any viewer-dependent setup |
Critical (directly affects what the model learns to see):
- Cursor shape — model should learn that I-beam means text field, hand means link, etc.
- DPI scaling — real laptops often use 125-150% scaling; 100% is minority
- Desktop composition effects — rounded corners, shadows affect element boundaries
Medium (noticeable but less impactful): 4. Font rendering differences 5. Window animations during transitions 6. Color profile variations
Low (unlikely to affect model performance): 7. Wallpaper variety 8. Input device type 9. OS edition differences (mostly invisible in UI)
Replace DCV with a GPU instance type + NVIDIA GRID driver (or equivalent). On a g4dn / g5 / g6 instance, the NVIDIA virtual display driver:
- Renders cursor into the framebuffer (standard WDDM behavior)
GetCursorInforeturnsflags=1(CURSOR_SHOWING) with validhCursor- mss/GDI/DXGI capture includes cursor natively
- No CursorPainter workaround needed
For remote access, replace DCV with RDP (built into Windows) or Parsec/Sunshine (open-source, game-quality streaming).
The fleet is defined in experiments/synthetic-data-collection/:
┌─────────────────────────────────────────────────────────────────┐
│ CURRENT ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ Instance type: c6i.4xlarge (16 vCPU, 32 GB, NO GPU) │
│ OS: Windows Server 2025 Datacenter (Win11 kernel) │
│ Display: DCV IDD virtual display (WinDisc) │
│ Remote access: DCV web viewer (port 8443) │
│ Fleet size: up to 300 (MAX_FLEET_SIZE in terraform) │
│ Pricing: 100% on-demand (~$0.68/hr c6i.4xlarge) │
│ AMI: synth-explorer-windows11-golden-v8b-* │
│ Region: us-west-2 │
└─────────────────────────────────────────────────────────────────┘
Boot sequence:
- ASG launches instance from Golden AMI
- EC2Launch v2 runs UserData (
vm/userdata-fleet-launch.ps1) - UserData waits for auto-logon (Administrator, Session 1)
- If auto-logon didn't fire in 3min → reboot
- AtLogOn scheduled task fires →
start-worker-prelaunch.ps1:- Sets wallpaper (img0.jpg) via SystemParametersInfo
- Sets DCV display layout (
dcv.exe set-display-layout --session console 1920x1080) - Starts explorer.exe via PsExec -i 1 (workaround for Server 2025)
- Worker loop starts (
python -m synthetic_data_collection.vm.worker) - Worker long-polls SQS for (task_file, seed) messages
- Per message: spawns
session_entrypointsubprocess (pywinauto + TREC SDK) - Session records via TREC SDK (ffmpeg gdigrab/ddagrab capture)
- CursorPainter composites arrow cursor (because DCV suppresses real cursor)
- Upload bundle to S3 → delete SQS message
- Idle >5min →
shutdown /s→ ASG observes stop
Key DCV-specific steps (would be removed):
- Step 5:
dcv.exe set-display-layout— activates the virtual display - Step 5:
PsExec -i 1 explorer.exe— starts shell (wouldn't be needed if display is real) - Step 10: CursorPainter arrow fallback — wouldn't be needed with real cursor
Key files:
terraform/asg.tf ← instance_type = "c6i.4xlarge", AMI lookup, MixedInstancesPolicy
vm/bootstrap_2025.ps1 ← Step 9 installs DCV, Step 13 writes prelaunch (dcv set-display-layout)
vm/check_ami_prereqs.ps1 ← Checks 7-8: dcvserver service + console session
vm/build_ami.sh ← Orchestrates bootstrap on c6i.4xlarge
vm/userdata-fleet-launch.ps1 ← Auto-logon detection + conditional reboot
vm/launch_vm.sh ← Single-VM launcher (instance_type=c6i.4xlarge)
src/.../vm/launch.py ← Python launcher, calls dcv set-display-layout
src/.../vm/remote_windows.py ← SSH/SCP over SSM (unchanged)
src/.../vm/worker.py ← SQS worker loop (unchanged)
src/.../vm/session_entrypoint.py ← Session driver (unchanged)
┌─────────────────────────────────────────────────────────────────┐
│ TARGET ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ Instance type: g4dn.xlarge (4 vCPU, 16 GB, 1x T4 GPU) │
│ OR g6.xlarge (4 vCPU, 16 GB, 1x L4 GPU) │
│ OS: Windows Server 2025 Datacenter (Win11 kernel) │
│ Display: NVIDIA GRID/vGPU virtual display (WDDM) │
│ Remote access: SSM only (no DCV, no RDP needed for fleet) │
│ Fleet size: same (up to 300) │
│ Pricing: Spot (~$0.16-0.25/hr g4dn.xlarge) │
│ OR On-demand (~$0.52/hr g4dn.xlarge) │
│ AMI: synth-explorer-windows11-gpu-v1-* │
│ Region: us-west-2 (or multi-region for spot capacity) │
└─────────────────────────────────────────────────────────────────┘
Why real Windows 11 (not Server 2025):
The current fleet uses Windows Server 2025 Datacenter which shares the Windows 11 24H2 kernel but is NOT Windows 11:
- Different visual theme (no rounded corners on some controls, no Mica material by default)
- Different default apps (no Microsoft Store apps, no modern Notepad/Paint/Calculator)
- Different shell behavior (explorer doesn't auto-start reliably, different taskbar)
- Different Group Policy defaults (animations often disabled, visual effects reduced)
- Different UWP/WinUI app availability (Server SKU can't install Store apps via Add-AppxPackage under SYSTEM)
- No "Windows 11 Home/Pro" specific features (Snap Layouts may differ, Widgets absent)
For training data that matches real user machines, we want actual Windows 11 Pro.
How to run real Windows 11 on EC2:
- BYOL (Bring Your Own License) — import a Windows 11 Pro image as a custom AMI
- Create a Windows 11 Pro VM locally (Hyper-V, VMware, or VirtualBox)
- Sysprep it (
C:\Windows\System32\Sysprep\sysprep.exe /generalize /oobe /shutdown) - Export as VMDK/VHD
- Use
aws ec2 import-imageto create an AMI - Must have valid Windows 11 Pro license (Volume Licensing or per-device)
- AWS-provided Windows 11 AMIs — AWS does NOT provide Windows 11 desktop AMIs natively (only Server). BYOL is the only path.
- Amazon WorkSpaces Image Builder — can create Windows 11 images, but tied to WorkSpaces (not raw EC2)
BYOL licensing:
- Windows 11 Pro BYOL on EC2 requires a dedicated host or dedicated instance (Microsoft licensing requirement)
- Dedicated hosts:
g4dn.xlargededicated host pricing is higher but eliminates per-instance license cost - Alternative: volume licensing agreement (Microsoft 365 E3/E5 includes Windows 11 Enterprise BYOL rights for dedicated hosts)
- For an experimental fleet: start with Server 2025 + UI tweaks, validate the GPU approach, THEN migrate to Win11 BYOL when ready for production training data
Pragmatic approach (phased):
- Phase 1: Keep Server 2025, switch to GPU instance → fixes cursor immediately
- Phase 2: Build Windows 11 Pro BYOL AMI on dedicated g4dn host → fixes all visual/UX mismatches
- Phase 3: Full fleet on Win11 Pro BYOL → production-quality training data
Why GPU instances solve the problem:
- NVIDIA GRID driver creates a standard WDDM display output on boot
- Display renders frames WITHOUT needing any viewer connected
GetCursorInforeturnsflags=1(CURSOR_SHOWING) with validhCursor- mss/GDI/ffmpeg capture includes the real cursor natively
- Explorer starts normally via auto-logon (no PsExec workaround needed)
- The display driver behaves like a real user's machine
Why spot works for GPU (vs current on-demand for c6i):
- The c6i spot issue was 12-15min lifetimes (shorter than 9min boot + 5min session)
- g4dn.xlarge spot in us-west-2 typically has 2-6hr lifetimes
- GPU spot is more stable because there's less contention from batch/CI workloads
- Even if interrupted: worker already handles task re-queuing via SQS visibility timeout
- Total cost with spot: cheaper than current c6i on-demand ($0.16 vs $0.68/hr)
Boot sequence changes (delta from current):
→ removed. NVIDIA driver auto-creates 1920x1080 display.dcv.exe set-display-layout→ likely unnecessary (auto-logon + real display = explorer starts normally). Keep as fallback.PsExec -i 1 explorer.exeCursorPainter arrow fallback→ still present in code but thehCursor != 0path fires instead (gets real shape).- New: Set display resolution via
Set-DisplayResolutionPowerShell cmdlet (orChangeDisplaySettingsExWin32) in prelaunch. - New: NVIDIA GRID driver install during AMI bake (from AWS S3 bucket, included in g4dn instance cost).
| File | What it does with DCV | Migration action |
|---|---|---|
experiments/synthetic-data-collection/vm/bootstrap_2025.ps1 (lines 453-502) |
Installs DCV server MSI from CloudFront, creates console session | Remove DCV install. Install NVIDIA GRID driver instead. RDP is built-in. |
experiments/synthetic-data-collection/vm/check_ami_prereqs.ps1 (lines 118-130) |
Verifies dcvserver service + console session exist | Replace with NVIDIA driver + RDP service checks |
experiments/synthetic-data-collection/vm/build_ami.sh |
Orchestrates bootstrap (which installs DCV) | Update to use GPU AMI base + NVIDIA driver |
experiments/synthetic-data-collection/vm/launch.py (lines 259-269) |
Calls dcv.exe set-display-layout --session console 1920x1080 to attach DXGI framebuffer |
Remove. NVIDIA driver creates display output on boot. Set resolution via Set-DisplayResolution or ChangeDisplaySettings API. |
apps/eval-script/src/eval_script/run_remote.py (lines 569-588) |
_set_dcv_resolution() calls dcv.exe set-display-layout |
Replace with PowerShell Set-DisplayResolution -Width 1920 -Height 1080 or Win32 ChangeDisplaySettingsEx |
apps/eval-script/src/eval_script/run_remote.py (lines 786-958) |
start_dcv_viewer() — Puppeteer connects to DCV web client to activate framebuffer |
Remove entirely. NVIDIA driver renders framebuffer without needing a viewer. |
apps/eval-script/src/eval_script/run_eval.py (lines 63-90, 1262-1277) |
Imports/calls start_dcv_viewer, stop_dcv_viewer, --dcv-resolution arg |
Remove DCV viewer logic. Keep resolution setting (via new method). Remove --no-dcv-viewer flag. |
apps/eval-script/src/eval_script/run_os_eval.py (lines 381-382, 526-542) |
Same as run_eval.py — DCV viewer + resolution | Same changes as run_eval.py |
| File | What it does with DCV | Migration action |
|---|---|---|
packages/client/src/client/cursor_overlay.py |
Workaround for DCV's CURSOR_SUPPRESSED — loads IDC_ARROW fallback when hCursor=0 | Keep as-is (still handles RDP edge cases). Or simplify: with NVIDIA driver, GetCursorInfo returns valid hCursor, so the SUPPRESSED fallback path rarely fires. |
packages/client/src/client/capture.py (lines 264-268) |
Documents why cursor compositing is needed on DCV/RDP | Update comment. CursorPainter still adds value for RDP sessions. |
apps/eval-script/src/eval_script/bake_cursor.py |
Post-hoc cursor overlay for recorded videos | May become unnecessary if cursor is captured natively. Keep for backward compat with old recordings. |
apps/eval-script/src/eval_script/interactive_launcher.py (lines 11-13) |
Documents DCV framebuffer activation dependency | Update comment to reflect new architecture. |
| Change | Details |
|---|---|
| Instance type | t3.xlarge → g4dn.xlarge (or g5.xlarge for newer GPU). Cost: ~$0.52/hr vs ~$0.19/hr |
| AMI base | Switch from Windows Server 2025 base to AWS-provided Windows + NVIDIA GRID AMI |
| Security group | Port 8443 (DCV) no longer needed. RDP port 3389 already allowed (or use SSM only). |
| IAM | Remove DCV license S3 access (if any). No other changes. |
| ASG / fleet | No structural changes — same ASG, same SQS, same worker model |
Goal: Fix cursor capture immediately, keep Server 2025 OS.
vm/bootstrap_gpu.ps1 (fork of bootstrap_2025.ps1):
# Step 9: NICE DCV server
-# Install DCV MSI from CloudFront...
-$dcvUrl = 'https://d1uj6qtbmh3dt5.cloudfront.net/nice-dcv-server-x64-Release.msi'
-# ... (lines 453-502 removed)
+# Step 9: NVIDIA GRID driver
+# g4dn instances include GRID license in instance cost.
+# Driver is available from AWS S3 bucket:
+$nvidiaUrl = "s3://ec2-windows-nvidia-drivers/latest/NVIDIA_grid_win10_win11_server2025_64bit.exe"
+aws s3 cp $nvidiaUrl "$env:TEMP\nvidia_grid.exe" --region us-east-1
+Start-Process "$env:TEMP\nvidia_grid.exe" -ArgumentList '/s', '/noreboot' -Wait
+# Verify display adapter is now NVIDIA
+$gpu = Get-WmiObject Win32_VideoController | Where-Object { $_.Name -match 'NVIDIA' }
+if (-not $gpu) { Fail 'NVIDIA driver install completed but no NVIDIA adapter found' }vm/start-worker-prelaunch.ps1 changes:
-# Set DCV display layout to 1920x1080
-& 'C:\Program Files\NICE\DCV\Server\bin\dcv.exe' set-display-layout --session console 1920x1080 2>&1 | Out-Null
+# Set display resolution (NVIDIA GRID creates output on boot, just need resolution)
+# Using ChangeDisplaySettingsEx for reliability on headless boot
+Add-Type @"
+using System;
+using System.Runtime.InteropServices;
+public struct DEVMODE {
+ [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)] public string dmDeviceName;
+ public short dmSpecVersion, dmDriverVersion;
+ public short dmSize, dmDriverExtra;
+ public int dmFields, dmPositionX, dmPositionY, dmDisplayOrientation, dmDisplayFixedOutput;
+ public short dmColor, dmDuplex, dmYResolution, dmTTOption, dmCollate;
+ [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)] public string dmFormName;
+ public short dmLogPixels, dmBitsPerPel;
+ public int dmPelsWidth, dmPelsHeight, dmDisplayFlags, dmDisplayFrequency;
+ public int dmICMMethod, dmICMIntent, dmMediaType, dmDitherType;
+ public int dmReserved1, dmReserved2, dmPanningWidth, dmPanningHeight;
+}
+public class Display {
+ [DllImport("user32.dll")] public static extern int ChangeDisplaySettingsEx(
+ string lpszDeviceName, ref DEVMODE lpDevMode, IntPtr hwnd, int dwflags, IntPtr lParam);
+ public const int CDS_UPDATEREGISTRY = 0x01;
+ public const int DM_PELSWIDTH = 0x80000;
+ public const int DM_PELSHEIGHT = 0x100000;
+}
+"@
+$dm = New-Object DEVMODE
+$dm.dmSize = [System.Runtime.InteropServices.Marshal]::SizeOf($dm)
+$dm.dmPelsWidth = 1920
+$dm.dmPelsHeight = 1080
+$dm.dmFields = [Display]::DM_PELSWIDTH -bor [Display]::DM_PELSHEIGHT
+[Display]::ChangeDisplaySettingsEx($null, [ref]$dm, [IntPtr]::Zero, [Display]::CDS_UPDATEREGISTRY, [IntPtr]::Zero)vm/check_ami_prereqs.ps1 changes:
-# Check 7: dcvserver service running
-$dcv = Get-Service -Name 'dcvserver' -ErrorAction SilentlyContinue
-if (-not $dcv -or $dcv.Status -ne 'Running') { ... }
-# Check 8: DCV console session
-$sessions = & $dcvExe list-sessions 2>&1
-if ($sessions -notmatch 'console') { ... }
+# Check 7: NVIDIA display adapter present
+$gpu = Get-WmiObject Win32_VideoController | Where-Object { $_.Name -match 'NVIDIA' }
+if (-not $gpu) { Fail 'No NVIDIA display adapter found' }
+# Check 8: Display resolution is set
+$screen = [System.Windows.Forms.Screen]::PrimaryScreen
+if ($screen.Bounds.Width -lt 1920) { Fail "Display too small: $($screen.Bounds.Width)x$($screen.Bounds.Height)" }vm/build_ami.sh changes:
-INSTANCE_TYPE="c6i.4xlarge"
+INSTANCE_TYPE="g4dn.xlarge"
# Base AMI: Windows Server 2025 Full Base (latest)
+# IMPORTANT: Use an AMI from us-east-1 that supports g4dn (must be EBS-backed, x86_64)terraform/asg.tf changes:
resource "aws_launch_template" "fleet_worker" {
name_prefix = "${local.project}-fleet-worker-"
image_id = data.aws_ami.synth_explorer_v2.id
- instance_type = "c6i.4xlarge"
+ instance_type = "g4dn.xlarge"
...
}
mixed_instances_policy {
- # 100% on-demand by default.
+ # Spot-first for g4dn. GPU spot has longer lifetimes (2-6hr typical)
+ # vs c6i spot (12-15min). Total cost: ~$0.16-0.25/hr vs $0.68/hr on-demand c6i.
instances_distribution {
on_demand_base_capacity = 0
- on_demand_percentage_above_base_capacity = 100
+ on_demand_percentage_above_base_capacity = 20 # 20% on-demand fallback
spot_allocation_strategy = "capacity-optimized-prioritized"
}
launch_template {
...
override {
- instance_type = "c6i.4xlarge"
+ instance_type = "g4dn.xlarge"
}
override {
- instance_type = "c6a.4xlarge"
+ instance_type = "g4dn.2xlarge" # fallback: more memory
}
override {
- instance_type = "m6i.4xlarge"
+ instance_type = "g5.xlarge" # fallback: newer GPU (L4)
}
}
}apps/eval-script changes:
# run_remote.py
-def start_dcv_viewer(dcv_url, timeout=15, username="Administrator", password=None):
- """Start a DCV viewer connection via Puppeteer..."""
- # ... 170 lines of Puppeteer/Xvfb/Chrome setup ...
+# REMOVED: start_dcv_viewer / stop_dcv_viewer
+# GPU instances render framebuffer without a viewer.
-def _set_dcv_resolution(host, resolution="1920x1080", ...):
- """Set DCV display layout."""
- cmd = f'"C:\\Program Files\\NICE\\DCV\\Server\\bin\\dcv.exe" set-display-layout --session console {resolution}'
+def _set_display_resolution(host, resolution="1920x1080", ...):
+ """Set display resolution via PowerShell."""
+ w, h = resolution.split("x")
+ cmd = f'powershell -Command "Set-DisplayResolution -Width {w} -Height {h} -Force"'- Bake new AMI:
synth-explorer-windows11-gpu-v1 - Launch single test instance
- Run cursor test script — verify
GetCursorInforeturnsflags=1, validhCursor - Run full synthetic-data-collection session — verify TREC recording includes real cursor with correct shapes
- Compare screenshots with current DCV screenshots — confirm visual fidelity improvement
- Benchmark: verify session completion time is similar (GPU shouldn't be slower for UI automation)
terraform applywith new launch template- Submit small test campaign (10 tasks) on new fleet
- Verify bundles in S3 have real cursor shapes
- Scale to full campaigns
Goal: Replace Server 2025 with real Windows 11 Pro for maximum fidelity.
Steps:
- Create Windows 11 Pro VM locally (Hyper-V)
- Install Win11 Pro 24H2 from ISO
- Install Office, Chrome, classic Win32 apps
- Install NVIDIA GRID driver (compatible version)
- Configure auto-logon, OpenSSH, disable sleep
- Sysprep:
sysprep.exe /generalize /oobe /shutdown /mode:vm
- Export VHDX → VHD (fixed size)
- Upload VHD to S3
aws ec2 import-image --disk-containers file://containers.json- Wait for import task (~30-60min for 50GB image)
- Launch on dedicated host (
g4dn.hostin us-west-2, ~$5.02/hr)- Microsoft BYOL requires dedicated hosts/instances
- A single
g4dn.metaldedicated host can run ~4× g4dn.xlarge VMs - Cost: $5.02/hr ÷ 4 = ~$1.26/hr per VM equivalent
- OR use dedicated instances (more expensive per-VM but simpler)
- Modify
bootstrap_gpu.ps1for Win11 differences:- Modern Notepad/Paint/Calculator available natively (no Win32 fallback)
- Explorer starts normally on logon (no PsExec workaround)
- Microsoft Store apps work
- Standard Win11 visual theme (Mica, rounded corners)
- Update AMI filter in terraform to
synth-explorer-win11pro-gpu-v1-*
Win11 BYOL licensing options:
| License | Cost | How |
|---|---|---|
| Windows 11 Pro (retail) | ~$200 one-time per image | Buy once, use on dedicated hosts |
| Microsoft 365 E3/E5 | ~$36/user/month | Includes Win11 Enterprise VDA rights |
| Windows 11 Enterprise E3 (per-device) | ~$7/device/month | VDA rights for virtual desktops |
| SA (Software Assurance) | Varies | Existing EA/MPSA may already include rights |
For an experimental fleet of 300 VMs on dedicated hosts, the most cost-effective is a single Win11 Pro retail license + dedicated host infrastructure.
Alternative: Windows 11 on non-dedicated (cheaper, grey area):
- AWS Marketplace has "Windows 11" AMIs from 3rd-party vendors (e.g., "Windows 11 with NVIDIA Gaming")
- These bundle the license in the hourly price (~$0.10-0.20/hr premium)
- Simpler than BYOL but limited to specific configurations
- Check Marketplace for g4dn-compatible Win11 AMIs
| Current (DCV on t3.xlarge) | GPU (g4dn.xlarge) | GPU Spot | |
|---|---|---|---|
| On-demand $/hr | ~$0.19 | ~$0.52 | ~$0.16 |
| 300-instance fleet $/hr | ~$57 | ~$156 | ~$48 |
| 300-instance fleet $/day (8hr) | ~$456 | ~$1,248 | ~$384 |
Spot instances make GPU cheaper than current on-demand DCV. g4dn.xlarge spot availability is generally good.
If the cost increase is unacceptable for the full fleet, a hybrid approach:
- Keep DCV fleet for bulk data collection (arrow cursor is acceptable for most training)
- Use GPU instances only for sessions where cursor shape matters (eval, QA, specific training subsets)
- The CursorPainter fallback is already good enough for position — just not shape
| Risk | Mitigation |
|---|---|
| NVIDIA driver licensing cost on g4dn | g4dn includes GRID vWS license in instance cost (no additional charge) |
| RDP cursor has its own quirks | RDP cursor is in-framebuffer by default; only suppressed in specific redirect scenarios |
| No remote access without DCV web client | Use SSM Session Manager for CLI access (already works). Use Fleet Manager or RDP for GUI. |
| g4dn spot interruption during recording | Use spot instance interruption handling in worker (save state, re-queue task) |
| Display not activating without logon | GPU instances auto-create display output. Auto-logon (already configured) ensures desktop is rendered. |