Skip to content

Instantly share code, notes, and snippets.

@hugs
Created May 6, 2025 21:59
Tapster Valet Overview

Valet Vision Architecture Overview

Valet Vision is a flexible, open-source automation platform built on the Raspberry Pi. It enables precise control and observation of mobile devices through a combination of computer vision, hardware control, and network-accessible APIs.


📐 Architecture

At its core, Valet Vision runs an HTTP server on a Raspberry Pi. This server exposes a simple JSON-over-HTTP API to:

  • Access the camera feed
  • Simulate virtual mouse, keyboard, and stylus inputs
  • Capture screenshots and stream live MJPEG video

All requests can be made locally or over the network. The API is platform-agnostic, allowing automation scripts to run:

  • Locally on the Valet itself
  • On another machine within the same network

🗺️ Architecture diagram: valetnet.dev/overview


🔌 USB Gadget Protocol

Valet Vision uses Linux’s USB Gadget protocol to emulate USB peripherals to a connected mobile device. This includes:

  • Virtual touch stylus
  • USB keyboard and mouse
  • Optional Ethernet gadget mode to share or restrict the Pi’s network access with the mobile device

This allows both input simulation and network configuration of the attached phone or tablet, all via a single USB connection.


👁️ Vision + AI Capabilities

  • Screenshots can be fetched as image/jpeg or image/png
  • A live video stream is available as an MJPEG feed over HTTP

Valet Vision includes:

  • OpenCV for computer vision and object detection
  • Tesseract OCR for text recognition

These tools allow automation scripts to detect and act on visual UI elements in screenshots.

For more demanding AI workloads, you have options:

Importantly, Valet Vision operates fully offline by default—it does not require or depend on any cloud services.


📲 Push Button Module (PBM)

For full-device automation, Valet Vision can control hardware side buttons (e.g., power, volume up/down) via an optional Push Button Module (PBM):

  • PBM actuators are digitally controlled servos
  • Connected via Dynamixel Protocol 2.0 over serial from the Pi
  • Placement is flexible—PBM arms can be positioned on either side of the device

Support for PBM control via HTTP API is on the roadmap.

🔗 More on the Dynamixel Protocol: emanual.robotis.com


🧪 Developer Tools & Open Source

The software that powers Valet Vision is fully open source:

Example: Simulate a Tap

curl -X POST $HOST/api/touch/tap
-H "Content-Type: application/json"
-d '{"x": 0, "y": 0}'

🧩 Modular, Local-First Automation

Valet Vision is designed to be adaptable:

  • Fully autonomous (all logic and inference on-device)
  • Or controlled remotely from any machine on the same network
  • No mandatory cloud infrastructure

This makes it ideal for labs, local QA environments, and regulated industries where network control and data locality matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment