Valet Vision is a flexible, open-source automation platform built on the Raspberry Pi. It enables precise control and observation of mobile devices through a combination of computer vision, hardware control, and network-accessible APIs.
At its core, Valet Vision runs an HTTP server on a Raspberry Pi. This server exposes a simple JSON-over-HTTP API to:
- Access the camera feed
- Simulate virtual mouse, keyboard, and stylus inputs
- Capture screenshots and stream live MJPEG video
All requests can be made locally or over the network. The API is platform-agnostic, allowing automation scripts to run:
- Locally on the Valet itself
- On another machine within the same network
🗺️ Architecture diagram: valetnet.dev/overview
Valet Vision uses Linux’s USB Gadget protocol to emulate USB peripherals to a connected mobile device. This includes:
- Virtual touch stylus
- USB keyboard and mouse
- Optional Ethernet gadget mode to share or restrict the Pi’s network access with the mobile device
This allows both input simulation and network configuration of the attached phone or tablet, all via a single USB connection.
- Screenshots can be fetched as
image/jpeg
orimage/png
- A live video stream is available as an MJPEG feed over HTTP
Valet Vision includes:
- OpenCV for computer vision and object detection
- Tesseract OCR for text recognition
These tools allow automation scripts to detect and act on visual UI elements in screenshots.
For more demanding AI workloads, you have options:
- Add the Raspberry Pi AI Kit (GPU accelerator)
- Run ML inference remotely on a more powerful machine
- Use hosted services to offload heavy image processing
Importantly, Valet Vision operates fully offline by default—it does not require or depend on any cloud services.
For full-device automation, Valet Vision can control hardware side buttons (e.g., power, volume up/down) via an optional Push Button Module (PBM):
- PBM actuators are digitally controlled servos
- Connected via Dynamixel Protocol 2.0 over serial from the Pi
- Placement is flexible—PBM arms can be positioned on either side of the device
Support for PBM control via HTTP API is on the roadmap.
🔗 More on the Dynamixel Protocol: emanual.robotis.com
The software that powers Valet Vision is fully open source:
- Core Server: checkbox-server on GitHub
- Python Client: checkbox-client-python
curl -X POST $HOST/api/touch/tap
-H "Content-Type: application/json"
-d '{"x": 0, "y": 0}'
Valet Vision is designed to be adaptable:
- Fully autonomous (all logic and inference on-device)
- Or controlled remotely from any machine on the same network
- No mandatory cloud infrastructure
This makes it ideal for labs, local QA environments, and regulated industries where network control and data locality matter.