Goal: Make just-gemini usable as an LLM backend by any OpenAI-compatible client (including justbot's future subagent system).
Approach: Add standard OpenAI-compatible POST /v1/chat/completions (streaming SSE) and GET /v1/models endpoints on top of the existing session-based architecture. This is the most widely supported LLM API format — any client that speaks OpenAI can use just-gemini without custom integration code.
Target OpenAI format:
POST /v1/chat/completionswith{ model, messages, stream: true, max_tokens? }- SSE response:
data: {"choices":[{"delta":{"content":"..."}}]}\n\nchunks - Terminal:
data: [DONE]\n\n GET /v1/modelsreturning{ data: [{ id, object: "model" }] }
POST /v1/chat/completions
Request (standard OpenAI format):
{
"model": "gemini",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "List files in src/"}
],
"stream": true,
"max_tokens": 8192
}Model string mapping: The model field maps to a just-gemini provider. Format options:
"gemini"→ provider=gemini, model=default"codex"→ provider=codex, model=default"gemini/gemini-2.5-pro"→ provider=gemini, model=gemini-2.5-pro
Implementation flow:
- Parse request body, extract
model→ provider name + optional model - Extract the last user message content from
messages[]as the prompt - Determine
cwd: fromX-Working-Directoryheader, or fallback toCOMPLETIONS_CWDenv var, orprocess.cwd() - Create an ephemeral session:
manager.create({ provider, cwd, model }) - Subscribe to session events:
manager.on("event", handler) - Send the prompt:
manager.sendMessage(sessionId, content) - If
stream: true(default path):- Set headers:
Content-Type: text/event-stream,Cache-Control: no-cache - On
text-deltaevent → write OpenAI SSE chunk:data: {"id":"chatcmpl-<id>","object":"chat.completion.chunk","created":<ts>,"model":"<model>","choices":[{"index":0,"delta":{"content":"<text>"},"finish_reason":null}]} - On
run-ended→ write final chunk withfinish_reason+data: [DONE] - Cleanup:
manager.off("event", handler),manager.delete(sessionId)
- Set headers:
- If
stream: false:- Accumulate all
text-deltacontent into a string - On
run-ended→ return full OpenAI response:{ "id": "chatcmpl-<id>", "object": "chat.completion", "created": 1709900000, "model": "gemini", "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}] } - Cleanup session
- Accumulate all
finish_reason mapping from StopReason:
| StopReason | finish_reason |
|---|---|
end_turn |
"stop" |
max_tokens |
"length" |
cancelled |
"stop" |
error |
"stop" (error details in separate handling) |
Tool call events: Ignored for v1. The CLI agent executes tools internally. These are NOT surfaced as OpenAI tool_calls in the response (justbot would try to execute them, which is wrong). Optionally, tool activity can be injected as inline text in the content stream (configurable).
Error handling:
- Unknown provider →
400 {"error": {"message": "Unknown provider: foo", "type": "invalid_request_error"}} - Session creation failure →
500 {"error": {"message": "...", "type": "server_error"}} run-endedwithstopReason: "error"→ stream the error event message as text, then close- Client disconnect → cleanup handler, interrupt session, delete session
GET /v1/models
Response (standard OpenAI format):
{
"object": "list",
"data": [
{"id": "gemini", "object": "model", "created": 0, "owned_by": "just-gemini"},
{"id": "codex", "object": "model", "created": 0, "owned_by": "just-gemini"}
]
}Lists registered providers as "models". Simple pass-through from manager.getProviders().
Add the completions and models routers:
import { completionsRouter } from "./completions.js";
import { modelsRouter } from "./models.js";
router.use("/v1/chat", completionsRouter(manager));
router.use("/v1", modelsRouter(manager));Add a new docs/openai-compat.md documenting:
POST /v1/chat/completionsrequest/response formatGET /v1/modelsresponse format- Model string mapping (
gemini,codex,gemini/model-name) X-Working-Directoryheader- Known limitations (no multi-turn history, no tool_calls forwarding)
- justbot integration example config
Update docs/README.md endpoint table with new endpoints.
Ephemeral sessions per request: Each completions request creates a fresh session and destroys it after. This means:
- No conversation history preserved between requests (known limitation)
- Clean slate each time (correct for stateless OpenAI API semantics)
- Subprocess startup overhead per request (acceptable for v1)
Why not session reuse: The OpenAI chat completions API is stateless — each request carries full message history. Maintaining persistent sessions would cause the CLI agent to accumulate duplicate context. Ephemeral is correct.
Only send the last user message: The CLI agent starts fresh each request. Sending the full message history as a concatenated prompt would be confusing to the agent. Just send the latest user message. If the user needs multi-turn context, they should use the session-based API directly.
Authorization header: Accepted but not validated. just-gemini has no auth layer. The header is silently ignored so OpenAI-compat clients (which always send it) work without errors.
| File | Action | Description |
|---|---|---|
src/http/completions.ts |
create | POST /v1/chat/completions endpoint |
src/http/models.ts |
create | GET /v1/models endpoint |
src/http/router.ts |
modify | Mount new routers |
docs/openai-compat.md |
create | Document new endpoints |
docs/README.md |
modify | Add new endpoints to table |
- No multi-turn context: Each request is a fresh CLI subprocess session. The agent doesn't see prior conversation.
- No tool_calls forwarding: CLI agent tools are internal. Not exposed as OpenAI function calls.
- Subprocess overhead: Each request spawns and tears down a CLI subprocess. Could add session pooling in v2.
- No system prompt passthrough: The
systemmessage from the request is ignored — CLI agents use their own system prompts. - No auth: Authorization header is accepted but not validated.
- Start just-gemini:
npm run dev - Test streaming:
curl -N http://localhost:14354/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"gemini","messages":[{"role":"user","content":"Hello"}],"stream":true}' - Verify SSE format:
data: {"choices":[{"delta":{"content":"..."}}]}\n\nfollowed bydata: [DONE] - Test non-streaming: same curl without
"stream":true, verify full JSON response - Test models:
curl http://localhost:14354/v1/models - Test with justbot: configure as openai-compat provider, send a message via TUI
- Run
npm test