Plan: Add OpenAI-Compatible Chat Completions Endpoint to just-gemini

Context

Goal: Make just-gemini usable as an LLM backend by any OpenAI-compatible client (including justbot's future subagent system).

Approach: Add standard OpenAI-compatible POST /v1/chat/completions (streaming SSE) and GET /v1/models endpoints on top of the existing session-based architecture. This is the most widely supported LLM API format — any client that speaks OpenAI can use just-gemini without custom integration code.

Target OpenAI format:

POST /v1/chat/completions with { model, messages, stream: true, max_tokens? }
SSE response: data: {"choices":[{"delta":{"content":"..."}}]}\n\n chunks
Terminal: data: [DONE]\n\n
GET /v1/models returning { data: [{ id, object: "model" }] }

Changes to just-gemini

1. Create `src/http/completions.ts` — OpenAI-compat endpoint

POST /v1/chat/completions

Request (standard OpenAI format):

{
  "model": "gemini",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "List files in src/"}
  ],
  "stream": true,
  "max_tokens": 8192
}

Model string mapping: The model field maps to a just-gemini provider. Format options:

"gemini" → provider=gemini, model=default
"codex" → provider=codex, model=default
"gemini/gemini-2.5-pro" → provider=gemini, model=gemini-2.5-pro

Implementation flow:

Parse request body, extract model → provider name + optional model
Extract the last user message content from messages[] as the prompt
Determine cwd: from X-Working-Directory header, or fallback to COMPLETIONS_CWD env var, or process.cwd()
Create an ephemeral session: manager.create({ provider, cwd, model })
Subscribe to session events: manager.on("event", handler)
Send the prompt: manager.sendMessage(sessionId, content)
If stream: true (default path):
- Set headers: Content-Type: text/event-stream, Cache-Control: no-cache
- On text-delta event → write OpenAI SSE chunk:
```
data: {"id":"chatcmpl-<id>","object":"chat.completion.chunk","created":<ts>,"model":"<model>","choices":[{"index":0,"delta":{"content":"<text>"},"finish_reason":null}]}
```
- On run-ended → write final chunk with finish_reason + data: [DONE]
- Cleanup: manager.off("event", handler), manager.delete(sessionId)

If stream: false:

Accumulate all text-delta content into a string

On run-ended → return full OpenAI response:

{
  "id": "chatcmpl-<id>",
  "object": "chat.completion",
  "created": 1709900000,
  "model": "gemini",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}]
}

Cleanup session

finish_reason mapping from StopReason:

StopReason	finish_reason
`end_turn`	`"stop"`
`max_tokens`	`"length"`
`cancelled`	`"stop"`
`error`	`"stop"` (error details in separate handling)

Tool call events: Ignored for v1. The CLI agent executes tools internally. These are NOT surfaced as OpenAI tool_calls in the response (justbot would try to execute them, which is wrong). Optionally, tool activity can be injected as inline text in the content stream (configurable).

Error handling:

Unknown provider → 400 {"error": {"message": "Unknown provider: foo", "type": "invalid_request_error"}}
Session creation failure → 500 {"error": {"message": "...", "type": "server_error"}}
run-ended with stopReason: "error" → stream the error event message as text, then close
Client disconnect → cleanup handler, interrupt session, delete session

2. Create `src/http/models.ts` — Model listing endpoint

GET /v1/models

Response (standard OpenAI format):

{
  "object": "list",
  "data": [
    {"id": "gemini", "object": "model", "created": 0, "owned_by": "just-gemini"},
    {"id": "codex", "object": "model", "created": 0, "owned_by": "just-gemini"}
  ]
}

Lists registered providers as "models". Simple pass-through from manager.getProviders().

3. Modify `src/http/router.ts` — Mount new routes

Add the completions and models routers:

import { completionsRouter } from "./completions.js";
import { modelsRouter } from "./models.js";

router.use("/v1/chat", completionsRouter(manager));
router.use("/v1", modelsRouter(manager));

4. Update `docs/` — Document the new endpoints

Add a new docs/openai-compat.md documenting:

POST /v1/chat/completions request/response format
GET /v1/models response format
Model string mapping (gemini, codex, gemini/model-name)
X-Working-Directory header
Known limitations (no multi-turn history, no tool_calls forwarding)
justbot integration example config

Update docs/README.md endpoint table with new endpoints.

Design Decisions

Ephemeral sessions per request: Each completions request creates a fresh session and destroys it after. This means:

No conversation history preserved between requests (known limitation)
Clean slate each time (correct for stateless OpenAI API semantics)
Subprocess startup overhead per request (acceptable for v1)

Why not session reuse: The OpenAI chat completions API is stateless — each request carries full message history. Maintaining persistent sessions would cause the CLI agent to accumulate duplicate context. Ephemeral is correct.

Only send the last user message: The CLI agent starts fresh each request. Sending the full message history as a concatenated prompt would be confusing to the agent. Just send the latest user message. If the user needs multi-turn context, they should use the session-based API directly.

Authorization header: Accepted but not validated. just-gemini has no auth layer. The header is silently ignored so OpenAI-compat clients (which always send it) work without errors.

Files Summary

File	Action	Description
`src/http/completions.ts`	create	`POST /v1/chat/completions` endpoint
`src/http/models.ts`	create	`GET /v1/models` endpoint
`src/http/router.ts`	modify	Mount new routers
`docs/openai-compat.md`	create	Document new endpoints
`docs/README.md`	modify	Add new endpoints to table

Known Limitations (v1)

No multi-turn context: Each request is a fresh CLI subprocess session. The agent doesn't see prior conversation.
No tool_calls forwarding: CLI agent tools are internal. Not exposed as OpenAI function calls.
Subprocess overhead: Each request spawns and tears down a CLI subprocess. Could add session pooling in v2.
No system prompt passthrough: The system message from the request is ignored — CLI agents use their own system prompts.
No auth: Authorization header is accepted but not validated.

Verification

Start just-gemini: npm run dev
Test streaming: curl -N http://localhost:14354/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"gemini","messages":[{"role":"user","content":"Hello"}],"stream":true}'
Verify SSE format: data: {"choices":[{"delta":{"content":"..."}}]}\n\n followed by data: [DONE]
Test non-streaming: same curl without "stream":true, verify full JSON response
Test models: curl http://localhost:14354/v1/models
Test with justbot: configure as openai-compat provider, send a message via TUI
Run npm test

justinyanme/tingly-weaving-lantern.md

Select an option

No results found

Select an option

No results found

Plan: Add OpenAI-Compatible Chat Completions Endpoint to just-gemini

Context

Changes to just-gemini

1. Create `src/http/completions.ts` — OpenAI-compat endpoint

2. Create `src/http/models.ts` — Model listing endpoint

3. Modify `src/http/router.ts` — Mount new routes

4. Update `docs/` — Document the new endpoints

Design Decisions

Files Summary

Known Limitations (v1)

Verification

justinyanme/tingly-weaving-lantern.md

Plan: Add OpenAI-Compatible Chat Completions Endpoint to just-gemini

Context

Changes to just-gemini

1. Create src/http/completions.ts — OpenAI-compat endpoint

2. Create src/http/models.ts — Model listing endpoint

3. Modify src/http/router.ts — Mount new routes

4. Update docs/ — Document the new endpoints

Design Decisions

Files Summary

Known Limitations (v1)

Verification

1. Create `src/http/completions.ts` — OpenAI-compat endpoint

2. Create `src/http/models.ts` — Model listing endpoint

3. Modify `src/http/router.ts` — Mount new routes

4. Update `docs/` — Document the new endpoints