Skip to content

Instantly share code, notes, and snippets.

@justinyanme
Last active March 8, 2026 12:46
Show Gist options
  • Select an option

  • Save justinyanme/961dbb92267f2e411059a921f680246b to your computer and use it in GitHub Desktop.

Select an option

Save justinyanme/961dbb92267f2e411059a921f680246b to your computer and use it in GitHub Desktop.

Plan: Add OpenAI-Compatible Chat Completions Endpoint to just-gemini

Context

Goal: Make just-gemini usable as an LLM backend by any OpenAI-compatible client (including justbot's future subagent system).

Approach: Add standard OpenAI-compatible POST /v1/chat/completions (streaming SSE) and GET /v1/models endpoints on top of the existing session-based architecture. This is the most widely supported LLM API format — any client that speaks OpenAI can use just-gemini without custom integration code.

Target OpenAI format:

  • POST /v1/chat/completions with { model, messages, stream: true, max_tokens? }
  • SSE response: data: {"choices":[{"delta":{"content":"..."}}]}\n\n chunks
  • Terminal: data: [DONE]\n\n
  • GET /v1/models returning { data: [{ id, object: "model" }] }

Changes to just-gemini

1. Create src/http/completions.ts — OpenAI-compat endpoint

POST /v1/chat/completions

Request (standard OpenAI format):

{
  "model": "gemini",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "List files in src/"}
  ],
  "stream": true,
  "max_tokens": 8192
}

Model string mapping: The model field maps to a just-gemini provider. Format options:

  • "gemini" → provider=gemini, model=default
  • "codex" → provider=codex, model=default
  • "gemini/gemini-2.5-pro" → provider=gemini, model=gemini-2.5-pro

Implementation flow:

  1. Parse request body, extract model → provider name + optional model
  2. Extract the last user message content from messages[] as the prompt
  3. Determine cwd: from X-Working-Directory header, or fallback to COMPLETIONS_CWD env var, or process.cwd()
  4. Create an ephemeral session: manager.create({ provider, cwd, model })
  5. Subscribe to session events: manager.on("event", handler)
  6. Send the prompt: manager.sendMessage(sessionId, content)
  7. If stream: true (default path):
    • Set headers: Content-Type: text/event-stream, Cache-Control: no-cache
    • On text-delta event → write OpenAI SSE chunk:
      data: {"id":"chatcmpl-<id>","object":"chat.completion.chunk","created":<ts>,"model":"<model>","choices":[{"index":0,"delta":{"content":"<text>"},"finish_reason":null}]}
      
    • On run-ended → write final chunk with finish_reason + data: [DONE]
    • Cleanup: manager.off("event", handler), manager.delete(sessionId)
  8. If stream: false:
    • Accumulate all text-delta content into a string
    • On run-ended → return full OpenAI response:
      {
        "id": "chatcmpl-<id>",
        "object": "chat.completion",
        "created": 1709900000,
        "model": "gemini",
        "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}]
      }
    • Cleanup session

finish_reason mapping from StopReason:

StopReason finish_reason
end_turn "stop"
max_tokens "length"
cancelled "stop"
error "stop" (error details in separate handling)

Tool call events: Ignored for v1. The CLI agent executes tools internally. These are NOT surfaced as OpenAI tool_calls in the response (justbot would try to execute them, which is wrong). Optionally, tool activity can be injected as inline text in the content stream (configurable).

Error handling:

  • Unknown provider → 400 {"error": {"message": "Unknown provider: foo", "type": "invalid_request_error"}}
  • Session creation failure → 500 {"error": {"message": "...", "type": "server_error"}}
  • run-ended with stopReason: "error" → stream the error event message as text, then close
  • Client disconnect → cleanup handler, interrupt session, delete session

2. Create src/http/models.ts — Model listing endpoint

GET /v1/models

Response (standard OpenAI format):

{
  "object": "list",
  "data": [
    {"id": "gemini", "object": "model", "created": 0, "owned_by": "just-gemini"},
    {"id": "codex", "object": "model", "created": 0, "owned_by": "just-gemini"}
  ]
}

Lists registered providers as "models". Simple pass-through from manager.getProviders().

3. Modify src/http/router.ts — Mount new routes

Add the completions and models routers:

import { completionsRouter } from "./completions.js";
import { modelsRouter } from "./models.js";

router.use("/v1/chat", completionsRouter(manager));
router.use("/v1", modelsRouter(manager));

4. Update docs/ — Document the new endpoints

Add a new docs/openai-compat.md documenting:

  • POST /v1/chat/completions request/response format
  • GET /v1/models response format
  • Model string mapping (gemini, codex, gemini/model-name)
  • X-Working-Directory header
  • Known limitations (no multi-turn history, no tool_calls forwarding)
  • justbot integration example config

Update docs/README.md endpoint table with new endpoints.

Design Decisions

Ephemeral sessions per request: Each completions request creates a fresh session and destroys it after. This means:

  • No conversation history preserved between requests (known limitation)
  • Clean slate each time (correct for stateless OpenAI API semantics)
  • Subprocess startup overhead per request (acceptable for v1)

Why not session reuse: The OpenAI chat completions API is stateless — each request carries full message history. Maintaining persistent sessions would cause the CLI agent to accumulate duplicate context. Ephemeral is correct.

Only send the last user message: The CLI agent starts fresh each request. Sending the full message history as a concatenated prompt would be confusing to the agent. Just send the latest user message. If the user needs multi-turn context, they should use the session-based API directly.

Authorization header: Accepted but not validated. just-gemini has no auth layer. The header is silently ignored so OpenAI-compat clients (which always send it) work without errors.

Files Summary

File Action Description
src/http/completions.ts create POST /v1/chat/completions endpoint
src/http/models.ts create GET /v1/models endpoint
src/http/router.ts modify Mount new routers
docs/openai-compat.md create Document new endpoints
docs/README.md modify Add new endpoints to table

Known Limitations (v1)

  1. No multi-turn context: Each request is a fresh CLI subprocess session. The agent doesn't see prior conversation.
  2. No tool_calls forwarding: CLI agent tools are internal. Not exposed as OpenAI function calls.
  3. Subprocess overhead: Each request spawns and tears down a CLI subprocess. Could add session pooling in v2.
  4. No system prompt passthrough: The system message from the request is ignored — CLI agents use their own system prompts.
  5. No auth: Authorization header is accepted but not validated.

Verification

  1. Start just-gemini: npm run dev
  2. Test streaming: curl -N http://localhost:14354/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"gemini","messages":[{"role":"user","content":"Hello"}],"stream":true}'
  3. Verify SSE format: data: {"choices":[{"delta":{"content":"..."}}]}\n\n followed by data: [DONE]
  4. Test non-streaming: same curl without "stream":true, verify full JSON response
  5. Test models: curl http://localhost:14354/v1/models
  6. Test with justbot: configure as openai-compat provider, send a message via TUI
  7. Run npm test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment