Deep Dive: Quarkus Agent MCP Server — Information Architecture

Overview

This MCP server provides 22 tools across 7 classes that help AI agents build Quarkus applications. The Quarkus-specific information it delivers comes from 6 distinct sources, each gathered differently.

Source 1: Extension Skills (SKILL.md files embedded in JARs)

Where captured: Inside Quarkus extension deployment JARs at META-INF/quarkus-skill.md

How gathered: SkillReader.java (1,206 lines) implements a three-layer composition chain:

Layer 1 — Extension JARs (from ~/.m2/repository):
- Core extensions (io.quarkus group): Scans all quarkus-*-deployment directories in the local Maven repo for the project's detected Quarkus version. For each deployment JAR containing META-INF/quarkus-skill.md, it reads the raw skill content and composes it with:
  - Extension metadata from the runtime JAR's META-INF/quarkus-extension.yaml (name, description, guide URL, categories)
  - Discovered MCP tools via Jandex bytecode scanning of deployment, runtime, and dev JARs (looking for @JsonRpcDescription, @DevMCPEnableByDefault, @DevMcpBuildTimeTool annotations)
- Non-core extensions (Quarkiverse, custom): Resolved from the project's pom.xml/build.gradle dependencies via DependencyResolver, then the same JAR composition process
- Fallback: For older Quarkus versions without per-extension skills, downloads an aggregated quarkus-extension-skills JAR from Maven Central
Layer 2 — User-level skills (~/.quarkus/skills/): SKILL.md files with frontmatter supporting enhance (append) or override (replace) composition modes
Layer 3 — Project-level skills (.agent/skills/): Standalone files, no composition

Content type: Markdown files with YAML frontmatter containing coding patterns, testing guidelines, configuration reference, common pitfalls, and Dev MCP tool tables. This is the primary source of "how to code with Quarkus" knowledge.

Key insight for extraction: The actual content lives inside Quarkus extension JARs that ship with each Quarkus release. The SKILL.md files are authored by extension developers and shipped as resources. You'd need to extract these from all extension deployment JARs for a given Quarkus version.

Source 2: Semantic Documentation Search (Pre-indexed pgvector)

Where captured: Pre-built Docker images at ghcr.io/quarkusio/chappie-ingestion-quarkus:<version>

How gathered: DocSearchTools.java + ContainerManager.java + EmbeddingModelLoader.java:

On startup, loads the BGE Small EN v1.5 embedding model (quantized, 384 dimensions) via LangChain4j
Starts a pgvector PostgreSQL container via Testcontainers with pre-indexed documentation
The container images are version-specific (e.g., tag 3.34.2 has docs for that Quarkus version)
Queries use vector similarity search with a minimum score threshold of 0.82

Post-processing logic embedded in Java:

Synonym expansion: Maps common terms to Quarkus concepts (e.g., "api" -> "rest", "orm" -> "hibernate", "di" -> "cdi" — 22 synonyms total)
Score boosting: +0.15 for title/topic matches, +0.10 for repo path/category matches, +0.08 for section matches
Legacy penalties: -0.50 for resteasy-classic guides, -0.50 for internal docs
Modern boosts: +0.15 for modern rest/rest-json/rest-client guides
Junk filtering: Skips chunks under 50 chars, pure whitespace, or config property tables

Key insight for extraction: The actual documentation content is inside those Docker images as pgvector data. The ingestion pipeline is the separate chappie-docling-rag project (external). You'd need either the raw indexed documents or the original Quarkus guide sources.

Source 3: Embedded Workflow Instructions (application.properties + Tool descriptions)

Where captured: Hardcoded in Java source code as annotation string literals and in application.properties

How gathered: Static content compiled into the server:

application.properties lines 7-53: A massive multi-line instructions string defining the complete AI agent workflow — when to use each tool, testing patterns, error handling, logging, key rules. This is sent as the MCP server's instructions field.
Tool description strings: Each @Tool(description = ...) annotation contains detailed behavioral instructions. For example, quarkus_create has ~20 lines of rules about extension-first development, skill loading order, testing patterns, and README maintenance.
quarkus_create also generates: AGENTS.md (180+ lines of project instructions) and CLAUDE.md into every new project, embedding the entire Quarkus development workflow.

Key insight for extraction: This is the "how to use the tools" / "Quarkus development methodology" content. It's easily extractable from the source code directly — it's all string literals.

Source 4: Live Dev MCP Tool Discovery (Runtime introspection)

Where captured: Dynamically discovered from running Quarkus applications

How gathered: DevMcpProxyTools.java:

Makes HTTP JSON-RPC calls to http://localhost:<port>/q/dev-mcp
tools/list discovers available tools on the running app
tools/call invokes discovered tools
Tool list is dynamic — changes when extensions are added/removed

Content type: Tool schemas (name, description, parameters) from extensions that implement Dev MCP tools (testing, config, OpenAPI, scheduler management, etc.)

Key insight for extraction: This is inherently runtime/dynamic. The available tools depend on which extensions are in the project. However, the Jandex scanning in SkillReader already discovers these tool definitions statically from extension JARs and includes them in skill content. So the tool documentation is partially captured in Source 1.

Source 5: Version & Update Intelligence (GitHub + reference projects)

Where captured: External GitHub repository quarkusio/code-with-quarkus-compare

How gathered: UpdateTools.java:

git ls-remote --tags to fetch available version tags
HTTP fetch of reference build files (pom.xml/build.gradle) from raw GitHub content
Runs quarkus update --dry-run (or Maven/Gradle plugin equivalent) locally
Generates comparison diffs between current and latest versions

Key insight for extraction: This is operational tooling, not knowledge content. The version comparison logic and upgrade advice are algorithmic, not content-based.

Source 6: Extension Metadata (YAML from runtime JARs)

Where captured: Inside extension runtime JARs at META-INF/quarkus-extension.yaml

How gathered: SkillReader.parseExtensionYaml():

Extracts: name, description, guide URL, categories
Used to compose the header of each skill (name, description link, guide reference)
Also used for categorizing skills in the index display

Key insight for extraction: Standard Quarkus metadata that ships with every extension. Easily extractable from Maven Central JARs.

Summary: What to Extract for agentskill.io

Source	Content Type	Extraction Difficulty	Value
Extension Skills (SKILL.md)	Coding patterns, testing, pitfalls	Medium — need to iterate all extension deployment JARs for each version	Highest — this is the core "how to write Quarkus code" knowledge
Documentation (pgvector)	Full Quarkus guides, chunked	Medium — need the raw guide content from quarkus.io or the ingestion pipeline	High — comprehensive API/config documentation
Workflow Instructions	Development methodology	Easy — copy from source code strings	High — guides AI agents on Quarkus development process
Extension Metadata	Names, descriptions, guide URLs, categories	Easy — parse YAML from runtime JARs	Medium — provides the extension catalog
Dev MCP Tool Schemas	Tool names, descriptions, parameters	Medium — Jandex scanning or runtime query	Medium — already partially in skills
Update Intelligence	Version comparison, migration recipes	N/A — algorithmic, not content	Low — operational, not knowledge

The Critical Path for agentskill.io Extraction

Extract all SKILL.md files from every Quarkus extension deployment JAR for a target version — this is where the actionable coding patterns live
Extract the workflow instructions from application.properties and tool descriptions — this is the development methodology
Get the raw documentation either from the pgvector container dump or the quarkus.io source guides
Build the extension metadata catalog from runtime JAR YAML files

The MCP server itself is primarily a delivery mechanism and runtime orchestrator — it doesn't generate Quarkus knowledge, it aggregates it from extension JARs, pre-indexed docs, and hardcoded instructions, then delivers it through MCP tools. The knowledge itself is extractable.

jwmatthews/deep-dive-analysis.md

Select an option

No results found