Status: Finalized for Experimental Phase
Framework Name: Seek.js (@seekjs - @seekjs/core)
SaaS Platform: Vaan | Vantage | Koor (Placeholder : vaan.ai) or some .ai domain that is cheaper
Objective: Deliver an "AI-search-as-a-service" toolchain that completely eliminates the "Vector Database Tax" by shifting index generation to build-time and search execution to the client's browser. Serve as the definitive engineering roadmap for building the 4 core SDK modules of the framework, outlining API contracts, data flows, and critical research paths.
In 2026, adding generative AI search (RAG) to a website is fundamentally broken for the modern frontend developer. To give users the ability to "Ask AI" about documentation or product catalogs, developers are forced to regress to legacy backend architectures. They must provision expensive managed vector databases (Pinecone, Qdrant), write fragile web-scraping ingestion scripts, and pay heavy LLM inference costs for every single user keystroke. We call this the "Vector Database Tax."
This project is a framework designed to completely eliminate that tax. We are redefining AI search not as a backend database challenge, but as a static asset delivery and edge compute challenge. Our mission is to give developers enterprise-grade AI search with the exact same developer experience (DX) as deploying a static website: zero provisioning, zero configuration, and sub-15ms latency.
We achieve this by fundamentally disaggregating the RAG (Retrieval-Augmented Generation) pipeline:
- Shift Indexing to Build-Time: Instead of a live database, we hook directly into the developer's framework (Next.js, Astro, Vite). We extract text using a WASM-based parser, vectorize it, and compile it into a highly-compressed binary file (
.msp). - Shift Search to the Browser: That binary file is deployed to a global CDN. The user's browser downloads it, caches it in
IndexedDB, and executes Hybrid Search (BM25 + Vector) entirely in local memory. - Shift Reasoning to the Edge: Server-side compute is only invoked when a user asks for an AI summary. The browser sends local context to our Edge LLMs (Cloudflare Workers AI), which stream back cited, hallucination-free answers.
- The Open-Source Framework (Seek.js): A free, modular SDK. Developers can install bundler plugins to extract content, generate indexes using local models, and serve search from their own static hosting. Zero vendor lock-in.
- The SaaS Abstraction (Vaan.ai): A managed, zero-config cloud platform.
- Automated Pipeline: We intercept the build, vectorize chunks on our Edge GPUs, and host sharded
.mspfiles on our global CDN. - Managed Reasoning: We securely manage the Edge LLM endpoints required for the generative RAG summaries.
- Revenue: Scalable, usage-based subscription for Edge AI compute and managed infrastructure.
- Automated Pipeline: We intercept the build, vectorize chunks on our Edge GPUs, and host sharded
By disaggregating the database, we drastically alter the performance and cost metrics for the end-user.
| Category | Competitors | Search Model | Architecture | Pricing (Avg) |
|---|---|---|---|---|
| Static Search | Pagefind, Stork | Lexical | Local-First | $0 (OS) |
| Vector DBs | Pinecone, Upstash | Vector-only | Centralized DB | $500/mo (Prod) |
| AI Chat SaaS | Mendable, Kapa.ai | RAG Chat | Centralized API | $200+/mo |
| Search Engines | Algolia AI, Orama | Neural/Hybrid | Centralized SaaS | $100 - $1,500/mo |
| Seek.js | N/A | Hybrid | Disaggregated | $0 (OS) / $19 (SaaS) |
- Against Pagefind: Pagefind is "AI-blind." Seek.js brings semantic intent to the browser.
- Against Mendable/Kapa.ai: These are "Black Boxes" that charge for data storage and message credits. Seek.js keeps the context in the browser—you pay $0 for storage and only pennies for Edge reasoning.
- Against Pinecone: No 24/7 database instance required. Your DB is a static file on a CDN.
- The MCP Advantage: Seek.js natively supports the Model Context Protocol (MCP), allowing your documentation to be instantly "read" by AI agents like Claude or ChatGPT.
The Goal: Safely extract semantic text from HTML files and bind them to source URLs for citation.
import { extractHtml } from '@seekjs/parser';
const chunkStream = extractHtml({
inputDir: './dist',
urlBase: 'https://mysite.com',
selectors: ['article', 'main .content'],
ignorePaths: ['/404'],
chunkSize: 50
});
for await (const batch of chunkStream) {
// batch: [{ text: "...", url: "/docs/auth", hash: "#setup" }]
console.log(`Extracted ${batch.length} chunks...`);
}sequenceDiagram
participant CI as CI/CD Runner (Node/Bun/Deno)
participant FS as File System
participant WASM as @seekjs/parser (WASM)
CI->>FS: Scan /dist directory
FS-->>CI: List of .html files
loop Every 50 files
CI->>FS: Open ReadStreams
FS->>WASM: Pipe raw HTML bytes
WASM->>WASM: Parse tags & bind URL
WASM-->>CI: Yield JSON batch
end
The Goal: Vectorize chunks and compile them into a binary MessagePack (.msp) database.
import { compileIndex } from '@seekjs/compiler';
import { cloudflareEmbedder } from '@seekjs/embedders/cloudflare';
const mspBuffer = await compileIndex({
chunks: chunkBatch,
embedder: cloudflareEmbedder({
apiKey: process.env.CF_API_TOKEN,
model: '@cf/baai/bge-small-en-v1.5'
}),
schema: {
text: 'string',
url: 'string',
hash: 'string'
}
});flowchart TD
A[Raw Chunk Batch] --> B{Schema Validation}
B -->|Valid| C[Orama Instance]
C --> D[Lexical Engine]
D -->|BM25| F[In-Memory DB]
C --> E[Embedder Function]
E -->|Float32Array Vectors| F
F --> G[MessagePack Serializer]
G --> H[search_index.msp]
The Goal: Deliver the index to the browser and execute <15ms hybrid queries locally.
import { useAiSearch } from '@seekjs/react';
function SearchWidget() {
const { search, results, status } = useAiSearch({
indexUrl: '/search_index.msp',
storageStrategy: 'indexedDB'
});
return (
<input
onMouseEnter={() => search.preload()}
onChange={(e) => search.execute(e.target.value)}
/>
);
}sequenceDiagram
participant UI as Search Widget
participant IDB as IndexedDB (Local)
participant CDN as Edge CDN (R2)
UI->>UI: User hovers input (Intent)
UI->>IDB: Check cached .msp & ETag
UI->>CDN: HEAD /search_index.msp
CDN-->>UI: Returns remote ETag
alt Local ETag == Remote ETag
UI->>IDB: Load from cache
else Local ETag != Remote ETag
UI->>CDN: GET /search_index.msp (Brotli)
CDN-->>UI: 600KB Payload
UI->>IDB: Overwrite Cache & ETag
end
UI->>UI: Hydrate Orama WASM
UI-->>UI: Ready for 15ms Searches!
The Goal: Stream synthesized answers with clickable citations from the Edge.
import { streamAiResponse } from '@seekjs/ai-edge';
export async function POST(req) {
const { query, chunks } = await req.json();
const stream = await streamAiResponse({
query,
context: chunks,
provider: 'cloudflare',
model: '@cf/meta/llama-3-8b-instruct'
});
return new Response(stream, { headers: { 'Content-Type': 'text/event-stream' } });
}sequenceDiagram
participant Browser as User Browser
participant API as Edge API (Cloudflare)
participant LLM as Workers AI
Browser->>Browser: Local Orama Search
Browser->>API: POST Query + Top 5 Chunks
API->>API: Assemble Context + Prompt
API->>LLM: Dispatch Inference Request
LLM-->>API: Stream tokens (SSE)
API-->>Browser: Relay SSE Stream
Browser->>Browser: Render Markdown Citations
- The "Index Bloat" Problem: 5,000 pages can be 15MB+. We use int8 quantization and Brotli compression to squash this under 1.5MB.
- Abuse Prevention: Use Cloudflare Turnstile and aggressive Edge semantic caching to prevent LLM endpoint spam.
- Post-Build Stability: We act as a Vite/Rollup plugin to hook into the build lifecycle before obfuscation.
- Experiment 1: Vector Sharding: Test sharding a 50MB
.mspfile into 1MB fragments for incremental hydration on mobile devices. - Experiment 2: LLM Citation Drift: Measure Llama 3 8B hallucination rates on links. If >5%, fallback to manual JSON mapping of citations.
- Experiment 3: Cache Versioning: Ensure schema updates gracefully wipe old
IndexedDBversions. - Experiment 4: Runtime Limits: Stress test WASM parser against 10,000 files in Bun/Node to find the
EMFILEbreak point.
| Component | Cost (Seek.js / Vaan) | Cost (Pinecone + OpenAI) |
|---|---|---|
| Storage (R2) | $0.00 (within 10GB free tier) | $160.00 |
| Search (Local) | $0.00 | $150.00 |
| AI (10k Summaries) | $2.50 (Workers AI) | $5.00 |
| Total Monthly | ~$2.54 | ~$315.00+ |
Monetization Strategy: Offer a $19/mo Pro Tier. At a COGS of ~$2.54, we maintain an 86% margin while saving the customer $400+/mo in "Vector Tax."
Because our architecture is disaggregated, we have zero server idle costs. We scale exactly with the user's traffic via Cloudflare's serverless edge.
Our Motto: "Search that pays for itself."
Hi @AchuAshwath really appreciate the time taken to compile the blueprint 👏
You have covered all possible scenarios to validate at this stage but there are a few concerns that I would like to highlight which might play a vital role in taking this project to next phase.
Challenges in build time
1. Sending build files to SaaS
It is possible to send the files emitted after the build, but how do we make sure if those files have the necessary data that we need in order to generate the database (
.msp) file for the search tool. Not all documentation sites generate markdown, mdx or HTML content.2. i18n
As we are going to deal with the documentation sites, 99% of them will come with Multilingualism (i.e use of more than one language). It will be best to validate if we will me able to maintain multi language database (
.msp) files for the search tool.Additionally, we need to be aware that most of the documentation sites are static and will not required advanced i18n features such as plurals, genders etc,. But some may, which might lead to missing information of the documentation content. Look at the example below (used in Docusaurus):
These are
.jsor.tsfiles which will dynamically inject the content in the DOM.Optimizing performance
Finally, let's use Web Workers API to optimize hydration phase of the database (
.msp) file as web workers does not use the main thread to execute the script (which means the main thread will not be blocked & can perform other blocking tasks seamlessly).The client side application and the Web Worker API can communicate using
postMessagemethod andonmessageevent handler. Here is an example:web-worker.js
client.js
I will try to see how we can mitigate the first two challenges mentioned above, need some time.