Fused search

Open-Source Hybrid Keyword and Semantic Search Engines

OpenSearch (Elasticsearch Open Source Fork) • Ease of Setup & Scalability: OpenSearch is a drop-in replacement for Elasticsearch, offering easy cluster deployment via Docker or managed services. It’s built on Lucene, so it scales horizontally and handles large indexes reliably. Unlike Elasticsearch’s basic tier, OpenSearch has no license restrictions on advanced features , making it fully open-source and cost-free to scale out. • Semantic Search Flexibility: OpenSearch supports dense vector fields (knn_vector) for embeddings and can execute kNN (nearest neighbor) searches natively . It enables hybrid search: you can combine traditional BM25 keyword matches with semantic vector similarity in one query . For example, a query can have both a full-text component and an embedding-based component, with their scores fused. It also provides a Neural Search plugin to automate embeddings – you can configure a transformer model in an ingest pipeline to generate vectors for documents at index time (and even at query time), eliminating the need to preprocess text externally. This means you’re free to choose your embedding model (e.g. SBERT or custom) and have OpenSearch apply it during ingestion . • Performance: Built on Elasticsearch’s core, OpenSearch delivers high-performance full-text search and analytics. It uses approximate kNN algorithms (HNSW under the hood) for fast vector retrieval on large corpora . It’s designed for production workloads – you can index millions of HTML/Markdown documents and get low-latency queries by scaling out shards. Hybrid queries naturally add some overhead (since both BM25 and vector similarity are computed), but OpenSearch allows filtering to narrow vector search scope and supports efficient ANN indices for speed . In practice it can handle enterprise-scale data if properly tuned, just as Elasticsearch does for keyword search. • Integration with LLMs: OpenSearch exposes a RESTful JSON API and has wide language client support, so it’s straightforward for an LLM-based system to query it. For example, one can use an OpenSearch DSL query as a tool for a GPT-based agent, or use LangChain’s OpenSearch integration to let an LLM perform searches. There’s no built-in OpenAI/Anthropic API integration (OpenSearch focuses on the search backend), but it pairs well with LLMs in a retrieval-augmented generation pipeline. You can use OpenAI to generate/query embeddings and store them in OpenSearch, or use OpenSearch’s own ML inference capability to host a model. In summary, OpenSearch is a flexible, scalable choice if you need a mix of robust keyword search and custom semantic search in an open-source package .

Weaviate • Ease of Setup & Data Ingestion: Weaviate is an open-source vector search database that’s very quick to get started with – a single Docker container can spin it up, and it also offers a managed cloud option. It natively understands JSON data (with schema flexibility) and can ingest text from HTML/Markdown (after you strip markup) easily via its client or REST API. Weaviate automatically handles indexing and creates vector embeddings if you enable a module, so minimal manual preprocessing is needed. It’s designed for automation – for example, you can just point Weaviate at your text and it will vectorize it on import using a chosen model. • Flexibility (Embeddings & Features): Weaviate is built for semantic search from the ground up . It supports a variety of vectorization modules out-of-the-box, allowing you to choose or plug in different embedding models. You can use OpenAI’s embeddings, Cohere, Hugging Face transformers, or others simply by configuring the corresponding module (e.g. text2vec-openai for OpenAI’s API) . This means you could swap in Nomic’s Embed v2 MoE model if it’s available via Hugging Face or as an API – Weaviate will handle calling that model to embed your documents. In addition to semantic vector search, Weaviate also offers keyword filtering and a hybrid search mode that combines keyword and vector signals . It allows similarity searches on embeddings, while also supporting BM25-style term search on text fields for exact matches. Advanced features like cross-referencing between data objects (a graph-like data model) are available, and you can use custom ranking rules or filters if needed. • Performance & Scalability: Weaviate uses an HNSW index for vectors in memory, giving very fast semantic queries. It can scale horizontally by sharding data across multiple instances (distributing both the vectors and the text). For moderate corpus sizes (millions of documents), it performs well on both lexical and vector searches. Its full-text search capabilities are not as elaborate as Lucene’s (e.g. fewer custom analyzers), but they are sufficient for most use cases and can complement vector search. Weaviate’s focus on embeddings may require more RAM/CPU, especially if using large embedding dimensions or running on CPUs without model acceleration. However, it supports approximate vector search and even optional vector compression to keep performance high as data grows. • Integration with OpenAI/Anthropic: Weaviate was designed with LLM integration in mind . It provides both REST and GraphQL APIs, and client libraries (Python, JavaScript, etc.) making it easy for an LLM agent to query it. Notably, Weaviate has built-in modules for OpenAI: with an API key, it will directly use OpenAI’s models to vectorize data or even do question-answering with qna-openai . This seamless integration means an LLM (like GPT-4 or Claude) could invoke a Weaviate search and Weaviate can behind-the-scenes call OpenAI’s embedding API – all through simple queries from the LLM . Anthropic’s API isn’t a prebuilt module, but an Anthropic model could similarly use Weaviate by treating it as a vector store (Anthropic’s Claude could ask a tool that queries Weaviate). Weaviate also works with LangChain and other LLM frameworks for Retrieval Augmented Generation. Overall, Weaviate offers one of the most developer-friendly experiences for hybrid search with LLMs, thanks to its automatic embedding pipelines and high-level query interfaces.

Vespa • Ease of Setup & Deployment: Vespa (open-sourced by Yahoo) is a powerful but complex engine. Setting up Vespa requires defining an application package (schema and configuration) and deploying it to a Vespa instance (via Docker or on Vespa Cloud). This initial learning curve is higher – you’ll need to specify how documents are structured (fields for text, embeddings, etc.) and configure indexing and ranking in a config file or via the pyvespa SDK. Once up and running, Vespa can serve as a scalable cluster (it’s meant for large-scale deployments) and will index HTML/Markdown content (usually you’d preprocess to text or use Vespa’s text field with an analyzer for HTML). In short, Vespa setup isn’t as one-click as others, but it’s designed to handle enterprise search scenarios once configured. • Flexibility (Embedding Models & Ranking): Vespa shines in flexibility. It’s a fully featured search engine and vector database in one . You can store both inverted indices for text and vectors for embeddings, and query them in combination. Vespa supports hybrid ranking natively – for example, you can fetch the top N documents by semantic similarity and re-rank them using BM25 or vice versa, or even do a fused ranking score within the same query . It allows custom ranking profiles where you can mathematically combine a text relevance score and a vector similarity score as you see fit. Importantly, Vespa lets you bring your own embedding models: you can preprocess text with any model and feed the vectors, or use Vespa’s integration to host models (it can load ONNX models for inference as part of the query pipeline). For instance, you could integrate the Nomic v2 MoE model by converting it to ONNX and deploying it in Vespa, enabling Vespa to generate or use those embeddings internally. Vespa also supports structured data and filters alongside text, so you can mix metadata filtering with semantic search easily. This level of control is unparalleled – you can fine-tune how results are retrieved and scored at a very granular level. • Performance & Scalability: Vespa is built for massive scale and low latency. It has been used in production for web search and e-commerce with billions of documents. Under the hood, Vespa uses efficient C++ components; it implements approximate nearest neighbor search (with algorithms like HNSW) and can utilize multiple cores and nodes effectively. Its serving topology can be scaled out with content nodes, and it handles query distribution and result merging for you. Vespa can also perform real-time updates and feeding of documents, making it suitable for dynamic content. Because of its advanced capabilities, expect to invest some effort in tuning (e.g., configuring memory for ANN indices, tweaking ranking expressions). When properly tuned, Vespa can return results combining semantic and lexical relevance extremely fast, even at very large data sizes. It’s perhaps the closest to “Google-like” search on this list in terms of handling complex ranking at scale. • Integration with OpenAI/Anthropic: Vespa offers a REST API for search queries and document updates, so any LLM can query it via HTTP calls. Additionally, Vespa has first-class support for integrating with LLM services – it can call external APIs like OpenAI’s or Anthropic’s as part of its processing pipeline . For example, you could configure Vespa to send the top search results to OpenAI’s GPT-4 to generate a summary answer, all within the Vespa query response flow. Vespa’s documentation highlights that you can use OpenAI’s ChatGPT or Anthropic’s Claude in queries, or even deploy local LLMs inside Vespa’s container runtime . This means you could build a system where the LLM (OpenAI/Anthropic) and Vespa are tightly coupled: Vespa handles retrieval and passes context to the LLM for generation. However, doing so requires writing some custom integration logic in Vespa’s config (Java or specialized schema config), so it’s powerful but more manual. For simpler setups, one would use Vespa as a pure retrieval tool: e.g., a LangChain agent retrieves documents via Vespa (there’s community integration for Vespa in LangChain) , then an OpenAI/Anthropic model produces the final answer. In summary, Vespa is the most flexible and scalable, integrating deeply with LLM workflows, but demands more effort to configure and maintain.

Marqo • Ease of Setup & Use: Marqo is a newer open-source “end-to-end” vector search engine that prioritizes ease of use. It packages the entire pipeline – from embedding generation to indexing and searching – into one system. Getting started is as simple as running a Docker container or using their Python SDK (pip install marqo), and Marqo will handle indexing of your HTML/Markdown content (after you provide the text) and even generate embeddings automatically. There’s no need to run a separate model inference service or manually preprocess documents. Marqo was built to be cloud-native and scalable; it supports horizontal sharding of indexes for scale-out, so it can handle tens of millions to hundreds of millions of documents by splitting them across nodes . This automation and scalability make it very convenient – you get a ready-to-go semantic search with minimal setup. • Flexibility (Embedding Models & Features): One of Marqo’s key features is the ability to use a variety of embedding models easily. It comes with pre-configured defaults (for example, it recommends models like e5-base-v2 for text) but you can also bring your own model . Under the hood, Marqo leverages PyTorch and Hugging Face models, and even OpenAI models if configured, to generate embeddings . You can start with a built-in model or specify a model from HuggingFace Hub – Marqo will download and use it, including automatically converting it to ONNX for faster inference if possible . This means you have the flexibility to experiment with different embeddings (including multilingual or domain-specific models). Marqo also supports multi-modal data: you can index images alongside text and it will use vision-language models (like CLIP) to embed them . When it comes to search techniques, Marqo supports pure vector semantic search and also hybrid search combining BM25 with vectors . In a hybrid query, Marqo can retrieve results via lexical search and re-rank with vector similarity or fuse the scores of lexical and semantic searches . This addresses cases where exact keyword matching is critical (e.g. numbers or codes) while still using embeddings for broader semantic matching . The system doesn’t expose as much low-level tweakability as Vespa or OpenSearch, but covers the common needs out-of-the-box (including typos/variants handled by semantic similarity). • Performance: Marqo builds on proven components like the HNSW algorithm for vector search (storing embeddings in memory for fast ANN lookups) and uses ONNX-optimized inference to speed up model computations . It is designed to handle real-time search with both text and image data. Because Marqo performs model inference internally, query latency will include embedding the user query through a neural model – for large models this can be the main cost. However, with medium-sized models (like e5-base or CLIP) and ONNX optimization, Marqo can achieve good throughput. It supports asynchronous index operations and non-blocking searches, which helps maintain responsiveness under load . In terms of scale, Marqo’s documentation claims support for hundred-million document corpora by sharding the index . This is still relatively unproven compared to long-established engines, but early users have reported it to be robust for moderate scales. If your use case involves a moderate number of docs and you want quick setup, Marqo’s integrated approach can deliver strong relevance (thanks to modern embeddings) without a complex tuning phase. • Integration with OpenAI/Anthropic: Marqo exposes a simple REST API and has a Python client, making it easy to plug into an LLM workflow. In a typical setup, you’d use Marqo as the “vector store” for retrieved context, and then pass that context to an LLM like GPT-4 or Claude for answer generation. Marqo doesn’t host the generative LLM itself (it focuses on embedding and retrieval) , but it plays nicely with them. For example, you can use OpenAI’s GPT-4 to formulate a query; Marqo will handle the search and return relevant passages, which the LLM can then consume to produce a final answer. Because Marqo can use OpenAI’s embedding models (if you choose to, by calling OpenAI API to embed data), it’s possible to ensure the same embeddings are used by both your indexing and query steps. Anthropic’s models can be integrated similarly by simply using Marqo’s results as input to Claude. Marqo also has integration with LangChain , meaning you can easily use it as a retriever in a chain for LLM QA tasks. In summary, Marqo is highly convenient for LLM integration – it handles the heavy lifting of vector search and allows the LLM to focus on reasoning over the results. Its design philosophy is to require little configuration, so you can get semantic search up and running (with your choice of embeddings) and hook it into an OpenAI/Anthropic-powered app with minimal code.

Summary and Recommendations

All four solutions above are capable of indexing HTML/Markdown content and providing relevant results through a mix of keyword and embedding-based techniques. The “best” choice depends on your priorities: • If you want minimal setup and automatic embedding generation: Weaviate and Marqo are top choices. Weaviate offers a more mature ecosystem (GraphQL interface, modules for various models, and an active community) and integrates seamlessly with OpenAI for both embeddings and Q&A out-of-the-box . Marqo provides an all-in-one experience with even less setup – ideal if you just want to drop in a search engine that works with modern embeddings without managing pipelines . Both support hybrid search and will let you experiment with different embedding models easily. • If you need a balance of powerful text search and decent vector support in a familiar package: OpenSearch is a great option. It gives you enterprise-grade keyword search (stemming, filtering, etc.) plus the ability to do ANN semantic search . You have to do a bit more work (e.g., set up an ingest pipeline or call an embedding API yourself), but it’s very scalable and well-supported by tools. It’s essentially the open-source way to get “Elasticsearch + vectors” without license fees, and is proven in production for large-scale search use cases. • If you require maximum flexibility, custom ranking, or extreme scaling: Vespa is unmatched in what it can do, blending lexical and semantic search in arbitrary ways within the same system . It’s the closest analog to what a Google/Bing-scale stack might look like internally. Vespa is ideal when you have specialized requirements (complex relevance formulas, integration of custom ML models, billions of documents, etc.) and the engineering resources to leverage it. It will integrate with LLMs at a deep level (even calling LLM APIs during search if needed) . The trade-off is higher complexity in setup and maintenance.

In conclusion, all these open-source technologies will enable an LLM to perform high-quality retrieval. For most use cases, a hybrid search approach using either Weaviate or OpenSearch will provide a strong foundation – they offer a good mix of automation, flexibility in model choice, and scalability. Marqo is an excellent choice when ease-of-use is paramount and you want everything in one system, whereas Vespa caters to advanced scenarios where you need the search engine to be as smart and tunable as your LLM. Each of these can be integrated with OpenAI/Anthropic APIs, so you can’t go too far wrong – it’s about picking the tool that best matches your project’s scale and your team’s expertise.

Dicklesworthstone/FusedSearch.md