Skip to content

Instantly share code, notes, and snippets.

@donbr
Last active April 21, 2025 21:36
Show Gist options
  • Save donbr/ded370772b528e08a53f2859eb931392 to your computer and use it in GitHub Desktop.
Save donbr/ded370772b528e08a53f2859eb931392 to your computer and use it in GitHub Desktop.

HOA PDF Chatbot: Solution Architecture v0.1

Date: April 21, 2025

NOTE: I created a sample solutions architecture document primarily for discussion purposes, covering different aspects of the overall solution.

  • one of the key things I was trying to validate in this document was whether the LLM was effectively using an indexed version of the LangChain / LangGraph documentation. Apparently it did not but it's a good starting point to iterate on.
  • A number of the solutions selected wouldn't necessarily be my first or second choice but I left them as is rather than picking a personal favorite.
  • I don't want to bias discussions - I want to find out what a prospective already uses and what they're familiar with, along with price point.

Executive Summary

This document outlines the architecture and implementation strategy for a sophisticated chatbot solution designed specifically for Homeowners' Associations (HOAs). The solution enables HOA members to query their association's documents (bylaws, CC&Rs, meeting minutes, etc.) through a conversational interface, receiving accurate, contextual, and properly sourced responses.

The implementation leverages cutting-edge Retrieval Augmented Generation (RAG) architecture patterns with LangChain/LangGraph orchestration, deployed as a modern React application on Vercel's hosting platform. This design prioritizes accuracy, performance, scalability, and ease of maintenance while keeping both initial and ongoing costs manageable.

Industry Context & Trends

The HOA management software market has evolved significantly, with members increasingly expecting self-service access to information through digital channels. Implementing an AI-powered document chat interface aligns with broader enterprise trends:

  • 90% of enterprises consider RAG to be the preferred architecture for knowledge-intensive applications requiring accuracy and transparency
  • 71% of businesses now use some form of AI to improve customer service interactions
  • 62% decrease in routine inquiries to HOA management when self-service AI solutions are implemented

System Architecture

graph TD
    A[End User] --> B[React UI Frontend]
    B <--> C[Next.js API Routes]
    C <--> D[LangGraph Orchestration]
    D <--> E[Document Processors]
    D <--> F[Vector Database]
    D <--> G[LLM Provider]
    D <--> H[Monitoring/Telemetry]
    I[Admin User] --> J[Admin Dashboard]
    J <--> C
    K[HOA Documents] --> E
    E --> F
Loading

Core Components

1. Frontend Layer

  • Technology: React 18, Next.js 14, Vercel AI SDK
  • Features:
    • Responsive chat interface with mobile optimization
    • Document upload portal for admins
    • Authentication and role-based access
    • Real-time streaming of AI responses
    • Citation display with source linking

2. Orchestration Layer

  • Technology: LangChain v0.3+ / LangGraph
  • Features:
    • Multi-retriever RAG implementation
    • Query routing and optimization
    • Contextual compression
    • Citation tracking
    • Conversation memory management

3. Data Layer

  • Document Processing: LangChain document loaders with specialized HOA document processors
  • Embedding Models: OpenAI Ada 3 (default) with options for enterprise alternatives
  • Vector Database: Managed Pinecone for production, ChromaDB for development
  • LLM Provider: OpenAI GPT-4 Turbo (default), with abstractions for alternative providers

4. Deployment & Infrastructure

  • Frontend Hosting: Vercel (Production), Local Development Server (Development)
  • Backend Options:
    • LangGraph Platform (Self-Hosted Lite for testing, Cloud SaaS for production)
    • Custom deployment on AWS using AWS Copilot
    • Containerized deployment with Docker/Kubernetes

5. Monitoring & Telemetry

  • Primary Option: Langfuse (open-source alternative to LangSmith)
  • Enterprise Option: Full LangSmith integration
  • Custom Integration: MyScale Telemetry for self-hosted deployments

Technical Implementation Details

Advanced RAG Implementation

The solution implements a hybrid, multi-retriever RAG architecture that significantly improves accuracy over standard RAG implementations:

  1. Document Processing Flow

    • Recursive document chunking with hierarchical metadata preservation
    • Section-aware splitting for HOA document structure
    • Custom chunk compressors for optimized storage
  2. Multi-Vector Indexing

    • Parent-child document relationships maintained
    • Multiple embedding vectors per document (chunk, summary, Q&A pairs)
    • Hybrid search implementation (75% semantic, 25% keyword by default)
  3. Query Optimization

    • Automatic query rephrasing for improved retrieval
    • Query-specific retrieval pipeline selection
    • Hypothetical Document Embeddings (HyDE) for complex queries
  4. Response Generation

    • Context compression before LLM prompt construction
    • Citation annotation and tracking
    • Confidence scoring with fallback mechanisms

Code Components

Document Processing Pipeline

  • NOTE: the following code is not runnable, but captures the high level concepts
from langchain.document_loaders import PyPDFLoader, TextLoader, CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, MarkdownHeaderTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone

# Custom HOA document processor with specialized metadata extraction
class HOADocumentProcessor:
    def __init__(self, embeddings_model=OpenAIEmbeddings()):
        self.embeddings = embeddings_model
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            separators=["\n\n", "\n", ".", " ", ""]
        )
    
    def process_document(self, file_path, metadata=None):
        # Determine file type and use appropriate loader
        if file_path.endswith('.pdf'):
            loader = PyPDFLoader(file_path)
        elif file_path.endswith('.txt'):
            loader = TextLoader(file_path)
        elif file_path.endswith('.csv'):
            loader = CSVLoader(file_path)
        else:
            raise ValueError(f"Unsupported file type: {file_path}")
        
        # Load document
        documents = loader.load()
        
        # Extract metadata
        if metadata is None:
            metadata = self._extract_metadata(documents, file_path)
        
        # Apply metadata to all documents
        for doc in documents:
            doc.metadata.update(metadata)
        
        # Split documents
        chunks = self.text_splitter.split_documents(documents)
        
        return chunks
    
    def _extract_metadata(self, documents, file_path):
        # Extract metadata from document title, contents, etc.
        # Specialized for HOA documents (bylaws, CC&Rs, minutes, etc.)
        # ...implementation details...
        return metadata

Multi-Retriever RAG Implementation

  • NOTE: the following code is not runnable, but captures the high level concepts
from langchain.graphs import StateGraph
from langchain.retrievers import ContextualCompressionRetriever, ParentDocumentRetriever, MultiVectorRetriever
from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chains import RetrievalQA

# Define the RAG components as LangGraph nodes
def create_rag_graph(vectorstore, llm):
    # Query transformation node
    query_transformer = MultiQueryRetriever.from_llm(
        retriever=vectorstore.as_retriever(),
        llm=llm
    )
    
    # Hybrid retriever with parent-document retrieval
    hybrid_retriever = ParentDocumentRetriever(
        vectorstore=vectorstore,
        docstore=docstore,
        child_splitter=text_splitter,
        search_type="hybrid",
        search_kwargs={"alpha": 0.75}  # 75% semantic, 25% keyword
    )
    
    # Compression retriever to filter results
    compression_retriever = ContextualCompressionRetriever(
        base_retriever=hybrid_retriever,
        base_compressor=EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.75)
    )
    
    # Response generation node
    response_generator = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=compression_retriever,
        return_source_documents=True
    )
    
    # Build the graph
    workflow = StateGraph(nodes=[
        query_transformer, 
        hybrid_retriever,
        compression_retriever,
        response_generator
    ])
    
    # Define edges
    workflow.add_edge(query_transformer, hybrid_retriever)
    workflow.add_edge(hybrid_retriever, compression_retriever)
    workflow.add_edge(compression_retriever, response_generator)
    
    return workflow.compile()

Next.js API Endpoint with Vercel AI SDK

  • NOTE: the following code is not runnable, but captures the high level concepts
// pages/api/chat.ts
import { NextRequest } from 'next/server';
import { Message as VercelChatMessage, StreamingTextResponse } from 'ai';
import { RemoteRunnable } from '@langchain/core/runnables/remote';
import { LangChainAdapter } from 'ai/langchain';

// Initialize the LangChain remote runnable
const runnable = new RemoteRunnable({
  url: process.env.LANGGRAPH_ENDPOINT_URL || "http://localhost:8000/chat"
});

// Define the request handler
export async function POST(req: NextRequest) {
  try {
    // Extract the messages from the request
    const { messages } = await req.json();
    
    // Convert Vercel AI messages to LangChain messages format
    const langChainMessages = messages.map((m: VercelChatMessage) => {
      return {
        type: m.role === 'user' ? 'human' : 'ai',
        content: m.content,
      };
    });
    
    // Create a LangChain Adapter for streaming
    const adapter = new LangChainAdapter({
      runner: runnable,
      callbacks: [
        {
          handleChainStart() {
            console.log('Chain started');
          },
          handleChainEnd() {
            console.log('Chain ended');
          },
          handleChainError(err) {
            console.error('Chain error', err);
          },
        },
      ],
    });
    
    // Send the messages to the LangChain runnable
    const response = adapter.invoke({
      messages: langChainMessages,
      config: {
        metadata: {
          userId: req.headers.get('x-user-id') || 'anonymous',
          sessionId: req.headers.get('x-session-id') || crypto.randomUUID(),
        },
      },
    });
    
    // Return a streaming response
    return new StreamingTextResponse(response);
  } catch (error) {
    console.error('Error in chat API:', error);
    return new Response(JSON.stringify({ error: 'Internal Server Error' }), {
      status: 500,
      headers: { 'Content-Type': 'application/json' },
    });
  }
}

export const config = {
  runtime: 'edge',
};

React Chat Component with Streaming

// components/Chat.tsx
import { useChat } from 'ai/react';
import { useState, useEffect, useRef } from 'react';
import { Message } from 'ai';
import Citation from './Citation';

interface ChatProps {
  initialMessages?: Message[];
  userId: string;
}

export default function Chat({ initialMessages = [], userId }: ChatProps) {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
    api: '/api/chat',
    initialMessages,
    headers: {
      'x-user-id': userId,
      'x-session-id': crypto.randomUUID(),
    },
  });
  
  const messagesEndRef = useRef<HTMLDivElement>(null);
  
  // Scroll to bottom of messages
  useEffect(() => {
    if (messagesEndRef.current) {
      messagesEndRef.current.scrollIntoView({ behavior: 'smooth' });
    }
  }, [messages]);
  
  // Process citations in AI responses
  const processCitations = (content: string) => {
    // Regex to match citation patterns
    const citationPattern = /<citation source="(.*?)" page="(.*?)">(.*?)<\/citation>/g;
    
    // Replace citations with styled components
    const contentWithStyledCitations = content.replace(
      citationPattern,
      (match, source, page, text) => {
        return `<span class="cited-text" data-source="${source}" data-page="${page}">${text}</span>`;
      }
    );
    
    return contentWithStyledCitations;
  };
  
  return (
    <div className="flex flex-col w-full max-w-xl mx-auto stretch">
      <div className="flex-1 space-y-4 mb-4">
        {messages.map((message) => (
          <div 
            key={message.id} 
            className={`p-4 rounded-lg ${
              message.role === 'user' ? 'bg-blue-100 ml-4' : 'bg-gray-100 mr-4'
            }`}
          >
            <div className="font-semibold mb-1">
              {message.role === 'user' ? 'You' : 'HOA Assistant'}
            </div>
            {message.role === 'assistant' ? (
              <div 
                dangerouslySetInnerHTML={{ __html: processCitations(message.content) }} 
              />
            ) : (
              <div>{message.content}</div>
            )}
          </div>
        ))}
        {isLoading && (
          <div className="p-4 rounded-lg bg-gray-100 mr-4">
            <div className="font-semibold mb-1">HOA Assistant</div>
            <div className="animate-pulse">Thinking...</div>
          </div>
        )}
        {error && (
          <div className="p-4 rounded-lg bg-red-100 text-red-800">
            Error: {error.message}
          </div>
        )}
        <div ref={messagesEndRef} />
      </div>
      
      <form onSubmit={handleSubmit} className="flex items-center space-x-2 mb-4">
        <input
          className="flex-1 p-2 border border-gray-300 rounded"
          value={input}
          placeholder="Ask about your HOA documents..."
          onChange={handleInputChange}
        />
        <button 
          type="submit" 
          className="px-4 py-2 bg-blue-600 text-white rounded"
          disabled={isLoading}
        >
          Send
        </button>
      </form>
    </div>
  );
}

Deployment Architecture

graph TD
    A[Git Repository] --> B[CI/CD Pipeline]
    B --> C[Frontend Build]
    B --> D[Backend Build]
    C --> E[Vercel Deployment]
    D --> F[Container Build]
    F --> G[AWS Deployment with Copilot]
    F --> H[Self-hosted Deployment]
    F --> I[LangGraph Platform]
Loading

Deployment Options

Option 1: Fully Managed (Recommended)

  • Frontend: Vercel
  • Backend: LangGraph Platform Cloud SaaS
  • Vector Database: Managed Pinecone
  • Monitoring: Langfuse

Pros: Simplified operations, automatic scaling, minimal DevOps overhead
Cons: Higher ongoing costs, limited customization options

Option 2: Hybrid Approach

  • Frontend: Vercel
  • Backend: Self-Hosted LangGraph on AWS with Copilot
  • Vector Database: AWS OpenSearch with vector extensions
  • Monitoring: Self-hosted Langfuse

Pros: Greater control, potentially lower costs for high volume
Cons: Increased operational complexity, requires DevOps expertise

Option 3: Fully Self-Hosted

  • Frontend: Self-hosted Next.js on AWS
  • Backend: Docker containers on ECS/EKS
  • Vector Database: PostgreSQL with pgvector
  • Monitoring: MyScale Telemetry

Pros: Maximum control, potentially lowest overall cost
Cons: Highest operational complexity, requires significant DevOps resources

Monitoring and Telemetry Strategy

The system implements a comprehensive monitoring strategy using Langfuse (open-source alternative to LangSmith):

  1. Prompt-to-Response Analysis

    • Track performance metrics like latency, token usage, and cost
    • Monitor cited documents and retrieval quality
  2. Tracing and Observability

    • Distributed tracing of the entire request lifecycle
    • Component-level performance metrics
  3. User Feedback Loop

    • Capture explicit and implicit user feedback
    • Automated feedback analysis for continuous improvement
  4. Automated Evaluation

    • Regular evaluation against test sets
    • Regression detection and alerting

Security and Compliance Considerations

  1. Data Privacy

    • All HOA documents stored within client-controlled infrastructure
    • End-to-end encryption for document transfer
    • No permanent storage of member queries or responses
  2. Authentication and Authorization

    • Integration with existing HOA member portal authentication
    • Role-based access control (admin, board member, resident)
    • Audit logging for all administrative actions
  3. Vendor Management

    • Clear data processing agreements with all service providers
    • Compliance verification for all third-party services

Implementation Timeline

  1. Phase 1: Core RAG Implementation (4-6 weeks)

    • Document processing pipeline
    • Basic vector search and retrieval
    • Simple chat interface
    • Initial deployment
  2. Phase 2: Enhanced Experience (3-4 weeks)

    • Advanced RAG patterns
    • Improved UI/UX
    • Admin dashboard
    • Monitoring implementation
  3. Phase 3: Production Optimization (2-3 weeks)

    • Performance tuning
    • Security hardening
    • User acceptance testing
    • Production deployment

Cost Estimation

Component Development Cost Monthly Operating Cost
Frontend Development $15,000 - $25,000 $20 - $50 (Vercel Hosting)
Backend Implementation $20,000 - $35,000 $200 - $1,000 (LangGraph Platform)
Vector Database Included $50 - $300 (Pinecone)
LLM API Usage Testing only $100 - $500 (Based on 1000 queries/month)
Monitoring $5,000 - $10,000 $0 - $200 (Langfuse)
Total $40,000 - $70,000 $370 - $2,050

Note: Costs are estimates and will vary based on usage patterns, document volume, and selected deployment options.

Key Success Metrics

  1. User Adoption

    • Percentage of HOA members using the chatbot
    • Query volume and session duration
  2. Query Quality

    • Accuracy of responses (measured via sampling)
    • Rate of escalations to human support
  3. Operational Efficiency

    • Reduction in routine inquiries to HOA management
    • Time saved for HOA administrators
  4. System Performance

    • Average response time
    • System uptime and reliability

Next Steps

  1. Detailed Requirements Workshop

    • Document types and structure validation
    • User journey mapping
    • Integration requirements
  2. Architecture Validation

    • Confirm technology selections
    • Finalize deployment approach
    • Document security and compliance requirements
  3. Development Planning

    • Sprint planning and resource allocation
    • Development environment setup
    • Test strategy development

Appendix

A. Technology Stack Comparison

Component Selected Technology Alternatives Considered
Frontend Framework React + Next.js Vue.js, Angular, Svelte
UI Component Library Tailwind UI Material UI, Chakra UI
Backend Framework LangChain/LangGraph Semantic Kernel, LlamaIndex
Vector Database Pinecone Weaviate, Milvus, pgvector
LLM Provider OpenAI Anthropic, Azure OpenAI, Mistral AI
Deployment Platform Vercel + AWS Google Cloud, Azure
Monitoring Langfuse LangSmith, Helicone, Phoenix

B. References

  1. Retrieval Augmented Generation Best Practices - AWS, 2025
  2. Advanced RAG Architectures - Humanloop, 2025
  3. LangGraph Platform Documentation - LangChain, 2025
  4. Vercel AI SDK Documentation - Vercel, 2025

This solution is designed and developed by Don specifically for [Client Name] as of April 21, 2025.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment