Skip to content

Instantly share code, notes, and snippets.

@cmantas
Last active July 21, 2025 21:04
Show Gist options
  • Save cmantas/81210ac72e0b2bd68b61d3f46ba08162 to your computer and use it in GitHub Desktop.
Save cmantas/81210ac72e0b2bd68b61d3f46ba08162 to your computer and use it in GitHub Desktop.

Video Search Diversity Problem & Solutions

Problem Definition

The Challenge

When searching for video content using AI-generated embeddings (like "man with white t-shirt"), current frame-by-frame indexing creates a result clustering problem:

  • Current Setup: Each video frame is stored as a separate searchable item
  • Search Behavior: Query returns individual frames ranked by similarity score
  • Undesired Outcome: Top results often come from the same video, even if other videos contain equally relevant content

Real-World Example

Searching for "man with white t-shirt" might return:

1. Video_A_Frame_15 (similarity: 0.95) ← Same video
2. Video_A_Frame_16 (similarity: 0.94) ← Same video  
3. Video_A_Frame_17 (similarity: 0.93) ← Same video
4. Video_B_Frame_08 (similarity: 0.85) ← Different video (buried)
5. Video_C_Frame_22 (similarity: 0.84) ← Different video (buried)

Business Impact: Users miss relevant content from other videos because results are dominated by sequential frames from one video.


Qdrant Solution: Multiple Vectors Per Point

Concept

Qdrant's "Multiple Vectors Per Point" feature allows storing an entire video clip as a single searchable entity, with each frame represented as a named vector within that entity.

Data Structure Transformation

Before (Frame-level indexing):

Point 1: {id: "VIDEO_A_FRAME_001", vector: [...], payload: {video: "A"}}
Point 2: {id: "VIDEO_A_FRAME_002", vector: [...], payload: {video: "A"}}
Point 3: {id: "VIDEO_A_FRAME_003", vector: [...], payload: {video: "A"}}
Point 4: {id: "VIDEO_B_FRAME_001", vector: [...], payload: {video: "B"}}

After (Video-level indexing):

Point 1: {
  id: "VIDEO_A", 
  vectors: {
    "frame_001": [...],
    "frame_002": [...], 
    "frame_003": [...]
  },
  payload: {video_metadata: "..."}
}

Point 2: {
  id: "VIDEO_B",
  vectors: {
    "frame_001": [...],
    "frame_002": [...]
  },
  payload: {video_metadata: "..."}
}

Search Behavior

  • Query: Single vector representing "man with white t-shirt"
  • Qdrant Processing: Compares query against ALL frame vectors within each video point
  • Result: Returns best-matching videos (not individual frames)
  • Ranking: Based on the highest-scoring frame within each video

Implementation Benefits

  1. Natural Diversity: Impossible to get multiple results from same video
  2. Semantic Grouping: Videos are treated as coherent units
  3. Efficient Storage: Reduced metadata duplication
  4. Scalability: Fewer total points to manage (videos vs. frames)

Code Example

# Create video-level point with multiple frame vectors
video_point = PointStruct(
    id="EVENT0145_CLIP4",
    vector={
        "frame_001": embedding_vector_1,
        "frame_002": embedding_vector_2, 
        "frame_003": embedding_vector_3
    },
    payload={
        "event_name": "EVENT0145",
        "clip_number": 4,
        "duration_seconds": 10.5,
        "frame_count": 3
    }
)

# Search returns video-level results
results = client.search(
    collection_name="video_clips",
    query_vector=query_vector,
    limit=10  # Returns 10 different videos
)

Elasticsearch Alternative Solutions

Solution 1: Aggregation-Based Deduplication

Approach: Use Elasticsearch aggregations to group results by video and return only the best frame per video.

search_body = {
    "size": 0,  # Don't return individual frames
    "aggs": {
        "unique_videos": {
            "terms": {"field": "video_id", "size": 10},  # Top 10 videos
            "aggs": {
                "best_frame": {
                    "top_hits": {
                        "size": 1,  # Best frame per video
                        "sort": [{"_score": "desc"}]
                    }
                }
            }
        }
    },
    "query": {
        "script_score": {
            "query": {"match_all": {}},
            "script": {
                "source": "cosineSimilarity(params.query_vector, 'embedding_vector') + 1.0",
                "params": {"query_vector": query_vector}
            }
        }
    }
}

Pros:

  • Works with existing frame-level data
  • Leverages ES's powerful aggregation engine
  • Highly flexible grouping criteria

Cons:

  • More complex query structure
  • Potentially higher computational overhead
  • Requires careful aggregation configuration

Solution 2: Application-Level Post-Processing

Approach: Retrieve more results than needed, then deduplicate in application code.

def deduplicate_video_results(search_results, max_per_video=1):
    """Keep only top N frames per video"""
    video_counts = {}
    filtered_results = []
    
    for result in search_results:
        video_id = result['_source']['video_id']
        
        if video_counts.get(video_id, 0) < max_per_video:
            filtered_results.append(result)
            video_counts[video_id] = video_counts.get(video_id, 0) + 1
            
    return filtered_results

Pros:

  • Simple to implement and understand
  • Works with any vector database
  • Easy to adjust deduplication logic

Cons:

  • Network overhead (retrieving extra results)
  • Application-level complexity
  • Less efficient than database-level solutions

Solution 3: Nested Documents (Advanced)

Approach: Use ES nested documents to store videos with embedded frame data.

{
    "video_id": "EVENT0145_CLIP4",
    "video_metadata": {...},
    "frames": [
        {"frame_id": 1, "embedding": [...]},
        {"frame_id": 2, "embedding": [...]},
        {"frame_id": 3, "embedding": [...]}
    ]
}

Pros:

  • Semantic grouping similar to Qdrant approach
  • Leverages ES nested query capabilities

Cons:

  • Complex nested queries required
  • Vector similarity search on nested fields is challenging
  • May require custom scoring functions

Efficiency Comparison

Qdrant Multiple Vectors Approach

  • Query Complexity: Simple (single API call)
  • Network Overhead: Minimal (only relevant videos returned)
  • Computational Efficiency: High (optimized for multi-vector points)
  • Storage Efficiency: Good (reduced metadata duplication)
  • Development Complexity: Low (built-in feature)

Elasticsearch Alternatives

Approach Query Complexity Network Overhead Computational Cost Development Effort
Aggregations High Low Medium-High Medium
Post-Processing Low High Low Low
Nested Docs Very High Low High High

Recommendation

For Production Use:

  • Qdrant: Multiple vectors per point (if using Qdrant)
  • Elasticsearch: Aggregation-based approach for best balance of efficiency and functionality

For Prototyping:

  • Both platforms: Post-processing approach for simplicity and cross-platform compatibility

The Qdrant solution is more elegant and efficient for this specific use case, while Elasticsearch requires more complex workarounds but offers greater flexibility in other areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment