Happenstance Affinity Ranking - 1 Page Proposal

Problem & Solution

Need: Rank user connections by "affinity strength" for search results and question prioritization Current: Only works for users with connected email headers Goal: Works for ALL users, efficient queries, production-ready ASAP

Solution: XGBoost classifier trained on binary labels (0/1) that outputs probability scores (0.0-1.0) for ranking

Features (What Makes People Connected?)

Affinity scores are only calculated for user-person pairs where at least one relevant feature or connection exists. This includes:

Email Signals: Total emails sent/received, bidirectional exchange score
LinkedIn Signals: Connected status, connection recency, mutual connections
Professional Overlap: Same company (current/past), shared education, network overlap
Engagement Patterns: Interaction consistency, response rates, communication trends

Pairs with no features or connections are ignored to ensure scalability and efficiency.

Training Data (Binary Labels)

High Affinity (1): Bidirectional email + recent LinkedIn connection + professional overlap Low Affinity (0): Old LinkedIn-only connections, one-way emails, distant network ties

Model Output: Probability scores 0.0-1.0 perfect for ranking

Architecture

Email/LinkedIn Data → Feature Engineering → XGBoost Model → Affinity Scores (0-1)
                                                          ↓
Search Results: Ranked by affinity score
Questions: Show highest affinity people first

Storage

PostgreSQL: user_person_affinity(user_id, person_id, score, updated_at)
Redis: Hot scores cached for <10ms lookups
Indexes: (user_id, score DESC) for fast ranking queries

Automated Affinity Score Update (Cron Job)

A daily cron job (scheduled batch process) is responsible for keeping affinity scores up to date in the database. This job performs the following steps:

Schedule: Runs every 24 hours (e.g., 2:00 AM UTC)
Steps:
1. Select candidate pairs: Identify user-person pairs with at least one relevant feature or connection (e.g., email, LinkedIn, or professional overlap). Ignore pairs with no features.
2. Extract latest features from Email and LinkedIn data for these candidate pairs
3. Apply the trained XGBoost model to compute updated affinity scores (0-1)
4. Update the user_person_affinity table in PostgreSQL with new scores and timestamps
5. Refresh hot scores in Redis cache for fast access

Example Cron Schedule:

0 2 * * * /usr/bin/python3 /app/batch_update_affinity_scores.py

This ensures that search results and question prioritization always use the most recent data, while also supporting real-time updates for new interactions as needed, without incurring unnecessary computation for unrelated pairs.

Key Questions Answered

What signals? Email frequency, LinkedIn connections, professional overlap, network proximity

How to compute? Daily batch jobs for feature extraction + real-time updates on new interactions

What to store? Affinity scores (0-1) in PostgreSQL with user_id/person_id indexes

Runtime queries?

Search: SELECT * FROM people JOIN affinity WHERE user_id=X ORDER BY score DESC
Questions: SELECT person_id FROM affinity WHERE user_id=X AND score>0.5 ORDER BY score DESC

Performance targets: <50ms search ranking, <10ms individual lookups

Implementation (4 weeks)

Week 1-2: MVP

Basic feature extraction (email + LinkedIn)
Heuristic binary labeling
Train XGBoost classifier
Batch scoring pipeline
Incorporate user feedback and LLMs for labeling: Begin collecting explicit user feedback on connection relevance and experiment with using large language models (LLMs) to assist in labeling ambiguous or large-scale data.

Week 3-4: Production

Optimize database schema/indexes
Real-time scoring API
Deploy with monitoring
A/B test vs current system
Expand user feedback and LLM labeling: Integrate user feedback loops into the product and use LLMs to continuously improve label quality and coverage.

Architecture Diagram

Bottom Line: Simple, scalable system that works for all users. Binary classification trained on clear heuristics outputs nuanced 0-1 scores perfect for ranking. Gets us to production fast while building foundation for future improvements.

Slyracoon23/affinity_scoring.md