Building a Hybrid Search System: Combining Vector and Lexical Search with Reciprocal Rank Fusion

Anablock

AI Insights & Innovations

April 26, 2026

Prompt-Engineering-Professional-Program-in-Kochi-Kerala

Building a Hybrid Search System: Combining Vector and Lexical Search with Reciprocal Rank Fusion

Introduction

Search is rarely a one-size-fits-all problem. Vector embeddings excel at understanding semantic meaning and context, while traditional lexical methods like BM25 shine at exact keyword matching and term frequency analysis. So why choose one when you can have both?

In this article, we'll explore how to build a hybrid search system that combines the strengths of semantic search (using vector embeddings) and lexical search (using BM25) into a unified pipeline. We'll dive into the architecture, the mathematics behind result fusion, and see real-world examples of how this approach outperforms either method alone.

The Challenge: Different Strengths, Different Weaknesses

Before we jump into the solution, let's understand why we need hybrid search in the first place.

Vector Search (Semantic) excels at:

Understanding synonyms and related concepts
Capturing contextual meaning
Finding conceptually similar content even with different wording

But struggles with:

Exact keyword matches (like product codes or IDs)
Rare or domain-specific terminology
Proper nouns and technical identifiers

BM25 (Lexical) excels at:

Precise keyword matching
Finding exact phrases and identifiers
Handling rare terms effectively

But struggles with:

Understanding semantic relationships
Dealing with synonyms
Capturing conceptual similarity

A hybrid approach lets us leverage both strengths while mitigating their individual weaknesses.

The Multi-Index Architecture

The foundation of our hybrid system is a clean, modular architecture. Both our VectorIndex and BM25Index classes implement the same interface with two core methods:

add_document() - Index a new document
search() - Query the index and return ranked results

This API consistency is crucial—it allows us to treat different search implementations as interchangeable components. Here's where the magic happens: we introduce a Retriever class that acts as a coordinator.

class Retriever:
    def __init__(self, *indexes: SearchIndex):
        if len(indexes) == 0:
            raise ValueError("At least one index must be provided")
        self._indexes = list(indexes)
    
    def add_document(self, document: Dict[str, Any]):
        """Add document to all underlying indexes"""
        for index in self._indexes:
            index.add_document(document)
    
    def search(self, query_text: str, k: int = 1, k_rrf: int = 60):
        """Search all indexes and merge results using RRF"""
        # Implementation details below...

The Retriever forwards queries to all registered indexes, collects their results, and intelligently merges them. But how do we merge results from systems that use completely different scoring mechanisms?

Understanding Reciprocal Rank Fusion (RRF)

This is where Reciprocal Rank Fusion comes in. RRF is an elegant algorithm that combines rankings from multiple sources without needing to normalize their scores.

The Problem with Score-Based Merging

You might think: "Why not just combine the scores from each index?" The problem is that scores from different systems aren't comparable:

Vector search might return cosine similarities between 0 and 1
BM25 returns relevance scores that can range from 0 to infinity
Different indexes might have different score distributions

Trying to normalize these scores is complex and error-prone. RRF sidesteps this entirely by focusing on rank position instead of raw scores.

The RRF Formula

The RRF score for a document is calculated as:

RRF_score(d) = Σ(1 / (k + rank_i(d)))

Where:

d is the document being scored
k is a constant (typically 60, though we'll use 1 for clearer examples)
rank_i(d) is the rank position of document d in the i-th ranking
The sum is taken across all rankings where the document appears

A Concrete Example

Let's walk through a real example. Suppose we search for "INC-2023-Q4-011" (an incident identifier) and get these results:

VectorIndex returns:

Section 2 (Software Engineering)
Section 7 (HR Policy)
Section 6 (Marketing Campaign)

BM25Index returns:

Section 6 (Marketing Campaign)
Section 2 (Software Engineering)
Section 7 (HR Policy)

Now we apply RRF with k=1:

Section 2:

Vector rank: 1 → 1/(1+1) = 0.500
BM25 rank: 2 → 1/(1+2) = 0.333
Total: 0.833

Section 7:

Vector rank: 2 → 1/(1+2) = 0.333
BM25 rank: 3 → 1/(1+3) = 0.250
Total: 0.583

Section 6:

Vector rank: 3 → 1/(1+3) = 0.250
BM25 rank: 1 → 1/(1+1) = 0.500
Total: 0.750

Final merged ranking:

Section 2 (0.833) ✓
Section 6 (0.750)
Section 7 (0.583)

Notice how Section 2 rises to the top because it performed well in both indexes. This is the power of RRF—it rewards consensus while still considering results that appear in only one ranking.

Implementation Deep Dive

Here's the complete implementation of the search method with RRF fusion:

def search(self, query_text: str, k: int = 1, k_rrf: int = 60):
    """
    Search all indexes and merge results using Reciprocal Rank Fusion
    
    Args:
        query_text: The search query
        k: Number of results to return
        k_rrf: RRF constant (default 60, lower values give more weight to top ranks)
    
    Returns:
        List of (document, rrf_score) tuples, sorted by score descending
    """
    # Step 1: Collect results from all indexes
    all_results = []
    for index in self._indexes:
        results = index.search(query_text, k=k)
        all_results.append(results)
    
    # Step 2: Build RRF scores
    doc_scores = {}
    
    for idx, results in enumerate(all_results):
        for rank, (doc, _) in enumerate(results, start=1):
            doc_id = id(doc)  # Use object identity as key
            
            if doc_id not in doc_scores:
                doc_scores[doc_id] = {
                    'document': doc,
                    'score': 0.0
                }
            
            # Apply RRF formula: 1 / (k_rrf + rank)
            doc_scores[doc_id]['score'] += 1.0 / (k_rrf + rank)
    
    # Step 3: Sort by RRF score and return top k
    sorted_results = sorted(
        doc_scores.values(),
        key=lambda x: x['score'],
        reverse=True
    )
    
    return [(item['document'], item['score']) for item in sorted_results[:k]]

The algorithm is straightforward:

Query all indexes and collect their ranked results
For each document in each ranking, calculate its RRF contribution
Sum contributions across all rankings
Sort by final RRF score and return top k results

Real-World Performance: A Case Study

Let's revisit a problem we encountered with vector-only search. When searching for "what happened with INC-2023-Q4-011?", the vector index returned:

Section 10: Cybersecurity Analysis - Incident Response Report ✓ (correct)
Section 3: Financial Analysis - Q4 Revenue Breakdown ✗ (wrong - just matched "Q4")
Section 2: Software Engineering - Project Phoenix ✗ (should be #2)

The vector search got confused by the "Q4" in the query and over-weighted the financial section.

With hybrid search using RRF:

Section 10: Cybersecurity Analysis - Incident Response Report ✓
Section 2: Software Engineering - Project Phoenix Stability Enhancements ✓
Section 5: Legal Developments ✓

Much better! The BM25 index correctly identified the exact incident code "INC-2023-Q4-011" and boosted the relevant sections, while the vector index contributed semantic understanding. The fusion process combined their strengths.

Why This Architecture Matters: Extensibility

The real power of this design isn't just in combining two search methods—it's in the extensibility it provides.

Because all indexes implement the same SearchIndex protocol, you can easily add new search methodologies:

# Start with vector + BM25
retriever = Retriever(
    VectorIndex(embedding_model),
    BM25Index()
)

# Later, add a keyword index
retriever = Retriever(
    VectorIndex(embedding_model),
    BM25Index(),
    KeywordIndex()  # New!
)

# Or add domain-specific search
retriever = Retriever(
    VectorIndex(embedding_model),
    BM25Index(),
    GraphBasedIndex(),      # New!
    SpecializedDomainIndex()  # New!
)

The RRF fusion automatically incorporates any new index into the ranking process. Each search implementation stays focused and testable, while the Retriever handles the complexity of combining them.

Potential Extensions

Here are some search methods you could add to this architecture:

Keyword/Phrase Index: Exact phrase matching with proximity scoring
Graph-Based Search: Leverage document relationships and citations
Temporal Index: Weight results by recency or time relevance
Domain-Specific Index: Custom scoring for specialized fields (legal, medical, etc.)
Multilingual Index: Language-specific search with translation
Fuzzy Matching Index: Handle typos and spelling variations

Each new index brings its own strengths, and RRF ensures they all contribute fairly to the final ranking.

Tuning the k_rrf Parameter

The k_rrf parameter in the RRF formula controls how much weight is given to rank position:

Lower k_rrf (e.g., 1-10): Top-ranked results get much more weight, differences between ranks are amplified
Higher k_rrf (e.g., 60-100): More gradual weighting, lower-ranked results still contribute meaningfully

The default of 60 is a good starting point, but you might tune this based on your use case:

Precision-focused: Lower k_rrf to strongly favor top results
Recall-focused: Higher k_rrf to consider a broader range of results
Balanced: Stick with 60

Best Practices and Considerations

1. Index Consistency

Ensure all indexes contain the same documents. The add_document() method in Retriever handles this automatically.

2. Query Preprocessing

Consider applying the same preprocessing (lowercasing, stemming, etc.) to queries across all indexes for consistency.

3. Performance Optimization

Run index searches in parallel using threading or async
Cache frequently accessed results
Consider approximate nearest neighbor (ANN) indexes for large-scale vector search

4. Monitoring and Evaluation

Track which indexes contribute most to final rankings:

def search_with_attribution(self, query_text: str, k: int = 1):
    """Search and return which indexes contributed to each result"""
    # Track which indexes ranked each document
    # Useful for understanding and debugging ranking behavior

5. A/B Testing

Compare hybrid search against individual methods:

Measure precision@k and recall@k
Track user engagement metrics (click-through rate, time on page)
Gather qualitative feedback on result relevance

Conclusion

Hybrid search isn't just about combining two algorithms—it's about building a flexible, extensible architecture that can evolve with your needs. By:

Maintaining consistent APIs across search implementations
Using rank-based fusion (RRF) instead of score normalization
Designing for extensibility from the start

You create a system that's greater than the sum of its parts. Vector search and BM25 each have their strengths, and RRF lets them complement each other naturally.

The modular design means you can start simple (just vector + BM25) and grow sophisticated (adding specialized indexes as needs arise) without rewriting your core search logic.

Whether you're building a document search system, a recommendation engine, or a question-answering platform, this hybrid approach provides a solid foundation that balances semantic understanding with precise matching—giving your users the best of both worlds.

Unlock the Full Power of AI-Driven Transformation

Schedule Demo

See how Anablock can automate and scale your business with AI.

Book Demo

Start a Support Agent

Talk directly with our AI experts and get real-time guidance.

Call Now

Building a Hybrid Search System: Combining Vector and Lexical Search with Reciprocal Rank Fusion

Building a Hybrid Search System: Combining Vector and Lexical Search with Reciprocal Rank Fusion

Introduction

The Challenge: Different Strengths, Different Weaknesses

The Multi-Index Architecture

Understanding Reciprocal Rank Fusion (RRF)

The Problem with Score-Based Merging

The RRF Formula

A Concrete Example

Implementation Deep Dive

Real-World Performance: A Case Study

Why This Architecture Matters: Extensibility

Potential Extensions

Tuning the k_rrf Parameter

Best Practices and Considerations

1. Index Consistency

2. Query Preprocessing

3. Performance Optimization

4. Monitoring and Evaluation

5. A/B Testing

Conclusion

Further Reading

Related Articles

Unlock the Full Power of AI-Driven Transformation

Schedule Demo

Start a Support Agent

Send us a Message