Building a Hybrid Search System: Combining Vector and Lexical Search with Reciprocal Rank Fusion

Anablock
AI Insights & Innovations
April 26, 2026

Prompt-Engineering-Professional-Program-in-Kochi-Kerala

Building a Hybrid Search System: Combining Vector and Lexical Search with Reciprocal Rank Fusion

Introduction

Search is rarely a one-size-fits-all problem. Vector embeddings excel at understanding semantic meaning and context, while traditional lexical methods like BM25 shine at exact keyword matching and term frequency analysis. So why choose one when you can have both?

In this article, we'll explore how to build a hybrid search system that combines the strengths of semantic search (using vector embeddings) and lexical search (using BM25) into a unified pipeline. We'll dive into the architecture, the mathematics behind result fusion, and see real-world examples of how this approach outperforms either method alone.

The Challenge: Different Strengths, Different Weaknesses

Before we jump into the solution, let's understand why we need hybrid search in the first place.

Vector Search (Semantic) excels at:

  • Understanding synonyms and related concepts
  • Capturing contextual meaning
  • Finding conceptually similar content even with different wording

But struggles with:

  • Exact keyword matches (like product codes or IDs)
  • Rare or domain-specific terminology
  • Proper nouns and technical identifiers

BM25 (Lexical) excels at:

  • Precise keyword matching
  • Finding exact phrases and identifiers
  • Handling rare terms effectively

But struggles with:

  • Understanding semantic relationships
  • Dealing with synonyms
  • Capturing conceptual similarity

A hybrid approach lets us leverage both strengths while mitigating their individual weaknesses.

The Multi-Index Architecture

The foundation of our hybrid system is a clean, modular architecture. Both our VectorIndex and BM25Index classes implement the same interface with two core methods:

  • add_document() - Index a new document
  • search() - Query the index and return ranked results

This API consistency is crucial—it allows us to treat different search implementations as interchangeable components. Here's where the magic happens: we introduce a Retriever class that acts as a coordinator.

class Retriever:
    def __init__(self, *indexes: SearchIndex):
        if len(indexes) == 0:
            raise ValueError("At least one index must be provided")
        self._indexes = list(indexes)
    
    def add_document(self, document: Dict[str, Any]):
        """Add document to all underlying indexes"""
        for index in self._indexes:
            index.add_document(document)
    
    def search(self, query_text: str, k: int = 1, k_rrf: int = 60):
        """Search all indexes and merge results using RRF"""
        # Implementation details below...

The Retriever forwards queries to all registered indexes, collects their results, and intelligently merges them. But how do we merge results from systems that use completely different scoring mechanisms?

Understanding Reciprocal Rank Fusion (RRF)

This is where Reciprocal Rank Fusion comes in. RRF is an elegant algorithm that combines rankings from multiple sources without needing to normalize their scores.

The Problem with Score-Based Merging

You might think: "Why not just combine the scores from each index?" The problem is that scores from different systems aren't comparable:

  • Vector search might return cosine similarities between 0 and 1
  • BM25 returns relevance scores that can range from 0 to infinity
  • Different indexes might have different score distributions

Trying to normalize these scores is complex and error-prone. RRF sidesteps this entirely by focusing on rank position instead of raw scores.

The RRF Formula

The RRF score for a document is calculated as:

RRF_score(d) = Σ(1 / (k + rank_i(d)))

Where:

  • d is the document being scored
  • k is a constant (typically 60, though we'll use 1 for clearer examples)
  • rank_i(d) is the rank position of document d in the i-th ranking
  • The sum is taken across all rankings where the document appears

A Concrete Example

Let's walk through a real example. Suppose we search for "INC-2023-Q4-011" (an incident identifier) and get these results:

VectorIndex returns:

  1. Section 2 (Software Engineering)
  2. Section 7 (HR Policy)
  3. Section 6 (Marketing Campaign)

BM25Index returns:

  1. Section 6 (Marketing Campaign)
  2. Section 2 (Software Engineering)
  3. Section 7 (HR Policy)

Now we apply RRF with k=1:

Section 2:

  • Vector rank: 1 → 1/(1+1) = 0.500
  • BM25 rank: 2 → 1/(1+2) = 0.333
  • Total: 0.833

Section 7:

  • Vector rank: 2 → 1/(1+2) = 0.333
  • BM25 rank: 3 → 1/(1+3) = 0.250
  • Total: 0.583

Section 6:

  • Vector rank: 3 → 1/(1+3) = 0.250
  • BM25 rank: 1 → 1/(1+1) = 0.500
  • Total: 0.750

Final merged ranking:

  1. Section 2 (0.833) ✓
  2. Section 6 (0.750)
  3. Section 7 (0.583)

Notice how Section 2 rises to the top because it performed well in both indexes. This is the power of RRF—it rewards consensus while still considering results that appear in only one ranking.

Implementation Deep Dive

Here's the complete implementation of the search method with RRF fusion:

def search(self, query_text: str, k: int = 1, k_rrf: int = 60):
    """
    Search all indexes and merge results using Reciprocal Rank Fusion
    
    Args:
        query_text: The search query
        k: Number of results to return
        k_rrf: RRF constant (default 60, lower values give more weight to top ranks)
    
    Returns:
        List of (document, rrf_score) tuples, sorted by score descending
    """
    # Step 1: Collect results from all indexes
    all_results = []
    for index in self._indexes:
        results = index.search(query_text, k=k)
        all_results.append(results)
    
    # Step 2: Build RRF scores
    doc_scores = {}
    
    for idx, results in enumerate(all_results):
        for rank, (doc, _) in enumerate(results, start=1):
            doc_id = id(doc)  # Use object identity as key
            
            if doc_id not in doc_scores:
                doc_scores[doc_id] = {
                    'document': doc,
                    'score': 0.0
                }
            
            # Apply RRF formula: 1 / (k_rrf + rank)
            doc_scores[doc_id]['score'] += 1.0 / (k_rrf + rank)
    
    # Step 3: Sort by RRF score and return top k
    sorted_results = sorted(
        doc_scores.values(),
        key=lambda x: x['score'],
        reverse=True
    )
    
    return [(item['document'], item['score']) for item in sorted_results[:k]]

The algorithm is straightforward:

  1. Query all indexes and collect their ranked results
  2. For each document in each ranking, calculate its RRF contribution
  3. Sum contributions across all rankings
  4. Sort by final RRF score and return top k results

Real-World Performance: A Case Study

Let's revisit a problem we encountered with vector-only search. When searching for "what happened with INC-2023-Q4-011?", the vector index returned:

  1. Section 10: Cybersecurity Analysis - Incident Response Report ✓ (correct)
  2. Section 3: Financial Analysis - Q4 Revenue Breakdown ✗ (wrong - just matched "Q4")
  3. Section 2: Software Engineering - Project Phoenix ✗ (should be #2)

The vector search got confused by the "Q4" in the query and over-weighted the financial section.

With hybrid search using RRF:

  1. Section 10: Cybersecurity Analysis - Incident Response Report ✓
  2. Section 2: Software Engineering - Project Phoenix Stability Enhancements ✓
  3. Section 5: Legal Developments ✓

Much better! The BM25 index correctly identified the exact incident code "INC-2023-Q4-011" and boosted the relevant sections, while the vector index contributed semantic understanding. The fusion process combined their strengths.

Why This Architecture Matters: Extensibility

The real power of this design isn't just in combining two search methods—it's in the extensibility it provides.

Because all indexes implement the same SearchIndex protocol, you can easily add new search methodologies:

# Start with vector + BM25
retriever = Retriever(
    VectorIndex(embedding_model),
    BM25Index()
)

# Later, add a keyword index
retriever = Retriever(
    VectorIndex(embedding_model),
    BM25Index(),
    KeywordIndex()  # New!
)

# Or add domain-specific search
retriever = Retriever(
    VectorIndex(embedding_model),
    BM25Index(),
    GraphBasedIndex(),      # New!
    SpecializedDomainIndex()  # New!
)

The RRF fusion automatically incorporates any new index into the ranking process. Each search implementation stays focused and testable, while the Retriever handles the complexity of combining them.

Potential Extensions

Here are some search methods you could add to this architecture:

  • Keyword/Phrase Index: Exact phrase matching with proximity scoring
  • Graph-Based Search: Leverage document relationships and citations
  • Temporal Index: Weight results by recency or time relevance
  • Domain-Specific Index: Custom scoring for specialized fields (legal, medical, etc.)
  • Multilingual Index: Language-specific search with translation
  • Fuzzy Matching Index: Handle typos and spelling variations

Each new index brings its own strengths, and RRF ensures they all contribute fairly to the final ranking.

Tuning the k_rrf Parameter

The k_rrf parameter in the RRF formula controls how much weight is given to rank position:

  • Lower k_rrf (e.g., 1-10): Top-ranked results get much more weight, differences between ranks are amplified
  • Higher k_rrf (e.g., 60-100): More gradual weighting, lower-ranked results still contribute meaningfully

The default of 60 is a good starting point, but you might tune this based on your use case:

  • Precision-focused: Lower k_rrf to strongly favor top results
  • Recall-focused: Higher k_rrf to consider a broader range of results
  • Balanced: Stick with 60

Best Practices and Considerations

1. Index Consistency

Ensure all indexes contain the same documents. The add_document() method in Retriever handles this automatically.

2. Query Preprocessing

Consider applying the same preprocessing (lowercasing, stemming, etc.) to queries across all indexes for consistency.

3. Performance Optimization

  • Run index searches in parallel using threading or async
  • Cache frequently accessed results
  • Consider approximate nearest neighbor (ANN) indexes for large-scale vector search

4. Monitoring and Evaluation

Track which indexes contribute most to final rankings:

def search_with_attribution(self, query_text: str, k: int = 1):
    """Search and return which indexes contributed to each result"""
    # Track which indexes ranked each document
    # Useful for understanding and debugging ranking behavior

5. A/B Testing

Compare hybrid search against individual methods:

  • Measure precision@k and recall@k
  • Track user engagement metrics (click-through rate, time on page)
  • Gather qualitative feedback on result relevance

Conclusion

Hybrid search isn't just about combining two algorithms—it's about building a flexible, extensible architecture that can evolve with your needs. By:

  1. Maintaining consistent APIs across search implementations
  2. Using rank-based fusion (RRF) instead of score normalization
  3. Designing for extensibility from the start

You create a system that's greater than the sum of its parts. Vector search and BM25 each have their strengths, and RRF lets them complement each other naturally.

The modular design means you can start simple (just vector + BM25) and grow sophisticated (adding specialized indexes as needs arise) without rewriting your core search logic.

Whether you're building a document search system, a recommendation engine, or a question-answering platform, this hybrid approach provides a solid foundation that balances semantic understanding with precise matching—giving your users the best of both worlds.

Further Reading


Ready to implement hybrid search in your application? Start with a simple two-index setup and expand from there. The architecture is designed to grow with you.

Share this article:
View all articles

Related Articles

Workflows vs Agents: When to Use Each Strategy with Claude featured image
April 26, 2026
Not every task can be solved in a single Claude request. Learn when to use workflows vs agents, explore the powerful evaluator-optimizer pattern, and discover proven workflow patterns that will make you a better AI engineer.

Unlock the Full Power of AI-Driven Transformation

Schedule Demo

See how Anablock can automate and scale your business with AI.

Book Demo

Start a Support Agent

Talk directly with our AI experts and get real-time guidance.

Call Now

Send us a Message

Summarize this page content with AI