← Back to Field Log

Log Entry #3 · Nov 28

Hybrid retrieval degraded recall on long-tail queries. Time to fix it.

  • Problem: When BM25 and Vector scores were combined using RRF (Reciprocal Rank Fusion), the long-tail semantic queries dropped out of the top-K.
  • What we tried: Added a cross-encoder reranker at the final step (Cohere Rerank API).
  • Result: +18% retrieval accuracy and bounded latency.
  • Next: Caching identical queries to drop latency further.

Log Entry #2 · Nov 20

Attempted semantic chunking using spaCy boundaries.

  • Problem: Chunking by raw character count was splitting tables and code blocks down the middle.
  • What we tried: Implemented a semantic router that detects headers and keeps blocks intact.
  • Result: Context window density improved significantly.
  • Next: Implement BM25 for keyword lookup.

Log Entry #1 · Nov 15

Baseline created.

  • Goal: Set up a simple LangChain pipeline to load the documentation.
  • Result: Pipeline functional. Latency is currently 4.1s (too slow).