← Back to Field Log
Log Entry #3 · Nov 28
Hybrid retrieval degraded recall on long-tail queries. Time to fix it.
- Problem: When BM25 and Vector scores were combined using RRF (Reciprocal Rank Fusion), the long-tail semantic queries dropped out of the top-K.
- What we tried: Added a cross-encoder reranker at the final step (Cohere Rerank API).
- Result: +18% retrieval accuracy and bounded latency.
- Next: Caching identical queries to drop latency further.
Log Entry #2 · Nov 20
Attempted semantic chunking using spaCy boundaries.
- Problem: Chunking by raw character count was splitting tables and code blocks down the middle.
- What we tried: Implemented a semantic router that detects headers and keeps blocks intact.
- Result: Context window density improved significantly.
- Next: Implement BM25 for keyword lookup.
Log Entry #1 · Nov 15
Baseline created.
- Goal: Set up a simple LangChain pipeline to load the documentation.
- Result: Pipeline functional. Latency is currently 4.1s (too slow).