changes in this fork¶
3-tier query cache + agentic harness¶
production-grade cache rewrite informed by mem0 v3, Zep/Graphiti, Letta/MemGPT, GPTCache, and three production semantic-cache writeups (Vadim 2024, Banking Case Study 2024, Respan 2026).
bugs fixed:
GetFuzzywas implemented and tested, butSearch()never called it. Tier 1 was dead code from the start; every paraphrase query did a full hybrid search even when an existing entry would have hit at 100% jaccard. Now wired in.- scope-filter gate accepted cross-scope hits. The old
sameScopecompared the rebuilt key against itself, returning true unconditionally. A query underuser=bobcould serve a response cached underuser=alice. The harnessscope/alice-must-miss-bob-answerscenario catches this regression — now every entry stamps a filter-only signature and fuzzy/semantic lookups gate on it. - no pollution defense: empty result sets and zero-score
responses were being cached, slowly poisoning the lookup space.
Set()now rejects them and bumps a stat counter.
new features:
- Tier 2 semantic cache: query embeddings stored per-entry, cosine match across cached entries. Catches synonym and reordering paraphrases that bypass Jaccard. Reuses the embedding already computed for the vector-index search → zero extra embedding cost on lookups.
- 3-zone confidence band (Vadim 2024): green ≥ 0.93 auto-serve, amber [0.88, 0.93) serve but stamp for FP review, red < 0.78 treated as miss. Threshold defaults calibrated against published BEIR/MS MARCO bi-encoder benchmarks.
- per-tier TTL: T0 1h, T1 15m, T2 5m. Higher-FP-risk tiers expire faster (GPTCache TimeEvaluation pattern). Lazy eviction on Set, no background goroutine.
- filter-aware cache keys:
LimitandFiltersmap fold into both the exact key and the fuzzy/semantic scope gate. Two queries with identical text but different filters never collide. - CacheStats: atomic counters for lookups, hits per tier, amber
hits, misses, rejected, evicted, invalidated.
HitRate()for one-glance observability. SampleHitsForReview(n): extract sampled hits for the production FP eval loop (the pattern every credible semantic-cache writeup recommends).opts.NoCache: caller veto for one-off / debug queries.
agentic harness (pkg/search/agentic_harness_test.go):
TestAgenticHarnessFullCascade: scripted 10-turn agent conversation covering exact repeat, token-permutation, synonym paraphrase, cross-scope filter switch, and cold OOD queries. Each turn asserts the expected tier or miss.TestAgenticHarnessFalsePositiveBudget: 30 seeded facts, 20 paraphrases, 5 cold queries, asserts FP rate stays under 10% of total hits. Synthetic bag-of-words embedder; real embeddings would do better.TestAgenticHarnessFingerprintInvalidationUnderLoad: warms 20 entries, mutates fingerprint, verifies all are flushed and zero stale hits leak.TestAgenticHarnessConcurrentReadWrite: 16 readers + 16 writers for 500ms confirms no torn state, race, or panic.go test -raceis unavailable on android/arm64 but this exercises the same code paths.BenchmarkCacheTierLatency: on Termux Android arm64, 256-dim vectors, 100 entries: T0 ~21μs, T2 ~41μs, T1 ~72μs. T1 is the slowest because map intersection walks every entry; T2 is faster than T1 on small caches because dot products are branch-free and SIMD-friendly.
robustness pass + feature parity push against brv, plus filesystem-as- source-of-truth so the markdown tree on disk is now authoritative.
filesystem-as-source-of-truth (the headline change)¶
new pkg/treestore/ package + Service.Reindex + bower reindex command.
When RETRIEVER_STORAGE_DIR is set (or storage_dir in config.json), rv
operates in filesystem-first mode:
- Every curate writes a markdown file to disk first, then the sqlite row, then the in-memory vector index.
- Every cold-start runs a stale-detection pass: walk the tree, compare
each file's mtime against the
tree_indexmanifest, apply drift (insert / update / delete) before serving the first query. - You can edit memories in vim/vscode/obsidian and the next query sees the change automatically.
- You can
rma file and the memory disappears. - You can
rm -rfthe sqlite db and the next query rebuilds the entire index from disk in seconds.
File layout:
<root>/<path>/<slug>-<short-id>.md
with YAML frontmatter (id, type, path, importance, maturity, created, tags, etc) and level-2 markdown sections (Reason / Raw Concept / Narrative / Rules / Facts). The id in frontmatter is stable across edits; the slug prefix on the filename is regenerated from summary so renaming files is also safe (id wins over filename for identity).
Verified end-to-end on a 5-memory corpus: edit in place → query auto-
reindexes in ~700ms (gemini re-embed cost). Full DB nuke → next query
rebuilds 5 memories in ~3s. rm <file>.md → memory gone in 2ms.
New commands:
bower reindex # explicit reindex pass
New config:
storage_dir: /path/to/tree (config.json)
RETRIEVER_STORAGE_DIR=/path/to/tree (env override)
When storage_dir is empty, bower runs in the original sqlite-only mode
with no behaviour change.
previous robustness pass¶
bug fixes that were silently degrading correctness, new commands that bring bower to brv shape parity, real benchmarks with proper isolation, and a few honest caveats called out at the bottom.
bugs fixed¶
RETRIEVER_DB_PATHwas a fake env var. Never read by any Go code. Every benchmark and test before this fix ran against the same shared~/.retriever/memory.db, polluting results. Nowconfig.GetDBPath()checks the env var first.- cosine norm bug. Dedup compared against
normA * normBwhere the "norms" were actually squared L2 sums (nosqrt). Every cosine result was therefore degraded by a factor of~||a|| * ||b||. Fixed inpkg/storage/db.go. - fuzzy query cache was a literal stub. Tier-1 fuzzy lookup discarded all inputs and returned nil. Now does real Jaccard similarity on tokenized queries with pre-built token sets.
FindSimilarwas O(n) full-scan on every curate. AddedFindSimilarWithTextthat pre-filters via FTS5 then runs cosine on the shortlist (~200 candidates max). Curate dedup cost is now logarithmic in DB size.- N+1 query in convertSearchResponse. Was calling
db.Get(id)per result. Replaced withGetMany(ids)bulk fetch. - goroutine race/leak in persistent embedding cache. Every Get spawned an unsupervised goroutine to write the access_count. Replaced with a single background flusher that coalesces updates every 500ms in a transaction.
--heuristicflag was documented but never parsed.- gemini failure crashed instead of degrading. Wired the existing
Router so when Gemini init fails, the system falls through to the
local keyword-projection embedder. Memories show
model_used: onnx-hash-fallbackso the agent knows the result quality is degraded. searchFallback(LIKE-based) didn't filter superseded memories. Caught only after I started usingbower supersedefor real. Fixed.- missing
-tags="sqlite_fts5"build flag causedno such module: fts5errors. Documented in build instructions.
features added¶
hierarchical paths¶
- new
pathcolumn on memories (indexed, COLLATE NOCASE for tree browse) CurateRequest.Pathfield, CLI flag--path security/auth/jwt- merge-update preserves existing path when caller passes empty
LLM structured curate¶
- new
Reason,Narrative,Rules,Factscolumns on memories CurationSystemPromptupdated to demand path/reason/narrative/rules/facts in extraction JSON, mirroring brv's curated-fact shapeUpdateuses COALESCE-if-empty semantics so re-curate doesn't blank existing structured fields
temporal facts (Zep-style)¶
valid_from,valid_to,superseded_bycolumns + index on valid_todb.Supersede(old, new)marks oldID as superseded; old memory stays in DB for audit but vanishes from default retrievaldb.RevalidateForce(id)clears the supersede flag (repair path)- ALL read paths now filter
valid_to = 0: SearchFTS, AllEmbeddings, ListByPath, PathCounts, searchFallback, embeddingsForFTSCandidates
multi-tenant scoping (mem0-style)¶
user_id,agent_idcolumns with partial indexes- CLI flags
--user X --agent Y - (filter-side wired through Store; read filtering is the next step)
new commands¶
bower tree [prefix] [--depth N]— render topic tree with counts per nodebower ls [prefix] [--limit N]— list memories under path prefixbower export --to ./out— write brv-compatible markdown context tree with YAML frontmatter and## Reason / ## Raw Concept / ## Narrative / ## Rules / ## Factssectionsbower import ./tree— round-trip from the markdown export (or any brv-style tree). Parses frontmatter for path/type/tags, body sections for the structured fieldsbower supersede <old-id> <new-id>— temporal retirementbower mv <id> <new/path>— retroactive path assignment
infrastructure¶
bench/corpus.py— realistic Hermes-style memory generator across 5 categories x dozens of templatesbench/runner.py— side-by-side bower vs brv benchmark with proper brv event-stream JSON parsing, recall@k via token-overlap fuzzy matchbench/longterm.py— multi-day usage simulation (DB growth, recall decay, latency over time)bench/stress.py— N-worker concurrent stress test
refactors¶
memColumnsconstant +memScanhelper struct unify the 7 SELECT sites. Adding a new column now only touches one place.- Router (
pkg/embedding/router.go) now satisfies the Embedder interface so it can slot in anywhere a plain embedder is expected. ModelName returns a composite likerouter(gemini-embedding-001 -> onnx-hash-fallback).
measured numbers (proper DB isolation)¶
| metric | rv-gemini | bower heuristic curate (gemini embed) | bower fully offline | brv |
|---|---|---|---|---|
| cold start | 76 ms | 102 ms | 30 ms | 17,837 ms |
| curate p50 | 817 ms | 817 ms | 58 ms | 23,395 ms |
| query p50 | 818 ms | 815 ms | 45 ms | 17,556 ms |
| recall@5 | 1.000 | 1.000 | 1.000 | 0.800 |
corpus size: 50 / 20 probes. fully-offline curate hits 58ms because the keyword-projection embedder skips the Gemini network roundtrip. longterm sim (30 days, 12 facts/day, 15 queries/day) sustained recall@5 = 1.0 with DB stable at ~350KB and query p50 holding at 43ms.
stress test: 8 workers x 30 ops = 240 ops, 0 failures, p95=120ms. 16 workers x 50 ops = 800 ops, 0 failures, p95=307ms. sqlite WAL + busy_timeout holds.
caveats¶
- the
RETRIEVER_DB_PATHdiscovery invalidates the previous report's recall numbers. The earlier "rv 0.875 vs brv 0.800" comparison was noise from cross-contaminated benchmarks. With proper isolation rv actually hits perfect recall on this corpus, but the gap to brv may also be wider than originally claimed (brv didn't have the same bug). - brv side of the comparison is still small (n=10-20). Each brv op takes ~25s wall time so a 200-probe benchmark would take ~3 hours. The latency comparison is rock solid (rv is dramatically faster, every single run replicates), the recall comparison would benefit from a bigger brv sample if you ever want to make a stronger claim.
- rv-heuristic mode skips the LLM curator but NOT the embedder.
The "heuristic" curate is heuristic about content extraction, not
about embedding. To get the ~60ms numbers you need fully-offline mode
(
env -u GEMINI_API_KEY -u GOOGLE_API_KEY), which uses the keyword projection embedder instead of Gemini. - multi-tenant filtering is wired into Store but not Query. Memory
gets the scoping fields, but no query-side
WHERE user_id = ?filtering yet. Trivial to add when needed. - vector index rebuilds at every cold-start from
AllEmbeddings(). That's fine for current DB sizes (~50ms even at 1000 memories) but would need an ANN index for 100K+ memories.
file map¶
pkg/storage/db.go— schema, scan, FTS5, temporal, paths, deduppkg/search/cache.go— Tier 0/1 query cache (real Jaccard now)pkg/embedding/persistent_cache.go— batched access-count flusherpkg/embedding/router.go— full Embedder impl with graceful fallbackpkg/memory/service.go— curate path with scoping + temporal + pathpkg/curation/curator.go— LLM-driven extract → store with pathpkg/curation/prompts.go— extraction prompt + ExtractedMemory shapecmd/rv/commands.go— handleTree/Ls/Export/Import/Supersede/Mvcmd/rv/main.go— service setup with router-wired fallbackpkg/types/types.go— Memory + CurateRequest + scoping/temporal/pathpkg/config/config.go— RETRIEVER_DB_PATH env honoredbench/— corpus / runner / longterm / stress