Skip to main content
EverOS OSS stores memories at multiple granularities — episodes, atomic facts, user profiles, foresight, agent skills — and selects the right type at query time, not write time. All retrieval methods work across multimodal memories (text, image, audio, doc, PDF). See Multimodal Memory for details.

Retrieval Methods

EverOS OSS supports four retrieval methods with different performance and dependency trade-offs:
MethodMechanismRequiresBest for
keywordBM25 full-text (jieba + tantivy)NothingExact terms, low latency, no model setup
vectorANN embedding similarityEmbedding modelSemantic queries, paraphrased language
hybridBM25 + vector + hierarchical fusion + LLM rerankEmbedding + LLMRecommended — best recall and precision
agenticMulti-round adaptive with LLM sufficiency checkEmbedding + LLM + rerankerComplex queries needing multiple retrieval rounds
Use hybrid as your default. It delivers the best results across most query types by combining keyword recall, semantic similarity, and hierarchical granularity selection. Fall back to keyword when you need zero-dependency fast lookup, or agentic for complex multi-part queries.
keyword is the only method that requires no external model — it runs entirely on the embedded LanceDB engine with local jieba tokenization. All other methods require the corresponding model providers to be configured in .env.

HYBRID: Hierarchical Retrieval

hybrid is more than a simple BM25 + vector blend. It runs a hierarchical fusion over episodes and their atomic facts:
  1. BM25 and vector recall run concurrently — each returns a candidate pool
  2. RRF fusion scores each candidate episode
  3. Atomic facts are fetched for every candidate episode
  4. Granularity competition — if a fact’s relevance score exceeds the lowest-scoring episode in the top-N, the fact replaces its parent episode in the results. The parent is evicted; orphaned facts find their parent from the pre-fusion pool
  5. LLM rerank rescores the mixed episode + atomic_fact list by relevance to the query
The result is a ranked mix of episodes and atomic facts — the system automatically returns the right granularity for each query without any configuration.
For case and skill memory (agent track), hybrid uses LR fusion instead of hierarchical fusion, since cases and skills have no child memory types.

AGENTIC: Multi-Round Adaptive Retrieval

agentic handles queries where a single retrieval pass is insufficient:
  1. Round 1 — initial query → cross-encoder rerank → top 10 candidates
  2. LLM sufficiency check — is this enough to answer the query?
    • If yes: return immediately
    • If no: LLM generates N supplementary queries based on what is missing
  3. Round 2 — supplementary queries run in parallel (BM25 + vector + RRF), results merged, deduped, and cross-encoder reranked
Use agentic for complex or underspecified queries where the initial recall is likely incomplete.

Multi-Granularity Coexistence

All memory types compete in the same retrieval pool. Different queries surface different granularities from the same user’s memory:
QueryWinning typeWhy
”When is Jason’s birthday?”atomic_factA specific fact — direct and precise
”What has Jason been working on lately?”episodeNeeds narrative context, not a single data point
”What kind of person is Jason?”user_profileRequires an aggregated portrait
”What is Jason planning this week?”foresightForward-looking — only foresight holds this
No configuration is required. Write memory as you normally would — granularity selection happens at retrieval time.

Cross-Track Retrieval

A single search call spans all memory tracks — users/, agents/, and knowledge/ all participate. The retrieval pool assembles candidates from all relevant tracks before ranking. This means an agent answering a user’s question can surface in one call:
  • what this specific user has said or planned (user_profile, episode, foresight)
  • what the agent itself knows how to do (agent_skill)
  • what the system knows about the world (knowledge_entry)

Search Consistency

Search results reflect the index state, which may lag slightly behind recent writes while the Cascade Daemon processes the queue. For time-critical reads immediately after a write, call everos flush to force index sync before searching. See How Memory Works for the full consistency model.