Retrieval

EverOS OSS stores memories at multiple granularities — episodes, atomic facts, user profiles, foresight, agent skills — and selects the right type at query time, not write time. All retrieval methods work across multimodal memories (text, image, audio, doc, PDF). See Multimodal Memory for details.

Retrieval Methods

EverOS OSS supports four retrieval methods with different performance and dependency trade-offs:

Method	Mechanism	Requires	Best for
`keyword`	BM25 full-text (jieba + tantivy)	Nothing	Exact terms, low latency, no model setup
`vector`	ANN embedding similarity	Embedding model	Semantic queries, paraphrased language
`hybrid` ⭐	BM25 + vector + hierarchical fusion + LLM rerank	Embedding + LLM	Recommended — best recall and precision
`agentic`	Multi-round adaptive with LLM sufficiency check	Embedding + LLM + reranker	Complex queries needing multiple retrieval rounds

Use hybrid as your default. It delivers the best results across most query types by combining keyword recall, semantic similarity, and hierarchical granularity selection. Fall back to keyword when you need zero-dependency fast lookup, or agentic for complex multi-part queries.

keyword is the only method that requires no external model — it runs entirely on the embedded LanceDB engine with local jieba tokenization. All other methods require the corresponding model providers to be configured in .env.

HYBRID: Hierarchical Retrieval

hybrid is more than a simple BM25 + vector blend. It runs a hierarchical fusion over episodes and their atomic facts:

BM25 and vector recall run concurrently — each returns a candidate pool
RRF fusion scores each candidate episode
Atomic facts are fetched for every candidate episode
Granularity competition — if a fact’s relevance score exceeds the lowest-scoring episode in the top-N, the fact replaces its parent episode in the results. The parent is evicted; orphaned facts find their parent from the pre-fusion pool
LLM rerank rescores the mixed episode + atomic_fact list by relevance to the query

The result is a ranked mix of episodes and atomic facts — the system automatically returns the right granularity for each query without any configuration.

For case and skill memory (agent track), hybrid uses LR fusion instead of hierarchical fusion, since cases and skills have no child memory types.

AGENTIC: Multi-Round Adaptive Retrieval

agentic handles queries where a single retrieval pass is insufficient:

Round 1 — initial query → cross-encoder rerank → top 10 candidates
LLM sufficiency check — is this enough to answer the query?
- If yes: return immediately
- If no: LLM generates N supplementary queries based on what is missing
Round 2 — supplementary queries run in parallel (BM25 + vector + RRF), results merged, deduped, and cross-encoder reranked

Use agentic for complex or underspecified queries where the initial recall is likely incomplete.

Multi-Granularity Coexistence

All memory types compete in the same retrieval pool. Different queries surface different granularities from the same user’s memory:

Query	Winning type	Why
”When is Jason’s birthday?”	`atomic_fact`	A specific fact — direct and precise
”What has Jason been working on lately?”	`episode`	Needs narrative context, not a single data point
”What kind of person is Jason?”	`user_profile`	Requires an aggregated portrait
”What is Jason planning this week?”	`foresight`	Forward-looking — only foresight holds this

No configuration is required. Write memory as you normally would — granularity selection happens at retrieval time.

Cross-Track Retrieval

A single search call spans all memory tracks — users/, agents/, and knowledge/ all participate. The retrieval pool assembles candidates from all relevant tracks before ranking. This means an agent answering a user’s question can surface in one call:

what this specific user has said or planned (user_profile, episode, foresight)
what the agent itself knows how to do (agent_skill)
what the system knows about the world (knowledge_entry)

Search Consistency

Search results reflect the index state, which may lag slightly behind recent writes while the Cascade Daemon processes the queue. For time-critical reads immediately after a write, call everos flush to force index sync before searching. See How Memory Works for the full consistency model.

Introduction

Getting Started

Core Concepts

Advanced

Retrieval Methods

HYBRID: Hierarchical Retrieval

AGENTIC: Multi-Round Adaptive Retrieval

Multi-Granularity Coexistence

Cross-Track Retrieval

Search Consistency

​Retrieval Methods

​HYBRID: Hierarchical Retrieval

​AGENTIC: Multi-Round Adaptive Retrieval

​Multi-Granularity Coexistence

​Cross-Track Retrieval

​Search Consistency

Retrieval Methods

HYBRID: Hierarchical Retrieval

AGENTIC: Multi-Round Adaptive Retrieval

Multi-Granularity Coexistence

Cross-Track Retrieval

Search Consistency