Supported Benchmarks
Theevaluation/ directory contains scripts to run standard benchmarks.
LoCoMo
LoCoMo
Tests long-context modeling capabilities.
LongMemEval
LongMemEval
Evaluates the system’s ability to recall specific details over long conversation histories.
PersonaMem
PersonaMem
Focuses on the consistency and accuracy of user profile extraction.
Running Evaluations
To run a specific benchmark:Ensure you have configured your
.env with the necessary API keys before running evaluations.