Cognee vs Mazemaker
Cognee describes itself as “AI memory at scale” — an LLM-driven knowledge-graph constructor that fuses with vector search at retrieval time. We will run it on LongMemEval-S 500q. No number until the run lands.
Their architecture
Cognee runs an LLM extraction step that builds a knowledge graph (entities + relations) from the corpus. At query time, retrieval combines the graph walk with vector search over chunk embeddings. The graph-construction step is the system’s defining feature — and its main cost surface. Repo: topoteretes/cognee.
Methodology — locked
- Dataset: LongMemEval-S, all 500 questions, identical haystack split.
- Retrieval: top-k=10. Cognee uses its KG + vector hybrid; Mazemaker uses hybrid + ColBERT @ 1.5 + (optional) PPR graph traversal on its own auto-built graph.
- Cost surface: tokens spent on Cognee’s LLM-graph-construction stage are reported alongside accuracy. Mazemaker builds its graph mechanically (cosine-similarity edges + dream-cycle consolidation) at zero LLM cost.
- Latency surface: Cognee’s ingest is order-minutes per session; Mazemaker’s is order-milliseconds per turn. The benchmark reports wall-clock ingest cost separately from query latency, since they’re different operational concerns.
- Judge: identical —
substring_match. - Reproducibility: shell script in
benchmarks/external/cognee_run.sh, committed before the run.
Mazemaker reference (LongMemEval-S 500q)
| Config | R@1 | R@5 | R@10 | MRR | p50 |
|---|---|---|---|---|---|
| master baseline (hybrid) | 0.8064 | 0.9596 | 0.983 | 0.8733 | — |
| + ColBERT @ 1.5 | 0.8574 | 0.9787 | 0.9894 | 0.9114 | 56.9 ms |
Why this comparison matters
Cognee’s thesis is that LLM-built KGs produce better recall than mechanical embedding graphs. The Mazemaker thesis is that mechanical embedding edges + idle dream-cycle consolidation produce comparable recall at zero LLM-ingest cost. The benchmark answers whether either thesis survives a 500-question harness with cited tokens.
What “queued” means
- Harness exists. Methodology is locked.
- When the JSON lands, this page updates with the verified table — same shape as the Hindsight page.
- If Cognee’s numbers reverse our finding, we publish them here, unedited.