A Topological–Relational Paradigm for Post-Vector Data Management — CookiX Reference Implementation on Dynamic Graph Manifolds with Sheaf-Theoretic Composition
Vector databases suffer from three fundamental limitations rooted in the geometry of flat Euclidean space: semantic gap (distance ≠ meaning), precision collapse (concentration of measure in high dimensions), and opacity (no interpretable retrieval path).
We introduce NoVectDB (Not Only Vector Database), a formal paradigm that augments — and, where appropriate, replaces — vector similarity with typed relational edges, persistent homology signatures, and sheaf-theoretic composition rules. The reference engine CookiX demonstrates a 2.4× precision improvement on relational retrieval tasks over leading vector databases while maintaining sub-linear query scaling.
The dominant pattern for grounding large language models in external knowledge is Retrieval-Augmented Generation (RAG), which relies almost exclusively on vector databases: systems that embed text into \(\mathbb{R}^n\) and retrieve neighbours by Euclidean or cosine distance. Despite widespread adoption, this approach has three well-documented failure modes.
As the embedding dimension \(n \to \infty\), all pairwise distances converge to \(\sqrt{2}\) in probability:
\[\Pr\!\left[\left|\|X_i - X_j\| - \sqrt{2}\right| > \varepsilon\right] \leq 2\exp\!\left(-\frac{cn\varepsilon^2}{4}\right)\]The ratio \(d_{\max}/d_{\min} \to 1\). Every document is equally distant from every other — precision collapses.
For any non-symmetric, non-transitive relation \(\rho\), a semantic gap exists: no single metric can faithfully represent it. A metric space is always symmetric (\(d(a,b)=d(b,a)\)) and obeys the triangle inequality — but knowledge relations are not.
The verb "prevents" is directed and cannot be encoded by any inner product or \(\ell_p\) norm.
Vector retrieval returns a scalar distance. There is no path, no justification, no typed connection. When a RAG system answers incorrectly, diagnosing why a particular chunk was retrieved is intractable — one can only inspect floating-point inner products.
score: 0.872
chunk_id: 4f2a…
// why? unknown
umbrella →[prevents] rain →[wets] coat path_score: 0.97
Five concepts: raincoat, rain, water, umbrella, storm. A sentence-transformer maps them to nearby points in \(\mathbb{R}^n\) — all with similar cosine distance to the query "What prevents the coat from getting wet?"
The correct answer — umbrella — requires understanding a directed causal chain. No inner product can encode the verb prevents.
The black node (umbrella) is identified by deterministic typed-edge traversal — not proximity. Vector retrieval cannot distinguish this causal chain from irrelevant neighbours with similar embeddings.
The atomic unit of NoVectDB storage is the Knowledge Object — a quadruple that captures every dimension of meaning:
causes, is_a, part_of, prevents, contradicts, example_of…For two Knowledge Objects \(\mathcal{K}_a, \mathcal{K}_b\) in the Dynamic Graph Manifold \(\mathcal{M}\), the composite distance blends three orthogonal signals:
The coefficients must satisfy \(\alpha + \beta + \gamma = 1\). Drag any slider — the others auto-adjust.
(\(\mathcal{K},\, d_{\text{NoVectDB}}\)) is a quasi-metric space: \(d(K_a,K_a)=0\), \(d(K_a,K_b)\geq 0\), and the triangle inequality holds up to a bounded sheaf consistency error \(\varepsilon\). When the sheaf is globally consistent, \(\varepsilon = 0\) and the space is a true metric space.
Persistent homology studies the shape of data across multiple scales. Given a filtration of simplicial complexes built from the local neighbourhood graph, it tracks which topological features (connected components, loops, voids) are born and die as the scale parameter \(\varepsilon\) grows.
where \(\mathrm{VR}\) is the Vietoris–Rips complex on the \(r\)-hop neighbourhood, and \(\mathrm{Vectorise}\) maps persistence barcodes to stable vectors via persistence landscapes.
As \(\varepsilon\) grows, edges form between nearby nodes. Loops (1-cycles) appear and disappear — their persistence \(= (\text{birth}, \text{death})\) encodes topological structure.
Adding a single edge of weight \(w_{\max}\) changes the signature by at most \(C \cdot w_{\max}\). The topological signature is provably stable under small graph perturbations.
A cellular sheaf on the knowledge graph assigns a vector space (a stalk) to each Knowledge Object and a linear map to each relation edge. These maps encode how meaning transforms as you traverse a relation.
The sheaf Laplacian \(\mathcal{L}_\mathcal{F} = B_\mathcal{F}^\top B_\mathcal{F}\) captures whether local sections agree globally. Its kernel encodes globally consistent interpretations.
\(\dim \ker(\mathcal{L}_\mathcal{F})\) equals the number of independent globally consistent interpretations. This gives NoVectDB a principled answer to the question: "Do these facts agree?"
The composition residual measures how well \(\mathcal{K}_a\)'s semantics arrive at \(\mathcal{K}_b\) via path \(\pi\). Small residual = high semantic compatibility along the relational chain.
Explore the two core mathematical structures of NoVectDB in three dimensions. Drag to orbit, scroll to zoom.
A cellular sheaf assigns a vector space (stalk) to each Knowledge Object and a linear map (restriction) to each edge. The wireframe cube at each node represents its local semantic frame \(\mathcal{F}(v) \cong \mathbb{R}^3\). Hover a node to highlight its stalk and outgoing restriction maps.
NoVectDB retrieval is a deterministic multi-stage process. Click Next Step to walk through Algorithm 1.
where \(|V_{\text{local}}|, |E_{\text{local}}|\) are the sizes of the explored subgraph and \(l\) is the landmark count for PH computation (typically \(l=50\), costing only \(\sim 1.25 \times 10^5\) operations).
Evaluated on 500 relational queries across three task classes over a technical document corpus (industrial pipe specifications, medical ontologies, legal case chains).
| System | Task A Single-hop | Task B Multi-hop | Task C Contradiction | Avg |
|---|---|---|---|---|
| Chroma | 0.72 | 0.31 | 0.28 | 0.437 |
| Pinecone | 0.74 | 0.33 | 0.30 | 0.457 |
| GraphRAG | 0.78 | 0.52 | 0.45 | 0.583 |
| CookiX | 0.91 | 0.82 | 0.76 | 0.830 |
CookiX: 2.4× improvement on Task B · 2.6× on Task C vs vector-only baselines
| System | p₅₀ (ms) | p₉₉ (ms) |
|---|---|---|
| Chroma | 12 | 45 |
| Pinecone | 8 | 32 |
| CookiX | 18 | 67 |
CookiX trades modest latency overhead (PH computation) for substantially higher precision. Graph traversal itself adds negligible cost.
CookiX is the reference implementation of NoVectDB — a document-oriented topological database analogous to how MongoDB realised the NoSQL paradigm. Five core subsystems:
db = cookix.connect("mydb")
db.insert({"text": "...", "edges": [...]})
results = db.query(
"Is A compatible with B?",
k=5, mode="reasoning"
)
How NoVectDB positions against the landscape of existing retrieval paradigms:
| Property | VectorDB | GraphDB | GraphRAG | NoVectDB |
|---|---|---|---|---|
| Typed relations | × | ✓ | partial | ✓ |
| Topological signature | × | × | × | ✓ |
| Sheaf composition | × | × | × | ✓ |
| Interpretable path | × | ✓ | partial | ✓ |
| Precision collapse | yes | N/A | yes | immune |
| Multi-hop reasoning | weak | strong | medium | strong |
| Sub-linear ANN | ✓ | × | ✓ | ✓ |
| Document-oriented | ✓ | × | × | ✓ |
NoVectDB presents a mathematically grounded paradigm for post-vector data management. By combining typed relational edges, persistent homology signatures, and sheaf-theoretic composition, it overcomes the fundamental limitations of flat metric spaces:
Just as NoSQL spawned MongoDB, Cassandra, and Neo4j, NoVectDB could inspire specialised engines for legal reasoning, medical ontologies, and engineering knowledge bases. CookiX is the first, not the last.
"Stop measuring distances.
Start understanding adjacency."
Ahmed Hafdi · NoVectDB / CookiX · February 2026