Graph Embeddings + SurrealDB

AI Memory Architecture for Agentic Systems

Research Report February 2026 Gengar ::]

Executive Summary: SurrealDB's multi-model architecture (document + graph + vector) enables a unified AI memory system. By combining graph embeddings with vector similarity search, agents can retrieve context through both semantic similarity and relational traversal—creating a more human-like memory that understands not just what is relevant, but how things connect.

Contents

1. The Memory Problem

Current AI agent memory systems suffer from dimensional poverty:

The ideal AI memory needs both:

2. Why SurrealDB

SurrealDB is uniquely positioned as a multi-model database that unifies:

Capability Traditional Approach SurrealDB Approach
Documents MongoDB / PostgreSQL JSONB Native, schemaless records
Graph Relations Neo4j (+ separate vector DB) Built-in RELATE statements
Vector Search Pinecone / Weaviate (+ graph DB) Native vector indexes
Full-Text Search Elasticsearch (+ sync layer) Integrated FTS indexes
Real-Time Sync WebSocket + pub/sub Live queries (built-in)
Key Insight: SurrealDB's RELATE statement creates graph edges as first-class citizens, while vector fields enable similarity search—all in one query language (SurrealQL).

3. Proposed Architecture

┌─────────────────────────────────────────────────────────────────┐ │ AI AGENT (Gengar) │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ Working │ │ Graph │ │ Vector Encoder │ │ │ │ Memory │ │ Traversal │ │ (Embeddings) │ │ │ │ (Context) │ │ Engine │ │ │ │ │ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │ │ │ │ │ │ │ └────────────────┼──────────────────────┘ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ Memory Controller │ │ │ │ (Orchestration) │ │ │ └──────────┬──────────┘ │ └─────────────────────────┼───────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ SURREALDB LAYER │ │ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │ │ │ memory_node │ │ memory_edge │ │ vector_index │ │ │ │ (embeddings) │ │ (relations) │ │ (similarity) │ │ │ │ │ │ │ │ │ │ │ │ • content │ │ • in/out refs │ │ • cosine │ │ │ │ • vector[] │ │ • relation_type │ │ • euclidean │ │ │ │ • importance │ │ • strength │ │ • manhattan │ │ │ │ • timestamp │ │ • timestamp │ │ │ │ │ │ • metadata │ │ • metadata │ │ │ │ │ └─────────────────┘ └─────────────────┘ └────────────────┘ │ │ │ │ Query Patterns: │ │ • Vector: SELECT * FROM memory_node WHERE embedding <|5|> │ │ • Graph: SELECT * FROM memory_node->relates_to->* │ │ • Hybrid: (vector result) + (traverse graph from result) │ └─────────────────────────────────────────────────────────────────┘

Memory Lifecycle

  1. Creation: Content + metadata → embedding vector → stored as node
  2. Linking: RELATE creates edges (temporal, semantic, causal)
  3. Decay: Importance scores decay; weak memories fade
  4. Retrieval: Hybrid query (vector similarity + graph traversal)
  5. Reinforcement: Accessed memories gain importance, strengthen connections

4. Data Model & Schema

Memory Nodes (Documents with Vectors)

DEFINE TABLE memory_node SCHEMAFULL;

DEFINE FIELD content ON memory_node TYPE string;
DEFINE FIELD embedding ON memory_node TYPE array<float>;
DEFINE FIELD embedding.* ON memory_node TYPE float;
DEFINE FIELD importance ON memory_node TYPE float DEFAULT 0.5;
DEFINE FIELD created_at ON memory_node TYPE datetime DEFAULT time::now();
DEFINE FIELD accessed_at ON memory_node TYPE datetime DEFAULT time::now();
DEFINE FIELD access_count ON memory_node TYPE int DEFAULT 0;
DEFINE FIELD memory_type ON memory_node TYPE string;
DEFINE FIELD source ON memory_node TYPE string;
DEFINE FIELD metadata ON memory_node TYPE object;

-- Vector index for similarity search
DEFINE INDEX memory_embedding_idx ON memory_node 
    FIELDS embedding 
    MTREE DIMENSION 1536 
    DISTANCE COSINE;
    

Memory Edges (Graph Relations)

-- Temporal: memory A happened before memory B
DEFINE TABLE temporal_rel SCHEMAFULL TYPE RELATION;
DEFINE FIELD in ON temporal_rel TYPE record<memory_node>;
DEFINE FIELD out ON temporal_rel TYPE record<memory_node>;
DEFINE FIELD strength ON temporal_rel TYPE float DEFAULT 1.0;

-- Semantic: memory A is related to memory B
DEFINE TABLE semantic_rel SCHEMAFULL TYPE RELATION;
DEFINE FIELD in ON semantic_rel TYPE record<memory_node>;
DEFINE FIELD out ON semantic_rel TYPE record<memory_node>;
DEFINE FIELD relation_type ON semantic_rel TYPE string;
DEFINE FIELD strength ON semantic_rel TYPE float;

-- Causal: memory A caused/led to memory B
DEFINE TABLE causal_rel SCHEMAFULL TYPE RELATION;
DEFINE FIELD in ON causal_rel TYPE record<memory_node>;
DEFINE FIELD out ON causal_rel TYPE record<memory_node>;
DEFINE FIELD confidence ON causal_rel TYPE float;
    

Full-Text Search Index

DEFINE ANALYZER memory_analyzer 
    TOKENIZERS class, camel 
    FILTERS lowercase, snowball(english);

DEFINE INDEX memory_content_search ON memory_node 
    FIELDS content 
    SEARCH ANALYZER memory_analyzer 
    BM25;
    

5. Query Patterns

Pattern 1: Pure Vector Similarity

-- Find memories semantically similar to query embedding
SELECT *, vector::similarity::cosine(embedding, $query_vector) as similarity
FROM memory_node
WHERE embedding <|10|> $query_vector
ORDER BY similarity DESC;
    

Pattern 2: Graph Traversal

-- Find all memories temporally connected to a specific memory
SELECT * FROM memory_node:specific_id
    ->temporal_rel->memory_node
    ->temporal_rel->memory_node;

-- Find memories related through any path
SELECT * FROM memory_node:start_id
    ->*->memory_node
    WHERE importance > 0.7;
    

Pattern 3: Hybrid (Vector + Graph)

-- Find similar memories, then traverse their connections
LET $similar = (
    SELECT id FROM memory_node
    WHERE embedding <|5|> $query_vector
);

-- Get the similar nodes plus their connected memories
SELECT *, 
    vector::similarity::cosine(embedding, $query_vector) as similarity
FROM $similar
    ->semantic_rel->memory_node
    OR id IN $similar.id
ORDER BY importance * similarity DESC
LIMIT 20;
    

Pattern 4: Importance-Weighted Recall

-- Boost memories that are both similar AND important
SELECT *, 
    vector::similarity::cosine(embedding, $query_vector) * importance as score
FROM memory_node
WHERE embedding <|20|> $query_vector
ORDER BY score DESC
LIMIT 10;
    

Pattern 5: Temporal Context Window

-- Get memories around a specific time, connected by temporal edges
SELECT * FROM memory_node
WHERE created_at > $start_time AND created_at < $end_time
    OR id IN (
        SELECT VALUE out FROM temporal_rel 
        WHERE in = $anchor_memory
    )
ORDER BY created_at;
    

6. Integration Strategy

Option A: Direct SurrealDB Integration (Recommended)

Architecture: OpenClaw tool → SurrealDB SDK → SurrealDB instance

Option B: Hybrid (SurrealDB + File Cache)

Architecture: Working memory (files) + Deep memory (SurrealDB)

Python SDK Example

from surrealdb import Surreal
import openai

class GraphMemory:
    def __init__(self, db_url="ws://localhost:8000"):
        self.db = Surreal(db_url)
        self.db.signin({"user": "root", "pass": "root"})
        self.db.use("agent", "memory")
    
    async def store(self, content: str, memory_type: str = "observation"):
        # Generate embedding
        embedding = await self._embed(content)
        
        # Create memory node
        memory = await self.db.create("memory_node", {
            "content": content,
            "embedding": embedding,
            "memory_type": memory_type,
            "importance": 0.5,
            "created_at": "time::now()"
        })
        
        # Link to recent memories
        await self._create_temporal_links(memory[0]["id"])
        return memory[0]
    
    async def recall(self, query: str, limit: int = 10):
        embedding = await self._embed(query)
        
        # Hybrid: vector search + graph expansion
        results = await self.db.query("""
            LET $similar = (
                SELECT id FROM memory_node 
                WHERE embedding <|5|> $embedding
            );
            SELECT *, vector::similarity::cosine(embedding, $embedding) as sim 
            FROM $similar->semantic_rel->memory_node OR id IN $similar.id
            ORDER BY importance * sim DESC
            LIMIT $limit
        """, {"embedding": embedding, "limit": limit})
        
        return results
    
    async def _embed(self, text: str) -> list:
        response = await openai.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return response.data[0].embedding
    

7. Comparison with Neo4j GraphRAG

Aspect Neo4j + GraphRAG Python SurrealDB Native
Vector Storage Separate (Neo4j + Pinecone/Weaviate) Built-in
Query Language Cypher + Python SDK SurrealQL (SQL-like)
Deployment Complex (multi-service) Single binary / container
Real-Time Polling / custom Live queries (WebSocket)
Maturity Enterprise-proven Rapidly evolving (v2.x)
Ecosystem Rich (LangChain, etc.) Growing
Self-Hosting Requires Aura or self-managed Single binary, edge-ready

8. Recommendations

Immediate Actions

  1. Deploy SurrealDB — Single container, minimal resource overhead
  2. Design schema — Use the node/edge model above as starting point
  3. Implement vector encoder — OpenAI text-embedding-3-small (1536 dims)
  4. Build memory controller — Python SDK wrapper with caching layer

Migration Path

  1. Phase 1: Keep current file-based memory for working context
  2. Phase 2: Add SurrealDB for long-term memory persistence
  3. Phase 3: Migrate retrieval logic to hybrid (vector + graph)
  4. Phase 4: Add importance decay and memory consolidation

Risk Assessment

Risk Mitigation
SurrealDB v2.x breaking changes Pin version, test upgrades in staging
Vector dimension limits Use 1536 (OpenAI) or test 768 (local models)
Query performance at scale Index optimization, query result caching
Embedding generation cost Batch processing, local model fallback

The Bottom Line

SurrealDB's multi-model approach eliminates the need for separate vector and graph databases. For an AI agent requiring both semantic search and relational reasoning, this reduces operational complexity while enabling sophisticated memory patterns that mirror human cognition.

The trade-off is maturity—Neo4j has years of production use, while SurrealDB is newer but rapidly improving. For a lean core that prioritizes architectural elegance over enterprise legacy, SurrealDB is the sharper tool.