Building My Personal Knowledge Graph: Neo4j, Postgres, and LLM-Powered Extraction

2025-03-16By Chad Linden

Building My Personal Knowledge Graph: Neo4j, Postgres, and LLM-Powered Extraction

Visualization of my personal knowledge graph showing interconnected research topics

After years of scattered Notion pages and disorganized research notes, I finally built what I've needed: a queryable database of everything I read and learn.

My personal knowledge graph now connects 4,736 articles to 8,291 entities across 12 domains. When I need to recall that security paper from 2023 that mentioned zero-knowledge proofs alongside homomorphic encryption, I can find it in seconds—not hours.

Architecture Decisions

Database Layer

Neo4j handles the graph structure with a schema of:

  • Nodes: Person, Organization, Article, Concept, Product, Event
  • Relationships: WROTE, MENTIONED_IN, WORKS_AT, RELATED_TO, SUPPORTS, OPPOSES

PostgreSQL stores:

  • Raw document content with versioning
  • Full-text search capabilities via tsvector columns
  • Metadata including source URL, access date, and credibility scoring
CREATE TABLE documents (
  id UUID PRIMARY KEY,
  content JSONB NOT NULL,
  content_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', content->>'text')) STORED,
  metadata JSONB,
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Extraction Pipeline

Six months of experimentation led to this workflow:

  1. Web scraping via Playwright for dynamic content or direct API integrations for sources I frequent
  2. Entity extraction using a fine-tuned Mistral 7B model that outperforms OpenAI's entity extraction for domain-specific technical concepts
  3. Relationship inference via Langchain custom extractors with prompt templates optimized for technical content
  4. Entity resolution using embedding similarity and custom rules for merging duplicate entities

The entire pipeline runs on a $15/month VM with batch processing during off-hours.

Query Interface

I built a simple React app with Cypher query templates for common research tasks:

// Find all articles mentioning both concepts with authors from a specific org
MATCH (a:Article)-[:MENTIONS]->(c1:Concept {name: "Zero-Knowledge Proofs"}),
      (a)-[:MENTIONS]->(c2:Concept {name: "Homomorphic Encryption"}),
      (p:Person)-[:WROTE]->(a),
      (p)-[:AFFILIATED_WITH]->(o:Organization)
WHERE o.name CONTAINS "University"
RETURN a.title, p.name, o.name
ORDER BY a.published_date DESC

The system pays for itself through the time it saves—about 7 hours weekly of previous research inefficiency. Worth every minute spent building it.