Building My Personal Knowledge Graph: Neo4j, Postgres, and LLM-Powered Extraction


After years of scattered Notion pages and disorganized research notes, I finally built what I've needed: a queryable database of everything I read and learn.
My personal knowledge graph now connects 4,736 articles to 8,291 entities across 12 domains. When I need to recall that security paper from 2023 that mentioned zero-knowledge proofs alongside homomorphic encryption, I can find it in seconds—not hours.
Architecture Decisions
Database Layer
Neo4j handles the graph structure with a schema of:
- Nodes: Person, Organization, Article, Concept, Product, Event
- Relationships: WROTE, MENTIONED_IN, WORKS_AT, RELATED_TO, SUPPORTS, OPPOSES
PostgreSQL stores:
- Raw document content with versioning
- Full-text search capabilities via
tsvector
columns - Metadata including source URL, access date, and credibility scoring
CREATE TABLE documents (
id UUID PRIMARY KEY,
content JSONB NOT NULL,
content_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', content->>'text')) STORED,
metadata JSONB,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
Extraction Pipeline
Six months of experimentation led to this workflow:
- Web scraping via Playwright for dynamic content or direct API integrations for sources I frequent
- Entity extraction using a fine-tuned Mistral 7B model that outperforms OpenAI's entity extraction for domain-specific technical concepts
- Relationship inference via Langchain custom extractors with prompt templates optimized for technical content
- Entity resolution using embedding similarity and custom rules for merging duplicate entities
The entire pipeline runs on a $15/month VM with batch processing during off-hours.
Query Interface
I built a simple React app with Cypher query templates for common research tasks:
// Find all articles mentioning both concepts with authors from a specific org
MATCH (a:Article)-[:MENTIONS]->(c1:Concept {name: "Zero-Knowledge Proofs"}),
(a)-[:MENTIONS]->(c2:Concept {name: "Homomorphic Encryption"}),
(p:Person)-[:WROTE]->(a),
(p)-[:AFFILIATED_WITH]->(o:Organization)
WHERE o.name CONTAINS "University"
RETURN a.title, p.name, o.name
ORDER BY a.published_date DESC
The system pays for itself through the time it saves—about 7 hours weekly of previous research inefficiency. Worth every minute spent building it.