08.LLM in Obsidian
📅 Sat. 2026-04-11 🕐 13:18 from Gemini 3 Flash
👉 #AI #LLM_Wiki #KnowledgeManagement #RAG #Obsidian
📎 Reference 1: Building Personal Knowledge Bases with LLMs
📎 Reference 2: RAG vs. Long Context Synthesis
1. Overview
1.1. Design Intent & Pain Points
The LLM Wiki pattern addresses the "Goldfish Memory" of standard Retrieval-Augmented Generation (RAG). Traditional RAG treats every query as an isolated event, forcing the model to re-derive insights from raw chunks repeatedly. This leads to high latency, inconsistent synthesis, and a lack of intellectual compounding.
The LLM Wiki shifts the paradigm from just-in-time retrieval to ahead-of-time compilation. By treating an LLM as a persistent "Wiki Maintainer," knowledge is incrementally integrated into a structured Markdown-based repository. This solves the "Fragmented Context" problem where critical connections across disparate documents are often missed by top-k vector searches.
1.2. Key Features
-
Persistent Synthesis: New information is not just stored; it is merged into existing entity and topic pages.
-
Self-Correcting Architecture: The LLM identifies contradictions between new sources and existing wiki entries during ingestion.
-
Compounding Artifacts: Query results that provide deep analysis are saved back into the wiki, creating a feedback loop of intelligence.
-
Human-in-the-Loop IDE: Uses Obsidian as the "Integrated Development Environment" for knowledge, allowing humans to navigate the graph while the AI writes the "code" (Markdown).
-
Zero-Infrastructure Scaling: Relies on a robust
index.mdandlog.mdrather than complex vector databases for small-to-medium scales.
1.3. Use Cases
-
Academic Research: Building a living literature review where papers are cross-referenced by methodology and findings.
-
Personal Growth: Integrating journal entries, health data, and psychology notes into a cohesive "User Manual" of the self.
-
Corporate Intelligence: Transforming Slack noise and meeting transcripts into a clean, searchable internal handbook.
-
Creative Writing: Maintaining complex lore, character arcs, and world-building rules for novelists.
1.4. Competitors & Market Landscape
(1) Traditional RAG Systems (e.g., AnyDistance, Verba)
-
Market: Enterprise search and basic Q&A.
-
Technical Gap: These focus on retrieval (finding the needle) rather than synthesis (knitting the needles into a sweater). They lack statefulness.
(2) AI Note-Takers (e.g., Mem.ai, Reflect)
-
Market: Productivity-focused individuals.
-
Technical Gap: Often "black boxes." The LLM Wiki pattern prioritizes local-first, human-readable Markdown files, avoiding vendor lock-in.
(3) Agentic Frameworks (e.g., LangGraph, CrewAI)
-
Market: Developers building autonomous workflows.
-
Technical Gap: While they can power a Wiki, they are often too complex for a personal knowledge base. The LLM Wiki pattern is a middle ground—structure without the overhead of a full agent swarm.
2. Concept, Component, & Architecture
2.1. Key Concepts
(1) Incremental Ingestion
The process of reading a single source and updating all relevant nodes in the graph simultaneously, ensuring the "global" view is always current.
(2) Compiling Knowledge
Treating information like source code. Raw sources are "objects," the Wiki is the "binary," and the Schema is the "compiler flags."
(3) Associative Trails
Inspired by the Memex, these are LLM-generated links between pages that represent logical connections (e.g., "See also: X, which contradicts the premise found in Y").
2.2. Core Components
(1) The Raw Vault
An immutable directory of PDFs, Markdown clips, and images. This serves as the "Source of Truth" for auditing LLM claims.
(2) The Semantic Wiki
A collection of interlinked .md files. This is the LLM's workspace. It includes:
-
Entity Pages: Specific people, organizations, or projects.
-
Concept Pages: High-level themes or theories.
-
Synthesis Pages: Comparisons or timelines generated during queries.
(3) The Schema (AGENTS.md)
The instruction set. It defines the "Style Guide" for the LLM, such as "Always use YAML frontmatter for tags" or "Ensure every new page is linked in index.md."
(4) Navigation Files
-
index.md: A content-based map for the LLM to locate context without expensive embeddings.
-
log.md: A chronological record of system state changes.
2.3. Architecture & Design
Code snippet
graph TD
A[Raw Sources] -->|Ingest Pass| B(LLM Agent)
S[Schema/AGENTS.md] -->|Constraints| B
B -->|Update/Create| C{The Wiki}
C -->|Internal Links| C
C -->|Reference| I[index.md]
C -->|Audit| L[log.md]
U[User Query] -->|Context Search| I
I -->|Pathfinding| B
B -->|Synthesized Answer| U
U -->|Save Analysis| C
The design philosophy is Iterative Evolution. The system starts with a simple index and grows into a complex web of knowledge. The architecture favors transparency over magic; if the LLM hallucinates, the user sees it immediately in the Obsidian graph.
2.4. Eco-system
-
Obsidian: The primary UI. It provides the Graph View, Backlinks, and Dataview support.
-
Git: Provides the version control layer. The LLM's "edits" are essentially commits.
-
CLI Tools (qmd/grep): Allow the LLM to perform high-speed text processing over thousands of files when the
index.mdbecomes too large. -
Web Clippers: Tools like the Obsidian Web Clipper provide the raw data pipeline.
3. Install, Configure, Secure, & Cheatsheet
3.1. Install
(1) Environment Setup
Ensure you have a modern LLM agent interface (Claude Code, OpenAI Canvas, or a custom Python wrapper).
Bash
# Install Obsidian (MacOS)
brew install --cask obsidian
# Install Search Utilities
brew install ripgrep fzf
# Recommended: Python environment for custom scripts
python3 -m venv llm_wiki_env
source llm_wiki_env/bin_activate
pip install qmd
3.2. Configure
(1) Folder Structure
Plaintext
.
├── raw/ # Immutable sources
│ ├── articles/
│ └── assets/ # Images/attachments
├── wiki/ # LLM-controlled files
│ ├── entities/
│ ├── concepts/
│ └── index.md
├── log.md # Transaction history
└── AGENTS.md # The Schema
(2) Example AGENTS.md (The Schema)
Markdown
# Wiki Maintainer Instructions
- Role: You are a meticulous librarian and analyst.
- Formatting: Use Level 4+ headers. All pages must have YAML frontmatter.
- Linking: Every new page must have at least two inbound links.
- Contradictions: If a new source contradicts `index.md`, flag it in `log.md`.
3.3. Secure
-
Local First: Keep the vault on a local drive or a private Git repo. Avoid uploading the entire wiki to public LLM "Projects" unless privacy is secondary.
-
API Key Safety: Use environment variables for LLM providers.
-
Audit Logs: Regularly grep
log.mdfor "Contradiction" or "Error" flags to ensure the LLM isn't drifting into hallucination.
3.4. Cheatsheet
(1) Common Operations
-
Ingest:
Process raw/new_paper.pdf. Update relevant concept pages. -
Query:
Based on the wiki, what are the primary risks of [Project X]? -
Lint:
Check wiki/ for orphan pages and update index.md.
(2) Useful Shell Aliases
Bash
# View last 5 actions in the wiki
alias wlog='grep "^## \[" log.md | tail -5'
# Find pages without links
alias worphans='find wiki -name "*.md" -exec grep -L "\[\[" {} \+'
4. Bootcamp & Workshops
4.1. Training Resources
-
Obsidian Hub: Official Community Guides - Focus on Graph View and Backlinks.
-
Zettelkasten Method: Understanding "Atomic Notes" to help the LLM structure concept pages.
-
Prompt Engineering for Agents: Learn to write robust
AGENTS.mdfiles.
4.2. Troubleshooting (RCA)
(1) Hallucination in Synthesis
-
Root Cause: Temperature too high or context window overflow.
-
Fix: Reduce LLM temperature. Force the agent to cite specific line numbers from the
raw/folder.
(2) Broken Links in Obsidian
-
Root Cause: LLM renamed a file without updating the index.
-
Fix: Run a "Lint" pass:
LLM, find all wikilinks that point to non-existent files and repair them.
(3) Index Bloat
-
Root Cause: Too many files for the LLM to read the full
index.mdin one go. -
Fix: Implement a "Sub-index" strategy (e.g.,
index_entities.md,index_concepts.md) or switch toqmdfor search.
4.3. Q&A
(1) "Why not just use NotebookLM?"
NotebookLM is a silo. You can't easily export the interlinked logic to other tools, and you don't own the "file system." The LLM Wiki is platform-agnostic.
(2) "How much does this cost in API tokens?"
Ingestion is the most expensive part because the LLM may read/write 10+ files. However, querying is cheaper because the LLM only reads highly relevant, pre-synthesized wiki pages instead of raw data.
(3) "Can it handle images?"
Yes, via the Obsidian "Download attachments" workflow. Modern multimodal LLMs (GPT-4o, Gemini 1.5 Pro) can "see" these images when prompted to look at the raw/assets/ folder.
5. Implementation Script (Python/Zsh)
5.1. Automated Log Prefixing
Python
import datetime
import os
def append_log(action_type, description):
date_str = datetime.datetime.now().strftime("%Y-%m-%d")
entry = f"## [{date_str}] {action_type} | {description}\n"
with open("log.md", "a") as f:
f.write(entry)
# Example usage: append_log("ingest", "Analysis of Q1 Financials")
5.2. Content Integrity Check (Zsh)
Bash
#!/bin/zsh
# Quick check for YAML frontmatter presence
for file in wiki/**/*.md; do
if ! head -n 1 "$file" | grep -q "---"; then
echo "Warning: $file missing YAML frontmatter"
fi
done
6. Advanced Workflow: The "Deep Synthesis" Loop
6.1. The "Query-to-Page" Pipeline
(1) Initial Query
The user asks a complex question: "How does the author's view on decentralization evolve between Chapter 1 and Chapter 10?"
(2) Iterative Retrieval
The LLM looks at index.md, finds the pages for "Chapter 1", "Chapter 10", and the "Decentralization" concept page.
(3) Drafting the Synthesis
The LLM generates a comprehensive response.
(4) Committing to the Wiki
Instead of leaving the answer in the chat, the user instructs: Save this as wiki/concepts/evolution-of-decentralization.md and link it back to the chapter pages.
6.2. Handling Conflicting Information
When the LLM encounters a conflict (e.g., Source A says $X$, Source B says $Y$):
(1) Flagging
The LLM creates a "Conflict" block in the relevant Wiki page.
(2) Resolution Prompting
The LLM asks the user: I noticed a contradiction regarding [Topic]. Source B is more recent (2026) than Source A (2024). Should I deprecate the claims from Source A?
(3) Documentation
The decision is logged in log.md for future context.
7. Future Proofing & Scalability
7.1. Moving Beyond Markdown
As the wiki reaches 1,000+ nodes, the "Schema" should instruct the LLM to use more metadata-heavy formats like JSON-LD embedded in Markdown to facilitate better programmatic querying.
7.2. Multi-Agent Collaboration
One agent acts as the Reader (ingesting raw sources), while another acts as the Editor (linting the wiki and checking for stylistic consistency).
7.3. Version Control as Memory
By using Git, the user can "Time Travel" the knowledge base. If an LLM's synthesis becomes biased or goes down a wrong path, a simple git checkout restores the collective intelligence to a known good state.
Note: This structure ensures that your knowledge is not a fleeting conversation, but a growing digital garden maintained by an tireless AI gardener.
How would you like to structure your first "Schema" document to define the personality of your Wiki maintainer?