gptme-codegraph
Structural code retrieval for gptme via tree-sitter — complementary to gptme-rag (text chunks), this retrieves code structure: function/class definitions, call graphs, blast radius, and impact analysis.
Features
- 9 MCP tools:
codegraph_parse,codegraph_index,codegraph_map,codegraph_def,codegraph_callers,codegraph_callees,codegraph_refs,codegraph_blast,codegraph_impact - Multi-language symbol extraction for Python, JavaScript/TypeScript, Rust, Go, Java, C#, Ruby, C, C++, PHP, Kotlin, and Swift
- Cross-file import capture across supported languages, with strongest semantic resolution on Python
- Qualified symbol IDs (
module::Class.method) for unambiguous cross-file references - SQLite index cache — optional persistent cache for large codebases
- Blast/impact semantics split:
blast= dependency closure (what X needs),impact= what breaks if you change X - Repo map / symbol skeletons for token-cheap default codebase context
When to use
Reach for the right retrieval tool by the shape of the question, not by habit:
- codegraph — structural / symbol questions: where is
Xdefined?, who callsX?, what breaks if I changeX?, give me a repo skeleton. Use it when you care about definitions, call graphs, blast radius, or impact. - grep / ripgrep — exact strings and known patterns: a literal identifier, an error message, a config key. Fastest when you already know the text to match.
- semantic search (gptme-rag / semble) — conceptual queries where you don't know the exact tokens: how does auth work here?, where is retry logic?. Matches by meaning over text chunks.
Rule of thumb: exact text → grep; "what does this concept look like" → semantic; "how is this symbol wired" → codegraph.
Install
pip install gptme-codegraph[treesitter,mcp]
Or with uv:
uv add gptme-codegraph[treesitter,mcp]
Usage
CLI
# Extract symbols from a file
gptme-codegraph path/to/file.py parse
gptme-codegraph path/to/file.ts parse
# Who calls a function?
gptme-codegraph path/to/file.py callers my_function
# What does a function call?
gptme-codegraph path/to/file.py callees my_function
# What breaks if you change a function?
gptme-codegraph path/to/file.py impact my_function
# Where is a symbol defined?
gptme-codegraph path/to/file.py def my_function
# Show a repo-map style symbol skeleton for a directory
gptme-codegraph path/to/repo map
Committed repo-map artifact
"Analyze once, commit the graph." Generate a .gptme-codegraph-map.json that
teammates and agents can read for a repo's structural outline without re-running
the tree-sitter pipeline:
# Generate and save the artifact at <repo>/.gptme-codegraph-map.json
gptme-codegraph-commit-map path/to/repo
# Check freshness (exit 0 = fresh, 1 = stale/missing) — for pre-commit/CI gating
gptme-codegraph-commit-map path/to/repo --check
# Regenerate only if stale (use in a pre-commit hook); --force always regenerates
gptme-codegraph-commit-map path/to/repo --refresh
Freshness is keyed off a digest of supported source files (*.py, *.ts,
*.tsx, *.js, *.rs, *.go, *.java, *.cs, *.rb, *.c, *.cpp,
*.php, *.kt, *.kts, *.swift), not HEAD — so an artifact regenerated in a
pre-commit hook stays fresh after the commit that contains it lands. The default
staleness window is 1 day (--stale-after-days N to change it). The artifact is
structural only (paths, class/function names, nesting) — no source, comments, or
values — so it is safe to commit to any repo.
MCP Server
# Start the MCP server (stdio transport)
gptme-codegraph-mcp
Configure in Claude Code:
claude mcp add codegraph -- gptme-codegraph-mcp
Python API
from gptme_codegraph import (
build_call_graph,
build_cross_file_call_graph,
build_index,
extract_symbols,
impact_radius,
)
from pathlib import Path
# Single-file: extract symbols and build call graph
symbols = extract_symbols(Path("src/my_module.py"))
_callees_graph, callers_graph = build_call_graph(symbols)
# Compute impact radius: what breaks if you change this symbol?
radius = impact_radius("my_function", callers_graph, max_depth=5)
print(radius) # {"depth_0": {…}, "depth_1": {…}, …}
# Cross-file: build an index over a whole directory
index = build_index(Path("src/"))
_callees_graph, callers_graph = build_cross_file_call_graph(index, Path("src/"))
radius = impact_radius("my_module::MyClass.my_method", callers_graph, max_depth=5)
print(radius) # {"depth_0": {…}, "depth_1": {…}, …}
Status
Experimental package — Python support is the deepest path today, with broad tree-sitter extraction now wired into the same surface for common web, systems, JVM, scripting, PHP/Kotlin, and Swift codebases. Cross-file resolution remains strongest on Python; non-Python import handling is best-effort rather than fully semantic.
Namespace packages (
import google.cloud.storagewithout__init__.py) are a known v1.1 gap.