Repo Mind Light

Repo Mind Light is follow-up work to Repo Mind. It keeps the same core ambition as Repo Mind: giving humans and coding agents a reliable, holistic understanding of a repository, not just isolated search hits, while preserving the same quality of repository understanding in a form that is easier to operationalize.

Context - Distributed Repository Memory

Large repositories are full of information that is individually searchable but hard to assemble into a useful mental model. Source code explains only part of the story. Equally important details often live in pull requests, issues, review threads, and old comments where design intent, operational tradeoffs, and team memory accumulate over time.

Repo Mind Light is built for that broader form of understanding rather than for narrow snippet retrieval alone.

The Problem - Scattered Context

Repository questions often depend on context that is spread across code, discussions, and history. Those questions include:

architectural questions about how the repository is organized
questions about which feature is implemented by which subsystem or component
ownership and contact questions, such as who is most likely to know a part of the system well
historical questions about why something works the way it does and where that reasoning was previously discussed

One concrete example is incident response. When an incident lands, first responders need to get oriented quickly: which pull requests, issues, subsystems, and people are relevant, and what buried history matters before the investigation can even begin. Repo Mind Light helps with that, but incident response is only one example of the broader repository-understanding problem the project addresses.

Solution Approach - Focused Architecture

Dimension	Repo Mind	Repo Mind Light
Core goal	Holistic repository understanding
Indexed content	Code, docs, issues, PRs, summaries	Issues and PRs indexed locally
Code and docs access	Preprocessed as part of the wider system	Retrieved live through Blackbird
Retrieval modes	Multiple variants	GraphRAG Zero only
Deployment shape	Remote service	Standalone tool or Docker image plus local MCP server
Workflow fit	Integrable anywhere	Integrable anywhere, with a strong fit for Agentic Workflows

Repo Mind Light has a focused architecture built around three pieces:

incremental indexing of GitHub issues and pull requests into local on-disk index files
GraphRAG Zero retrieval over that indexed issue and pull request corpus at query time
live retrieval of source-code chunks and documentation files from GitHub Code Search, also known internally as Blackbird

This design keeps the system grounded in repository-wide context while making it straightforward to run as a standalone tool or embed into a larger workflow.

GraphRAG Zero is the third generation in a family of GraphRAG retrieval approaches from Microsoft Research. In Repo Mind Light it is used in a mode without cluster summarization: retrieval stays grounded in embeddings and chunk retrieval, with graph structure guiding candidate selection rather than relying on precomputed cluster summaries. The current GraphRAG Zero implementation is proprietary.

Rendering diagram...

At query time, Repo Mind Light combines the indexed issue and pull request context with live code and docs retrieval, then exposes that combined understanding through an MCP server that coding agents can call directly.

Repo Mind Light can be used as a standalone repository question-answering system. It can also be embedded naturally into Agentic Workflows, where an agent needs high-quality repository context as part of a larger task. That workflow integration is important, but it is an application of Repo Mind Light rather than its whole purpose.

In workflow settings, the integration looks like this:

a GitHub Actions job restores the most recent local index snapshot from GitHub Actions cache
Repo Mind Light refreshes that index incrementally for the current repository
the workflow starts a local MCP server from the published Docker image
an agent queries that MCP server for repository understanding and contact-finding questions

In practice, those local index files stay small. Even for repositories with thousands of issues and pull requests, the Repo Mind Light index is only around 100 MB, which fits comfortably inside GitHub Actions’ 10 GB cache budget. Because GitHub Actions cache is least recently used and index maintenance is incremental, old snapshots fall out naturally, while refreshed runs only need to process changed issues and pull requests.

This lets repository understanding travel with the workflow instead of depending on a separately managed service, while still preserving the main capability: strong answers grounded in the repository’s code, discussions, and history.

Rendering diagram...

Repo Mind Light is designed to give agents and humans the same kind of broad repository understanding explored in Repo Mind, but in a form that can run standalone or plug directly into operational and agentic workflows.

Evaluation - Internal Workflows And Agents

The most interesting evaluation for Repo Mind Light is not just whether it retrieves relevant context in the abstract, but whether it helps people and agents in live operational tasks.

One current use case is incident response for internal teams at GitHub. When incidents come in, those teams use agentic workflows backed by Repo Mind Light to mine repositories for related issues, likely contacts, code references, and historical discussions that can help them get started faster.

That feedback has been especially useful because it highlights what they actually value:

related issues that reveal prior investigation and adjacent failures
code references that anchor the problem in the current implementation
old issue comments and discussion threads that contain tribal knowledge that ordinary search often misses

That last category is particularly important. Large repositories accumulate a lot of operational memory in places that are technically searchable but practically hard to find at the right moment. Repo Mind Light is useful when it brings that hidden repository memory forward early enough to shape the response.

The same underlying capability is valuable for coding agents. Repo Mind showed that better repository context improves consistency and reliability, especially on broader multi-file tasks. Repo Mind Light is meant to preserve that same practical benefit for agents while moving to a simpler architecture that is easier to run inside real workflows.

This is where the shape of the system matters. The retrieval quality is useful, but the real test is whether the system can fit naturally into a live workflow where reliability and workflow fit matter just as much as raw retrieval quality.

Availability - Public Access

Repo Mind Light is available as a public Docker image that you can pull as ghcr.io/githubnext/repo-mind-light. The public package page is github.com/orgs/githubnext/packages/container/package/repo-mind-light. Public documentation lives at githubnext.com/projects/repo-mind-light/. That image is the main public way to use the system today.

The source repository remains private because the current GraphRAG Zero implementation is proprietary. The public Docker image exposes the practical interface to Repo Mind Light without exposing that source.

Repo Mind Light

What's it for?

Share

Stage

Published

Who made it?