How can GitHub enable better online collaboration for software development?
At GitHub Next we’ve been exploring this question for some time: In the Collaborative Workspaces concept design, we imagined cloud workspaces that integrate different modes of development (ideation, design, coding, etc.) into a single interface, and provide realtime multiplayer collaboration in all these modes. And in the GitHub Blocks prototype, we made the GitHub repository page customizable, with interactive blocks to tailor it to your team’s development process.
We haven’t stopped thinking about these ideas, and we’re back with another exploration: Realtime GitHub, a prototype of realtime multiplayer collaboration in your repository. With Realtime GitHub, you can share a link (“meet me in this branch!”) and instantly edit repository files together with your team. It’s still GitHub, so you can still work asynchronously—pull changes from another branch and merge your changes back when you’re ready.
While our north star is the fully-integrated cloud environments envisioned in Collaborative Workspaces, we know that we can’t replicate the features of a modern development environment overnight. But we want to build something that we can use for real work, in order to learn and improve our prototype. So we’ve focused so far on workflows involving editing rich-text documents: taking meeting notes, drafting site copy, writing design documents, and so on.
Making software with a team involves a lot of writing, especially in a widely-distributed team like Next. At Next we use Google Docs for most writing tasks. We like the ease and directness of rich-text editing; and we like being able to share documents with a URL, comment on them asynchronously, and collaborate on them in realtime—e.g. using a doc as a collaborative place to take notes and ask questions during a meeting.
But there are some things we don't like:
Google Docs live in their own silo, disconnected from repos and other project materials; it can be hard to find the doc you're looking for, and your data isn't easily accessible for search, backup, or use by other programs.
There's no way to version Docs along with other files, e.g. to update documentation along with code in a PR.
Docs are great for realtime collaboration when you want that, but sometimes you want to digest comments and revise a draft without others looking over your shoulder.
Realtime GitHub provides a rich-text editor like Google Docs (missing a lot of features, but good enough for our use cases), built on the excellent ProseMirror and Tiptap projects. Documents are stored as JSON files in your project's GitHub repo—so they can be edited on a branch and included in PRs, searched along with other files, backed up, or used in other ways.
And Realtime GitHub is collaborative like Google Docs, providing realtime multiplayer editing, cursor and selection presence, and threaded comments (with emoji reactions, a critical feature 😻!). But it also supports GitHub-style async collaboration: a branch in your repository becomes a distinct "room" for multiplayer collaboration; you can work privately or with others, and merge it back to the main branch when you're ready.
There are many approaches to collaborative editing, of which CRDTs are perhaps the best known, with well-engineered implementations like Yjs and Automerge available off the shelf. For Realtime GitHub we've taken a different direction. The vision we're working toward, of fully-integrated cloud environments for development, comes with some unusual requirements:
users collaborate on an entire codebase; we want to maintain a consistent view without requiring clients to load the whole tree up front, or handle updates for parts of the tree the user isn't accessing.
we want to provide asynchronous as well as realtime collaboration; it should be cheap to branch from any edit state to try out an idea, then merge it back when you're ready.
we want to support many different types of collaborative artifacts (code, documents, diagrams, etc.), potentially with different semantics around combining edits.
we want to support straightforward integration with external tools (e.g. build systems, or AI assistants) as participants in a collaboration
Existing CRDT libraries don't seem to be designed with these requirements in mind; and some particular strengths of CRDTs (decentralization, handling a large number of collaborators) aren't requirements for us. So we've been exploring a different part of the design space to see where it leads.
Our approach takes inspiration from Git, as well as from the ProseMirror collaborative editing design, Replicache, Irmin, and others. The main idea is to think of each client as a Git clone, communicating local changes by pulling, rebasing, and pushing them to the server.
In more detail:
the state of a branch is represented by a commit with a corresponding hash (just as in ordinary Git), and the server maintains the authoritative current hash for the branch.
clients are notified when the server's branch hash changes, and pull the changes.
when a client makes a change, it applies it locally, then submits the change to the server, along with the most recent branch hash the client has (which may no longer match the server's hash).
if the server's branch hash is the same as the client's, the server applies the change, updates the branch hash, and notifies other clients of the new hash.
if the server's branch hash is different from the client's, then another client's change has slipped in; the first client must pull the second client's changes, rebase its own changes, and try again.
Git object graphs have some nice properties: Different Git objects (commits, trees, and blobs) have different hashes (with high probability), so hashes can be used as pointers. Git trees are a kind of hash tree, where a change anywhere in the tree produces a different hash for the root. And they are a kind of persistent data structure, where updating the tree produces a new tree with pointers into the old tree to the parts that haven't changed.
This produces some nice properties for our collaboration approach: We can compare trees by comparing their root hashes, so it's easy to know when a client is out of sync with the server. Clients can cache objects permanently by hash, and updated trees point to unchanged parts; so it's easy for clients to fetch only the changed parts when they're notified of a new branch hash, switch branches, or reconnect after being disconnected. Clients can lazily fetch only the parts of the tree they care about (e.g. the document a user is editing); but since the branch hash identifies the whole tree, clients know how to fetch the correct version of other parts (e.g. if the user switches documents) .
In the diagram above, clients A and B are viewing different parts of the tree (A views
notes/meeting.doc, B views
docs/design.doc); they each have a partial view of the repo on the server. When client B makes a change (shown in darker gray), it writes a new partial tree comprising a new blob, trees, and commit, and tries to update the hash for
main on the server. If it's successful (no other client has made a change since
main was at
ae4), then client A is notified of the updated hash
7a2; but since the new tree
30c still points to
3bd, client A doesn't need to update any further since it's not viewing
Since our approach is just a way of using Git, it's straightforward to implement branching and merging for async collaboration; and it's straightforward to expose a branch via the filesystem to integrate external tools.
One way Realtime GitHub diverges from Git is how it does merges and rebases: Git knows about states of a codebase, but not the changes that get it from one state to another. When you merge or rebase one branch onto another, Git compares the states of the branches and their common ancestor and reconstructs changes using diff3, which works line-by-line and doesn't consider the syntax or semantics of the file.
For ProseMirror documents, which are stored as JSON, this isn't a good approach—it works at too coarse a grain for realtime edits (e.g. edits to different parts of the same line produce a conflict), and can produce invalid JSON at the syntactic or semantic levels. ProseMirror knows about changes (transactions) and how to combine them (by rebasing) to provide conflict-free multiplayer editing.
So Realtime GitHub uses two strategies:
for fine-grained realtime changes, clients write a ProseMirror transaction, which produces a new Git state on the server, and also sends the transaction to other clients to apply to their local state (rebasing local changes if needed).
(not implemented yet) for coarse-grained changes (e.g. merging one branch into another), we do a Git-style three-way merge on the JSON structure of the document, producing a semantically valid document. Conflicts are marked as special document nodes, which are displayed in the editor UI for manual resolution.
This approach can be tailored to different datatypes depending on their semantics and supported operations.
We've been dog-fooding Realtime GitHub within Next [I'm using it to write this right now! — @jaked], but there's still a lot more to do to make it a useful and reliable tool for documents.
Beyond that, there are lots of directions to take it in pursuit of the Collaborative Workspaces vision. Here are some ideas and questions we've been mulling:
integrate the Realtime GitHub backend with other surfaces — how would it look as a VS Code plugin? could it support other experiments like Copilot Workspaces?
integrate more deeply with GitHub — would we use documents differently if they could embed data from issues, action runs, etc.?
integrate a drawing tool like tldraw for brainstorms, UI sketches, architecture diagrams, etc. — how could we use a canvas interface to organize development materials?
take ideas from GitHub Blocks to make the UI deeply customizable — how can we expose multiplayer sync as a primitive for custom UI components?
We're excited to find out where Realtime GitHub takes us 🚀!