Bastien Guerry from Software Heritage recently nerd-sniped me with an idea for a git-remote-swh that would let you git clone from a SWHID, pulling source code directly from Software Heritage’s archive by content hash rather than by URL. Building that means writing a git remote helper, which sent me back to the gitremote-helpers docs and down the rabbit hole of how many of these things already exist. I covered remote helpers briefly in my earlier post on extending git functionality, but the protocol deserves a closer look.
A git-remote-swh would need to be an executable on your $PATH so that git invokes it when it sees a URL like swh://. The helper and git talk over stdin/stdout using a text-based line protocol. For git-remote-swh the end goal would be something like:
git clone swh://swh:1:rev:676fe44740a14c4f0e09ef4a6dc335864e1727ca;origin=https://github.com/wikimedia/mediawiki
Or using the double-colon form, which reads a bit cleaner when adding a remote:
git remote add archive swh::swh:1:rev:676fe44740a14c4f0e09ef4a6dc335864e1727ca;origin=https://github.com/wikimedia/mediawiki
The SWHID identifies a specific revision by content hash, and the origin qualifier tells the helper where to fall back if that revision isn’t in the archive yet. The helper would resolve the SWHID against Software Heritage’s archive, and if the revision isn’t archived yet, use the origin qualifier to ask Software Heritage to import it first, so the clone always comes through the archive and can be verified against the content hash. You’d end up with git clone as a content-addressed fetch primitive rather than just a URL fetch, which is an interesting building block for reproducible builds and supply chain verification.
Git opens by sending capabilities and the helper responds with what it can do: fetch, push, import, export, connect, or some combination. A SWHID helper would only need import and list since Software Heritage is a read-only archive and its API returns objects individually rather than as packfiles. import lets the helper pull snapshots, revisions, trees, and blobs via the REST API and stream them into git’s fast-import format, which is easier to implement than fetch where you’d have to reconstruct packfiles yourself for not much gain on a read-only helper. connect establishes a bidirectional pipe where git speaks its native pack protocol as if it were talking to a real git server, but that only makes sense when the remote actually speaks git’s wire protocol.
After capability negotiation, git sends list to get the remote’s refs, then issues import commands in batches. For a SWHID helper, list would resolve the SWHID against Software Heritage’s API, translate the archive’s snapshot into a ref listing, and then import would stream the objects through as fast-import data. Each batch ends with a blank line, and the helper responds with status lines like ok refs/heads/main or error refs/heads/main <reason>.
Writing a remote helper from scratch is more work than writing a git subcommand but less work than building a full git server. Most implementations are a few hundred to a few thousand lines of code, and the hardest part is mapping git’s object model onto whatever storage backend you’re targeting. Software Heritage already stores git objects natively, so a SWHID helper might be one of the easier ones to build.
Built-in
Git ships with remote helpers for its standard network transports, and they follow the same protocol as everything else below.
- git-remote-http / git-remote-https implement the smart HTTP protocol that most hosted git services use
- git-remote-ftp / git-remote-ftps fetch over FTP, though this is rarely used in practice
- git-remote-ext pipes git’s protocol through an arbitrary command, which makes it a building block for custom transports without writing a full remote helper
Cloud and object storage
- git-remote-dropbox stores git repos in Dropbox using the Dropbox API, and is one of the better documented remote helpers if you’re looking for implementation examples.
- git-remote-s3 from AWS Labs uses S3 as a serverless git server with LFS support. Written in Rust. There are several other S3-backed helpers floating around but this is the most complete.
- git-remote-codecommit provides authenticated access to AWS CodeCommit repositories without needing to configure SSH keys or manage HTTPS credentials manually.
- git-remote-rclone pushes and fetches through rclone, so it gets rclone’s 70+ cloud storage providers for free: Google Drive, Azure Blob Storage, Backblaze B2, and the rest.
Encryption
- git-remote-gcrypt encrypts an entire git repository with GPG before pushing it to any standard git remote. The remote stores only encrypted data, so you can use an untrusted host as a private git server with multiple participants sharing access through GPG’s key infrastructure.
- git-remote-encrypted takes a different approach where each git object is individually encrypted before being stored as a file in a separate git repository. The remote looks like a normal git repo full of encrypted blobs.
- git-remote-keybase was part of the Keybase client and stored encrypted git repos on Keybase’s infrastructure using the Keybase identity and key management system. Keybase was acquired by Zoom in 2020 and the service has been winding down since.
Content-addressed storage
- git-remote-ipfs maps git objects onto IPFS, storing repositories in a content-addressed merkle DAG. Written in Go using the IPFS API. Several other IPFS-based remote helpers exist (dhappy/git-remote-ipfs, git-remote-ipld, Git-IPFS-Remote-Bridge) taking slightly different approaches to the same problem.
VCS bridges
- git-remote-hg lets you clone and push to Mercurial repositories transparently using git commands, converting between the two object models on the fly using the fast-import/fast-export capabilities.
- git-remote-bzr does the same for Bazaar repositories, also by Felipe Contreras.
- git-remote-mediawiki treats a MediaWiki instance as a git remote where each wiki page becomes a file. You can clone a wiki, edit pages locally with your text editor, and push changes back. Written in Perl.
P2P and decentralised
- git-remote-gittorrent distributed git over BitTorrent, using a DHT for peer discovery and Bitcoin’s blockchain for user identity. A research prototype from 2015 that demonstrated the concept but never saw wide adoption.
- git-remote-nostr publishes git objects as Nostr events, using the relay network for distribution.
- git-remote-blossom builds on the Blossom protocol, a Nostr-adjacent system for content-addressed blob storage.
- git-remote-ssb stored repositories on Secure Scuttlebutt, a gossip-based peer-to-peer protocol where data replicates through social connections rather than central servers. Dormant since the SSB ecosystem contracted.
Transport wrappers
These don’t provide their own storage or collaboration model, they wrap existing git remotes with a different transport layer, closer in spirit to the built-in git-remote-ext than to the storage-backed helpers above.
- git-remote-tor routes git traffic through Tor hidden services, written in Rust.
Blockchain
- git-remote-gitopia pushes repositories to Gitopia, a code collaboration platform built on the Cosmos blockchain where repository metadata and access control live on-chain.
Other storage backends
- git-remote-sqlite stores git objects as rows in a SQLite database, which can then be replicated using tools like Litestream.
- git-remote-restic bridges git and restic backup repositories, inheriting restic’s encryption and support for dozens of storage backends.
- git-remote-couch stores git repos in CouchDB, gaining CouchDB’s replication and conflict resolution for free.
- git-remote-grave pushes repositories into a content-addressable store that deduplicates across multiple repos.
If I’ve missed one, reach out on Mastodon or submit a pull request on GitHub.