<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://nesbitt.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://nesbitt.io/" rel="alternate" type="text/html" /><updated>2026-03-05T12:16:56+00:00</updated><id>https://nesbitt.io/feed.xml</id><title type="html">Andrew Nesbitt</title><subtitle>Package management and open source metadata expert. Building Ecosyste.ms, open datasets and tools for critical open source infrastructure.</subtitle><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><entry><title type="html">Package Manager Magic Files</title><link href="https://nesbitt.io/2026/03/05/package-manager-magic-files.html" rel="alternate" type="text/html" title="Package Manager Magic Files" /><published>2026-03-05T10:00:00+00:00</published><updated>2026-03-05T10:00:00+00:00</updated><id>https://nesbitt.io/2026/03/05/package-manager-magic-files</id><content type="html" xml:base="https://nesbitt.io/2026/03/05/package-manager-magic-files.html"><![CDATA[<p>A follow-up to my post on <a href="/2026/02/05/git-magic-files.html">git’s magic files</a>. Most package managers have a manifest and a lockfile, and most developers stop there. But across the ecosystems I track on <a href="https://ecosyste.ms">ecosyste.ms</a>, package managers check for dozens of other files beyond the manifest and lockfile, controlling where packages come from, what gets published, how versions resolve, and what code runs during installation. These files tend to be poorly documented, inconsistently named, and useful once you know they exist.</p>

<h3 id="configuration">Configuration</h3>

<p>Registry URLs, auth tokens, proxy settings, cache behavior. Every package manager has a way to configure these, and they almost always live outside the manifest.</p>

<p><a href="https://docs.npmjs.com/cli/v11/configuring-npm/npmrc"><code class="language-plaintext highlighter-rouge">.npmrc</code></a> is an INI-format file that can live at the project root, in your home directory, or globally. npm and pnpm both read it. It controls the registry URL, auth tokens for private registries, proxy settings, and dozens of install behaviors like <code class="language-plaintext highlighter-rouge">legacy-peer-deps</code> and <code class="language-plaintext highlighter-rouge">engine-strict</code>. There’s a footgun here: if an <code class="language-plaintext highlighter-rouge">.npmrc</code> ends up inside a published package tarball, npm will silently apply those settings when someone installs your package in their project. Less well known are the <code class="language-plaintext highlighter-rouge">shell</code>, <code class="language-plaintext highlighter-rouge">script-shell</code>, and <code class="language-plaintext highlighter-rouge">git</code> settings, which point at arbitrary executables that npm will invoke during lifecycle scripts and git operations. <a href="https://snyk.io/blog/exploring-npm-security-vulnerabilities/">Research by Snyk and Cider Security</a> showed these as viable attack vectors: a malicious <code class="language-plaintext highlighter-rouge">.npmrc</code> committed to a repository can redirect script execution without touching <code class="language-plaintext highlighter-rouge">package.json</code> at all.</p>

<p><a href="https://yarnpkg.com/configuration/yarnrc"><code class="language-plaintext highlighter-rouge">.yarnrc.yml</code></a> replaced the INI format of Yarn Classic’s <code class="language-plaintext highlighter-rouge">.yarnrc</code>. It configures which linker to use (PnP, pnpm-style, or traditional <code class="language-plaintext highlighter-rouge">node_modules</code>), registry auth, and the <code class="language-plaintext highlighter-rouge">pnpMode</code> setting that controls how strictly Yarn enforces its dependency resolution. The <code class="language-plaintext highlighter-rouge">yarnPath</code> setting is security-sensitive: it points to a JavaScript file that Yarn will execute as its own binary, so a malicious <code class="language-plaintext highlighter-rouge">.yarnrc.yml</code> can hijack the entire package manager.</p>

<p><a href="https://bun.sh/docs/runtime/bunfig"><code class="language-plaintext highlighter-rouge">bunfig.toml</code></a> is Bun’s config file, covering registry config, install behavior, and the test runner all in one TOML file.</p>

<p><a href="https://pip.pypa.io/en/stable/topics/configuration/"><code class="language-plaintext highlighter-rouge">pip.conf</code></a> on Unix and <code class="language-plaintext highlighter-rouge">pip.ini</code> on Windows, searched at <code class="language-plaintext highlighter-rouge">~/.config/pip/pip.conf</code>, <code class="language-plaintext highlighter-rouge">~/.pip/pip.conf</code>, and <code class="language-plaintext highlighter-rouge">/etc/pip.conf</code>. The <code class="language-plaintext highlighter-rouge">PIP_CONFIG_FILE</code> environment variable can override all of these or point to <code class="language-plaintext highlighter-rouge">/dev/null</code> to disable config entirely. Malformed config files are silently ignored rather than producing errors, so you can have broken configuration for months without realizing it.</p>

<p><a href="https://docs.astral.sh/uv/concepts/configuration-files/"><code class="language-plaintext highlighter-rouge">uv.toml</code></a> or the <code class="language-plaintext highlighter-rouge">[tool.uv]</code> section in <code class="language-plaintext highlighter-rouge">pyproject.toml</code>.</p>

<p><a href="https://bundler.io/man/bundle-config.1.html"><code class="language-plaintext highlighter-rouge">.bundle/config</code></a> stores Bundler’s per-project config, created by <code class="language-plaintext highlighter-rouge">bundle config set</code>. RubyGems has its own <code class="language-plaintext highlighter-rouge">.gemrc</code> file, which Bundler deliberately ignores because it calls <code class="language-plaintext highlighter-rouge">Gem::Installer</code> directly. The credentials file at <code class="language-plaintext highlighter-rouge">~/.gem/credentials</code> must have <code class="language-plaintext highlighter-rouge">0600</code> permissions or RubyGems refuses to read it.</p>

<p><a href="https://doc.rust-lang.org/cargo/reference/config.html"><code class="language-plaintext highlighter-rouge">.cargo/config.toml</code></a> is the most interesting of the bunch because it’s hierarchical: Cargo walks up the directory tree merging config files as it goes, so you can have workspace-level settings that individual crates inherit. It controls registries, proxy settings, build targets, and command aliases. A backwards-compatibility quirk means Cargo still reads <code class="language-plaintext highlighter-rouge">.cargo/config</code> without the <code class="language-plaintext highlighter-rouge">.toml</code> extension, and if both files exist, the extensionless one wins, which is an easy way to have a stale config file shadow your actual settings.</p>

<p><a href="https://docs.conda.io/projects/conda/en/stable/user-guide/configuration/use-condarc.html"><code class="language-plaintext highlighter-rouge">.condarc</code></a> is searched at six different paths from <code class="language-plaintext highlighter-rouge">/etc/conda/.condarc</code> through <code class="language-plaintext highlighter-rouge">~/.condarc</code> to <code class="language-plaintext highlighter-rouge">$CONDA_PREFIX/.condarc</code>, plus <code class="language-plaintext highlighter-rouge">.d/</code> directories at each level for drop-in fragments, and you can put one inside a specific conda environment to configure just that environment. Every setting also has a <code class="language-plaintext highlighter-rouge">CONDA_UPPER_SNAKE_CASE</code> environment variable equivalent.</p>

<p><a href="https://maven.apache.org/settings.html"><code class="language-plaintext highlighter-rouge">~/.m2/settings.xml</code></a> holds Maven’s repositories and credentials, plus <code class="language-plaintext highlighter-rouge">~/.m2/settings-security.xml</code> stores the master password used to decrypt encrypted passwords in the main settings file. Most developers don’t know <code class="language-plaintext highlighter-rouge">settings-security.xml</code> exists. <code class="language-plaintext highlighter-rouge">.mvn/maven.config</code> holds per-project default CLI arguments (since Maven 3.9.0, each arg must be on its own line), and <code class="language-plaintext highlighter-rouge">.mvn/jvm.config</code> sets JVM options.</p>

<p><a href="https://docs.gradle.org/current/userguide/build_environment.html"><code class="language-plaintext highlighter-rouge">gradle.properties</code></a> lives at both project and user level. Init scripts in <code class="language-plaintext highlighter-rouge">~/.gradle/init.d/</code> run before every build, which is how enterprises inject internal repository configurations across all projects.</p>

<p><a href="https://getcomposer.org/doc/articles/authentication-for-private-packages.md"><code class="language-plaintext highlighter-rouge">auth.json</code></a> keeps Composer credentials separate from <code class="language-plaintext highlighter-rouge">composer.json</code> (per-project or at <code class="language-plaintext highlighter-rouge">~/.composer/auth.json</code>) so you can gitignore it.</p>

<p><a href="https://learn.microsoft.com/en-us/nuget/reference/nuget-config-file"><code class="language-plaintext highlighter-rouge">nuget.config</code></a> is XML searched hierarchically from the project directory up to the drive root, then at the user level. Like pip, malformed XML is silently ignored.</p>

<p><a href="https://docs.deno.com/runtime/fundamentals/configuration/"><code class="language-plaintext highlighter-rouge">deno.json</code></a> is both configuration and import map, controlling formatting, linting, test config, lock file behavior, and dependency imports in a single file. If you have a separate <code class="language-plaintext highlighter-rouge">import_map.json</code>, Deno reads that too, though the trend is toward folding everything into <code class="language-plaintext highlighter-rouge">deno.json</code>.</p>

<h3 id="publishing">Publishing</h3>

<p>What gets included or excluded when you publish a package. People accidentally ship secrets and accidentally omit files they need in roughly equal measure.</p>

<p><a href="https://docs.npmjs.com/cli/v11/configuring-npm/package-json#files"><code class="language-plaintext highlighter-rouge">.npmignore</code></a> works like <code class="language-plaintext highlighter-rouge">.gitignore</code> but for <code class="language-plaintext highlighter-rouge">npm pack</code> and <code class="language-plaintext highlighter-rouge">npm publish</code>. If it doesn’t exist, npm falls back to <code class="language-plaintext highlighter-rouge">.gitignore</code>. But if you create an <code class="language-plaintext highlighter-rouge">.npmignore</code>, it completely replaces <code class="language-plaintext highlighter-rouge">.gitignore</code> for packaging purposes, they are not merged. This means patterns you had in <code class="language-plaintext highlighter-rouge">.gitignore</code> to keep <code class="language-plaintext highlighter-rouge">.env</code> files or credentials out of version control no longer protect you from publishing them.
<code class="language-plaintext highlighter-rouge">npm-shrinkwrap.json</code> is identical in format to <code class="language-plaintext highlighter-rouge">package-lock.json</code> but gets included inside published tarballs. It’s the only npm lock file that travels with a published package, intended for CLI tools and daemons that want locked transitive dependencies for their consumers rather than letting the consumer’s resolver pick versions.</p>

<p><a href="https://packaging.python.org/en/latest/guides/using-manifest-in/"><code class="language-plaintext highlighter-rouge">MANIFEST.in</code></a> controls what goes into a Python source distribution using directives like <code class="language-plaintext highlighter-rouge">include</code>, <code class="language-plaintext highlighter-rouge">exclude</code>, <code class="language-plaintext highlighter-rouge">recursive-include</code>, <code class="language-plaintext highlighter-rouge">graft</code>, and <code class="language-plaintext highlighter-rouge">prune</code>. It only matters for sdists, not wheels.</p>

<p><code class="language-plaintext highlighter-rouge">.helmignore</code> controls what gets excluded when packaging a Helm chart, following <code class="language-plaintext highlighter-rouge">.gitignore</code> syntax.</p>

<h3 id="workspaces">Workspaces</h3>

<p>Monorepo topology and inter-package relationships. The JavaScript ecosystem has the most options here, which probably says something about the JavaScript ecosystem.</p>

<p><a href="https://pnpm.io/pnpm-workspace_yaml"><code class="language-plaintext highlighter-rouge">pnpm-workspace.yaml</code></a> defines workspace membership with a <code class="language-plaintext highlighter-rouge">packages:</code> field. Where npm and Yarn put this in a <code class="language-plaintext highlighter-rouge">workspaces</code> field in <code class="language-plaintext highlighter-rouge">package.json</code>, pnpm requires a separate file.</p>

<p><code class="language-plaintext highlighter-rouge">lerna.json</code> handles versioning and publishing across workspace packages, though Lerna’s remaining value is mostly the publishing workflow (changelogs, version bumps). <code class="language-plaintext highlighter-rouge">nx.json</code> and <code class="language-plaintext highlighter-rouge">turbo.json</code> configure task pipelines and caching for Nx and Turborepo monorepo builds.</p>

<p><a href="https://go.dev/ref/mod#workspaces"><code class="language-plaintext highlighter-rouge">go.work</code></a> (added in Go 1.18) lists <code class="language-plaintext highlighter-rouge">use</code> directives pointing to local module directories so you can develop across multiple modules without <code class="language-plaintext highlighter-rouge">replace</code> directives scattered through your <code class="language-plaintext highlighter-rouge">go.mod</code> files. It generates a companion <code class="language-plaintext highlighter-rouge">go.work.sum</code> checksum file.</p>

<p><code class="language-plaintext highlighter-rouge">settings.gradle</code> / <code class="language-plaintext highlighter-rouge">settings.gradle.kts</code> declares all Gradle subprojects with <code class="language-plaintext highlighter-rouge">include</code> statements and is mandatory for multi-project builds. Maven uses <code class="language-plaintext highlighter-rouge">&lt;modules&gt;</code> in a parent <code class="language-plaintext highlighter-rouge">pom.xml</code>.</p>

<h3 id="overrides-and-resolution">Overrides and resolution</h3>

<p>When a transitive dependency has a bug or a security vulnerability and you can’t wait for every package in the chain to release an update, override files let you force a specific version or patch a package in place. Most developers don’t know these mechanisms exist and spend hours working around dependency conflicts that a single config line would fix.</p>

<p>In the JavaScript ecosystem, npm has <a href="https://docs.npmjs.com/cli/v11/configuring-npm/package-json#overrides"><code class="language-plaintext highlighter-rouge">overrides</code></a>, Yarn has <a href="https://yarnpkg.com/configuration/manifest#resolutions"><code class="language-plaintext highlighter-rouge">resolutions</code></a>, and pnpm has <a href="https://pnpm.io/package_json#pnpmoverrides"><code class="language-plaintext highlighter-rouge">pnpm.overrides</code></a>, all fields in <code class="language-plaintext highlighter-rouge">package.json</code> that force specific versions of transitive dependencies. Yarn Berry and pnpm also support patching dependencies in place: Yarn’s <code class="language-plaintext highlighter-rouge">patch:</code> protocol stores diff files in <code class="language-plaintext highlighter-rouge">.yarn/patches/</code>, and pnpm’s <code class="language-plaintext highlighter-rouge">pnpm.patchedDependencies</code> references diffs in a <code class="language-plaintext highlighter-rouge">patches/</code> directory, built into the workflow via <code class="language-plaintext highlighter-rouge">pnpm patch</code> and <code class="language-plaintext highlighter-rouge">pnpm patch-commit</code>.</p>

<p><a href="https://pnpm.io/pnpmfile"><code class="language-plaintext highlighter-rouge">.pnpmfile.cjs</code></a> goes further than any of these: the <code class="language-plaintext highlighter-rouge">readPackage</code> hook lets you programmatically rewrite any package’s <code class="language-plaintext highlighter-rouge">package.json</code> at install time, and <code class="language-plaintext highlighter-rouge">afterAllResolved</code> can modify the lockfile after resolution. It’s the nuclear option for dependency problems, living next to the lockfile and running before anything gets installed.</p>

<p><a href="https://pip.pypa.io/en/stable/user_guide/#constraints-files"><code class="language-plaintext highlighter-rouge">constraints.txt</code></a> is used via <code class="language-plaintext highlighter-rouge">pip install -c constraints.txt</code> to pin versions of packages without triggering their installation. It’s been available since pip 7.1, yet almost nobody uses it despite being exactly what large organizations need for base image management and reproducible environments. uv has <code class="language-plaintext highlighter-rouge">override-dependencies</code> in <code class="language-plaintext highlighter-rouge">[tool.uv]</code> for the same purpose with better ergonomics.</p>

<p><a href="https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management"><code class="language-plaintext highlighter-rouge">Directory.Packages.props</code></a> is worth knowing about if you work in .NET. NuGet’s Central Package Management (6.4+) lets you put a single file at the repo root that sets <code class="language-plaintext highlighter-rouge">&lt;PackageVersion&gt;</code> for all projects, so individual <code class="language-plaintext highlighter-rouge">.csproj</code> files use <code class="language-plaintext highlighter-rouge">&lt;PackageReference&gt;</code> without version numbers. It eliminates version drift across large solutions and is one of the better implementations of centralized version management I’ve seen. <code class="language-plaintext highlighter-rouge">Directory.Build.props</code> can inject shared package references into all projects too.</p>

<p><a href="https://docs.gradle.org/current/userguide/version_catalogs.html"><code class="language-plaintext highlighter-rouge">gradle/libs.versions.toml</code></a> is Gradle’s version catalog, with sections for <code class="language-plaintext highlighter-rouge">[versions]</code>, <code class="language-plaintext highlighter-rouge">[libraries]</code>, <code class="language-plaintext highlighter-rouge">[bundles]</code>, and <code class="language-plaintext highlighter-rouge">[plugins]</code>, referenced in build files as typed accessors like <code class="language-plaintext highlighter-rouge">libs.someLibrary</code>.</p>

<p><code class="language-plaintext highlighter-rouge">cabal.project</code> supports <code class="language-plaintext highlighter-rouge">constraints:</code> stanzas for pinning transitive Haskell deps, and <code class="language-plaintext highlighter-rouge">cabal.project.freeze</code> locks everything down.</p>

<h3 id="vendoring-and-integrity">Vendoring and integrity</h3>

<p>Beyond lockfiles, some package managers support vendoring all dependency source code into the repository and tracking its integrity.</p>

<p><code class="language-plaintext highlighter-rouge">.cargo-checksum.json</code> lives in each vendored crate directory after running <a href="https://doc.rust-lang.org/cargo/commands/cargo-vendor.html"><code class="language-plaintext highlighter-rouge">cargo vendor</code></a>, containing the SHA256 of the original tarball and per-file checksums. If you need to patch vendored source (which you sometimes do for air-gapped builds), setting <code class="language-plaintext highlighter-rouge">"files": {}</code> in the checksum file disables integrity checking for that crate, which is the known workaround and also completely defeats the purpose of the checksums.</p>

<p><a href="https://go.dev/ref/mod#private-modules"><code class="language-plaintext highlighter-rouge">GONOSUMCHECK</code> and <code class="language-plaintext highlighter-rouge">GONOSUMDB</code></a> are Go environment variables that bypass the checksum database for private modules, which is how enterprises use Go modules without leaking internal module paths to Google’s infrastructure. Go’s <code class="language-plaintext highlighter-rouge">vendor/modules.txt</code> (generated by <code class="language-plaintext highlighter-rouge">go mod vendor</code>) lists vendored packages and their module versions, and the Go toolchain verifies it matches <code class="language-plaintext highlighter-rouge">go.mod</code>. If your repo has a <code class="language-plaintext highlighter-rouge">vendor/</code> directory and <code class="language-plaintext highlighter-rouge">go.mod</code> specifies Go 1.14+, vendoring is automatically enabled without any flag, which surprises people who have a stale vendor directory they forgot about.</p>

<p><code class="language-plaintext highlighter-rouge">.yarn/cache/</code> and <code class="language-plaintext highlighter-rouge">.pnp.cjs</code> make up Yarn Berry’s zero-install setup: compressed zip archives of every dependency and the Plug’n’Play loader mapping package names to zip locations, both committed to version control. After <code class="language-plaintext highlighter-rouge">git clone</code>, the project works without running <code class="language-plaintext highlighter-rouge">yarn install</code>, though your repository size will grow substantially.</p>

<p><a href="https://developer.hashicorp.com/terraform/language/files/dependency-lock"><code class="language-plaintext highlighter-rouge">.terraform.lock.hcl</code></a> records Terraform provider version locks with platform-specific hashes, which means a lock file generated on macOS may fail verification on Linux CI unless you’ve run <code class="language-plaintext highlighter-rouge">terraform providers lock</code> for multiple platforms.</p>

<h3 id="hooks-and-scripts">Hooks and scripts</h3>

<p>Lifecycle scripts that run during install, build, or publish. Supply chain attacks often hide here, but so does a lot of useful automation.</p>

<p><a href="https://pnpm.io/pnpmfile"><code class="language-plaintext highlighter-rouge">.pnpmfile.cjs</code></a> isn’t just for overrides. pnpm’s hooks API includes <code class="language-plaintext highlighter-rouge">readPackage</code> for rewriting manifests, <code class="language-plaintext highlighter-rouge">afterAllResolved</code> for modifying the resolved lockfile, and custom fetchers for alternative package fetching logic.</p>

<p><code class="language-plaintext highlighter-rouge">.yarn/plugins/</code> contains committed plugin files that hook into Yarn Berry’s lifecycle. <code class="language-plaintext highlighter-rouge">.yarn/sdks/</code> holds editor integration files generated by <code class="language-plaintext highlighter-rouge">@yarnpkg/sdks</code> to make PnP work with IDEs.</p>

<p><code class="language-plaintext highlighter-rouge">.mvn/extensions.xml</code> loads Maven extensions that hook into the build lifecycle before anything else runs. Gradle’s init scripts in <code class="language-plaintext highlighter-rouge">~/.gradle/init.d/</code> execute before every build and can inject repositories, apply plugins, or configure all projects. Cargo’s <code class="language-plaintext highlighter-rouge">build.rs</code> is a build script that runs before compilation, generating code, linking native libraries, or setting cfg flags. Go’s <code class="language-plaintext highlighter-rouge">//go:generate</code> directives in source files run via <code class="language-plaintext highlighter-rouge">go generate</code> for code generation, though they’re not part of the build itself.</p>

<hr />

<p>I’ll keep updating this post as I find more. If you know of package manager magic files I’ve missed or have corrections, reach out on <a href="https://mastodon.social/@andrewnez">Mastodon</a> or submit a pull request on <a href="https://github.com/andrew/nesbitt.io">GitHub</a>.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="package-managers" /><category term="reference" /><summary type="html"><![CDATA[Package manager magic files and where to find them: .npmrc, MANIFEST.in, Directory.Packages.props, .pnpmfile.cjs, and more.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Package Managers Need to Cool Down</title><link href="https://nesbitt.io/2026/03/04/package-managers-need-to-cool-down.html" rel="alternate" type="text/html" title="Package Managers Need to Cool Down" /><published>2026-03-04T10:00:00+00:00</published><updated>2026-03-04T10:00:00+00:00</updated><id>https://nesbitt.io/2026/03/04/package-managers-need-to-cool-down</id><content type="html" xml:base="https://nesbitt.io/2026/03/04/package-managers-need-to-cool-down.html"><![CDATA[<p>This post was requested by <a href="https://sethmlarson.dev/">Seth Larson</a>, who asked if I could do a breakdown of dependency cooldowns across package managers. His framing: all tools should support a globally-configurable <code class="language-plaintext highlighter-rouge">exclude-newer-than=&lt;relative duration&gt;</code> like <code class="language-plaintext highlighter-rouge">7d</code>, to bring the response times for autonomous exploitation back into the realm of human intervention.</p>

<p>When an attacker compromises a maintainer’s credentials or takes over a dormant package, they publish a malicious version and wait for automated tooling to pull it into thousands of projects before anyone notices. William Woodruff made the case for <a href="https://blog.yossarian.net/2025/11/21/We-should-all-be-using-dependency-cooldowns">dependency cooldowns</a> in November 2025, then followed up with a <a href="https://blog.yossarian.net/2025/12/13/cooldowns-redux">redux</a> a month later: don’t install a package version until it’s been on the registry for some minimum period, giving the community and security vendors time to flag problems before your build pulls them in. Of the ten supply chain attacks he examined, eight had windows of opportunity under a week, so even a modest cooldown of seven days would have blocked most of them from reaching end users.</p>

<p>The concept goes by different names depending on the tool (<code class="language-plaintext highlighter-rouge">cooldown</code>, <code class="language-plaintext highlighter-rouge">minimumReleaseAge</code>, <code class="language-plaintext highlighter-rouge">stabilityDays</code>, <code class="language-plaintext highlighter-rouge">exclude-newer</code>) and implementations vary in whether they use rolling durations or absolute timestamps, whether they cover transitive dependencies or just direct ones, and whether security updates are exempt. But the adoption over the past year has been remarkably fast.</p>

<h3 id="javascript">JavaScript</h3>

<p>The JavaScript ecosystem moved on this faster than anyone else, with <a href="https://pnpm.io/supply-chain-security">pnpm</a> shipping <code class="language-plaintext highlighter-rouge">minimumReleaseAge</code> in version 10.16 in September 2025, covering both direct and transitive dependencies with a <code class="language-plaintext highlighter-rouge">minimumReleaseAgeExclude</code> list for packages you trust enough to skip. <a href="https://github.com/yarnpkg/berry/pull/6901">Yarn</a> shipped <code class="language-plaintext highlighter-rouge">npmMinimalAgeGate</code> in version 4.10.0 the same month (also in minutes, with <code class="language-plaintext highlighter-rouge">npmPreapprovedPackages</code> for exemptions), then <a href="https://bun.com/docs/runtime/bunfig">Bun</a> added <code class="language-plaintext highlighter-rouge">minimumReleaseAge</code> in version 1.3 in October 2025 via <code class="language-plaintext highlighter-rouge">bunfig.toml</code>. <a href="https://socket.dev/blog/npm-introduces-minimumreleaseage-and-bulk-oidc-configuration">npm</a> took longer but shipped <code class="language-plaintext highlighter-rouge">min-release-age</code> in version 11.10.0 in February 2026. <a href="https://github.com/denoland/deno/issues/30751">Deno</a> has <code class="language-plaintext highlighter-rouge">--minimum-dependency-age</code> for <code class="language-plaintext highlighter-rouge">deno update</code> and <code class="language-plaintext highlighter-rouge">deno outdated</code>. Five package managers in six months, which I can’t think of a precedent for in terms of coordinated feature adoption across competing tools.</p>

<h3 id="python">Python</h3>

<p><a href="https://docs.astral.sh/uv/concepts/resolution/">uv</a> has had <code class="language-plaintext highlighter-rouge">--exclude-newer</code> for absolute timestamps since early on and added relative duration support (e.g. <code class="language-plaintext highlighter-rouge">1 week</code>, <code class="language-plaintext highlighter-rouge">30 days</code>) in version 0.9.17 in December 2025, along with per-package overrides via <code class="language-plaintext highlighter-rouge">exclude-newer-package</code>. pip shipped <a href="https://ichard26.github.io/blog/2026/01/whats-new-in-pip-26.0/"><code class="language-plaintext highlighter-rouge">--uploaded-prior-to</code></a> in version 26.0 in January 2026, though it only accepts absolute timestamps and there’s an <a href="https://github.com/pypa/pip/issues/13674">open issue</a> about adding relative duration support.</p>

<h3 id="ruby">Ruby</h3>

<p>Bundler and RubyGems have no native cooldown support, but <a href="https://gem-coop.github.io/gem.coop/updates/4/">gem.coop</a>, a community-run gem server, launched a cooldowns beta that enforces a 48-hour delay on newly published gems served from a separate endpoint. Pushing the cooldown to the index level rather than the client is interesting because any Bundler user pointed at the gem.coop endpoint gets cooldowns without changing their tooling or workflow at all.</p>

<h3 id="rust-go-php-net">Rust, Go, PHP, .NET</h3>

<p>Cargo has an <a href="https://github.com/rust-lang/rfcs/pull/3923">RFC in progress</a> and the registry-side infrastructure for cooldowns is <a href="https://doc.rust-lang.org/nightly/cargo/CHANGELOG.html#cargo-194-2026-03-05">stabilized in Cargo 1.94</a> (releasing March 5, 2026). Their approach sidesteps the exemption list problem entirely: instead of exempting packages from cooldowns, you explicitly opt in to a new version with <code class="language-plaintext highlighter-rouge">cargo update foo --precise 1.5.10</code>, which records the choice in your lockfile. No exclude list to remember to clean up later. In the meantime there’s also <a href="https://crates.io/crates/cargo-cooldown">cargo-cooldown</a>, a third-party wrapper that enforces a configurable cooldown window on developer machines as a proof-of-concept. Go has an <a href="https://github.com/golang/go/issues/76485">open proposal</a> for <code class="language-plaintext highlighter-rouge">go get</code> and <code class="language-plaintext highlighter-rouge">go mod tidy</code>, Composer has <a href="https://github.com/composer/composer/issues/12552">two</a> <a href="https://github.com/composer/composer/issues/12633">open</a> issues, and NuGet has an <a href="https://github.com/NuGet/Home/issues/14657">open issue</a> though .NET projects using Dependabot already get cooldowns on the update bot side since Dependabot <a href="https://github.blog/changelog/2025-07-29-dependabot-expanded-cooldown-and-package-manager-support/">expanded NuGet support</a> in July 2025.</p>

<h3 id="dependency-update-tools">Dependency update tools</h3>

<p><a href="https://docs.renovatebot.com/key-concepts/minimum-release-age/">Renovate</a> has had <code class="language-plaintext highlighter-rouge">minimumReleaseAge</code> (originally called <code class="language-plaintext highlighter-rouge">stabilityDays</code>) for years, long before the rest of the ecosystem caught on, adding a “pending” status check to update branches until the configured time has passed. <a href="https://www.mend.io/blog/secure-npm-ecosystem-with-mend-renovate/">Mend Renovate 42</a> went a step further and made a 3-day minimum release age the default for npm packages in their “best practices” config via the <code class="language-plaintext highlighter-rouge">security:minimumReleaseAgeNpm</code> preset, making cooldowns opt-out rather than opt-in for their users. <a href="https://docs.github.com/en/code-security/dependabot/working-with-dependabot/dependabot-options-reference">Dependabot</a> shipped cooldowns in July 2025 with a <code class="language-plaintext highlighter-rouge">cooldown</code> block in <code class="language-plaintext highlighter-rouge">dependabot.yml</code> supporting <code class="language-plaintext highlighter-rouge">default-days</code> and per-semver-level overrides (<code class="language-plaintext highlighter-rouge">semver-major-days</code>, <code class="language-plaintext highlighter-rouge">semver-minor-days</code>, <code class="language-plaintext highlighter-rouge">semver-patch-days</code>), with security updates bypassing the cooldown. <a href="https://docs.snyk.io/scan-with-snyk/pull-requests/snyk-pull-or-merge-requests/upgrade-dependencies-with-automatic-prs-upgrade-prs/upgrade-open-source-dependencies-with-automatic-prs">Snyk</a> takes the most aggressive stance with a built-in non-configurable 21-day cooldown on automatic upgrade PRs. <a href="https://www.npmjs.com/package/npm-check-updates">npm-check-updates</a> added a <code class="language-plaintext highlighter-rouge">--cooldown</code> parameter that accepts duration suffixes like <code class="language-plaintext highlighter-rouge">7d</code> or <code class="language-plaintext highlighter-rouge">12h</code>.</p>

<h3 id="checking-your-config">Checking your config</h3>

<p><a href="https://docs.zizmor.sh/audits/">zizmor</a> added a <code class="language-plaintext highlighter-rouge">dependabot-cooldown</code> audit rule in version 1.15.0 that flags Dependabot configs missing cooldown settings or with insufficient cooldown periods (default threshold: 7 days), with auto-fix support. <a href="https://www.stepsecurity.io/blog/introducing-the-npm-package-cooldown-check">StepSecurity</a> offers a GitHub PR check that fails PRs introducing npm packages released within a configurable cooldown period. <a href="https://docs.openrewrite.org/recipes/github/adddependabotcooldown">OpenRewrite</a> has an <code class="language-plaintext highlighter-rouge">AddDependabotCooldown</code> recipe for automatically adding cooldown sections to Dependabot config files. For GitHub Actions specifically, <a href="https://github.com/suzuki-shunsuke/pinact">pinact</a> added a <code class="language-plaintext highlighter-rouge">--min-age</code> flag, and <a href="https://github.com/j178/prek">prek</a> (a Rust reimplementation of pre-commit) added <code class="language-plaintext highlighter-rouge">--cooldown-days</code>.</p>

<h3 id="still-waiting">Still waiting</h3>

<p>For Go, Bundler, Composer, and pip, cooldown support is still in discussion or only partially landed, which means you’re relying on Dependabot or Renovate to enforce the delay. That covers automated updates, but nothing stops someone from running <code class="language-plaintext highlighter-rouge">bundle update</code> or <code class="language-plaintext highlighter-rouge">go get</code> locally and pulling in a version that’s been on the registry for ten minutes. I couldn’t find any cooldown discussion at all for Maven, Gradle, Swift Package Manager, Dart’s pub, or Elixir’s Hex, if you know of one, let me know and I’ll update this post.</p>

<p>The feature also goes by at least ten different configuration names across the tools that do support it (<code class="language-plaintext highlighter-rouge">cooldown</code>, <code class="language-plaintext highlighter-rouge">minimumReleaseAge</code>, <code class="language-plaintext highlighter-rouge">min-release-age</code>, <code class="language-plaintext highlighter-rouge">npmMinimalAgeGate</code>, <code class="language-plaintext highlighter-rouge">exclude-newer</code>, <code class="language-plaintext highlighter-rouge">stabilityDays</code>, <code class="language-plaintext highlighter-rouge">uploaded-prior-to</code>, <code class="language-plaintext highlighter-rouge">min-age</code>, <code class="language-plaintext highlighter-rouge">cooldown-days</code>, <code class="language-plaintext highlighter-rouge">minimum-dependency-age</code>), which makes writing about it almost as hard as configuring it across a polyglot project.</p>

<h3 id="language-vs-system-package-managers">Language vs. system package managers</h3>

<p>On npm, PyPI, and RubyGems, running <code class="language-plaintext highlighter-rouge">npm publish</code> or <code class="language-plaintext highlighter-rouge">gem push</code> makes a package installable worldwide in seconds, and if Dependabot or Renovate happens to run in that window, the malicious code lands in a project without a human ever seeing it. All of the supply chain attacks William examined exploit this property, where publishing and distribution are the same act and nothing stands between a compromised maintainer account and thousands of downstream projects.</p>

<p>System package managers work differently because they separate those two things. When someone pushes a new version of an upstream library, it doesn’t appear in <code class="language-plaintext highlighter-rouge">apt install</code> or <code class="language-plaintext highlighter-rouge">brew install</code> until a distribution maintainer has reviewed the change, updated the package definition, and pushed it through a build pipeline. Fedora packages go through review and koji builds, Homebrew requires a pull request that passes CI and gets merged by a maintainer. A compromised upstream tarball still has to survive that process before it reaches anyone’s machine, and the people doing the reviews tend to notice when a patch adds an obfuscated postinstall script that curls a remote payload.</p>

<p>Debian goes further. Even if a maintainer account is compromised, uploads land in unstable first, then automigrate to testing after 2 to 10 days depending on urgency and availability of package tests. Stable only gets updates through a separate release process. That’s effectively a built-in cooldown with human review at multiple stages.</p>

<p>Cooldowns on the language package manager side are trying to retrofit something like that review window onto ecosystems that never had one, giving security researchers a few days to flag a malicious publish before automated tooling pulls it into lockfiles. Asking Homebrew or apt to add the same feature would mean delaying security patches through a process that already has human gatekeepers, which costs more than it saves.</p>

<h3 id="the-timestamp-problem">The timestamp problem</h3>

<p>pip’s <code class="language-plaintext highlighter-rouge">--uploaded-prior-to</code> and npm’s older <code class="language-plaintext highlighter-rouge">--before</code> flag both take absolute timestamps, and the <a href="https://github.com/pypa/pip/issues/13674">discussion about adding relative duration support to pip</a> reveals how these two modes serve different goals that happen to share implementation surface. An absolute timestamp pins your dependency resolution to a moment in time, so running the same install six months from now produces the same result, which is a reproducibility feature. A relative duration like <code class="language-plaintext highlighter-rouge">7 days</code> creates a sliding window that moves forward with you, so you always exclude recently published packages regardless of when you run the build, which is a security feature. uv’s <code class="language-plaintext highlighter-rouge">--exclude-newer</code> accepts both forms, and npm has both <code class="language-plaintext highlighter-rouge">--before</code> for absolute dates and <code class="language-plaintext highlighter-rouge">min-release-age</code> for relative durations. pnpm, Yarn, Bun, and Deno only accept relative durations.</p>

<p>The pip thread also gets into the surprisingly fiddly business of parsing duration strings. ISO 8601 durations (<code class="language-plaintext highlighter-rouge">P7D</code>) are unambiguous but nobody wants to type them, human-readable strings like <code class="language-plaintext highlighter-rouge">7 days</code> are friendly but need a parser that pip’s maintainers would rather not write and maintain, and variable-length calendar units like months and years require knowing which month you’re in to convert to a concrete number of days. uv went with ISO 8601 plus friendly strings but excluded months and years entirely, and pip’s maintainers are leaning toward just accepting a bare number of days, which covers nearly every real use case without dragging in leap year arithmetic.</p>

<p>Even the question of what “seven days ago” means gets complicated when your CI server is in UTC, your developer laptop is in US Pacific time, and the registry timestamp uses whatever timezone PyPI’s servers happen to be configured with. A few hours of timezone drift can determine whether a package published six days and twenty-two hours ago passes the cooldown check or not.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="package-managers" /><category term="security" /><category term="ecosystems" /><category term="deep-dive" /><summary type="html"><![CDATA[A survey of dependency cooldown support across package managers and update tools.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Package Management is Naming All the Way Down</title><link href="https://nesbitt.io/2026/03/03/package-management-is-naming-all-the-way-down.html" rel="alternate" type="text/html" title="Package Management is Naming All the Way Down" /><published>2026-03-03T10:00:00+00:00</published><updated>2026-03-03T10:00:00+00:00</updated><id>https://nesbitt.io/2026/03/03/package-management-is-naming-all-the-way-down</id><content type="html" xml:base="https://nesbitt.io/2026/03/03/package-management-is-naming-all-the-way-down.html"><![CDATA[<p>Package managers are usually described by what they do: resolve dependencies, download code, build artifacts. But if you look at the structure of the system instead of the process, nearly every part of it is a naming problem, and the whole thing works because we’ve agreed on how to interpret strings at each layer and because a registry sits in the middle translating between them.</p>

<h3 id="registries">Registries</h3>

<p>When you run <code class="language-plaintext highlighter-rouge">gem install rails</code>, the client needs to know where to look. RubyGems defaults to rubygems.org, pip to pypi.org, npm to registry.npmjs.org, and that default is just a URL baked into the client configuration. You can change it, which is exactly what makes <a href="/2025/12/10/slopsquatting-meets-dependency-confusion.html">dependency confusion</a> possible: if your client checks a public registry before a private one and the names overlap, an attacker who registers the right name on the public registry wins.</p>

<p>Companies run private registries with different names for the same packages, or the same names for different packages. Nix, Guix, and Spack layer multiple package repositories with their own namespaces on top of each other. Go uses URL-based module paths where the registry name is literally embedded in the package identity. Which registry you’re talking to determines what every other name in the system means, because a registry name is really a lookup context: give it a package name and it hands back a list of versions.</p>

<h3 id="namespaces">Namespaces</h3>

<p>Some registries insert another naming layer between the registry and the package. Packagist requires vendor prefixes (<code class="language-plaintext highlighter-rouge">symfony/console</code>), Maven requires reverse-domain group IDs (<code class="language-plaintext highlighter-rouge">org.apache.commons:commons-lang3</code>), and npm has optional scopes (<code class="language-plaintext highlighter-rouge">@babel/core</code>) that most of the ecosystem’s biggest packages never adopted because they predate the feature. RubyGems and PyPI have flat namespaces where the package name is all there is. Even the separator characters differ: <code class="language-plaintext highlighter-rouge">@scope/name</code> on npm, <code class="language-plaintext highlighter-rouge">vendor/package</code> on Packagist, <code class="language-plaintext highlighter-rouge">group:artifact</code> on Maven, and Cargo’s proposed namespaces use <code class="language-plaintext highlighter-rouge">::</code> because <code class="language-plaintext highlighter-rouge">/</code> was already taken by the feature syntax.</p>

<p>A namespace is really a claim of authority over a family of names, which makes questions like who gets to publish under <code class="language-plaintext highlighter-rouge">@google/</code> or who owns the <code class="language-plaintext highlighter-rouge">serde</code> namespace in Cargo’s proposed <code class="language-plaintext highlighter-rouge">serde::derive</code> scheme into governance problems dressed up as naming problems. They only get harder as registries grow. <a href="/2025/12/21/federated-package-management.html">Zooko’s triangle</a> says you can’t have names that are simultaneously human-readable, decentralized, and secure, and registries exist largely to hold two of those three together. I covered the <a href="/2026/02/14/package-management-namespaces.html">different namespace models</a> in more detail previously.</p>

<h3 id="package-names">Package names</h3>

<p>Once you’ve picked a registry and navigated any namespace, you arrive at a package name, and that name resolves to a list of available versions. <code class="language-plaintext highlighter-rouge">requests</code>, <code class="language-plaintext highlighter-rouge">express</code>, <code class="language-plaintext highlighter-rouge">serde</code>, <code class="language-plaintext highlighter-rouge">rails</code>. These need to be unique within their registry and namespace, memorable enough to type from recall, and stable enough that renaming doesn’t break everything downstream. Name scarcity in flat registries is why you get <code class="language-plaintext highlighter-rouge">python-dateutil</code> because <code class="language-plaintext highlighter-rouge">dateutil</code> was taken. PyPI normalizes hyphens, underscores, dots, and case so <code class="language-plaintext highlighter-rouge">my-package</code>, <code class="language-plaintext highlighter-rouge">my_package</code>, <code class="language-plaintext highlighter-rouge">My.Package</code>, and <code class="language-plaintext highlighter-rouge">MY_PACKAGE</code> all resolve to the same thing, a decision that prevents some squatting but means four different-looking strings in requirements files can point at the same package. npm used to allow uppercase package names and then banned them, so legacy packages like <code class="language-plaintext highlighter-rouge">JSONStream</code> still exist with capital letters that no new package can use. The package called <code class="language-plaintext highlighter-rouge">node</code> on npm isn’t Node.js.</p>

<p>Sometimes projects bake a major version into the package name itself, like <code class="language-plaintext highlighter-rouge">boto3</code> or <code class="language-plaintext highlighter-rouge">webpack5</code>, effectively creating a new package that has its own version history on top of the version number already embedded in its name. <code class="language-plaintext highlighter-rouge">boto3</code> version <code class="language-plaintext highlighter-rouge">1.34.0</code> is a different thing from a hypothetical <code class="language-plaintext highlighter-rouge">boto4</code> version <code class="language-plaintext highlighter-rouge">1.0.0</code>, even though the underlying project is the same.</p>

<p>Typosquatting exploits the gap between what you meant to type and what the registry resolved; slopsquatting exploits LLM hallucinations of package names that don’t exist yet but could be registered by an attacker. The registry will resolve whatever string you give it, no questions asked.</p>

<h3 id="versions">Versions</h3>

<p>Pick a version from that list and you get a particular snapshot of code, along with its metadata: a list of dependencies, a list of builds, and whatever the maintainer wrote in the changelog. Versions look like numbers but they’re really strings, which becomes obvious as soon as you see <code class="language-plaintext highlighter-rouge">1.0.0-beta.2+build.456</code> or Python’s <code class="language-plaintext highlighter-rouge">1.0a1.post2.dev3</code> or the <a href="/2024/06/24/from-zerover-to-semver-a-comprehensive-list-of-versioning-schemes-in-open-source.html">dozens of versioning schemes</a> people have invented over the years. Prerelease tags, build metadata, epoch prefixes, calver date segments all get bolted onto the version string to carry meaning that a simple three-number tuple can’t express, and every ecosystem parses and sorts these strings differently. Debian prepends an epoch (<code class="language-plaintext highlighter-rouge">2:1.0.0</code>) so that a repackaged version sorts higher than the original even if the version number is lower. Ruby uses <code class="language-plaintext highlighter-rouge">.pre.1</code> where npm uses <code class="language-plaintext highlighter-rouge">-pre.1</code>. Is <code class="language-plaintext highlighter-rouge">1.0.0</code> the same as <code class="language-plaintext highlighter-rouge">v1.0.0</code>? Depends who you ask. <code class="language-plaintext highlighter-rouge">1.2.3</code> is supposed to communicate something about compatibility relative to <code class="language-plaintext highlighter-rouge">1.2.2</code> and <code class="language-plaintext highlighter-rouge">2.0.0</code>, but that communication happens entirely through convention around the name, with no mechanism to enforce it. Elm is the rare exception, where the registry diffs APIs and rejects publishes that break compatibility without a major bump.</p>

<p>When a maintainer account is compromised, publishing <code class="language-plaintext highlighter-rouge">1.2.4</code> with malicious code looks indistinguishable from a routine patch release, because the version name carries no provenance. And when a version gets yanked or deleted, lockfiles that pinned to that exact name suddenly point at nothing.</p>

<h3 id="dependencies-and-requirements">Dependencies and requirements</h3>

<p>Each version carries a list of dependencies, and each dependency is itself a pair of names: a package name and a version constraint. <code class="language-plaintext highlighter-rouge">requests &gt;= 2.28</code> means “the package named <code class="language-plaintext highlighter-rouge">requests</code>, at a version whose name satisfies <code class="language-plaintext highlighter-rouge">&gt;= 2.28</code>”. So you’re back at the package name layer, looking up another name, getting another list of versions, and the resolver walks this graph recursively trying to find a consistent set of version names that satisfies all the constraints simultaneously. When two packages name the same dependency with incompatible constraints, the resolver has to either find a way through or prove that no path exists.</p>

<p>The same “convention not enforcement” problem from versioning carries over here. The version constraints are a small language for describing sets of version names, and every ecosystem invented its own. <code class="language-plaintext highlighter-rouge">~&gt; 2.0</code> in Ruby, <code class="language-plaintext highlighter-rouge">^2.0</code> in npm, <code class="language-plaintext highlighter-rouge">&gt;=2.0,&lt;3.0</code> in Python all use different syntax with subtly different semantics, especially once you hit edge cases around 0.x versions. A broad constraint like <code class="language-plaintext highlighter-rouge">&gt;=1.0</code> names a large and growing set of versions; a pinned <code class="language-plaintext highlighter-rouge">==1.2.3</code> names exactly one. The choice of constraint syntax determines how much of the version namespace a single declaration covers, and there’s no cross-ecosystem agreement on what the symbols mean.</p>

<p>Some dependencies are themselves hidden behind yet another name. pip has extras (<code class="language-plaintext highlighter-rouge">requests[security]</code>), Cargo has features (<code class="language-plaintext highlighter-rouge">serde/derive</code>), and Bundler has groups (<code class="language-plaintext highlighter-rouge">:development</code>, <code class="language-plaintext highlighter-rouge">:test</code>), all of which are named sets of additional dependencies that only activate when someone asks for them by name. <code class="language-plaintext highlighter-rouge">pip install requests</code> and <code class="language-plaintext highlighter-rouge">pip install requests[security]</code> install different dependency trees from the same package, selected by a string in square brackets that the package author chose.</p>

<p>These constraint languages also compose with the namespace layer. npm’s <code class="language-plaintext highlighter-rouge">@types/node@^18.0.0</code> combines a scope, a package name, and a version constraint into a single expression, while Maven’s <code class="language-plaintext highlighter-rouge">org.apache.commons:commons-lang3:3.12.0</code> encodes group, artifact, and version as three colon-separated names that only make sense when parsed together.</p>

<h3 id="builds-and-platforms">Builds and platforms</h3>

<p>Once the resolver has settled on a version, the client needs to pick the right build artifact, and that means matching platform names. Unlike the earlier naming layers, which are mostly human-coordination problems, platform identity is inherently fuzzy: an M1 Mac running Rosetta is simultaneously two platforms depending on who’s asking, and <code class="language-plaintext highlighter-rouge">manylinux</code> is a compatibility fiction that keeps getting revised as the definition shifts underneath it. PyPI wheels look like <code class="language-plaintext highlighter-rouge">numpy-1.24.0-cp311-cp311-manylinux_2_17_x86_64.whl</code>, packing the package name, version, Python version, ABI tag, and platform into a single filename. RubyGems appends a platform suffix to get <code class="language-plaintext highlighter-rouge">nokogiri-1.15.4-x86_64-linux-gnu.gem</code>, and Conda encodes the channel, platform, and build hash.</p>

<p>If the platform name on the artifact doesn’t match the platform name the client computes for its own environment, the package won’t install, or the wrong binary gets selected silently. And as I wrote about in <a href="/2026/02/17/platform-strings.html">platform strings</a>, the same M1 Mac is <code class="language-plaintext highlighter-rouge">aarch64-apple-darwin</code> to LLVM, <code class="language-plaintext highlighter-rouge">arm64-darwin</code> to RubyGems, <code class="language-plaintext highlighter-rouge">darwin/arm64</code> to Go, and <code class="language-plaintext highlighter-rouge">macosx_11_0_arm64</code> to Python wheels, so every tool that works across ecosystems ends up maintaining a translation table between naming schemes that each made sense in their original context.</p>

<h3 id="source-repositories">Source repositories</h3>

<p>The naming doesn’t stop at the registry. Most packages point back to a source repository, and that’s another stack of names: the host (<code class="language-plaintext highlighter-rouge">github.com</code>), the owner or organization (<code class="language-plaintext highlighter-rouge">rails</code>), the repository name (<code class="language-plaintext highlighter-rouge">rails</code>), branches (<code class="language-plaintext highlighter-rouge">main</code>, <code class="language-plaintext highlighter-rouge">7-1-stable</code>), tags (<code class="language-plaintext highlighter-rouge">v7.1.3</code>), and commits (a SHA that’s finally content-addressed rather than human-chosen). Go and Swift skip the registry layer entirely and use these repository URLs as the package identity, which means the naming conventions of GitHub or whatever host you’re on become part of your dependency graph directly. Monorepos add another wrinkle: Babel’s source lives at <code class="language-plaintext highlighter-rouge">babel/babel</code> on GitHub but publishes dozens of packages under <code class="language-plaintext highlighter-rouge">@babel/*</code>, so the mapping from repo name to package name is one-to-many.</p>

<p>Version tags in git are particularly interesting because they’re the bridge between two naming systems. A maintainer creates a git tag called <code class="language-plaintext highlighter-rouge">v1.2.3</code>, and the registry or build tool maps that to a version name in its own scheme. But there’s no standard for whether the tag should be <code class="language-plaintext highlighter-rouge">v1.2.3</code> or <code class="language-plaintext highlighter-rouge">1.2.3</code> or <code class="language-plaintext highlighter-rouge">release-1.2.3</code>, so tooling has to guess or be configured. And when an organization renames on GitHub, or a project moves from one owner to another, every downstream reference to the old owner/repo pair breaks unless the host maintains redirects, which GitHub does until someone registers the old name, at which point you have the repo-jacking problem.</p>

<h3 id="naming-and-trust">Naming and trust</h3>

<p>At each of these layers you’re trusting that a name resolves to what you think it does, that the registry URL points to the right service, that the package name belongs to who you think it does, that a version was published legitimately, that a constraint won’t pull in something unexpected, that a platform-tagged binary was built from the same source as the one for your colleague’s machine. That <a href="/2026/03/02/transitive-trust.html">trust is transitive</a>, flowing through your dependencies’ names and their dependencies’ names in a chain where nobody has full visibility. The registry is the authority that makes most of these names meaningful, which is why the question of who governs registries keeps coming back to the surface.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="package-managers" /><category term="deep-dive" /><summary type="html"><![CDATA[There are two hard problems in computer science, and package managers found at least eight of them.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Transitive Trust</title><link href="https://nesbitt.io/2026/03/02/transitive-trust.html" rel="alternate" type="text/html" title="Transitive Trust" /><published>2026-03-02T10:00:00+00:00</published><updated>2026-03-02T10:00:00+00:00</updated><id>https://nesbitt.io/2026/03/02/transitive-trust</id><content type="html" xml:base="https://nesbitt.io/2026/03/02/transitive-trust.html"><![CDATA[<p>Ken Thompson’s 1984 Turing Award lecture, <a href="https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf">Reflections on Trusting Trust</a>, described a C compiler modified to insert a backdoor into the <code class="language-plaintext highlighter-rouge">login</code> program, then modified again so the compiler would replicate the backdoor in future versions of itself without any trace in the source. The source was clean, the binary was compromised, and the only way to discover the backdoor was to rebuild the entire compiler toolchain from scratch and compare the output, which nobody was going to do.</p>

<p>The explosion of open source was built on this kind of transitive trust between maintainers. A package with 800 transitive dependencies works because each maintainer along the way did a reasonable job of choosing and maintaining their own dependencies, and the maintainers they depended on did the same. Nobody designed this trust network or audited it as a whole. It just grew as people built on each other’s work, and it has held up well enough that we’ve come to take it for granted, even as bad actors have started to map its weak points.</p>

<p>We have decent tools now for scanning our own dependency trees. You can run <code class="language-plaintext highlighter-rouge">npm audit</code> or Dependabot or Snyk against your lockfile and get a report on known vulnerabilities. But when you do that, you’re trusting that the maintainer of each package in your tree is doing the same: running audits, reviewing what they pull in, dropping dependencies whose maintainers have gone quiet, keeping their build tooling current. And you’re trusting that those maintainers are trusting their own dependencies’ maintainers to do the same, all the way down through a chain of people who mostly don’t know each other and have no visibility into each other’s practices.</p>

<p>Every package you install was also built, tested, and published using dependencies you never see: a JavaScript library’s <code class="language-plaintext highlighter-rouge">devDependencies</code>, the build tools that compiled a Rust crate before it was uploaded, the pytest plugins that ran during CI, the GitHub Action that handled publishing. You’re trusting that the maintainer chose those carefully, keeps them updated, and drops them when they go stale, and that the maintainers of those tools are doing the same. A maintainer who never runs <code class="language-plaintext highlighter-rouge">npm audit</code>, who has a three-year-old GitHub Action in their publish workflow, who accepted a PR from a stranger adding a new build dependency without much scrutiny, produces an artifact on the registry that looks identical to one from a maintainer who checks everything meticulously.</p>

<p>The <a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident">event-stream incident</a> is the classic example: the original maintainer handed the project to someone new, that person added a malicious dependency, and nobody upstream noticed. The <a href="https://www.openwall.com/lists/oss-security/2024/03/29/4">xz backdoor</a> was more patient and more frightening. A co-maintainer spent two years making legitimate contributions before planting obfuscated code in the build system and test fixtures, targeting a part of the toolchain that almost nobody reads. And then there’s the <a href="https://about.codecov.io/security-update/">codecov bash uploader compromise</a>, which didn’t target a library at all but a CI tool that thousands of projects were curling into their build pipelines. I suspect most maintainers who used it never read the script once.</p>

<p><a href="https://repos.openssf.org/trusted-publishers-for-all-package-repositories.html">Trusted publishing</a> is an effort to close part of this gap. PyPI, npm, and RubyGems now support publishing flows where packages are built and uploaded directly from CI using short-lived credentials tied to a specific repository and workflow, which creates a verifiable link between the source and the published artifact. But it also means we’re now trusting that each maintainer’s CI configuration is sound, that the GitHub Actions in their workflow are maintained by people who are themselves doing due diligence, that the dev dependencies installed during the build are ones they’ve reviewed. GitHub Actions in particular has <a href="/2025/12/06/github-actions-package-manager/">almost none of the supply chain protections</a> that language package managers have spent years building, so in practice we’ve traded one unverifiable assumption for a different one.</p>

<p>Semver ranges compound this because <code class="language-plaintext highlighter-rouge">npm update</code> or <code class="language-plaintext highlighter-rouge">bundle update</code> or <code class="language-plaintext highlighter-rouge">cargo update</code> will pull in new versions across your entire tree in seconds, and you’re trusting that every maintainer in the chain shipped something good since the last time your lockfile was generated, including versions built against whatever state their toolchains were in at the time.</p>

<p>Large companies deal with this by vendoring and rebuilding everything from source in controlled environments, effectively verifying each level of the trust chain themselves instead of relying on each maintainer to have done it. But even vendoring just moves the boundary. Those controlled builds still run on compilers and operating systems and hardware that somebody else produced, and at some point you stop verifying and start trusting. The honest version of “we’ve audited our supply chain” is “we’ve audited our supply chain down to a depth we felt comfortable with and then stopped”.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="package-managers" /><category term="security" /><category term="ecosystems" /><summary type="html"><![CDATA[You trust your maintainers, who trust their maintainers, but do they trust their maintainers' maintainers?]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Downstream Testing</title><link href="https://nesbitt.io/2026/03/01/downstream-testing.html" rel="alternate" type="text/html" title="Downstream Testing" /><published>2026-03-01T00:00:00+00:00</published><updated>2026-03-01T00:00:00+00:00</updated><id>https://nesbitt.io/2026/03/01/downstream-testing</id><content type="html" xml:base="https://nesbitt.io/2026/03/01/downstream-testing.html"><![CDATA[<p>The information about how a library is actually used lives in the dependents’ code, not in the library’s own tests or docs. Someone downstream is parsing your error messages with a regex, or relying on the iteration order of a result set you never documented, or depending on a method you consider internal because it wasn’t marked private in a language that doesn’t enforce visibility. <a href="https://www.hyrumslaw.com/">Hyrum’s Law</a> says all of these implicit contracts exist once you have enough users, and semver can’t help because a version number declares what the maintainer intended, not what downstream code actually depends on.</p>

<p>A <a href="https://hasel.auckland.ac.nz/2023/11/12/understanding-breaking-changes-in-the-wild/">2023 study of Maven</a> found that 11.58% of dependency updates contain breaking changes that impact clients, and nearly half arrived in non-major version bumps. Most library maintainers have no way to validate their version number before publishing, so the feedback loop is reactive: release, wait for bug reports, and hope the breakage wasn’t too widespread before you can cut a patch.</p>

<h3 id="distributions">Distributions</h3>

<p>Debian packages declare test suites following the DEP-8 specification, and when a package is a candidate for migration from unstable to testing, the migration tool Britney triggers <a href="https://wiki.debian.org/autopkgtest">autopkgtest</a> for the package and all of its reverse dependencies. A regression blocks migration, so an Expat update that causes test failures in its dependents sits in unstable until someone resolves them, and a Coq update that broke mathcomp-analysis and mathcomp-finmap did the same. The maintainer finds out who they broke and how before the change reaches anyone who didn’t opt into unstable.</p>

<p>Autopkgtest doesn’t check API compatibility. It runs actual test suites of actual consumers, which encode whatever implicit contracts those consumers have built against, including ones the upstream maintainer has never heard of. If library Y changes the sort order of a hash table in a patch release and package X’s tests assumed that order was stable, migration blocks until someone decides whose assumption was wrong.</p>

<p>Fedora’s recent work with <a href="https://cockpit-project.org/blog/tmt-cross-project-testing.html">tmt, Packit, and Testing Farm</a> runs downstream tests in the PR, before anything is released. The Cockpit project configured it so that opening a PR on their core library automatically runs the test suites of cockpit-podman and other dependents against the proposed change, with results showing up as status checks before merge. As they put it, “it is too late at the distro level anyway: at that point the new upstream release which includes the regression was already done, and the culprit landed possibly weeks ago already.”</p>

<p>When a maintainer discovers breakage in a PR, they’re still inside the change. They remember why they restructured that error path, they know which tests they considered, and the diff is right in front of them. The cost of responding to a downstream failure at this point is a few minutes of thought and maybe a revised approach. When the same breakage surfaces as an issue filed three weeks after release, the maintainer has to reload the context of the change, understand the downstream project’s usage well enough to see why it broke, decide whether to fix forward or revert, cut a new release, and hope that consumers who already pinned away will unpin. The information is the same in both cases, a downstream test failed, but the cost of acting on it scales with the distance from the change that caused it.</p>

<p>Debian’s autopkgtest catches breakage before migration to testing, which is better than catching it after, but the change has already been released upstream by that point. The Fedora approach catches it before the upstream release happens at all, which means the maintainer can fix it before anyone outside their own CI ever encounters it. František Lachman and Cristian Le presented the PTE project at <a href="https://fosdem.org/2026/schedule/event/MCNHUF-from-code-to-distribution-testing-pipeline/">FOSDEM</a>. Downstream feedback that arrives while you’re still writing the code changes how you think about the change itself.</p>

<h3 id="language-ecosystems">Language ecosystems</h3>

<p>Distributions can do this because they have structural properties that language ecosystems lack: a single canonical dependency graph, a standardized test interface (DEP-8 in Debian’s case), a shared execution environment where every package builds and runs the same way, and the authority to block a release based on downstream results. npm, PyPI, and RubyGems have fragmented tooling, no standard way to invoke a package’s tests from outside its own repo, heterogeneous execution environments, and no mechanism to gate a publish on anything other than the maintainer’s own judgement. A few language ecosystems have built partial versions of downstream testing anyway, though they tend to belong to compiler teams with the resources to work around these gaps.</p>

<p>Rust’s <a href="https://github.com/rust-lang/crater">crater</a> compiles and tests every crate on crates.io against both the current and proposed compiler, then diffs the results. A recent <a href="https://github.com/rust-lang/rust/pull/142723">PR adding <code class="language-plaintext highlighter-rouge">impl From&lt;f16&gt; for f32</code></a> to the standard library broke 3,143 crates out of 650,587 tested. Adding a trait implementation is unambiguously backwards-compatible by semver’s rules, but it broke type inference in thousands of downstream projects because existing code depended on there being exactly one conversion path between those types. Crater caught it before it shipped, during a run that took five to six days across Linux x86_64. Without it, the Rust team would have discovered the breakage from 3,143 individual bug reports.</p>

<p>Crater also benefits from Rust being compiled: a type inference failure shows up at build time, before any tests run. In Python, Ruby, or JavaScript, the equivalent breakage only surfaces at runtime, so you need downstream test suites that actually exercise the affected code paths, and those code paths need to be covered in the first place. The case for downstream testing is stronger in dynamic ecosystems because there’s no compile step to catch the easy ones, and the signal is harder to get.</p>

<p>Node.js runs <a href="https://github.com/nodejs/citgm">CITGM</a> (Canary in the Goldmine), which tests about 80 curated npm packages against proposed Node versions. A refactor in Node 12 moved <code class="language-plaintext highlighter-rouge">isFile</code> from <code class="language-plaintext highlighter-rouge">Stats.prototype</code> to <code class="language-plaintext highlighter-rouge">StatsBase.prototype</code>, changing nothing about the public API but breaking the esm module because it walked the prototype chain directly. In a separate release, a change to the timing of a <code class="language-plaintext highlighter-rouge">readable</code> event on EOF broke the dicer module, which depended on that event firing synchronously.</p>

<p>All of these were built by teams with dedicated infrastructure budgets and release processes, and an individual library maintainer who publishes a widely-used package on npm or PyPI or RubyGems has nothing comparable, even though they face the same problem at a different scale.</p>

<h3 id="merge-confidence">Merge confidence</h3>

<p>Renovate’s <a href="https://docs.renovatebot.com/merge-confidence/">Merge Confidence</a> aggregates data from millions of update PRs to tell consumers whether an update is safe: how old the release is, what percentage of Renovate users have adopted it, and what percentage of updates result in passing tests. The signal comes from real test results across real projects, but it arrives after the release and flows to consumers, never back to the maintainer who shipped the change.</p>

<p>The algorithm is private, and the underlying dataset of which updates broke which projects’ tests stays behind Mend’s paywall. Dependabot shows a <a href="https://docs.github.com/en/code-security/dependabot/dependabot-security-updates/about-dependabot-security-updates">compatibility score</a> on security update PRs, calculated from CI results across other public repos that made the same update, but only when at least five candidate updates exist, and the data doesn’t flow back to the maintainer either. I’ve started indexing Dependabot PRs at <a href="https://dependabot.ecosyste.ms">dependabot.ecosyste.ms</a> to build an open version of this signal. It doesn’t have CI data yet, but it already tracks merge percentages per update, which gives a rough proxy for how much trouble a particular version bump is causing across the ecosystem.</p>

<h3 id="discovery">Discovery</h3>

<p>Registries track which packages declare dependencies on other packages, but applications that consume libraries are mostly invisible: a Rails app that depends on a gem won’t show up in RubyGems’ reverse dependency list, and a company’s internal service using an npm package won’t appear on npmjs.com. The maintainer’s view of their dependents is limited to whatever the registry can see, which skews heavily toward other libraries and misses the applications, which are where the stranger usage patterns and more surprising implicit contracts show up.</p>

<p><a href="https://ecosyste.ms">ecosyste.ms</a> tracks dependents across both packages and open source repositories, scanning millions of repos on GitHub, GitLab, and other forges for manifest files that declare dependencies. A maintainer can see which applications actually use their library, which is the view you’d need to build a downstream testing system on.</p>

<h3 id="building-it">Building it</h3>

<p>This is something I want to build on top of ecosyste.ms. A maintainer connects the service to their CI, and on every PR or pre-release branch it queries ecosyste.ms for the top N dependents of the package, both libraries and applications, ranked by some combination of dependent count, download volume, and recency of commits. It clones each one, installs the proposed version of the library in place of the current release, and runs their test suite in an isolated environment. The results come back as a report on the PR: which dependents were tested, which ones regressed, what the stack traces look like, which of the maintainer’s changes likely caused each failure.</p>

<p>A maintainer looking at that report before tagging a release would see things that are currently invisible to them. They’d see that popular applications parse their error messages with regex and will break if the wording changes, that a widely-used wrapper library calls a method they considered internal and were about to remove, that their optimisation to batch database calls changed the callback order in a way that two downstream projects’ integration tests depend on.</p>

<p>Michal Gorny’s <a href="https://mgorny.pl/articles/downstream-testing-python-packages.html">catalogue of problems with downstream testing Python packages</a> lays out the failure modes: test suites that modify installed files assuming they’re in a disposable container, pytest plugins in the environment causing unexpected test collection, tests requiring network access or Docker, timing-dependent assertions, floating-point precision differences across architectures, source distributions that omit test files entirely. Any service trying this across a registry would need to handle all of these gracefully, distinguishing genuine regressions from environmental noise, which is a hard problem that Debian has spent years refining with autopkgtest and still hasn’t fully solved.</p>

<p>Developer tools usually fund themselves by selling an enterprise version, but large companies facing similar coordination problems between internal teams already solved them with monorepos. When all your code lives in one tree, downstream testing is just CI: you run every affected test before merging, no special infrastructure needed. Google, Meta, and Microsoft have invested heavily in making that work, and inside their monorepos the problem is already solved. Nobody’s going to buy an enterprise version of downstream testing when their codebase doesn’t have a “downstream,” which leaves open source maintainers as the only audience for a tool like this, and they can’t fund it.</p>

<p>ecosyste.ms already provides the dependent discovery, source repositories are linked from package metadata, test suites follow ecosystem conventions that are well-understood enough to automate, and container infrastructure makes isolated environments cheap. Crater and autopkgtest have proven the approach works at ecosystem scale. The missing piece is stitching these together into something an individual maintainer can point at their package and get results from, without needing a compiler team’s budget or a distro’s infrastructure.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="package-managers" /><category term="testing" /><category term="ecosystems" /><summary type="html"><![CDATA[Most library maintainers have no way to test against their dependents before releasing.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">npm Data Subject Access Request</title><link href="https://nesbitt.io/2026/02/28/npm-data-subject-access-request.html" rel="alternate" type="text/html" title="npm Data Subject Access Request" /><published>2026-02-28T10:00:00+00:00</published><updated>2026-02-28T10:00:00+00:00</updated><id>https://nesbitt.io/2026/02/28/npm-data-subject-access-request</id><content type="html" xml:base="https://nesbitt.io/2026/02/28/npm-data-subject-access-request.html"><![CDATA[<p><strong>From:</strong> Data Protection Officer, npm, Inc. (a subsidiary of GitHub, Inc., a subsidiary of Microsoft Corporation)<br />
<strong>To:</strong> [REDACTED]<br />
<strong>Date:</strong> 26 February 2026<br />
<strong>Re:</strong> Data Subject Access Request (Ref: DSAR-2026-0041573)<br />
<strong>Response deadline:</strong> Exceeded (statutory: 30 days)</p>

<p>Dear Data Subject,</p>

<p>Thank you for your request under Article 15 of the General Data Protection Regulation (EU) 2016/679 to access all personal data we hold about you.</p>

<p>We apologize for the delay in responding. Your request was initially routed to our dependency resolution system, which spent 47 days attempting to resolve your identity against our user registry before entering a circular reference with GitHub’s SSO provider. A human has since intervened.</p>

<h3 id="1-categories-of-personal-data-processed">1. Categories of Personal Data Processed</h3>

<ul>
  <li><strong>Identity data</strong>: Name, email address, username, GitHub handle, two-factor authentication status, and 487 unique IP addresses recorded since account creation.</li>
  <li><strong>Package data</strong>: Full publishing history for <code class="language-plaintext highlighter-rouge">buttplug</code> (147 versions) and 1 package published at 2:47 AM containing your <code class="language-plaintext highlighter-rouge">.env</code> file. You un-published it within four minutes, by which time 14 users had installed it.</li>
  <li><strong>Behavioral data</strong>: Every <code class="language-plaintext highlighter-rouge">npm install</code> you have ever run, including timestamps and resolved dependency trees. Every <code class="language-plaintext highlighter-rouge">npm audit</code> you have run (4 times) and every <code class="language-plaintext highlighter-rouge">npm audit</code> you chose not to run (approximately 11,200 times), all of which we log.</li>
  <li><strong>node_modules inventory</strong>: Resolved dependency trees, install manifests, and content hashes collected from your local environment during package installation. This constitutes the largest category at 412 pages (see Appendix J).</li>
</ul>

<h3 id="2-purposes-of-processing">2. Purposes of Processing</h3>

<ul>
  <li><strong>Service provision</strong>: To deliver packages to your machine.</li>
  <li><strong>Dependency graph construction</strong>: To build and maintain a complete graph of every package’s relationship to every other package, and by extension, every developer’s relationship to every other developer, though we have not yet determined a use for it.</li>
  <li><strong>Security</strong>: To detect anomalous publishing behavior. Our system flagged your 2:47 AM publish as anomalous.</li>
  <li><strong>Legitimate interest</strong>: We have a legitimate interest in understanding the full topology of the JavaScript ecosystem. We acknowledge this interest is difficult to distinguish from surveillance.</li>
</ul>

<h3 id="3-recipients-of-personal-data">3. Recipients of Personal Data</h3>

<ul>
  <li><strong>GitHub, Inc.</strong>: Our parent company. They hold your data under a separate privacy policy. You will need to submit a separate DSAR to them. They will redirect you to Microsoft.</li>
  <li><strong>GitHub Dependabot</strong>: Each of the 147 versions of <code class="language-plaintext highlighter-rouge">buttplug</code> you have published generated automated pull requests titled “Bump buttplug” across an estimated 1,247 downstream repositories.</li>
  <li><strong>Microsoft Corporation</strong>: Our parent company’s parent company. Their response to your DSAR will be delivered via Microsoft Teams, which you will need to install.</li>
  <li><strong>Cloudflare, Inc.</strong>: Our CDN provider. They have observed every package you have ever downloaded. They consider this metadata, not personal data.</li>
  <li><strong>The npm public registry</strong>: Your published packages, including their <code class="language-plaintext highlighter-rouge">package.json</code> files, are publicly available. Your <code class="language-plaintext highlighter-rouge">package.json</code> from the 2:47 AM incident contained your home directory path and your OS username. We cannot un-publish this information, as at least one of the 14 downstream consumers has mirrored it to IPFS.</li>
  <li><strong>GitHub Arctic Code Vault</strong>: Your published packages were frozen in February 2020 on archival film in a decommissioned coal mine in Svalbard, Norway.</li>
  <li><strong>An unspecified number of CI/CD pipelines</strong>: Your packages are installed approximately 900 times per week in automated build environments. Each of these environments logs the installation. We do not control these logs, nor, as far as we can determine, does anyone else.</li>
  <li><strong>An unknown number of software bills of materials</strong>: Under Executive Order 14028, federal software suppliers are required to produce SBOMs listing all components. Your package <code class="language-plaintext highlighter-rouge">buttplug</code> is listed as a transitive dependency in an estimated 340 SBOMs submitted as federal records to US government agencies.</li>
</ul>

<h3 id="4-retention-periods">4. Retention Periods</h3>

<ul>
  <li><strong>Account data</strong>: For the lifetime of your account, plus 7 years after deletion, plus the remaining useful life of physical backup media.</li>
  <li><strong>Package data</strong>: Indefinitely. npm’s contract with the ecosystem is that published packages are permanent. Un-publishing is technically possible but discouraged since 2016.</li>
  <li><strong>Behavioral data</strong>: 24 months in our primary database, after which it is moved to cold storage, where it remains queryable.</li>
  <li><strong>node_modules inventories</strong>: We do not have a retention policy for this data because we did not realize we were collecting it.</li>
</ul>

<h3 id="5-your-rights">5. Your Rights</h3>

<ul>
  <li><strong>Right of access</strong>: You are exercising this right now.</li>
  <li><strong>Right to rectification</strong>: You may request correction of inaccurate data. If you would like us to update the OS username in the leaked <code class="language-plaintext highlighter-rouge">package.json</code>, please note that this would require modifying a published package, which would break the integrity hash, which would cause <code class="language-plaintext highlighter-rouge">npm audit</code> to flag it as tampered, which would generate security advisories for the 14 downstream consumers, one of whom has mirrored it to a public Git repository. We advise against rectification at this time.</li>
  <li><strong>Right to erasure</strong>: You may request deletion of your personal data where there is no compelling reason for its continued processing. We believe there is a compelling reason: <code class="language-plaintext highlighter-rouge">buttplug</code> has 1,247 direct dependents, including 3 production banking applications. Deleting your account would remove it from the registry, breaking its dependents, their dependents, and so on until an estimated 0.003% of the JavaScript ecosystem fails to build. Our legal team considers this a compelling reason.</li>
  <li><strong>Right to data portability</strong>: You may request your data in a structured, commonly used, machine-readable format. We have prepared your data as a 2.7 GB JSON file, available for download at a pre-signed URL that expires in 7 days.</li>
  <li><strong>Right to object</strong>: You may object to processing based on legitimate interest. If you object to our construction of the global dependency graph, your objection will be noted in the graph.</li>
</ul>

<h3 id="6-automated-decision-making">6. Automated Decision-Making</h3>

<ul>
  <li><strong>Trust score</strong>: Our system has assigned you a trust score of 72 out of 100, based on account age, publishing frequency, two-factor authentication status, and whether you have ever mass-transferred package ownership to a stranger. The platform average is 64. The scoring methodology is proprietary.</li>
  <li><strong>Bus factor assessment</strong>: Our system has determined that <code class="language-plaintext highlighter-rouge">buttplug</code> has a bus factor of 1: You are driving the bus. This assessment has been shared with downstream maintainers who have opted into critical dependency notifications.</li>
</ul>

<h3 id="7-international-transfers">7. International Transfers</h3>

<ul>
  <li><strong>United States</strong>: Where our servers are located. This transfer is covered by the EU-US Data Privacy Framework, which replaced Privacy Shield, which replaced Safe Harbor.</li>
  <li><strong>47 additional countries</strong>: Your published packages are distributed via a global CDN. We cannot enumerate which edge nodes have cached your <code class="language-plaintext highlighter-rouge">package.json</code> at any given time. The full list of jurisdictions is included in Appendix K.</li>
</ul>

<hr />

<p>If you have questions about this response, please contact our Data Protection Officer at dpo@npmjs.com. Please allow 30 days for a reply. If our response requires querying the dependency graph, please allow 47 additional days.</p>

<p>Yours faithfully,</p>

<p>Data Protection Officer<br />
npm, Inc.<br />
A subsidiary of GitHub, Inc.<br />
A subsidiary of Microsoft Corporation</p>

<p><strong>Enclosures:</strong><br />
Appendix A: Account metadata (3 pages)<br />
Appendix B: Publishing history including retracted packages (7 pages)<br />
Appendix C: Behavioral telemetry (41 pages)<br />
Appendix D: Dependency graph, your packages only (28 pages)<br />
Appendix E: Dependency graph for <code class="language-plaintext highlighter-rouge">buttplug</code>, including transitive dependents (119 pages)<br />
Appendix F: npm audit output (84 pages)<br />
Appendix G: Download logs (31 pages)<br />
Appendix H: IP address history with geolocation (6 pages)<br />
Appendix J: node_modules inventory, deduplicated (412 pages)<br />
Appendix K: List of jurisdictions (2 pages)</p>

<p><em>Total enclosures: 743 pages</em><br />
<em>Format: JSON</em></p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="package-managers" /><category term="npm" /><category term="satire" /><summary type="html"><![CDATA[A response to a GDPR data subject access request.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">xkcd 2347</title><link href="https://nesbitt.io/2026/02/27/xkcd-2347.html" rel="alternate" type="text/html" title="xkcd 2347" /><published>2026-02-27T10:00:00+00:00</published><updated>2026-02-27T10:00:00+00:00</updated><id>https://nesbitt.io/2026/02/27/xkcd-2347</id><content type="html" xml:base="https://nesbitt.io/2026/02/27/xkcd-2347.html"><![CDATA[<p>I made an <a href="https://nesbitt.io/xkcd-2347/">interactive version</a> of <a href="https://xkcd.com/2347/">xkcd 2347</a>, the dependency comic, where you can drag blocks out of the tower and watch everything above them collapse.</p>

<p><img src="/images/xkcd.gif" alt="xkcd 2347 interactive game" /></p>

<p><a href="https://brm.io/matter-js/">Matter.js</a> handles the physics and <a href="https://roughjs.com/">Rough.js</a> gives it the hand-drawn xkcd look. Each reload generates a different tower from a seeded PRNG that picks a taper profile, varies the block sizes and row widths, and drifts the whole thing slightly off-center as it goes up. The project names are randomly assembled from parts that sound like real packages – things like <code class="language-plaintext highlighter-rouge">node-flux.js</code> or <code class="language-plaintext highlighter-rouge">libcrypt-fast</code> or <code class="language-plaintext highlighter-rouge">hyper-mux@3.12.7</code> – though about one in five times you’ll get an actual name like left-pad or log4j instead. Reload enough times and you might run into some unusual tower shapes, and the <a href="https://en.wikipedia.org/wiki/Konami_Code">Konami code</a> does what you’d hope.</p>

<p>The info button shows the tower’s seed, which you can share as a <code class="language-plaintext highlighter-rouge">?seed=</code> URL parameter, basically a way to say “look at this disaster” and have someone else see the exact same precarious arrangement.</p>

<p>Some ways this could go further:</p>

<ul>
  <li>Upload an SBOM and build the tower from your actual dependency tree, with block sizes based on how many other packages depend on each one</li>
  <li>Pull real dependency data from <a href="https://ecosyste.ms">ecosyste.ms</a> so you can see what your project’s tower looks like before you start pulling blocks out</li>
  <li>Use the phone’s accelerometer to let you tilt and topple the tower</li>
</ul>

<p><a href="https://github.com/andrew/nesbitt.io/tree/master/xkcd-2347">Source on GitHub</a>.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="dependencies" /><category term="open-source" /><summary type="html"><![CDATA[An interactive version of the dependency comic.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Git in Postgres</title><link href="https://nesbitt.io/2026/02/26/git-in-postgres.html" rel="alternate" type="text/html" title="Git in Postgres" /><published>2026-02-26T10:00:00+00:00</published><updated>2026-02-26T10:00:00+00:00</updated><id>https://nesbitt.io/2026/02/26/git-in-postgres</id><content type="html" xml:base="https://nesbitt.io/2026/02/26/git-in-postgres.html"><![CDATA[<p>In December I wrote about <a href="/2025/12/24/package-managers-keep-using-git-as-a-database.html">package managers using git as a database</a>, and how Cargo’s index, Homebrew’s taps, Go’s module proxy, and CocoaPods’ Specs repo all hit the same wall once their access patterns outgrew what a git repo is designed for.</p>

<p><a href="https://github.com/Homebrew/homebrew-core">homebrew-core</a> has one Ruby file per package formula, and every <code class="language-plaintext highlighter-rouge">brew update</code> used to clone or fetch the whole repository until it got large enough that <a href="https://github.com/Homebrew/brew/pull/9383">GitHub explicitly asked them to stop</a>. Homebrew 4.0 switched to downloading a JSON file over HTTP, because users wanted the current state of a package rather than its commit history. But updating a formula still means opening a pull request against homebrew-core, because git is where the collaboration tooling lives. Instead of using git as a database, what if you used a database as a git?</p>

<p>A git repository is a content-addressable object store where objects go in indexed by the SHA1 of their content, plus a set of named references pointing at specific objects by hash. The on-disk format (loose objects as individual files, packfiles as delta-compressed archives with a separate index, a ref store split between a directory of files and a packed-refs flat file with a locking protocol that breaks on NFS) is an implementation detail. The protocol for synchronising objects and refs between repositories is what actually matters, and since git-the-program is just one implementation of it, you can swap the storage backend without clients noticing.</p>

<p>The whole data model fits in two tables:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">objects</span> <span class="p">(</span>
    <span class="n">repo_id</span>  <span class="nb">integer</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">oid</span>      <span class="n">bytea</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">type</span>     <span class="nb">smallint</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">size</span>     <span class="nb">integer</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">content</span>  <span class="n">bytea</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">repo_id</span><span class="p">,</span> <span class="n">oid</span><span class="p">)</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">refs</span> <span class="p">(</span>
    <span class="n">repo_id</span>  <span class="nb">integer</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">name</span>     <span class="nb">text</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">oid</span>      <span class="n">bytea</span><span class="p">,</span>
    <span class="n">symbolic</span> <span class="nb">text</span><span class="p">,</span>
    <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">repo_id</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<p>An object’s OID is computed the same way git does it, <code class="language-plaintext highlighter-rouge">SHA1("&lt;type&gt; &lt;size&gt;\0&lt;content&gt;")</code>, using pgcrypto’s <code class="language-plaintext highlighter-rouge">digest()</code> function, and refs get compare-and-swap updates through <code class="language-plaintext highlighter-rouge">SELECT FOR UPDATE</code>. A libgit2 backend registers these tables as its storage layer, and if the protocol really is separable from the format, a normal git client should be able to push to and clone from a Postgres database without knowing the difference.</p>

<p>To test this I built <a href="https://github.com/andrew/gitgres">gitgres</a>, about 2,000 lines of C implementing the libgit2 <code class="language-plaintext highlighter-rouge">git_odb_backend</code> and <code class="language-plaintext highlighter-rouge">git_refdb_backend</code> interfaces against Postgres through libpq, plus roughly 200 lines of PL/pgSQL for the storage functions. libgit2 handles pack negotiation, delta resolution, ref advertisement, and the transport protocol while the backend reads and writes against the two tables, and a git remote helper (<code class="language-plaintext highlighter-rouge">git-remote-gitgres</code>) lets you add a Postgres-backed remote to any repo and push or clone with a normal git client that has no idea it’s talking to a database. There’s a Dockerfile in the repo if you want to try it out without building libgit2 and libpq from source.</p>

<p>The objects table contains the same bytes git would store on disk, and a set of SQL functions parse them into tree entries, commit metadata, and parent links that you can join against like any other table.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">r</span><span class="p">.</span><span class="n">name</span> <span class="k">AS</span> <span class="n">repo</span><span class="p">,</span> <span class="k">c</span><span class="p">.</span><span class="n">author_name</span><span class="p">,</span> <span class="k">c</span><span class="p">.</span><span class="n">authored_at</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">title</span> <span class="k">AS</span> <span class="n">issue</span>
<span class="k">FROM</span> <span class="n">commits</span> <span class="k">c</span>
<span class="k">JOIN</span> <span class="n">repositories</span> <span class="n">r</span> <span class="k">ON</span> <span class="n">r</span><span class="p">.</span><span class="n">id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">repo_id</span>
<span class="k">JOIN</span> <span class="n">issues</span> <span class="n">i</span> <span class="k">ON</span> <span class="n">i</span><span class="p">.</span><span class="n">repo_id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">repo_id</span>
  <span class="k">AND</span> <span class="k">c</span><span class="p">.</span><span class="n">message</span> <span class="k">ILIKE</span> <span class="s1">'%#'</span> <span class="o">||</span> <span class="n">i</span><span class="p">.</span><span class="k">index</span> <span class="o">||</span> <span class="s1">'%'</span>
<span class="k">WHERE</span> <span class="k">c</span><span class="p">.</span><span class="n">authored_at</span> <span class="o">&gt;</span> <span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">interval</span> <span class="s1">'30 days'</span><span class="p">;</span>
</code></pre></div></div>

<p>That query joins git commit data against Forgejo’s issue tracker, something that currently requires fetching commits through <code class="language-plaintext highlighter-rouge">git log</code>, pattern-matching issue references in application code, and then querying the database for the matching issues. With both sides in Postgres it’s one query.</p>

<h3 id="forgejo">Forgejo</h3>

<p>A self-hosted Forgejo or Gitea instance is really two systems bolted together: a web application backed by Postgres, and a collection of bare git repositories on the filesystem. Anything that needs to show git data in the web UI has to shell out to the binary and parse text, which is why something as straightforward as a blame view requires spawning a subprocess rather than running a query. If the git data lived in the same Postgres instance as everything else, that boundary disappears.</p>

<p>Forgejo stores issues, pull requests, users, permissions, webhooks, branch protection rules, and CI status in Postgres already, and git repositories are the one thing left on the filesystem, forcing every deployment to coordinate backups between them, and the two systems scale and fail in different ways. The codebase already shows the strain: Forgejo mirrors branch metadata from git into its own database tables (<code class="language-plaintext highlighter-rouge">models/git/branch.go</code>) so it can query branches without shelling out to git every time.</p>

<p>All git interaction goes through <code class="language-plaintext highlighter-rouge">modules/git</code>, about 15,000 lines of Go that shells out to the <code class="language-plaintext highlighter-rouge">git</code> binary and parses text output. With git data in Postgres, reading an object becomes <code class="language-plaintext highlighter-rouge">SELECT content FROM objects WHERE oid = $1</code> on the database connection Forgejo already holds, and walking commit history is a query against a materialized view rather than spawning <code class="language-plaintext highlighter-rouge">git log</code>.</p>

<p>The deployment collapses to a single Postgres instance where <code class="language-plaintext highlighter-rouge">pg_dump</code> backs up forge metadata, git objects, and user data together, and replicas handle read scaling for the web UI without NFS mounts or a Gitaly-style RPC layer. The path there is a Forgejo fork replacing <code class="language-plaintext highlighter-rouge">modules/git</code> with a package that queries Postgres, where <code class="language-plaintext highlighter-rouge">Repository</code> holds a database connection and repo_id instead of a filesystem path and <code class="language-plaintext highlighter-rouge">Commit</code>, <code class="language-plaintext highlighter-rouge">Tree</code>, <code class="language-plaintext highlighter-rouge">Blob</code> become thin wrappers around query results.</p>

<h3 id="postgres">Postgres</h3>

<p>Postgres has its own primitives for things that forges currently build custom infrastructure around. A trigger on the refs table firing <code class="language-plaintext highlighter-rouge">NOTIFY</code> means any connected client learns about a push the moment it happens, which is how forges normally end up building a custom webhook polling layer. Multi-tenant repo isolation becomes a database concern through row-level security on the objects and refs tables, and logical replication lets you selectively stream repositories across Postgres instances, a kind of partial mirroring that filesystem-based git can’t do. Commit graph traversal for ancestry queries and merge-base computation falls to recursive CTEs, and <code class="language-plaintext highlighter-rouge">pg_trgm</code> indexes on blob content give you substring search across all repositories without standing up a separate search index.</p>

<h3 id="diff-merge-blame">Diff, merge, blame</h3>

<p>Content-level diffs, three-way merge, and blame stay in libgit2 rather than being reimplemented in SQL, since libgit2 already has that support and works against the Postgres backends through cgo bindings. The Forgejo fork would be “replace <code class="language-plaintext highlighter-rouge">modules/git</code> with libgit2 backed by Postgres” rather than “replace <code class="language-plaintext highlighter-rouge">modules/git</code> with raw SQL,” because the read-side queries only cover the simple cases and anything involving content comparison or graph algorithms still needs libgit2 doing the work with Postgres as its storage layer. That’s a meaningful dependency to carry, though libgit2 is well-maintained and already used in production by the Rust ecosystem and various GUI clients. SQL implementations of some of this using recursive CTEs would be interesting to try eventually but aren’t needed to get a working forge. The remaining missing piece is the server-side pack protocol: the remote helper covers the client side, but a Forgejo integration also needs a server that speaks <code class="language-plaintext highlighter-rouge">upload-pack</code> and <code class="language-plaintext highlighter-rouge">receive-pack</code> against Postgres, either through libgit2’s transport layer or a Go implementation that queries the objects table directly.</p>

<h3 id="storage">Storage</h3>

<p>Git packfiles use delta compression, storing only the diff when a 10MB file changes by one line, while the objects table stores each version in full. A file modified 100 times takes about 1GB in Postgres versus maybe 50MB in a packfile. Postgres does TOAST and compress large values, but that’s compressing individual objects in isolation, not delta-compressing across versions the way packfiles do, so the storage overhead is real. A delta-compression layer that periodically repacks objects within Postgres, or offloads large blobs to S3 the way LFS does, is a natural next step. For most repositories it still won’t matter since the median repo is small and disk is cheap, and GitHub’s Spokes system made a similar trade-off years ago, storing three full uncompressed copies of every repository across data centres because redundancy and operational simplicity beat storage efficiency even at hundreds of exabytes.</p>

<p>gitgres is a neat hack right now, but if open source hosting keeps moving toward federation and decentralization, with ForgeFed, Forgejo’s federation work, and more people running small instances for their communities, the operational simplicity of a single-Postgres deployment matters more than raw storage efficiency. Getting from a handful of large forges to a lot of small ones probably depends on a forge you can stand up with <code class="language-plaintext highlighter-rouge">docker compose up</code> and back up with <code class="language-plaintext highlighter-rouge">pg_dump</code>, and that’s a lot easier when there’s no filesystem of bare repos to manage alongside the database.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="git" /><category term="postgres" /><summary type="html"><![CDATA[Instead of using git as a database, what if you used a database as a git?]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Two Kinds of Attestation</title><link href="https://nesbitt.io/2026/02/25/two-kinds-of-attestation.html" rel="alternate" type="text/html" title="Two Kinds of Attestation" /><published>2026-02-25T00:00:00+00:00</published><updated>2026-02-25T00:00:00+00:00</updated><id>https://nesbitt.io/2026/02/25/two-kinds-of-attestation</id><content type="html" xml:base="https://nesbitt.io/2026/02/25/two-kinds-of-attestation.html"><![CDATA[<p>The word “attestation” now means two unrelated things in open source, and the people using it in each sense don’t seem to be talking to each other much.</p>

<p><a href="https://github.blog/security/supply-chain-security/introducing-npm-package-provenance/">npm</a> and <a href="https://blog.pypi.org/posts/2024-11-14-pypi-now-supports-digital-attestations/">PyPI</a> have both shipped build provenance attestations using <a href="https://www.sigstore.dev/">Sigstore</a> over the past couple of years. When you publish a package from GitHub Actions with trusted publishing configured, the CI environment signs an <a href="https://in-toto.io/">in-toto</a> attestation binding the artifact to the source repository, commit, and workflow that built it, and the signature goes into a public transparency log that anyone downstream can verify without trusting the registry. PyPI has had this on by default for trusted publishers since late 2024, npm generates provenance automatically, and the cost to publishers is close to zero. I wrote about how this fits into the broader <a href="/2026/02/24/reproducible-builds-in-language-package-managers/">reproducible builds</a> picture recently.</p>

<p>Meanwhile the <a href="https://digital-strategy.ec.europa.eu/en/library/cyber-resilience-act">EU Cyber Resilience Act</a>, which grew out of product safety regulation originally written for things like toasters, introduced “open source stewards” as a legal concept, and Article 25 gives the <a href="https://commission.europa.eu/">European Commission</a> power to create voluntary security attestation programmes for them. At FOSDEM this year, Æva Black <a href="https://fosdem.org/2026/schedule/event/PTHENV-sustaining-foss-with-attestations/">presented work with the Eclipse Foundation</a> on what such a programme might look like. The proposed model has manufacturers funding stewards who issue attestations about the projects they support, with a tiered approach where the light tier asks whether a project has functional tests, a vulnerability reporting contact, and an end-of-life policy. Æva noted a maintainer could fill it out in minutes. So this is a checklist about project hygiene, filled out by a human, attesting to things like whether a CONTRIBUTING.md exists, which has almost nothing in common with a cryptographic proof logged in a transparency ledger except that both are called attestations.</p>

<p>Madalin Neag at OpenSSF <a href="https://openssf.org/blog/2026/01/21/preserving-open-source-sustainability-while-advancing-cybersecurity-compliance/">wrote an excellent piece</a> in January working through the details of how steward attestations relate to the projects they cover, since stewards don’t control technical decisions or releases, and a point-in-time attestation may not reflect the state of a component by the time a manufacturer integrates it. These are the kind of design questions that need working out as the delegated act takes shape.</p>

<p>This isn’t the first time naming has caused confusion at the boundary between open source and compliance. The CRA itself calls anyone who places software on the EU market a “manufacturer,” which is product safety language from the world of toasters and power tools. Daniel Stenberg <a href="https://daniel.haxx.se/blog/2022/01/24/logj4-security-inquiry-response-required/">got a taste of what that framing produces</a> when a company sent him a compliance questionnaire demanding he account for Log4j in curl, treating him as a vendor with SLA obligations for a project that has never used Java.</p>

<p>Both SPDX and CycloneDX have a “supplier” field for each component, and SBOM generators routinely fill it with the maintainer’s name, even though the maintainer has no contractual relationship with the consumer and <a href="https://www.softwaremaxims.com/blog/not-a-supplier">is not a supplier</a> in any commercial sense. These words carry legal connotations that don’t match the relationships they’re describing, and now that they’re codified in standards and regulation they’re difficult to undo.</p>

<p>What I keep thinking about is the maintainer who enables trusted publishing, whose CI generates Sigstore provenance on every release, and who then gets contacted by a foundation about a CRA attestation programme asking them to fill out a form about whether they have a security policy. The cryptographic attestation infrastructure already exists and already generates machine-verifiable supply chain metadata at scale, and continuous signals like <a href="https://docs.oasis-open.org/csaf/csaf/v2.0/csaf-v2.0.html">CSAF</a> advisories and <a href="https://www.cisa.gov/sites/default/files/2023-04/minimum-requirements-for-vex-508c.pdf">VEX</a> documents provide ongoing security posture rather than point-in-time snapshots. The Article 25 delegated act hasn’t been written yet and the Commission is still taking input. It would be nice if the two communities compared notes before then, if only so that maintainers don’t end up navigating two unrelated things with the same name.</p>

<p>Naming is hard, but it matters more than usual when the names carry assumptions about what’s actually in place. “Attested” sounds rigorous whether or not it is, and “supplier” implies a contractual relationship that doesn’t exist. Once these words are in standards and regulations, people downstream build processes around what they think the words mean, and unpicking those assumptions later is much harder than getting the names right in the first place. Toaster regulations at least have the advantage that everyone agrees on what a toaster is.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="security" /><category term="open-source" /><category term="policy" /><summary type="html"><![CDATA[The oldest problem in computer science, but with toasters.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reproducible Builds in Language Package Managers</title><link href="https://nesbitt.io/2026/02/24/reproducible-builds-in-language-package-managers.html" rel="alternate" type="text/html" title="Reproducible Builds in Language Package Managers" /><published>2026-02-24T10:00:00+00:00</published><updated>2026-02-24T10:00:00+00:00</updated><id>https://nesbitt.io/2026/02/24/reproducible-builds-in-language-package-managers</id><content type="html" xml:base="https://nesbitt.io/2026/02/24/reproducible-builds-in-language-package-managers.html"><![CDATA[<p>You download a package from a registry and the registry says it was built from a particular git commit, but the tarball or wheel or crate you received is an opaque artifact that someone built on their machine and uploaded. Reproducible builds let you check by rebuilding from source yourself and comparing, and if you get the same bytes, the artifact is what it claims to be. Making this work requires controlling both the build environment and the provenance of artifacts, and most language package managers historically controlled neither.</p>

<p>The <a href="https://reproducible-builds.org/">Reproducible Builds</a> project has been working on this since 2013, when Lunar (Jérémy Bobbio) organized a session at DebConf13 and began patching Debian’s build tooling. The Snowden disclosures had made software trust an urgent concern, Bitcoin’s Gitian builder had shown the approach was viable for a single project, and the Tor Project had begun producing deterministic builds of Tor Browser. Lunar wanted to apply the same thinking to an entire operating system.</p>

<p>The first mass rebuild of Debian packages in September 2013 found that 24% were reproducible, and by January 2014, after fixing the lowest-hanging fruit in dpkg and common build helpers, that jumped to 67%. Today Debian’s <a href="https://tests.reproducible-builds.org/">testing infrastructure</a> shows around 96% of packages in trixie building reproducibly under controlled conditions, while <a href="https://reproduce.debian.net/">reproduce.debian.net</a> runs a stricter test by rebuilding the actual binaries that ftp.debian.org distributes rather than clean-room test builds.</p>

<p>The project grew into a cross-distribution effort as Arch Linux, NixOS, GNU Guix, FreeBSD, and others joined over the following years. Summits have been held most years since 2015, most recently in Vienna in October 2025. Chris Lamb, who served as Debian Project Leader from 2017 to 2019, co-authored <a href="https://arxiv.org/abs/2104.06020">an IEEE Software paper</a> on the project that won Best Paper for 2022. Lunar passed away in November 2024. The project’s <a href="https://reproducible-builds.org/reports/">weekly reports</a>, published continuously since 2015, give a sense of the scale of work involved: each one lists patches sent to individual upstream packages fixing timestamps, file ordering, path embedding, locale sensitivity, one package at a time, hundreds of packages a year. Getting from 24% to 96% was not a single architectural fix but a decade of this kind of janitorial patching across the entire Debian archive.</p>

<h3 id="how-verification-works">How verification works</h3>

<p>You build the same source twice in different environments and compare the output, and if the bytes match, nobody tampered with the artifact between source and distribution. In practice this requires recording everything about the build environment, which Debian does with <code class="language-plaintext highlighter-rouge">.buildinfo</code> files capturing exact versions of all build dependencies, architecture, and build flags. A verifier retrieves the source, reconstructs the environment using tools like <code class="language-plaintext highlighter-rouge">debrebuild</code>, builds the package, and compares SHA256 hashes against the official binary.</p>

<p>When hashes don’t match, <a href="https://diffoscope.org/">diffoscope</a> is how you find out why. Originally written by Lunar as <code class="language-plaintext highlighter-rouge">debbindiff</code>, it recursively unpacks archives, decompiles binaries, and shows you exactly where two builds diverge across hundreds of file formats: ZIP, tar, ELF, PE, Mach-O, PDF, SQLite, Java class files, Android APKs. Feed it two JARs that should be identical and it’ll dig through the archive, into individual class files, into the bytecode, and show you that one has a timestamp from Tuesday and the other from Wednesday.</p>

<p>The project also maintains <a href="https://salsa.debian.org/reproducible-builds/strip-nondeterminism"><code class="language-plaintext highlighter-rouge">strip-nondeterminism</code></a> for removing non-deterministic metadata from archives after the fact, and <a href="https://salsa.debian.org/reproducible-builds/reprotest"><code class="language-plaintext highlighter-rouge">reprotest</code></a>, which builds packages under deliberately varied conditions (different timezones, user IDs, locales, hostnames, file ordering) to flush out hidden assumptions.</p>

<h3 id="what-makes-builds-non-reproducible">What makes builds non-reproducible</h3>

<p>Benedetti et al. tested 4,000 packages from each of six ecosystems using <code class="language-plaintext highlighter-rouge">reprotest</code> for their ICSE 2025 paper <a href="http://www.cs.cmu.edu/~ckaestne/pdf/icse25_rb.pdf">“An Empirical Study on Reproducible Packaging in Open-Source Ecosystems”</a>, varying time, timezone, locale, file ordering, umask, and kernel version between builds. Cargo and npm scored 100% reproducible out of the box because both package managers hard-code fixed values in archive metadata, eliminating nondeterminism at the tooling level. PyPI managed 12.2%, limited to packages using the <code class="language-plaintext highlighter-rouge">flit</code> or <code class="language-plaintext highlighter-rouge">hatch</code> build backends which fix archive metadata the same way. Maven came in at 2.1%, and RubyGems at 0%.</p>

<p>The dominant cause across all three failing ecosystems was timestamps embedded in the package archive, responsible for 97.1% of RubyGems failures, 92.4% of Maven failures, and 87.7% of PyPI failures. The standard fix is <code class="language-plaintext highlighter-rouge">SOURCE_DATE_EPOCH</code>, an environment variable defined by the Reproducible Builds project in 2015, containing a Unix timestamp that build tools should use instead of the current time. GCC, Clang, CMake, Sphinx, man-db, dpkg, and many other tools now honour it, but it’s opt-in, so any build tool that doesn’t check the variable just uses the current time.</p>

<p>Most of this turned out to be fixable with infrastructure changes rather than per-package work. Simply configuring <code class="language-plaintext highlighter-rouge">SOURCE_DATE_EPOCH</code> brought Maven from 2.1% to 92.6% and RubyGems from 0% to 97.1%, and small patches to the package manager tools addressing umask handling, file ordering, and locale issues pushed PyPI to 98% and RubyGems to 99.9%. The packages that remained unreproducible were ones running arbitrary code during the build, like <code class="language-plaintext highlighter-rouge">setup.py</code> scripts calling <code class="language-plaintext highlighter-rouge">os.path.expanduser</code> or gemspecs using <code class="language-plaintext highlighter-rouge">Time.now</code> in version strings, which no amount of tooling can fix because the nondeterminism is in the package author’s code.</p>

<p>File ordering causes similar problems because <code class="language-plaintext highlighter-rouge">readdir()</code> returns entries in filesystem-dependent order (hash-based on ext4, lexicographic on APFS, insertion order on tmpfs) and tar and zip tools faithfully preserve whatever order they’re given. The project built <a href="https://salsa.debian.org/reproducible-builds/disorderfs">disorderfs</a>, a FUSE filesystem overlay that deliberately shuffles directory entries to expose ordering bugs during testing. Absolute paths get embedded in compiler debug info and source location macros, so a binary built in <code class="language-plaintext highlighter-rouge">/home/alice/project</code> differs from one built in <code class="language-plaintext highlighter-rouge">/home/bob/project</code>. Archive metadata carries UIDs, GIDs, and permissions. Locale differences change output encoding. Parallel builds produce output in nondeterministic order, and any single unfixed source is enough to make the whole build non-reproducible.</p>

<h3 id="go">Go</h3>

<p>Since Go 1.21 in August 2023, the toolchain produces bit-for-bit identical output regardless of the host OS, architecture, or build time, after Russ Cox’s team <a href="https://go.dev/blog/rebuild">eliminated ten distinct sources of nondeterminism</a> including map iteration order, embedded source paths, file metadata in archives, and ARM floating-point mode defaults.</p>

<p>Go runs nightly verification at <a href="https://go.dev/rebuild">go.dev/rebuild</a> using <a href="https://pkg.go.dev/golang.org/x/build/cmd/gorebuild"><code class="language-plaintext highlighter-rouge">gorebuild</code></a>, and Andrew Ayer has <a href="https://www.agwa.name/blog/post/verifying_go_reproducible_builds">independently verified</a> over 2,672 Go toolchain builds with every one matching. The Go Checksum Database at sum.golang.org adds a transparency log so that even if a module author modifies a published version, the ecosystem detects it. Anything that calls into C via cgo reintroduces the host C toolchain as a build input and all the nondeterminism that comes with it, but pure Go code is genuinely reproducible across platforms and over time.</p>

<h3 id="maven">Maven</h3>

<p>Maven’s <a href="https://maven.apache.org/guides/mini/guide-reproducible-builds.html">official guide</a> documents the steps: set <code class="language-plaintext highlighter-rouge">project.build.outputTimestamp</code> in <code class="language-plaintext highlighter-rouge">pom.xml</code>, upgrade all plugins to versions that respect it, verify with <code class="language-plaintext highlighter-rouge">mvn clean verify artifact:compare</code>. Maven 4.0.0-beta-5 enables reproducible mode by default, and <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74682318">Reproducible Central</a> maintains a list of independently verified releases.</p>

<p>The timestamp only works if every plugin in the chain respects it, though, and many third-party plugins don’t. Different JDK versions produce different bytecode, ZIP entry ordering varies by implementation, and Maven builds are assembled from dozens of plugins that each introduce their own potential nondeterminism. Researchers built <a href="https://arxiv.org/html/2509.08204v1">Chains-Rebuild</a> to canonicalize six root causes of Java build unreproducibility, which gives a sense of how many separate things can go wrong in a single build system.</p>

<h3 id="cargo">Cargo</h3>

<p>Rust’s <a href="https://rust-lang.github.io/rfcs/3127-trim-paths.html">RFC 3127</a> introduced <code class="language-plaintext highlighter-rouge">trim-paths</code>, which remaps absolute filesystem paths out of compiled binaries and is now the default in release builds, replacing paths like <code class="language-plaintext highlighter-rouge">/home/alice/.cargo/registry/src/crates.io-abc123/serde-1.0.200/src/lib.rs</code> with <code class="language-plaintext highlighter-rouge">serde-1.0.200/src/lib.rs</code>. Embedded paths were the most common source of non-reproducibility in Rust binaries, and the <a href="https://docs.rs/cargo-repro"><code class="language-plaintext highlighter-rouge">cargo-repro</code></a> tool lets you rebuild and compare crates byte-for-byte to check for remaining issues.</p>

<p>Procedural macros and build scripts (<code class="language-plaintext highlighter-rouge">build.rs</code>) remain a gap since they can do anything at build time: read environment variables, call system tools, generate code based on the hostname. The <code class="language-plaintext highlighter-rouge">cc</code> crate, used to compile bundled C code, reintroduces the same C-toolchain nondeterminism that cgo does for Go.</p>

<h3 id="pypi">PyPI</h3>

<p>The Benedetti et al. study found only 12.2% of PyPI packages reproducible out of the box, and the split came down to build backend: packages using <code class="language-plaintext highlighter-rouge">flit</code> or <code class="language-plaintext highlighter-rouge">hatch</code> were reproducible because those backends fix archive metadata the way Cargo and npm do, while packages using <code class="language-plaintext highlighter-rouge">setuptools</code> (still the majority) were not. With patches to address umask handling and archive metadata the number reached 98%, with the remaining 2% coming from packages running arbitrary code in <code class="language-plaintext highlighter-rouge">setup.py</code> or <code class="language-plaintext highlighter-rouge">pyproject.toml</code> build hooks.</p>

<p>PyPI has also moved further than most registries on attestations through <a href="https://peps.python.org/pep-0740/">PEP 740</a>, shipped in October 2024, which adds support for Sigstore-signed digital attestations uploaded alongside packages. These link each artifact to the OIDC identity that produced it, so combined with trusted publishing, PyPI can record that a package was built in a specific CI workflow from a specific commit with a cryptographic signature binding artifact to source.</p>

<h3 id="rubygems">RubyGems</h3>

<p>RubyGems 3.6.7 made the gem building process <a href="https://blog.rubygems.org/2025/04/25/march-rubygems-updates.html">more reproducible by default</a>, setting a default <code class="language-plaintext highlighter-rouge">SOURCE_DATE_EPOCH</code> value and sorting metadata in gemspecs so that building the same gem twice produces the same <code class="language-plaintext highlighter-rouge">.gem</code> file without special configuration. Individual gems can still have their own nondeterminism, native extensions like nokogiri compile against host system libraries with all the usual C-toolchain variation, and there’s no independent rebuild verification infrastructure for RubyGems.</p>

<h3 id="npm">npm</h3>

<p>The npm registry accepts arbitrary tarballs with no connection to source, no build provenance, and no way to independently rebuild a package and compare it against what’s published. <code class="language-plaintext highlighter-rouge">package-lock.json</code> and <code class="language-plaintext highlighter-rouge">npm ci</code> give you dependency pinning and integrity hashes that confirm the tarball hasn’t changed since publication, but that says nothing about whether it matches any particular source commit.</p>

<h3 id="homebrew">Homebrew</h3>

<p>Homebrew distributes prebuilt binaries called bottles, built on GitHub Actions and hosted as GitHub release artifacts. The project has a <a href="https://docs.brew.sh/Reproducible-Builds">reproducible builds page</a> documenting the mechanisms available to formula authors: <code class="language-plaintext highlighter-rouge">SOURCE_DATE_EPOCH</code> is set automatically during builds, build paths are replaced with placeholders like <code class="language-plaintext highlighter-rouge">@@HOMEBREW_PREFIX@@</code> during bottle creation, and helpers like <code class="language-plaintext highlighter-rouge">Utils::Gzip.compress</code> produce deterministic gzip output. There’s no systematic testing of what percentage of bottles actually rebuild identically, though.</p>

<p>Since Homebrew 4.3.0 in May 2024, every bottle comes with a Sigstore-backed attestation linking it to the specific GitHub Actions workflow that built it, meeting SLSA Build Level 2 requirements. Users can verify attestations by setting <code class="language-plaintext highlighter-rouge">HOMEBREW_VERIFY_ATTESTATIONS=1</code>, though verification isn’t yet the default because it currently depends on the <code class="language-plaintext highlighter-rouge">gh</code> CLI and GitHub authentication while the project waits on <a href="https://github.com/sigstore/sigstore-ruby">sigstore-ruby</a> to mature.</p>

<h3 id="trusted-publishing">Trusted publishing</h3>

<p>Traditionally a maintainer authenticates with an API token, builds on their laptop, and uploads. Trusted publishing replaces that with OIDC tokens from CI so that the registry knows the package was built by a specific GitHub Actions workflow in a specific repository, not just uploaded by someone who had the right credentials.</p>

<p>PyPI <a href="https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/">launched trusted publishing</a> in April 2023, built by Trail of Bits and funded by Google’s Open Source Security Team. RubyGems.org <a href="https://blog.rubygems.org/2023/12/14/trusted-publishing.html">followed in December 2023</a>, npm shipped provenance attestations via Sigstore in 2023 and <a href="https://github.blog/changelog/2025-07-31-npm-trusted-publishing-with-oidc-is-generally-available/">full trusted publishing in July 2025</a>, crates.io launched in July 2025, and NuGet followed in September 2025. Over 25% of PyPI uploads now use it.</p>

<p>Once provenance tells you that a package was built from commit <code class="language-plaintext highlighter-rouge">abc123</code> of <code class="language-plaintext highlighter-rouge">github.com/foo/bar</code> in a specific workflow, anyone can check out that commit and attempt to rebuild, and if the build is reproducible the rebuilt artifact should match the published one. Most of these trusted publishing flows run on GitHub Actions, though, which itself has <a href="/2025/12/06/github-actions-package-manager/">serious problems as a dependency system</a>: no lockfile, no integrity verification, and mutable tags that can change between runs, meaning the build infrastructure that’s supposed to provide provenance guarantees doesn’t have great provenance properties of its own.</p>

<h3 id="googles-oss-rebuild">Google’s OSS Rebuild</h3>

<p><a href="https://github.com/google/oss-rebuild">OSS Rebuild</a>, announced by Google’s Open Source Security Team in July 2025, takes a pragmatic approach to the fact that most builds aren’t bit-for-bit reproducible yet by rebuilding packages from source and performing semantic comparison, normalizing known instabilities like timestamps and file ordering before checking whether the meaningful content matches.</p>

<p>At launch it covered thousands of packages across PyPI, npm, and crates.io, using automation and heuristics to infer build definitions from published metadata, rebuilding in containers, and publishing <a href="https://slsa.dev/">SLSA</a> Level 3 provenance attestations signed via Sigstore. The <code class="language-plaintext highlighter-rouge">stabilize</code> CLI tool handles the normalization by stripping timestamps, reordering archive entries, and removing owner metadata from ZIPs, tars, and wheels. Maven Central, Go modules, and container base images are on the roadmap.</p>

<p>Matthew Suozzo’s <a href="https://fosdem.org/2026/schedule/event/EP8AMW-oss-rebuild-observability/">FOSDEM 2026 talk</a> pushed beyond pure reproducibility into build observability, adding a network proxy for detecting hidden remote dependencies and eBPF-based build tracing to answer not just whether a build can be reproduced but what the build is actually doing at runtime, which is useful independently of whether the output happens to be deterministic.</p>

<h3 id="where-things-stand">Where things stand</h3>

<p>Language package managers are years behind Linux distributions on reproducible builds because Debian controls its build infrastructure and can mandate changes to that environment, while language registries accept uploads from anywhere and historically had no way to know how an artifact was produced. Trusted publishing is shifting that by moving builds from laptops into CI where the registry has visibility into the process, and combined with build provenance and SLSA attestations, this creates conditions where independent verification becomes possible even when the build tooling itself hasn’t caught up. Go got there by making the compiler deterministic, which is the cleanest solution but requires controlling the entire toolchain from the start.</p>]]></content><author><name>Andrew Nesbitt</name><email>andrew@ecosyste.ms</email></author><category term="package-managers" /><category term="security" /><summary type="html"><![CDATA[Verifying that a published package was actually built from the source it claims.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://nesbitt.io/images/boxes.png" /><media:content medium="image" url="https://nesbitt.io/images/boxes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>