Package Versions in Go
Go packages don’t currently have any standard notion of a “release” or a “version”. As Dave Cheney and others point out, this is a problem. Some folks just had a quick chat about that and I promised a longer follow-up. So here it is.
First, let me qualify this a bit. I’m not claiming this is a fully-realized systemic solution to Go’s package-versioning woes. This is basically my hunch for what a good solution might look like. I am claiming this is a design space worth exploring, and that, supposing this really did lead to something significantly better for Go than the status quo in other modern languages, it’s not too late.
So let’s talk about the Amiga.
I’ve never done systems programming on an Amiga, but I’ve heard descriptions of how its linker/loader handles (or used to handle) library versions. It is a bit simpler than the typical “modern” Unix style you see in GNU ld, but seems very practical to me. (I haven’t been able to find a good online reference for this, so perhaps this design is apocryphal. In that case, I still like it, just forget the part where it came from the Amiga.)
In this system, every library has a name, which is a string like “foo” or “bar”, and a version, which is an integer. This is a marked contrast with most other systems, where the version is a string or some more complicated data structure. The Amiga’s rough equivalent of dlopen has a signature like this:
loadLibrary(name string, minVersion int)
The idea there is to be as simple as possible but no simpler. When the process calls this function, the following happens:
- If no library by that name is found, the call fails.
- If a library by that name is found, but has a version less than minVersion, the call fails.
- Otherwise, it succeeds, and it loads whatever version is installed. If you asked for minVersion=3, it could give you version=3 or version=8, or version=112.
This lets library authors evolve their libraries over time, within constraints, and it lets applications adopt new features by specifying whatever minimum version they need.
For this to work, there is a strict social contract that library authors have to abide by: don’t break existing users. The rules are, 1) all changes must be compatible, and 2) increment the version number in every release. You can add new entry points, make performance improvements, fix bugs, and even add new features to existing functions as long as existing calls still work (for example, by adding new valid flag values to a function), but you can’t remove anything or change the ABI or the semantics of existing symbols incompatibly. This is essentially the same as the Go 1 compatibility promise.
So what if your Amiga library does need to make a backwards-incompatible change? After all, it’s not always possible to predict how a library will be used in the future. What once was a good design might prove problematic some day. In that case, you pick a new name. Period. If your library is “foo” you can call it “foo2” or “bar” or whatever else you can think of. It is, mechanically, a whole new library now. Clients then must opt in to your new library by changing their loadLibrary call to specify the new name. They will pick up new versions implicitly and automatically, but they’ll only pick up a new library explicitly and consciously.
There’s a straightforward analogy to apply this to Go. We replace a “library name” with an import path (which corresponds to a particular VCS repo), ABI compatibility with source compatibility, and the version integer sequence with the commit sequence on the repo’s default branch. The only thing not already present today is a way to specify a “minimum” version, which would correspond to a particular commit in the repo’s history. This would have to be put into the tooling.
So in Go, trying to “load a library” would equate to trying to import a package, or perhaps to vendor a package. The client would specify the import path and a minimum commit, and the tool would go find whatever commit it can get its hands on that satisfies those constraints. The social contract (“no incompatible changes”) means the tool is allowed to automate upgrades.
In other words, this lets us answer two very useful questions:
- Given two versions a and b of a package, which one is newer?
- Is it safe to upgrade from a to b automatically?
If it’s important to have an explicit version number, we can take the depth of the commit graph and write it down somewhere. (This doesn’t quite work in the presence of merge commits, but let’s set that aside for the moment.) We can then use this as an identifier to do all sorts of useful things like showing documentation for multiple versions, offering alternative ways to fetch the source (HTTPS tarball rather than VCS), whatever.
Now this might feel unfamiliar and uncomfortable, so I’m sure all sorts of problems are coming to mind. Since you don’t need my help coming up with objections, I’ll just point out a couple of nice properties of this scheme:
It is simple. Sure, we could have an explicit “major version” to document when the interface breaks, but that would be superfluous. The “semver” scheme answers those two questions listed above, but also encodes some more information. As far as I can tell, this extra information serves purely as documentation. It’s possible to express all the same information as in semver using a combination of import path, version integer, and documentation. Those three items are arguably better suited to their tasks than a version string would be, and every repo already has them. Adding more moving parts would be unnecessary complexity.
It aligns incentives better. Changing a package’s name (even a relatively small name change like “foo” to “foo2”) is more costly for the package author than bumping a “major version” number in a version string. This cost is a good thing. Incompatible changes should not be undertaken lightly; they are painful for users. As a package client, I will shed no tears over the increased burden on package maintainers, because this makes it more likely they’ll stick to a compatibile interface and semantics. As a package maintainer myself, I am happy to take on this duty, even though it means more work for me to make a breaking change. As long as I stick to being compatible, it means less work for me, because packaging (which involves assigning version numbers or version strings) is both simpler and easier.
Now, there are lots of unanswered questions here. Have we covered all the essential properties of version identifiers, or are there others? Are there nonessential properties that are nevertheless worth having (and missing here)? What are the downsides of a minimal approach?
In case we do want to reify the commit depth as an explicit version, how do we handle merge commits?
Not every package maintainer wants their clients to see versions at the
granularity of individual commits; it’s entirely reasonable to want a QA
process before making an explicit release. Does that mean we need a separate
mechanism? Is it reasonable for the Go tooling to use the same mechanism (VCS
commits) and ask package maintainers to adapt their workflow (for example, by
doing development on some other branch and making releases by merging or
fast-forwarding onto the default branch used by the Go tools)? Or should the
tooling look for a
release branch and fall back to the default branch if
necessary? Or something else entirely?
Finally, there are endless variations on the details. Details matter. They can make the difference between love and disuse. If something I’ve written above sounds bad, is it because of a fundamental flaw, or because it would work better after a slight change?
There’s always the option just to fall back to what other languages do. Even if this scheme did work, and was better, maybe it’s not better enough to justify the surprise it would entail. Still, I’d like us to spend some time thinking about this and perhaps experimenting, rather than dismissing it out of hand.
Ok, that’s all I have for now. Go ahead and tear it apart. :)