The Bus Factor: Life for Open-Source Projects After a Developer’s Death
You’ve probably never heard of the late Jim Weirich or his software. But you’ve almost certainly used apps built on his work.
Weirich helped create several key tools for Ruby, the popular programming language used to write the code for sites like Hulu, Kickstarter, Twitter, and countless others. His code was open source, meaning that anyone could use it and modify it. “He was a seminal member of the western world’s Ruby community,” says Justin Searls, a Ruby developer and co-founder of the software company Test Double.
When Weirich died in 2014, Searls noticed that no one was maintaining one of Weirich’s software-testing tools. That meant there would be no one to approve changes if other developers submitted bug fixes, security patches, or other improvements. Any tests that relied on the tool would eventually fail, as the code became outdated and incompatible with newer tech.
The incident highlights a growing concern in the open-source software community. What happens to code after programmers pass away? Much has been written about what happens to social-media accounts after users die. But it’s been less of an issue among programmers. In part, that’s because most companies and governments relied on commercial software maintained by teams of people. But today, more programs rely on obscure but crucial software like Weirich’s.
Some open-source projects are well known, such as the Linux operating system or Google’s artificial-intelligence framework TensorFlow. But each of these projects depend on smaller libraries of open-source code. And those libraries depend on other libraries. The result is a complex, but largely hidden, web of software dependencies.
That can create big problems, as in 2014 when a security vulnerability known as “Heartbleed” was found in OpenSSL, an open-source program used by nearly every website that processes credit- or debit-card payments. The software comes bundled with most versions of Linux, but was maintained by a small team of volunteers who didn’t have the time or resources to do extensive security audits. Shortly after the Heartbleed fiasco, a security issue was discovered in another common open-source application called Bash that left countless web servers and other devices vulnerable to attack.
There are surely more undiscovered vulnerabilities. Libraries.io, a group that analyzes connections between software projects, has identified more than 2,400 open-source libraries that are used in at least 1,000 other programs but have received little attention from the open-source community.
Security problems are only one part of the issue. If software libraries aren’t kept up to date, they may stop working with newer software. That means an application that depends on an outdated library may not work after a user updates other software. When a developer dies or abandons a project, everyone who depends on that software can be affected. Last year when programmer Azer Koçulu deleted a tiny library called Leftpad from the internet, it created ripple effects that reportedly caused headaches at Facebook, Netflix, and elsewhere.
The Bus Factor
The fewer people with ownership of a piece of software, the greater the risk that it could be orphaned. Developers even have a morbid name for this: the bus factor, meaning the number of people who would have to be hit by a bus before there’s no one left to maintain the project. Libraries.io has identified about 3,000 open-source libraries that are used in many other programs but have only a handful of contributors.
Orphaned projects are a risk of using open-source software, though commercial software makers can leave users in a similar bind when they stop supporting or updating older programs. In some cases, motivated programmers adopt orphaned open-source code.
That’s what Searls did with one of Weirich’s projects. Weirich’s most-popular projects had co-managers by the time of his death. But Searls noticed one, the testing tool Rspec-Given, hadn’t been handed off, and wanted to take responsibility for updating it. But he ran into a few snags along the way.
Rspec-Given’s code was hosted on the popular code-hosting and collaboration site GitHub, home to 67 million codebases. Weirich’s Rspec-Given page on GitHub was the main place for people to report bugs or to volunteer to help improve the code. But GitHub wouldn’t give Searls control of the page, because Weirich had not named him before he died. So Searls had to create a new copy of the code, and host it elsewhere. He also had to convince the operators of Ruby Gems, a “package-management system” for distributing code, to use his version of Rspec-Given, instead of Weirich’s, so that all users would have access to Searls’ changes. GitHub declined to discuss its policies around transferring control of projects.
That solved potential problems related to Rspec-Given, but it opened Searls’ eyes to the many things that could go wrong. “It’s easy to see open source as a purely technical phenomenon,” Searls says. “But once something takes off and is depended on by hundreds of other people, it becomes a social phenomenon as well.”
The maintainers of most package-management systems have at least an ad-hoc process for transferring control over a library, but that process usually depends on someone noticing that a project has been orphaned and then volunteering to adopt it. “We don’t have an official policy mostly because it hasn’t come up all that often,” says Evan Phoenix of the Ruby Gems project. “We do have an adviser council that is used to decide these types of things case by case.”
Some package managers now monitor their libraries and flag widely used projects that haven’t been updated in a long time. Neil Bowers, who helps maintain a package manager for the programming language Perl, says he sometimes seeks out volunteers to take over orphan projects. Bowers says his group vets claims that a project has been abandoned, and the people proposing to take it over.
A ‘Dead-Man’s Switch’
Taking over Rspec-Given inspired Searls, who was only 30 at the time, to make a will and a succession plan for his own open-source projects. There are other things developers can do to help future-proof their work. They can, for example, transfer the copyrights to a foundation, such as the Apache Foundation. But many open-source projects essentially start as hobbies, so programmers may not think to transfer ownership until it is too late.
Searls suggests that GitHub and package managers such as Gems could add something like a “dead man’s switch” to their platform, which would allow programmers to automatically transfer ownership of a project or an account to someone else if the creator doesn’t log in or make changes after a set period of time.
But a transition plan means more than just giving people access to the code. Michael Droettboom, who took over a popular mathematics library called Matplotlib after its creator John Hunter died in 2012, points out that successors also need to understand the code. “Sometimes there are parts of the code that only one person understands,” he says. “The knowledge exists only in one person’s head.”
That means getting people involved in a project earlier, ideally as soon as it is used by people other than the original developer. That has another advantage, Searls points out, in distributing the work of maintaining a project to help prevent developer burnout.