The multiple repository conundrum in Linux packaging


I'm involved with a number of free software projects as both a developer and as the maintainer of packages for various distributions such as Debian (which also feeds packages to Ubuntu) and OpenCSW.

I regularly come across the following situations:

Sadly, despite everybody having the best intentions, there is sometimes a chasm separating these two groups of people.

Upstream developers are often busy developing new features and don't have time to work on the intricacies of packaging. I hope that by sharing a few of my own experiences I can help more developers get their software packaged more easily.

Fortunately, a number of great tools like git-buildpackage have emerged for streamlining the packaging process, but this has also created more confusion for developers who have their own git repositories and don't quite understand how the Debian git repository and patching process relates to their own repository.

The autotools world

Here, I focus on autotools based software, because this type of software has it's own peculiar issues when packaging. In particular, these issues appear when using a version control system to track upstream releases. Some of the concepts can be applied to the study of regular Makefile or cmake projects as well.

Here is a diagram giving an overview:

Let's work through each of the steps in the diagram:

The upstream release

  1. The developer/release manager updates the version number in
    configure.ac
    (sometimes called configure.in) and tags the code. (Usually this tag is on a dedicated release branch.)
  2. The developer checks out a copy of the code from the tag into a fresh working directory
  3. The developer runs the autoreconf/automake tools, usually from a bootstrap script. These tools create a number of new files that don't exist in the project repository. Finally, the developer runs
    make dist
    , which puts all the files, including the generated files, into a distribution tarball. It is worth emphasizing this point: the tarball is not just an archive of the files from the repository/tag, it also contains a number of files generated by autotools.
  4. The developer uploads the tarball to a web site such as the Sourceforge download page. Usually a release announcement is made now containing checksums for the tarball.

At this point, the upstream developer's work is done and packaging teams from various projects such as Debian will take over. Sometimes, the upstream developer is also building the packages and continues onto the next steps himself:

Packaging

  1. The package is downloaded by the package maintainer
  2. If it is the first download, the maintainer creates a new git repository. If it has been packaged before, he clones the repository. The important point here is that this is not the upstream repository, it is an independent repository for Debian packaging. The maintainer uses the git-import-orig tool to import the upstream tarball into the packaging repository. The git-import-orig tool captures an exact snapshot of the upstream release tarball contents in a branch called upstream. One point where the Debian repository differs fundamentally from the upstream repository is that all files from the tarball will be tracked in the Debian git repository, even those automatically generated files that were created by autotools and don't exist in the upstream repository.
  3. The maintainer creates or updates the various artifacts for packaging. These files are kept on the master branch, and the tarball contents from the upstream branch are merged into master to create packages.
  4. When the maintainer feels the code is ready, he will check out a clean copy of the repository to build the package from.
  5. The maintainer executes a tool such as git-buildpackage or regular dpkg-buildpackage, which creates the *.deb files.
  6. The files are checked with a tool like lintian and some manual testing/installation. If all is OK, a tag is made in the packaging repository, with a suffix appended to the upstream version number to indicate which iteration of the package it applies to.