Linux init-systems

Several value-added init systems have emerged in recent times. Most notably, Upstart has become the default on Ubuntu while systemd is now default in Fedora.

Let's not forget the crusade to move beyond traditional init was given a big boost back in 2005 by Sun's introduction of SMF on Solaris 10 some years before the main Linux solutions gained prominence.

Debian is currently debating whether to use one of these alternatives as a default. My own position on the matter is not decided, but I do feel that some of the debates are asking the wrong questions. Some of the most interesting issues here are likely to have ramifications for other distributions.

My experience with systemd

I've come across systemd in my efforts to package reSIProcate on Fedora. One of the challenges I came across was trying to both create a /var/run/repro directory and telling systemd to launch the process as a non-root user. I tried using the ExecStartPre directive in the systemd manifest to invoke mkdir. Unfortunately, it runs as the non-root user too and could not create the PID directory. The conventional init script doesn't suffer this problem.

In the end, I have patched the repro binary to start as root and drop privileges itself, but that was only when it became apparent that we need to do that for WebRTC support (listening on port 443). While most daemons have the ability to drop privileges, there will be edge cases like this that need more than just a systemd manifest.

Maintainers who encounter such issues may have to give feedback to the systemd and Upstart upstreams and give them the opportunity to provide generalized solutions. Then again, the most generalized solution would be another scripting language and then we'd be back where we started.

The reason to change and the reason not to

Some people have argued that Debian should change simply because other distributions did so. This is not a compelling technical argument alone. The fragmentation of init-alternatives is not something that makes upstreams excited either.

It would be interesting to see if Upstart and systemd could agree on a common manifest format - maybe even a format that could be parsed by a wrapper from SysV init. Then Debian wouldn't really have to give up support for kFreeBSD and other non-Linux kernel efforts. A slight alternative to this approach would be using a system like cmake to read a single service XML manifest and spit out permutations readable by each of the different init schemes.

It has also been well argued that the simple and well known solution of scripts is much less complicated to understand and maintain and is extremely stable.

Wishlist items for all distributions

While I don't know which solution is the best, one thing I do believe is that if there is a change and if it is going to be disruptive to maintainers and users then it should aim to bring in as many benefits as possible to offset the pain and encourage people to support the migration effort.

Some of the potential road-map items are not only possible for Debian but could also become part of other platforms. Here are some of the issues that came to my mind:

  • Better support for starting multiple instances of a single process
  • Support for distributed solutions, for example, making sure that process A only starts on host 1 after process B on host 2
  • Support for clustering, for example, making sure that at least 2 instances of process A are running across 3 hosts.
  • Automated integration with monitoring and fault recovery systems: for example, can Nagios (or another tool) automatically detect whether a particular process should be running and whether it is running? Given that init-replacements typically need to start processes in the correct order, can they share that dependency information with Nagios to avoid manual/duplicated dependency mappings? Can Nagios restart the process without any manual scripting? Can this all be enabled for all processes on a given system in a plug-and-play manner?
  • Scheduler interaction: can and should any of the above, particularly the dependency mappings, be integrated with scheduling? For example, should some cron jobs be deferred when MySQL is down?

What other potential improvements do people feel to be worthwhile?

Comments

Reading the man pages, it seems that you set PermissionsStartOnly to true, and then your Pre/Post scripts will be run as root.

http://www.freedesktop.org/software/systemd/man/systemd.service.html

I wasn't aware of PermissionsStartOnly but it looks like it may have helped in the case I described.

The way to go with regards to creating the directory is to use tmpfiles.d(5).

I like your idea of clustered dependencies, however I'm not sure it should be tied in with pid 1. A higher-level management tool like nagios/puppet feels like a more natural fit, and I consider close integration between init and cluster management tools to be a real benefit.

As for your observation regarding creating entries in /var/run, I would like to extend the problem to cover selinux as well: the daemon execution context usually does not allow creating entries in /var/run, and it shouldn't allow that. Given that selinux type transitions happen (only) at exec(), the "create and drop capabilities" isn't a solution that works with selinux unless the daemon re-execs itself. The current (initscript) approach is to setup the runtime environment in init context, then use restorecon to label it correctly. I would like to see a more formal solution there as well.

I'm certainly not arguing that init should replace everything else in that domain. Rather, it would simply be nice to have some easier integration points. For really small sites with less than 10 servers people don't always set up Puppet, however, they could dramatically increase the quality of their environment if some of these things were just "plug-and-play", e.g. install a new package and the process is automatically monitored and managed by Nagios using some sensible defaults from the package maintainer.

Isn't a tmpfiles.d snippet to create the /var/run/repro directory the most appropriate way with systemd? Is there any reason why that wouldn't work in your case?

With systemd, you shouldn't need to start the daemon as root regardless of what port it listens on either, as the daemon can inherit the socket from systemd.

Daniel,
if the ReSIProcate server added the socket activation code for systemd, you would not need the server to start as root to listen for websockets. Systemd itself would be listening on tcp:443, tcp:5081, etc, and when any traffic came in on those ports, systemd would spawn repro.
As for /var/run/repro: the systemd way to handle this would be to package a file into /usr/lib/tmpfiles.d, with a line:
d /var/run/repro 0770 repro-user repro-group

I also argue that scripts are NOT simple to understand, maintain, and are not "extremely stable".

I was wondering about potential improvements myself, albeit in a wholly different context: puppet. That too is more multi-systems oriented and some of the modules (e.g. those from Example42) provide much of the integration of managing a service (in the broadest sense) and configuring the monitoring for it.

Especially in the more basic configuration space there is much overlap to what distributions traditionally provided (default config files et al.)

> One of the challenges I came across was trying to both create a /var/run/repro directory and telling systemd to launch the process as a non-root user.

All you have to do is to provide a proper /etc/tmpfiles.d/ entry:

* http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html
* http://pkgs.fedoraproject.org/cgit/ejabberd.git/tree/ejabberd.tmpfiles.conf (example)

> In the end, I have patched the repro binary to start as root and drop privileges itself, but that was only when it became apparent that we need to do that for WebRTC support (listening on port 443). While most daemons have the ability to drop privileges, there will be edge cases like this that need more than just a systemd manifest.

Use socket activation for that:

* http://0pointer.de/blog/projects/socket-activation.html

> Better support for starting multiple instances of a single process

Already implemented in systemd. Search for systemd template units:

* http://www.freedesktop.org/software/systemd/man/systemd.unit.html

. One of the challenges I came across was trying to both create a /var/run/repro directory and telling systemd to launch the process as a non-root user. I tried using the ExecStartPre directive in the systemd manifest to invoke mkdir [...] The conventional init script doesn't suffer this problem."
Sure you found that problem, because you were using the wrong tool for the job.. see tmpfiles.d man page for the right way...

In systemd.service(5), there is a cleverly hidden setting called "PermissionStartOnly=", which means, if true, that only ExecStart runs as the specified user, while ExecStartPre= runs as root.

See, for instance, the gearman-job-server.service systemd unit.

Hi, the generalised solution for making volatile directories exists – it's called tmpfilesd. It was first implemented in system and then adopted by upstart, so you can count on it being universally available.
Further read at http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html

You could tell your concerns to Systemd's team. Maybe they already have solutions to some of your wishes

"You could tell your concerns to Systemd's team. Maybe they already have solutions to some of your wishes"

Thanks! This is exactly my main beef with systemd: you gotta wait for the gurus to have a solution for you.

The SystemV approach (I'm very aware of its downsides, mind you) allowed the sysadmin to help herself, in anger. For me, that's the ultimate criterion.

I'd argue that systemd is in a rather better position in this regard than sysinit: it is easier to understand the rather explicit and clearly documented unit files syntax than trying to write a robust and secure init script.

And as this thread seems to prove somewhat, I'd dare to say that it is also already easier to find someone who's knowledgeable enough about systemd than people who know all the subtle intricacies of shell programming. :)

In fact, there is no need to create the directory in the script, you can use tmpfiles.d ( http://www.freedesktop.org/software/systemd/man/tmpfiles.d.html ), or just not fork in the background, since systemd take care of the rest ( see https://github.com/resiprocate/resiprocate/pull/3 for the PR, tested on F19.

Regarding your point on multiple instances, you can take a look on http://0pointer.de/blog/projects/instances.html

I am not sure if the systemd timer can make what you want ( with dependencies ) regarding cron.

And to convert systemd to sysV, there is https://github.com/akhilvij/systemd-to-sysvinit-converter ( debian gsoc )

This is called metainit, packaged in Debian and didn't take off.