Monitoring with Ganglia: an O'Reilly community book project

I recently had the opportunity to contribute to an O'Reilly community book project, developing the book Monitoring with Ganglia in collaboration with other members of the Ganglia team

The project itself, as a community book, pays no royalties back to the contributors, as we have chosen to donate all proceeds to charity. People who contributed to the book include Robert Alexander, Jeff Buchbinder, Frederiko Costa, Alex Dean, Dave Josephsen, Bernard Li, Matt Massie, Brad Nicholes, Peter Phaal and Vladimir Vuksan and we also had generous assistance from various members of the open source community who assisted in the review process.

Ganglia itself started at University of California, Berkeley as an initiative of Matt Massie, for monitoring HPC cloud infrastructure

My own contact with Ganglia only began in 2008 when I was offered the opportunity to work full-time on the enterprise-wide monitoring systems for a large investment bank. Ganglia had been chosen for this huge project due to it's small footprint, support for many platforms and it's ability to work on a heterogeneous network as well as providing dedicated features for the bank's HPC grid.

This brings me to one important point about Ganglia: it's not just about HPC any more. While it is extremely useful for clusters, grids and clouds, it is also quite suitable for a mixed network of web servers, mail servers, databases and all the other applications you may find in a small business, education or ISP environment.

Instantly up and running with packages

One of the most compelling features, even for small sites with less than 10 nodes, is the ease of installation: install the packages on Debian, Ubuntu, Fedora, OpenCSW and some other platforms, and it just works. Ganglia nodes will find each other over multicast, instantly, no manual configuration changes necessary. On one of the nodes, the web interface must be installed for viewing the statistics. Dare I say it: it is so easy, you hardly even need the book for a small installation.

Where the book is really compelling is if you have hundreds or thousands of nodes, if you want custom charts or custom metrics or anything else beyond just installing the package. If monitoring is more than 10% of your job, the book is probably a must-have.

Excellent open source architecture

Ganglia's simplicity is largely thanks to the way it leverages other open source projects such as Tobi Oetiker's RRDtool and PHP

Anybody familiar with these tools will find Ganglia is particularly easy to work with and customise.

Custom metrics: IO service times

One of my own contributions to the project has been the creation of ganglia-modules-linux, some plugins for Linux-specific metrics and ganglia-modules-solaris providing some similar metrics for Solaris.

These projects on github provide an excellent base for people to fork and implement their own custom metrics in C or C++

The book provides a more detailed account of how to work with the various APIs for Python, C/C++, gmetric (command line/shell scripts) and Java.

The new web interface

For people who had tried earlier versions of Ganglia (and for those people who installed versions < 3.3.0 and still haven't updated), the new web interface is a major improvement and well worth the effort to install.

It is available on the most recent packages (for example, it is in Debian 7 (wheezy) but not in Debian 6.)

It was originally promoted as a standalone project (code-named gweb2) but was adopted as the official Ganglia web interface around the release of Ganglia 3.3.0. This web page provides a useful overview of what has changed and here is the original release announcement.