Pruning Syslog entries from MongoDB

I previously announced the availability of rsyslog+MongoDB+LogAnalyzer in Debian wheezy-backports. This latest rsyslog with MongoDB storage support is also available for Ubuntu and Fedora users in one way or another.

Just one thing was missing: a flexible way to prune the database. LogAnalyzer provides a very basic pruning script that simply purges all records over a certain age. The script hasn't been adapted to work within the package layout. It is written in PHP, which may not be ideal for people who don't actually want LogAnalyzer on their Syslog/MongoDB host.

Now there is a convenient solution: I've just contributed a very trivial Python script for selectively pruning the records.

Thanks to Python syntax and the PyMongo client, it is extremely concise: in fact, here is the full script:

#!/usr/bin/python

import syslog
import datetime
from pymongo import Connection

# It assumes we use the default database name 'logs' and collection 'syslog'
# in the rsyslog configuration.

with Connection() as client:
    db = client.logs
    table = db.syslog
    #print "Initial count: %d" % table.count()
    today = datetime.datetime.today()

    # remove ANY record older than 5 weeks except mail.info
    t = today - datetime.timedelta(weeks=5)
    table.remove({"time":{ "$lt": t }, "syslog_fac": { "$ne" : syslog.LOG_MAIL }})

    # remove any debug record older than 7 days
    t = today - datetime.timedelta(days=7)
    table.remove({"time":{ "$lt": t }, "syslog_sever": syslog.LOG_DEBUG})

    #print "Final count: %d" % table.count()

Just put it in /usr/local/bin and run it daily from cron.

Customization

Just adapt the table.remove statements as required. See the PyMongo tutorial for a very basic introduction to the query syntax and full details in the MongoDB query operator reference for creating more elaborate pruning rules.

Potential improvements

  • Indexing the columns used in the queries
  • Logging progress and stats to Syslog


LogAnalyzer using a database backend such as MongoDB is very easy to set up and much faster than working with text-based log files