What is mod_log_spread?
mod_log_spread is a patch to Apache's mod_log_config, which provides an interface for spread to multicast access logs. It utilizes the group communication toolkit Spread, developed at Johns Hopkins University's Center for Networking and Distributed Systems. mod_log_spread was developed to solve the problem of collecting consolidated access logs for large web farms. In particular, the solution needed to be scalable to hundreds of machines, utilize a reliable network transport, allow machines to added or dropped on the fly, and impose minimal performance impact on the webservers. Current version is 1.0.3p3. This makes a fix to a stupid vhost logging bug as well as providing a complete and flexible log-writing solution.

 
What is wrong with the way things were...?
The reason I wrote mod_log_spread was that a popular commercial log writing application my company purchased was hard to support, non-scalable, and broke frequently. The scalability concerns with it stemmed from it's basic design. The particular product I was addled with was a (java-based) packet sniffer. It sniffs for http transactions and recreates them from tcp sessions. This presents immediate scalability concerns. How do we sniff a network pushing 70Mb of traffic with a single non-clustering packet-sniffer? You don't. mod_log_spread backs up this assertion by demonstratebly recording 10-15% more traffic. Sniffers drop logs, Spread, the underlying protocol behind mod_log_spread, is designed to be unable to drop messages. This particular commercial sniffer is also a single point of failure. mod_log_spread can run two (or any number) logging hosts simultaneously with no netwrok overhead. Further it is not a black box product, mod_log_spread is an open-source project.
So why not just write logs locally?
There is a 20-30% performance hit, and you have never known pain until you have tried to manage local logging across 60 machines. Trust me.
What are other Spread logging projects?
In addition to the apache module, I also have a patch for the lightweight web server thttpd to allow similair logging capabilities. If you're running spread on a highly utilized Linux system, you may benefit from setting our spread daemon to real-time scheduling. Here's a perl module to make that easy to accomplish.
Availability
mod_log_spread is available under an apache style license. Basically, redistribution is permitted as long as the copyright is preserved and included intact. In addition, the authors of Spread have provided an unlimited use license for using Spread to log web requests. Thanks CNDS!
Note: other uses of Spread may not be covered under this license.
ChangeLog
  • 2000-05-27 Fixed brown-paper-bag parse error.
  • 2000-06-04 Added perl scheduler interface for Linux to help running spread in realtime. 
  • 2000-06-07 Added spread tuning docs. Fixed potential issue of inedfinetly blocking on SP_multicast if Spread's queue is full.
  • 2000-07-14 Added Theo Schlossnagle's wonderful spreadlogd spreadlogd to the dist as a replacement to log_writer.
  • 2000-09-24 mod_log_spread becomes part of the Backhand Project!
  • 2000-10-14 Added vhost support. Fixed potential hang.
  • 2000-10-18 Completed vhost logging support.
  • 2000-10-19 Fixed bug in $#vhost logging which caused log corruption under certain circumstances.
  • 2000-10-21 Added sample configuration/tutorial to distribution tar ball.
  • 2000/11/03 Fixed Solaris support for spreadlogd. Fixed symbol conflict between mod_log_spread and mod_php which broke $#vhost logging. When the two are used in conjunction.
  • Download for Apache
    Download spreadlogd
    Download thttpd patch
    Download Perl Linux Scheduler module
    Report Bugs
    Join the Users Mailing List

    Like mod_log_spread? Please mail me and let me know what you think or how you've used it.