NAME

logfile-daemon_mysql.pl - Write squid access log into a mysql database


SYNOPSIS

  mysql -u root -p squid_log < logfile_daemon-mysql.sql
  cp logfile_daemon-mysql.pl /path/to/squid/libexec/

then, in squid.conf:

  logformat squid_mysql  %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un %Sh %<A %mt
  access_log daemon:/mysql_host/database/table/username/password squid_mysql
  logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl


DESCRIPTION

This module exploits the new logfile daemon support available in squid 2.7 to store access log entries in a MySQL database.


CONFIGURATION

Squid configuration

logformat directive

This script expects the following log format (it's the default 'squid' log format without the two '/' characters):

  logformat squid_mysql  %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un %Sh %<A %mt

access_log directive

The path to the access log file is used to provide the database connection parameters.

  access_log daemon:/mysql_host/database/table/username/password squid_mysql

The 'daemon' prefix is mandatory and tells squid that the logfile_daemon is to be used instead of the normal file logging.

The last parameter, 'squid_mysql' in the example, tells squid which log format to use when writing lines to the log daemon.

mysql_host

Host where the mysql server is running. If left empty, 'localhost' is assumed.

database

Name of the database to connect to. If left empty, 'squid_log' is assumed.

table

Name of the database table where log lines are stored. If left empty, 'access_log' is assumed.

username

Username to use when connecting to the database. If left empty, 'squid' is assumed.

password

Password to use when connecting to the database. If left empty, no password is used.

To leave all fields to their default values, you can use a single slash:

  access_log daemon:/ squid_mysql

To specify only the database password, which by default is empty, you must leave unspecified all the other parameters by using null strings:

  access_log daemon://///password squid_mysql

logfile_daemon directive

This is the current way of telling squid where the logfile daemon resides.

  logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl

The script must be copied to the location specified in the directive.

Database configuration

Let's call the database 'squid_log' and the log table 'access_log'. The username and password for the db connection will be both 'squid'.

Database

Create the database:

  CREATE DATABASE squid_log;

User

Create the user:

  GRANT INSERT,SELECT ON squid_log.* TO 'squid'@'localhost' IDENTIFIED BY 'squid';
  FLUSH PRIVILEGES;

Note that only INSERT and SELECT privileges are granted to the 'squid' user. This ensures that the logfile daemon script cannot change or modify the log entries.

Table

Create the table:

  CREATE TABLE access_log(
    id                   INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    time_since_epoch     DECIMAL(15,3),
    response_time        INTEGER,
    client_src_ip_addr   CHAR(15),
    squid_request_status VARCHAR(20),
    http_status_code     VARCHAR(10),
    reply_size           INTEGER,
    request_method       VARCHAR(20),
    request_url          VARCHAR(1000),
    username             VARCHAR(20),
    squid_hier_status    VARCHAR(20),
    server_ip_addr       CHAR(15),
    mime_type            VARCHAR(50)
  );

Alternatively, you can also use the provided sql scripts, like this:

cat logfile-daemon_mysql-table.sql logfile-daemon_mysql-date_day_column.sql logfile-daemon_mysql-indexes.sql | mysql -u root -p squid_log


VERSION INFORMATION

This document refers to logfile-daemon_mysql.pl script version 0.4.

The script has been developed and tested in the following environment:

squid-2.7.DEVEL0-20080220
mysql 5.0.26
perl 5.8.8
OpenSUSE 10.2


DATA EXTRACTION

Sample queries.

Clients accessing the cache
  SELECT DISTINCT client_src_ip_addr FROM access_log;
Number of request per day
  SELECT
    DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day,
    COUNT(*) AS num_of_requests
  FROM access_log
  GROUP BY 1
  ORDER BY 1;
Request status count

To obtain the raw count of each request status:

  SELECT squid_request_status, COUNT(*) AS n
  FROM access_log
  GROUP BY squid_request_status
  ORDER BY 2 DESC;

To calculate the percentage of each request status:

  SELECT
    squid_request_status,
    (COUNT(*)/(SELECT COUNT(*) FROM access_log)*100) AS percentage
  FROM access_log
  GROUP BY squid_request_status
  ORDER BY 2 DESC;

To distinguish only between HITs and MISSes:

  SELECT
    'hits',
    (SELECT COUNT(*)
    FROM access_log
    WHERE squid_request_status LIKE '%HIT%')
    /
    (SELECT COUNT(*) FROM access_log)*100
    AS percentage
  UNION
  SELECT
    'misses',
    (SELECT COUNT(*)
    FROM access_log
    WHERE squid_request_status LIKE '%MISS%')
    /
    (SELECT COUNT(*) FROM access_log)*100
    AS pecentage;
Response time ranges
  SELECT
    '0..500',
    COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
  FROM access_log
  WHERE response_time >= 0 AND response_time < 500
  UNION
  SELECT
    '500..1000',
    COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
  FROM access_log
  WHERE response_time >= 500 AND response_time < 1000
  UNION
  SELECT
    '1000..2000',
    COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
  FROM access_log
  WHERE response_time >= 1000 AND response_time < 2000
  UNION
  SELECT
    '>= 2000',
    COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
  FROM access_log
  WHERE response_time >= 2000;
Traffic by mime type
  SELECT
    mime_type,
    SUM(reply_size) as total_bytes
  FROM access_log
  GROUP BY mime_type
  ORDER BY 2 DESC;
Traffic by client
  SELECT
    client_src_ip_addr,
    SUM(reply_size) AS total_bytes
  FROM access_log
  GROUP BY 1
  ORDER BY 2 DESC;

Speed issues

The myisam storage engine is known to be faster than the innodb one, so although it doesn't support transactions and referential integrity, it might be more appropriate in this scenario. You might want to append ``ENGINE=MYISAM'' at the end of the table creation code in the above SQL script.

Indexes should be created according to the queries that are more frequently run. The DDL script only creates an implicit index for the primary key column.


TODO

Table cleanup

This script currently implements only the L (i.e. ``append a line to the log'') command, therefore the log lines are never purged from the table. This approach has an obvious scalability problem.

One solution would be to implement e.g. the ``rotate log'' command in a way that would calculate some summary values, put them in a ``summary table'' and then delete the lines used to caluclate those values.

Similar cleanup code could be implemented in an external script and run periodically independently from squid log commands.

Testing

This script has only been tested in low-volume scenarios (single client, less than 10 req/s). Tests in high volume environments could reveal performance bottlenecks and bugs.


AUTHOR

Marcello Romani, marcello.romani@libero.it


COPYRIGHT AND LICENSE

Copyright (C) 2008 by Marcello Romani

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.