logfile-daemon_mysql.pl
- Write squid access log into a mysql database
mysql -u root -p squid_log < logfile_daemon-mysql.sql cp logfile_daemon-mysql.pl /path/to/squid/libexec/
then, in squid.conf:
logformat squid_mysql %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un %Sh %<A %mt access_log daemon:/mysql_host/database/table/username/password squid_mysql logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl
This module exploits the new logfile daemon support available in squid 2.7 to store access log entries in a MySQL database.
This script expects the following log format (it's the default 'squid' log format without the two '/' characters):
logformat squid_mysql %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un %Sh %<A %mt
The path to the access log file is used to provide the database connection parameters.
access_log daemon:/mysql_host/database/table/username/password squid_mysql
The 'daemon' prefix is mandatory and tells squid that the logfile_daemon is to be used instead of the normal file logging.
The last parameter, 'squid_mysql' in the example, tells squid which log format to use when writing lines to the log daemon.
Host where the mysql server is running. If left empty, 'localhost' is assumed.
Name of the database to connect to. If left empty, 'squid_log' is assumed.
Name of the database table where log lines are stored. If left empty, 'access_log' is assumed.
Username to use when connecting to the database. If left empty, 'squid' is assumed.
Password to use when connecting to the database. If left empty, no password is used.
To leave all fields to their default values, you can use a single slash:
access_log daemon:/ squid_mysql
To specify only the database password, which by default is empty, you must leave unspecified all the other parameters by using null strings:
access_log daemon://///password squid_mysql
This is the current way of telling squid where the logfile daemon resides.
logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl
The script must be copied to the location specified in the directive.
Let's call the database 'squid_log' and the log table 'access_log'. The username and password for the db connection will be both 'squid'.
Create the database:
CREATE DATABASE squid_log;
Create the user:
GRANT INSERT,SELECT ON squid_log.* TO 'squid'@'localhost' IDENTIFIED BY 'squid'; FLUSH PRIVILEGES;
Note that only INSERT and SELECT privileges are granted to the 'squid' user. This ensures that the logfile daemon script cannot change or modify the log entries.
Create the table:
CREATE TABLE access_log( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, time_since_epoch DECIMAL(15,3), response_time INTEGER, client_src_ip_addr CHAR(15), squid_request_status VARCHAR(20), http_status_code VARCHAR(10), reply_size INTEGER, request_method VARCHAR(20), request_url VARCHAR(1000), username VARCHAR(20), squid_hier_status VARCHAR(20), server_ip_addr CHAR(15), mime_type VARCHAR(50) );
Alternatively, you can also use the provided sql scripts, like this:
cat logfile-daemon_mysql-table.sql logfile-daemon_mysql-date_day_column.sql logfile-daemon_mysql-indexes.sql | mysql -u root -p squid_log
This document refers to logfile-daemon_mysql.pl
script version 0.4.
The script has been developed and tested in the following environment:
SELECT DISTINCT client_src_ip_addr FROM access_log;
SELECT DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day, COUNT(*) AS num_of_requests FROM access_log GROUP BY 1 ORDER BY 1;
To obtain the raw count of each request status:
SELECT squid_request_status, COUNT(*) AS n FROM access_log GROUP BY squid_request_status ORDER BY 2 DESC;
To calculate the percentage of each request status:
SELECT squid_request_status, (COUNT(*)/(SELECT COUNT(*) FROM access_log)*100) AS percentage FROM access_log GROUP BY squid_request_status ORDER BY 2 DESC;
To distinguish only between HITs and MISSes:
SELECT 'hits', (SELECT COUNT(*) FROM access_log WHERE squid_request_status LIKE '%HIT%') / (SELECT COUNT(*) FROM access_log)*100 AS percentage UNION SELECT 'misses', (SELECT COUNT(*) FROM access_log WHERE squid_request_status LIKE '%MISS%') / (SELECT COUNT(*) FROM access_log)*100 AS pecentage;
SELECT '0..500', COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage FROM access_log WHERE response_time >= 0 AND response_time < 500 UNION SELECT '500..1000', COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage FROM access_log WHERE response_time >= 500 AND response_time < 1000 UNION SELECT '1000..2000', COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage FROM access_log WHERE response_time >= 1000 AND response_time < 2000 UNION SELECT '>= 2000', COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage FROM access_log WHERE response_time >= 2000;
SELECT mime_type, SUM(reply_size) as total_bytes FROM access_log GROUP BY mime_type ORDER BY 2 DESC;
SELECT client_src_ip_addr, SUM(reply_size) AS total_bytes FROM access_log GROUP BY 1 ORDER BY 2 DESC;
The myisam storage engine is known to be faster than the innodb one, so although it doesn't support transactions and referential integrity, it might be more appropriate in this scenario. You might want to append ``ENGINE=MYISAM'' at the end of the table creation code in the above SQL script.
Indexes should be created according to the queries that are more frequently run. The DDL script only creates an implicit index for the primary key column.
This script currently implements only the L
(i.e. ``append a line to the log'') command, therefore the log lines are never purged from the table. This approach has an obvious scalability problem.
One solution would be to implement e.g. the ``rotate log'' command in a way that would calculate some summary values, put them in a ``summary table'' and then delete the lines used to caluclate those values.
Similar cleanup code could be implemented in an external script and run periodically independently from squid log commands.
This script has only been tested in low-volume scenarios (single client, less than 10 req/s). Tests in high volume environments could reveal performance bottlenecks and bugs.
Marcello Romani, marcello.romani@libero.it
Copyright (C) 2008 by Marcello Romani
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.