Squid con mysql logging

A partire dalla versione 2.7 di Squid è stata aggiunta una interessante istruzione: access_log daemonCon questa istruzione è possibile redirigere i log di squid all'esterno.
Grazie ai contributi (script perl e query sql) di Marcello Romani, è stato possibile inviare i log di squid verso un db mysql.
Ecco lo script da salvare come /etc/squid/logfile-daemon_mysql.pl
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use English qw( -no_match_vars );
# utility routine to print messages on stderr (so they appear in cache log)
# without using warn, which would clutter the log with source line numbers
sub log_info {
my $msg = shift;
print STDERR "$msg\n";
}
# the first argument to this script is the log file path
my $log_file = shift;
# we use logfile to pass database access information to this script
# sample configuration:
# access_log daemon:/host/database/table/username/password squid
# to let a parmeter unspecified, e.g. the database host, use a double slash:
# access_log daemon://database/table/username/password squid
my ( $host, $database, $table, $user, $pass ) = $log_file =~ / \/(.*?)\/(.*?)\/(.*?)\/(.*?)\/(.*?) \z /xms;
if ( !$host ) {
$host = 'localhost';
log_info("Database host not specified. Using $host.");
}
if ( !$database ) {
$database = 'squid_log';
log_info("Database name not specified. Using $database.");
}
if ( !$table ) {
$table = 'access_log';
log_info("Table parameter not specified. Using $table.");
}
if ( !$user ) {
$user = 'squid';
log_info("User parameter not specified. Using $user.");
}
if ( !$pass ) {
log_info('No password specified. Connecting with NO password.');
}

# fields that we should have in the table
my @fields = qw(
id
time_since_epoch
response_time
client_src_ip_addr
squid_request_status
http_status_code
reply_size
request_method
request_url
username
squid_hier_status
server_ip_addr
mime_type
);
my $dsn;
my $dbh;
my $sth;
# perform db connection
$dsn = "DBI:mysql:database=$database";
eval {
warn "Connecting... dsn='$dsn', username='$user', password='...'";
$dbh = DBI->connect($dsn, $user, $pass, { AutoCommit => 1, RaiseError => 1, PrintError => 1 });
};
if ($EVAL_ERROR) {
die "Cannot connect to database: $DBI::errstr";
}

# a simple test to assure the specified table exists
eval {
my $q = 'SELECT ' . join(',',@fields) . " FROM $table LIMIT 1";
my $sth = $dbh->prepare($q);
$sth->execute;
};
if ($EVAL_ERROR) {
die "Cannot SELECT from $table: $DBI::errstr";
}

# for better performance, prepare the statement at startup
eval {
my $q = "INSERT INTO $table (" . join(',',@fields) . ") VALUES(NULL" . ',?' x (scalar(@fields)-1) . ')';
#$sth = $dbh->prepare("INSERT INTO $table VALUES(NULL,?,?,?,?,?,?,?,?,?,?,?,?)");
$sth = $dbh->prepare($q);
};
if ($EVAL_ERROR) {
die "Error while preparing sql statement: $EVAL_ERROR";
}

# main loop
while (my $line = <>) {
chomp $line;
my $cmd = substr($line, 0, 1); # extract command byte
substr($line, 0, 1, ' '); # replace command byte with a blank
if ( $cmd eq 'L' ) {
my @values = split / \s+ /xms, $line;
shift @values; # the first blank generates an empty bind value that has to be removed
eval { # we catch db errors to avoid crashing squid in case something goes wrong...
$sth->execute(@values) or die $sth->errstr
};
if ( $EVAL_ERROR ) { # leave a trace of the error in the logs
warn $EVAL_ERROR . " values=(" . join(', ', @values) . ')';
}
} } $dbh->disconnect();
Questo script dovrà essere reso eseguibile con "chmod +x "
Ora dobbiamo creare la struttura del db che ospiti i log.Creiamo un file chiamato logfile-daemon_mysql-table.sql
-- setup statements
--
-- CREATE DATABASE squid_log;
-- GRANT INSERT,SELECT ON squid_log.* TO 'squid'@'localhost' IDENTIFIED BY 'squid'; --
-- DDL statements
--
-- USE squid_log; --
-- This table is based on squid's default 'squid' logformat, with minor modifications
-- (the two slashes are removed)
-- original:
-- logformat squid %ts.%03tu %6tr %>a %Ss/%03Hs %<st %rm %ru %un %Sh/%<A %mt
-- modified:
-- logformat squid %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un %Sh %<A %mt
-- changes: ^ ^
--
CREATE TABLE access_log(
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
time_since_epoch DECIMAL(15,3),
response_time INTEGER,
client_src_ip_addr CHAR(15),
squid_request_status VARCHAR(50),
http_status_code VARCHAR(10),
reply_size INTEGER,
request_method VARCHAR(20),
request_url VARCHAR(1000),
username VARCHAR(40),
squid_hier_status VARCHAR(20),
server_ip_addr CHAR(15),
mime_type VARCHAR(50)
);
Ora è necessario creare la tabella in mysq con:mysql> source logfile-daemon_mysql.sqlEcco i file che permette di effettuare le statistiche logfile-daemon_mysql-views.sql
--
-- Sample views
-- -- ip address of hosts accessing the cache
CREATE OR REPLACE VIEW cache_clients AS
SELECT DISTINCT client_src_ip_addr FROM access_log ORDER BY 1; -- traffic by client
CREATE OR REPLACE VIEW traffic_by_client AS
SELECT
client_src_ip_addr,
SUM(reply_size) AS total_bytes,
SUM(reply_size)/1024 AS total_kilobytes,
SUM(reply_size)/1048576 AS total_megabytes
FROM access_log
GROUP BY 1
ORDER BY 1; -- most active clients
-- same as before, but sorted by traffic;
-- show only the 10 most active clients
CREATE OR REPLACE VIEW most_active_clients AS
SELECT
client_src_ip_addr,
SUM(reply_size) AS total_bytes,
SUM(reply_size)/1024 AS total_kilobytes,
SUM(reply_size)/1048576 AS total_megabytes
FROM access_log
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
-- traffic per day
CREATE OR REPLACE VIEW traffic_per_day AS
SELECT
date_day,
SUM(reply_size) AS total_bytes,
SUM(reply_size)/1024 AS total_kilobytes,
SUM(reply_size)/1048576 AS total_megabytes
FROM access_log
GROUP BY 1
ORDER BY 1; -- traffic by client per day
CREATE OR REPLACE VIEW traffic_per_day_per_client AS
SELECT
date_day,
client_src_ip_addr,
SUM(reply_size) AS total_bytes,
SUM(reply_size)/1024 AS total_kilobytes,
SUM(reply_size)/1048576 AS total_megabytes
FROM access_log
GROUP BY 1,2
ORDER BY 1,2 DESC; CREATE OR REPLACE VIEW traffic_per_month_per_client AS
SELECT
YEAR(date_day) AS date_year,
MONTH(date_day) AS date_month,
client_src_ip_addr,
SUM(reply_size) AS total_bytes,
SUM(reply_size)/1024 AS total_kilobytes,
SUM(reply_size)/1048576 AS total_megabytes
FROM access_log
GROUP BY 2,3
ORDER BY 1,2,3; -- list of clients with some stats
CREATE OR REPLACE VIEW cache_clients_with_infos AS
SELECT
a.client_src_ip_addr,
COUNT(*) AS total_requests,
(COUNT(*)/(SELECT COUNT(*) FROM access_log))*100 AS requests_perc,
SUM(a.reply_size) AS total_traffic,
(SUM(a.reply_size)/(SELECT SUM(reply_size) FROM access_log))*100 AS traffic_perc,
(SELECT COUNT(*) FROM access_log a1 WHERE a1.client_src_ip_addr=a.client_src_ip_addr AND squid_request_status LIKE '%HIT%')
/
(SELECT COUNT(*) FROM access_log)
* 100 AS hit_perc,
(SELECT COUNT(*) FROM access_log a1 WHERE a1.client_src_ip_addr=a.client_src_ip_addr AND squid_request_status LIKE '%MISS%')
/
(SELECT COUNT(*) FROM access_log)
* 100 AS miss_perc,
MIN(date_day) AS first_access_date,
MIN(date_time) AS first_access_time,
MAX(date_day) AS last_access_date,
MAX(date_time) AS last_access_time
FROM access_log a
GROUP BY 1
ORDER BY 1;
-- number of requests per day
CREATE OR REPLACE VIEW requests_per_day AS
SELECT
DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day,
COUNT(*) AS num_of_requests
FROM access_log
GROUP BY 1
ORDER BY 1; -- number of requests per minute
CREATE OR REPLACE VIEW requests_per_minute AS
SELECT
DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day,
HOUR(FROM_UNIXTIME(time_since_epoch)) AS date_hour,
MINUTE(FROM_UNIXTIME(time_since_epoch)) AS date_minute,
COUNT(*) AS num_of_requests
FROM access_log
GROUP BY 1,2,3
ORDER BY 1,2,3; -- number of requests per day of each cache client
CREATE OR REPLACE VIEW requests_per_day_per_client AS
SELECT
DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day,
client_src_ip_addr,
COUNT(*) AS num_of_requests
FROM access_log
GROUP BY 1,2
ORDER BY 1,2; -- percentage of each request status
CREATE OR REPLACE VIEW requests_status_perc AS
SELECT
squid_request_status,
(COUNT(*)/(SELECT COUNT(*) FROM access_log)*100) AS percentage
FROM access_log
GROUP BY squid_request_status
ORDER BY 2 DESC; -- request hits and misses, in percentage
CREATE OR REPLACE VIEW hits_misses_perc AS
SELECT
'hits',
(SELECT COUNT(*) FROM access_log WHERE squid_request_status LIKE '%HIT%')
/
(SELECT COUNT(*) FROM access_log)*100
AS percentage
UNION
SELECT
'misses',
(SELECT COUNT(*) FROM access_log WHERE squid_request_status LIKE '%MISS%')
/
(SELECT COUNT(*) FROM access_log)*100
AS pecentage; -- response times
CREATE OR REPLACE VIEW response_time_ranges AS
SELECT
'0..500',
COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
FROM access_log
WHERE response_time >= 0 AND response_time < 500
UNION
SELECT
'500..1000',
COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
FROM access_log
WHERE response_time >= 500 AND response_time < 1000
UNION
SELECT
'1000..2000',
COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
FROM access_log
WHERE response_time >= 1000 AND response_time < 2000
UNION
SELECT
'>= 2000',
COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
FROM access_log
WHERE response_time >= 2000; -- response time graph
CREATE OR REPLACE VIEW response_time_graph AS
SELECT
response_time,
COUNT(*) AS num_req
FROM access_log
GROUP BY 1
ORDER BY 1; -- traffic by mime type
CREATE OR REPLACE VIEW traffic_by_mime_type AS
SELECT
mime_type,
SUM(reply_size) as total_bytes
FROM access_log
GROUP BY mime_type
ORDER BY 2 DESC; -- last 10 queries
CREATE OR REPLACE VIEW last_10_queries AS
SELECT *
FROM access_log
WHERE
id > (SELECT MAX(id) FROM access_log) - 10
ORDER BY id DESC; -- id of the last query of each client
-- this view is required by the "last n queries by ip" view
CREATE OR REPLACE VIEW last_query_by_client AS
SELECT
client_src_ip_addr,
MAX(id) AS last_query_id
FROM access_log
GROUP BY client_src_ip_addr;
-- last 10 queries of each client
-- NOTE: this query is conceptually wrong because it assumes that no holes exist
-- in the values of column 'id'.
-- This can be false if e.g. some access_log entries get deleted...
CREATE OR REPLACE VIEW last_10_queries_by_client AS
SELECT *
FROM access_log a
WHERE
id > (
SELECT l.last_query_id
FROM last_query_by_client l
WHERE l.client_src_ip_addr = a.client_src_ip_addr
) - 10
ORDER BY a.client_src_ip_addr, a.id DESC;
-- number of HIT requests per day
CREATE OR REPLACE VIEW hits_per_day AS
SELECT
date_day,
COUNT(*) AS num_hits
FROM access_log
WHERE squid_request_status LIKE '%HIT%'
GROUP BY 1; -- HIT requests per day, percent (100% = total number of requests that day)
CREATE OR REPLACE VIEW hits_per_day_perc AS
SELECT
r.date_day,
h.num_hits/r.num_of_requests*100 AS hits_per_day_perc
FROM requests_per_day r
JOIN
hits_per_day h
ON r.date_day = h.date_day;
-- request methods (count)
CREATE OR REPLACE VIEW request_methods AS
SELECT
request_method,
COUNT(*)
FROM access_log
GROUP BY 1
ORDER BY 1; -- request methods by percent
CREATE OR REPLACE VIEW request_methods_perc AS
SELECT
request_method,
COUNT(*) / (SELECT COUNT(*) FROM access_log) * 100 AS perc
FROM access_log
GROUP BY 1
ORDER BY 2 DESC;
-- slowest queries
CREATE OR REPLACE VIEW slowest_requests AS
SELECT *
FROM access_log
ORDER BY response_time DESC
LIMIT 10;
CREATE OR REPLACE VIEW slowest_request_by_method AS
SELECT *
FROM access_log
GROUP BY request_method
ORDER BY request_method, response_time DESC;
-- requests with the biggest reply size
CREATE OR REPLACE VIEW biggest_requests AS
SELECT *
FROM access_log
ORDER BY reply_size DESC
LIMIT 10;   -- list each day which has at least one request, with some statistics
CREATE OR REPLACE VIEW days_with_infos AS
SELECT
date_day,
MIN(date_time) AS first_req_time,
MAX(date_time) AS last_req_time,
COUNT(*) AS number_of_requests,
SUM(reply_size) AS total_traffic_bytes,
SUM(reply_size) / 1048576 AS total_traffic_megabytes,
COUNT(DISTINCT client_src_ip_addr) AS number_of_clients,
AVG(response_time) AS avg_response_time,
MAX(response_time) AS max_response_time, -- client that has made the highest number of requests that day
(
SELECT client_src_ip_addr
FROM requests_per_day_per_client r
WHERE r.date_day = a.date_day
ORDER BY r.num_of_requests DESC LIMIT 1
) AS most_active_client_r, -- the number of requests that client actually made
(
SELECT r.num_of_requests
FROM requests_per_day_per_client r
WHERE r.date_day = a.date_day
ORDER BY r.num_of_requests DESC LIMIT 1
) AS most_active_client_r_nr, -- same info but as percentage on the total number of requests that day
-- we have to repeat the whole query because we cannot reference aliases
-- defined in previous columns
-- a date_day column with an index on it would help here; a view would probably help too...
(
(
SELECT r.num_of_requests
FROM requests_per_day_per_client r
WHERE r.date_day = a.date_day
ORDER BY 1 DESC LIMIT 1
) / (
SELECT COUNT(*)
FROM access_log a1
WHERE a.date_day = a1.date_day
) * 100
) AS most_active_client_r_pc, -- client that has generated the highest traffic that day
(
SELECT t.client_src_ip_addr
FROM traffic_per_day_per_client t
WHERE t.date_day = a.date_day
ORDER BY t.total_bytes DESC LIMIT 1
) AS most_active_client_t, -- the actual traffic generated by that client
(
SELECT t.total_bytes
FROM traffic_per_day_per_client t
WHERE t.date_day = a.date_day
ORDER BY t.total_bytes DESC LIMIT 1
) AS most_active_client_t_b, -- same info expressed in megabytes
(
SELECT t.total_bytes
FROM traffic_per_day_per_client t
WHERE t.date_day = a.date_day
ORDER BY t.total_bytes DESC LIMIT 1
) / 1048576 AS most_active_client_t_mb, -- same info in percentage on the total traffic that day
-- see previous comments
(
(
SELECT t.total_bytes
FROM traffic_per_day_per_client t
WHERE t.date_day = a.date_day
ORDER BY t.total_bytes DESC LIMIT 1
) / (
SELECT SUM(reply_size)
FROM access_log a1
WHERE a.date_day = a1.date_day
) * 100
) AS most_active_client_t_pc FROM access_log a
GROUP BY 1
ORDER BY 1; CREATE OR REPLACE VIEW requests_in_last_minute AS
select * from access_log where time_since_epoch >= ( (select max(time_since_epoch) from access_log) - 60);
CREATE OR REPLACE VIEW avg_req_per_minute AS
SELECT COUNT(*) FROM requests_in_last_minute;
A questo punto creiamo le viste in mysq con:mysql> source logfile-daemon_mysql-views.sqlOra è il momento di istruire squid per l'utilizzo dello script.Editare squid.conf aggiungendo le seguenti righe (controllando che le righe non esistano già nel vostro file di configurazione): logformat squid_mysql %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un %Sh %<A %mt
logfile_daemon /etc/squid/logfile-daemon_mysql.pl
access_log daemon:/ squid_mysql
Per rendere attivo il nuovo sistema riavviate squid (ad es. /etc/init.d/squid restart).
Per evitare la crescita a dismisura delle tabelle di mysql è necessario prevedere un sistema di rotazione dei log.

Una soluzione potrebbe essere quella di effettuare il dump compresso (mysqldump) del mese per poterlo archiviare. E' possibile anche eliminare dal db i mesi precedenti in automatico decommentando una riga. Ecco uno script di esempio:#!/bin/bash # monthly_squid_log_dump.sh # controlla se oggi è l'ultimo giorno del mese # # aggiungere al crontab la seguente riga: # 59 23 * * * /etc/scripts/monthly_squid_log_dump.sh 1>/dev/null 2>&1 TODAY=`/bin/date +%d` TOMORROW=`/bin/date +%d -d "1 day"` # See if tomorrow's day is less than today's if [ $TOMORROW -lt $TODAY ]; then #echo "E' l'ultimo giorno del mese" /usr/bin/mysqldump -paaabbb squid_log --where="from_unixtime(time_since_epoch) >= '`/bin/date +%Y-%m-`01 00:00:00' AND from_unixtime(time_since_epoch) <= '`/bin/date +%Y-%m-%d` 23:59:59'" > /tmp/`/bin/date +%Y%m`_squid_log.sql RETVAL=$? if [ $RETVAL -eq 0 ]; then /bin/gzip /tmp/`/bin/date +%Y%m`_squid_log.sql /bin/mv /tmp/`/bin/date +%Y%m`_squid_log.sql.gz /var/backups/squid/ # La riga seguente elimina i dati dal db dei mesi precedenti. Usare a proprio rischio #/usr/bin/mysql -uusername -ppassword squid_log -e "DELETE FROM access_log WHERE from_unixtime(time_since_epoch) <= '`/bin/date +%Y-%m-`01 00:00:00'" else /bin/rm /tmp/`/bin/date +%Y%m`_squid_log.sql -rf fi exit 0 fi #echo "Non è l'ultimo giorno del mese" #exit 1

Files scaricabili

to be continued...