This Readme describes analog1.9beta2. For the latest version of analog, see the analog home page.
This program analyses logfiles from WWW servers. It should work on any Unix system. It is designed to be fast and to produce attractive statistics. For more details, see the
For examples of the output see This program is free, and may be freely distributed and modified provided full credit is given to Stephen Turner (sret1@cam.ac.uk), and that this condition is retained. (I should, however, be grateful if you would let me know what modifications you have made). No warranty of any sort is given or implied for this program or its use. This is a beta test version, and some bugs can be expected.RANDOM
(saves time for long reports).
FROM
100 or more days ago.
+1
, +c
,
+f
, +F
, +G
and
+H
removed or given new meanings.
+d
, +D
, +h
,
+i
, +m
, +o
,
+r
, +S
and +W
changed.
BACKGROUND
and NUMLOOKUP
removed.
FILEINCLUDE
and FILEEXCLUDE
must
now be used in place of FILEONLY
and
FILEIGNORE
; similarly for HOST
options.
ALIAS
matching changed.
DOMMINREQS
and DOMMINBYTES
instead of DOMFLOOR
; similarly for the other reports.
REQUESTS
and BYTES
instead of
BYREQUESTS
and BYBYTES
in the
SORTBY
s.
-G
.
-v
option now gives the version number.
http://
not translated to
http:/
HOSTIGNORE
,
HOSTONLY
, SUBDOMAIN
and alphabetical sorting.
BASEURL
command allowing statistics to be
displayed on other servers.
TO
.
/../
, /./
and
//
translated.
FROM
and TO
commands more powerful.
DEBUG
and BACKGROUND
introduced.
SEPCHAR
and REPORTORDER
.
WITHARGS
and WITHOUTARGS
.
+-A
and +-x
. (Config.: ALL
and GENERAL
).
ISPAGE
and
ISNOTPAGE
.
-v
.
WEEKBEGINSON
.
FROM
and TO
commands introduced.
-u
option.
Next you must move the images that came with the analog program (in the directory images) into the IMAGEDIR specified in analhead.h.
When you have done that, compile the program by typing
make(It may take a while as the program is rather big). If that doesn't work, have a look in the Makefile to see if there's anything that you need to change to suit your configuration, and try again.
Then just type
analogto run the program. To send the output to a particular file instead of to the screen, type, e.g.,
analog > outfile.html(This assumes that . is in you
$PATH
, but it should be).
Many options can be set in the file analhead.h. These can be changed before compiling the program. They are explained in that file, so they will not be documented again here.
Otherwise, analog takes its options from configuration files. Many of the configuration commands also have abbreviations as commandline arguments. So, for example, the configuration command
DAILY OFFtells analog not to include a daily summary in the output. But this can also be specified by the command
analog -dbecause the
-d
option is an abbreviation for DAILY
OFF
. In fact any configuration command can be specified on the
commandline by means of the +C
option; you could write
analog +C"DAILY OFF"(This is most useful for running analog from a script or cron job).
To specify a configuration file, you use the commandline argument
+g
followed by the name of the file. For example,
analog +gextra.conftells analog to read configuration commands from the file
extra.conf
. (Note that there is no space between +g
and the filename; this is true of all commandline arguments).
(You can also specify standard input as the configuration file by
the option +g-
).
The configuration
file can contain several commands on separate lines; any text after a hash
(#
) on a line is ignored as a comment. So the following is an
example of a configuration file.
DAILY OFF # We don't want a daily summary FULLDAILY ON # We want a full daily report insteadAn argument to a command can be placed in single or double quotes, and it must be if the argument contains a hash or a space.
Commandline arguments are read in the order in which they occur, and
configuration files are read when the +g
argument is reached.
If commands conflict, later commands override earlier ones.
There are also two special configuration files which
can be specified in
analhead.h. The default configuration file is run before all other
configuration files. You can put in there configuration commands that you
normally want to include but which you can override. You can stop analog
running the default configuration file by the commandline option
-G
.
The mandatory configuration file is run after all other configuration commands have been read, and overrides them all. If the mandatory configuration file cannot be found, the program exits immediately. This can be used by system administrators to stop users analysing certain files or producing certain reports, for example. (Note, however, that the only way to stop it completely is to deny users read access to the logfile. Otherwise there is nothing to stop them analysing it by another copy of analog or another program).
If this is all a bit confusing, just run
analog -v [other options]That will tell you what the values of all the variables will be, based on analhead.h, the configuration options and the commandline options.
We shall now look at all the configuration commands and their commandline equivalents under the following headings. There is a summary list of all of them in the reference section.
The general summary can be turned off by the command
GENERAL OFF(or the commandline argument
-x
) or on by GENERAL ON
(or +x
). If the general summary is
off, all the `Go To' links in the output are also omitted.
The figures in parentheses refer to the last 7 days. They can be turned on and off with
LASTSEVEN ON # or OFFor with the commandline arguments
+7
and -7
.
Counting hosts is something which can take a lot of
memory (we have to remember
the name of every host that has accessed our server). If memory is a problem,
you can turn the host counting off with the commandline option
-s
or the configuration command
COUNTHOSTS OFFAlternatively, you can do an approximate host count in a fixed (pre-specified) amount of memory. You do this by using
+ss
or
COUNTHOSTS APPROXand you can specify the amount of memory to be used by
APPROXHOSTSIZE 100000 # or whatever number, in bytesAbout 3 bytes per host seems to give a very good estimate. Even 1 byte per host will give a fair estimate. If statistics for the last 7 days are on, twice this amount of space will be used.
Each unit () represents 4 000 requests, or part thereof.
month: #reqs: -------- ------ Nov 1995: 119865:Dec 1995: 121214:
Jan 1996: 144960:
![]()
The above display is of a monthly report. In this category, we also have the weekly report (one line for each week), daily summary (one line for Sundays, one for Mondays etc.), daily report (one line for each day ever), hourly summary (one line for midnight, one for 1am etc.) and hourly report (one line for each hour ever).
The following configuration commands show how to turn these reports on and off.
MONTHLY ON WEEKLY ON DAILY ON FULLDAILY OFF HOURLY ON FULLHOURLY OFFYou can also use the corresponding commandline arguments
+m
,
+W
, +d
, -D
, +h
,
-H
(use +
to turn the corresponding reports on,
-
to turn them off).
You should use these reports sensitively. If your output is 200k long, people won't be able to download it. In particularly, you probably don't want a daily report very often, and you certainly don't want an hourly report unless you have restricted the analysis to just a couple of days.
The graphs above are designed to produce coloured bars on graphical browsers and ASCII graphs on non-graphical browsers. They don't use tables or image-stretching properties, so should work on any browser. However, you can produce plain ASCII graphs instead by the command
GRAPHICAL OFF # or ON to turn it back on againThis has the advantage of producing smaller output which does not require any images to be downloaded.
The graphs rely on having the images distributed with analog available in
the directory IMAGEDIR
specified in analhead.h; or you
can override that choice with a command like
IMAGEDIR /Images/
You can change the character used in the graphs on non-graphical terminals by means of a command such as
MARKCHAR '#' # put in quotes so that it isn't a comment
The graphs can be plotted by bytes transferred instead of by requests. This can be done by means of commands like
MONTHGRAPH B # by bytes WEEKGRAPH R # by requestsThere are also commands
DAYGRAPH
, FULLDAYGRAPH
,
HOURGRAPH
and FULLHOURGRAPH
. Alternatively, you can
add the letter after the relevant commandline argument; for example,
+hB
to turn on the hourly summary with a graph sorted by bytes.
You can display the graphs backwards (with most recent requests at the top) by means of commands like
MONTHLYBACK ON # or OFFThere are also the commands
WEEKLYBACK
,
FULLDAILYBACK
and FULLHOURLYBACK
. The hourly summary
and daily summary cannot be displayed backwards. I find it confusing to have
some of the reports going backwards and some forwards, so you can also use
ALLBACK ON # or OFFto change all four of the reports to backwards or forwards together.
You can specify which columns appear in the various reports in which order. The above example showed the number of requests being given. You can also have the percentage of the requests, the number of bytes, and the percentage of the bytes. For example, the command
MONTHCOLS RBbrtells analog to include in the monthly report columns for number of requests (R), number of bytes (B), percentage of bytes (b), and percentage of requests (r) in that order. The other commands are
WEEKCOLS
,
DAYCOLS
, FULLDAYCOLS
, HOURCOLS
and
FULLHOURCOLS
.
For some reports, analog needs to know where weeks begin and end. You can specify
WEEKBEGINSON WEDNESDAYto change it to Wednesday, for example. (I guess Sunday or Monday is more likely).
In the graphs, analog will choose the value of the unit
()
automatically based on the length of the largest bar and the width of the
page. You can specify the page width with, for example,
PAGEWIDTH 70or the commandline option
+w70
. (I find about 65 works well).
Occasionally you may want to specify the value of MONTHLYUNIT 1000Setting it to 0 makes analog choose it automatically again. Of course, the other reports have
WEEKLYUNIT
, DAILYUNIT
,
FULLDAILYUNIT
, HOURLYUNIT
and
FULLHOURLYUNIT
.
Domain report
#reqs : %bytes : domain -------- -------- ------ 103125 : 46.58% : .uk (United Kingdom) ( 64982):( 35.45%): cam.ac.uk (University of Cambridge) ( 47138):( 20.55%): statslab.cam.ac.uk 49290 : 12.49% : .edu (USA Educational)
Host report
#reqs: %bytes: host ----- ------ ---- 10: 0.03%: zlsm03.arcs.ac.at 11: 0.04%: iki10.boku.ac.at 158: 0.15%: talus.maths.su.oz.au
Directory report
#reqs: %bytes: directory ------ ------ --------- 237985: 35.40%: /~sret1/ 18596: 17.60%: /~rrw1/ 3574: 11.89%: /~richard/
Request report
#reqs: %bytes: filename ----- ------ -------- 33980: 23.66%: /~sret1/backgammon/main.html 21162: 2.69%: /~sret1/backgammon/bitmaps/board.xbm 12690: 0.86%: /
Referer report
#reqs: refering URL ----- ------------ 260: http://webcrawler.com/cgi-bin/WebQuery 239: http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/HTTP/Servers/Log_Analysis_Tools/ 185: http://guide-p.infoseek.com/WW/NS/Titles?qt=backgammon&col=WW 149: http://www.yahoo.com/Recreation/Games/Board_Games/Backgammon/
Browser summary
#reqs: browser ----- ------- 16797: Netscape 1532: Mosaic 693: IWENG 492: Lynx
Browser report
#reqs: browser ----- ------- 3105: Mozilla/1.22 (Windows; I; 16bit) 2785: Mozilla/1.1N (Windows; I; 16bit) 458: IWENG/1.2.003
These reports can be turned on and off with commands like
DOMAIN ON FULLHOSTS OFF DIRECTORY ON REQUEST ON REFERER OFF BROWSER ON FULLBROWSER OFFor with the commandline arguments
+o
, -S
,
+i
, +r
or +R
(see below),
-f
, +b
and
-B
. (As in the date reports, use +
to turn the
corresponding reports on, -
to turn them off).
Another similarity with the date reports is that you can tell analog which
columns to print on each report with the commands DOMCOLS
,
HOSTCOLS
, DIRCOLS
, REQCOLS
,
REFCOLS
, BROWCOLS
and FULLBROWCOLS
.
Again, each command is followed by letters indicating which columns are wanted
and in which order. For example,
DOMCOLS RrBb # no. of reqs, %age reqs, no. of bytes, %age bytes
Each of these reports can be sorted in four different ways; by bytes, by requests, alphabetically or randomly (i.e., unsorted). (The only advantage of the last one is so as not to spend time sorting very long reports). The commands to change this look like
DOMSORTBY BYTES # or REQUESTS or ALPHABETICAL or RANDOMThe commands for the other reports are
HOSTSORTBY
,
DIRSORTBY
, REQSORTBY
,
REFSORTBY
, BROWSORTBY
and
FULLBROWSORTBY
.
You can also add a letter b
, r
, a
or
x
after the relevant commandline option; for example,
+Sa
for a host report sorted alphabetically.
It is important to be able to specify how many entries you want printed in each report. This is done by means of two variables for each report, one specifying the minimum number of bytes if the sorting is by bytes, and the other specifying the minimum number of requests if the sorting is by any of the other three methods. The following configuration commands illustrate the possible usages.
DOMMINREQS 20 # all items with at least 20 requests HOSTMINREQS -20 # the first 20 items # NB: useless if alphabetical or random sort REQMINREQS 0.01% # all items with at least 0.01% of the requests DIRMINBYTES 100000 # all items with at least 100000 bytes REFMINBYTES 100k # all items with at least 100 kbytes # (10M etc. also work) BROWMINBYTES -40 # Top 40 if sorting is by bytes FULLBROWMINBYTES 0.005% # all with at least 0.005% of the trafficYou can also specify the amount on the commandline by adding it after the sort method. For example,
+Sr-50
turns on a host report, sorted
by requests, with only the top 50 items included, and +ib20k
gives a directory report, sorted by bytes, including all directories with at
least 20 kilobytes transferred.
We now describe features unique to a particular one of the reports. First the domain report.
Subdomains can be specified for each domain. The syntax of the command is
SUBDOMAIN subdomain subdomain_nameIf the subdomain name has spaces in, it must be enclosed in quotes. The subdomain name can be omitted, indicating a nameless subdomain. For example, to produce the example above, I would include the following lines in the configuration file
SUBDOMAIN cam.ac.uk 'University of Cambridge' SUBDOMAIN statslab.cam.ac.ukNumerical subdomains (which have most significant part on the left) can also occur. They will look like
131 The Ever-Popular 131 domain 131.111 # NamelessAlso subdomains with wildcards in can occur. The following are examples:
SUBDOMAIN *.edu # mit.edu, umn.edu etc. SUBDOMAIN 131.111.* # 131.111.1, 131.111.2 etc. SUBDOMAIN % # all top-level numerical domains, from 1 to 255The variables
SUBDOMMINREQS
and SUBDOMMINBYTES
can
be specified in the same way as above, except they can't be negative.
If you ask for wild subdomains, you will probably want to set the minimum
requests and minimum bytes quite high.
However, you cannot alter the sort order; within a domain, subdomains will
always be output in alphabetical order.
There is a command NOTSUBDOMAIN
to erase a previously requested
subdomain. For example, you can write
NOTSUBDOMAIN *.edu NOTSUBDOMAIN cam.ac.ukHowever, if you request, for example,
*.edu
, then
NOTSUBDOMAIN mit.edu
will ont override it.
The domain report relies on having a domains file available, listing which geographical locations correspond to which domains. Which file is to be used as the domains file can be specified by the command
DOMAINSFILE domainsfileThe correct format of the domains file is explained in a separate section.
There is little to say about the host report, except to note that alphabetical sorting is by domain as most significant part. This report can be very long and slow to sort, and should be used with a high floor if at all.
The directory report has one further variable, which is the level (or depth) of the directory report. The example above is a level 1 report; a level 3 report might look like
#reqs: %bytes: directory ------ ------ --------- 43772: 72.06%: /~sret1/backgammon/ 173426: 19.93%: /~sret1/backgammon/bitmaps/ 11298: 4.14%: /~sret1/This can be specified by the commandline option
+l3
or the
configuration command
DIRLEVEL 3Note that the figures for each directory do not include those for the subdirectories of that directory, except where the directory is at the deepest level. So in the above example,
/~sret1/backgammon/bitmaps/dice/d1.xbm
would be reckoned in the
directory /~sret1/backgammon/bitmaps/
(which is at the deepest
level) but not in the other two directories.
We mentioned above that the request report has
two commandline
arguments, +r
and +R
. The difference is that
if the commandline option +r
is
used, only pages will be displayed in the report. If you want to list all
files, including, for example, graphics, then you should use +R
instead. Alternatively the configuration command
REQTYPE PAGES # or ALLwill control whether pages or all files are listed.
There are three possible modes of linking in the request report; you can link
to none of the files, or pages only, or all files. The commandline options
for these are -k
, +k
and +kk
respectively; or you can use the configuration command
PAGELINKS OFF # or ON, or ALLThere is also a related command
BASEURL
to
specify a URL to prepend to the links. For example, if
BASEURL http://www.statslab.cam.ac.ukwere specified, then
/~sret1/analog/
would be linked to
http://www.statslab.cam.ac.uk/~sret1/analog/
. This is useful
if you want to display the statistics on a different server than the one
they belong to. (See below for combining logfiles from
two different servers).
You can also specify in the configuration file what should be counted as a
`page' in the requests report. At the beginning, the following are
`pages': *.html
, *.htm
, *.shtml
,
*.shtm
, *.html3
, *.ht3
and
directories (*/
). The command
ISPAGE filenamewill specify that some other file is a `page'. You can give a list of filenames, separated by commas (without spaces). For example,
ISPAGE *.ps,*.ps.gzwould mean that Postscript files and gzipped Postscript files are to be regarded as pages. You can also use
ISNOTPAGE filenameto specify that something which would otherwise be a page is not to be regarded as a page.
The referer report, browser summary and browser report have no special commands, although the relevant logfiles must be present on the system (see below for how to specify where they are). It is important to note, however, that they are notoriously inaccurate. For the referer report, many browsers do not pass this information to the server and many pass it wrongly (sending the URL of the previous page even when your page was not reached by selecting a link from that page). For the browser reports, some browsers even lie deliberately about what sort of browser they are, or let users configure the browser name. Furthermore, there is no fixed format for browser information. (NB: I have combined all Mosaics as a special case). In addition, graphical browsers automatically generate more requests than non-graphical browsers by loading the graphics, so it is not a very good guide to browser usage. Interpret them with extreme caution.
#occs: error type ----- ---------- 19360: Send timed out 11286: Send aborted 7962: File does not existThe status code report lists how many of each type of status code occurred in your logfile:
#occs: no. description ----- --------------- 35564: 200 OK 173: 301 Document moved 3: 302 Document found elsewhere 5732: 304 Not modified since last retrievalThey are turned on and off by commands like
STATUS ON ERROR OFFor by the commandline arguments
+c
and +e
. There is
a command ERRMINOCCS
which says how many occurrences of an error
there must be before it appears on the error report. For example
ERRMINOCCS 20
The first thing to know is how to specify a different logfile to analyse. A default one should have been specified in analhead.h, but you can also specify one by just putting its name on the commandline; so, for example, the command
analog logfile.logwill use that logfile for its report. Analog will read the common log format (which most servers write) as well as the old NCSA format and the NCSA combined log format (which includes referer and agent information). Detection of which format each line of the logfile is in is automatic. You can also write
analog -to use standard input as the logfile. (This is useful in constructing pipes). You can also specify which logfile to use in the configuration file by means of a command like
LOGFILE logfile.log # or stdin for standard inputYou can specify several logfiles on one configuration line by separating their names with commas (no spaces). For example
LOGFILE log1,log2,log3
Sometimes it is necessary to combine logfiles from two different servers,
without getting filenames that happen to be the same on both servers confused.
To do this you can use a second argument to the LOGFILE
command,
specifying a prefix for each filename. For example
LOGFILE log1,log2 http://www.a.com # These logfiles from a.com LOGFILE log3 http://www.b.com # This one from b.comIf you use this, the directory report will need specifying to a deeper level.
Logfiles specified in the user's configuration files and commandline options
replace any specified in the default configuration file, and are in turn
overridden by any in the mandatory configuration file. In addition you can use
none
as the name of the logfile to overwrite the specification of
all previous logfiles.
Analog can also read the NCSA/Apache referer log, agent log and error log formats. Logfiles of these types can be specified by commands like
REFLOG referer_log BROWLOG agent_log.old,agent_log ERRLOG error_logThe same comments about which logfiles replace which apply as in the last paragraph.
Analog can uncompress compressed logfiles. You need to tell it how to uncompress each type of file by supplying a command that sends the uncompressed file to standard output (rather than uncompressing it into a file). The file can be a list of type of files, separated by commas. For example, depending what commands are on your system, you can use
UNCOMPRESS *.gz "gunzip -c" # or UNCOMPRESS *.gz,*.Z gzcatThis would be a suitable command to include in the default configuration file.
There are various commands which instruct the program
to analyse only part of the logfile.
First, you can instruct the program only to take into account certain files.
This is done by means of the FILEINCLUDE
and
FILEEXCLUDE
commands. Each command can have a list of filename, separated by commas (no
spaces). One asterisk and any number of question marks can appear in each of
the filenames specified, as wildcards. Each file is included and excluded as
each new command is reached. Unspecified files are included if the first
command found was an exclusion, and excluded if the first command found was an
inclusion. For example, the configuration
FILEINCLUDE /~sret1/* FILEEXCLUDE /~sret1/backgammon/*,/~sret1/analog/* FILEINCLUDE /~sret1/backgammon/*.gifwould instruct the program to examine only my files, excluding my backgammon and analog files, but including gifs in my backgammon directory. On the other hand,
FILEEXCLUDE /~sret1/*would analyse all files except mine. Remember you can always run
analog
-v
to see what the options you have specified represent.
You can exclude all gifs with FILEEXCLUDE *.gif
but this may not
be what you want to do. This will then exclude them from all the reports,
and not count the bytes transferred due to them. More likely, you just want to
exclude them from the request report while still including them in the other
reports, which you can do by means of
REQTYPE PAGES
.
There are similar commands HOSTINCLUDE
and
HOSTEXCLUDE
to analyse only the requests from certain sites. For example,
HOSTEXCLUDE emu.pmms.cam.ac.uk HOSTEXCLUDE *.statslab.cam.ac.ukwould ignore accesses from emu and from the whole of the statslab.
There are also commands REFINCLUDE
and REFEXCLUDE
for referers. You probably want to ignore referers from your own site. For
example, I use
REFEXCLUDE http://www.statslab.cam.ac.uk/*This would be a suitable command to put in your default configuration file.
Finally, there are commands to analyse only a subset of the dates in the
logfile. The simplest usage is FROM yymmdd
and
TO yymmdd
. So, for example, to analyse only requests in July
1995 I would use the configuration
FROM 950701 TO 950731Also each of the pairs of digits can be preceded by
-
and the
month and date can by preceded by +
to represent time relative
to the current date. This allows constructions like
FROM -01-00+01 # from tomorrow last year TO -00-0131 # to the end of last month (OK even if last month # didn't have 31 days) FROM -00-00-112 TO -00-00-01 #statistics for the last 16 weeksThere are commandline abbreviations
+F
and +T
for these commands; for example +T-00-00-01
looks at statistics
until the end of yesterday. -F
and -T
turn off the
from and to, as do FROM OFF
and TO OFF
.
If a TO
command is given, the figures for the last 7 days refer
to the time until then.
FILEALIAS file1 file2says that whenever
file1
occurs in the logfile, it is to be
replaced by file2
. Analog already understands that
/dir/index.html
is the same as /dir/
and translates
`escaped' entities (e.g., %7E
is the same as ~
)
so these don't need to be specified separately. It also understands that
..
means `parent directory,' .
means `this directory'
and //
is the same as /
, and translates those
filenames to their canonical forms.
Wildcards can occur in the aliases. For example, after
FILEALIAS /~sret1/*.gif /images/*g.gif FILEALIAS /~sret2/a?c* /sa/*
/~sret1/a.gif
would be translated to /images/ag.gif
and /~sret2/abcd.txt
would become /sa/d.txt
.
There are also the commands HOSTALIAS
and REFALIAS
(for referers) which work in the same way. HOSTALIAS
is particularly useful if your
server records local hostnames in the logfile
instead of full internet names. Also, if a host has two names, they
can be combined in this way. So, for example, I might find it
convenient to use
HOSTALIAS lion lion.statslab.cam.ac.uk HOSTALIAS www lion.statslab.cam.ac.uk HOSTALIAS www.statslab.cam.ac.uk lion.statslab.cam.ac.uk
A pair of related commands is WITHARGS
and
WITHOUTARGS
. Normally any arguments given as part of a URL (after
a question mark) are ignored. However, if a configuration command like
WITHARGS /cgi-bin/prog.cgiis given, then the arguments to that file will form part of the filename. So
/cgi-bin/prog.cgi?a
and /cgi-bin/prog.cgi?b
will be
regarded as separate files, whereas without that command they would both have
been translated to /cgi-bin/prog.cgi
. Note that the filename with
the arguments still has to fit inside the maximum length of a filename.
Asterisks and lists of files can again occur, and
there is also a parallel command WITHOUTARGS
; for example,
WITHARGS /cgi-bin/* WITHOUTARGS /cgi-bin/spam.cgiwould read the arguments for all files in
/cgi-bin/
except
spam.cgi
.
Commands REFWITHARGS
and REFWITHOUTARGS
work in the
same way for referers, except that in this case the default is to include all
the arguments (so that you can see what people are requesting from search
engines).
The ability to look up numerical IP addresses and translate them to hostnames has been removed in this version of analog because it didn't work well and caused problems on some systems. I recommend instead pre-processing the logfile with the program logresolve.c (which is distributed with the Apache server).
To produce a cache file instead of the normal output, use the command
OUTPUT CACHETo read data from a cache file, use, e.g.,
CACHEFILE cache.out(This will still read the ordinary logfile as well). You can also use the commandline argument
+Ucache.out
.
To use this feature and avoid losing entries or double counting them, I suggest you follow the following procedure.
Although it should now be safe to throw away the old logfile, I can take no
responsibility if something goes wrong. This is beta test software and is
expected to contain bugs. Also if you are going to use this feature please
make sure you understand what information is and is not recorded in the cache
file. You may find that the cache file is not the right feature for you.
Compressing logfiles (with gzip -9
) is very efficient owing to
the large number of repeated strings. That in itself may solve your filespace
problems.
+a
or -a
, or the
configuration command
OUTPUT ASCII # or HTMLIf you choose ASCII output, some of the other options are ignored, but it should be obvious which ones they will be.
You can select the file for the output to be sent to in the configuration file or on the commandline. So instead of
analog > outfile.htmlyou can use the configuration command
OUTFILE outfile.htmlor the commandline option
+Ooutfile.html
.
There is a configuration command REPORTORDER
which
specifies which order the reports should occur in. The usage is a line like
REPORTORDER hHDdWmoSirfbBecThis says that the reports should occur in the order hourly summary (h), hourly report (H), daily report (D), daily summary (d), weekly report (W), monthly report (m), domain report (o), host report (S), directory report (i), request report (r), referer report (f), browser summary (b), browser report (B), error report (e) and status code report (c). It is important to include all the above fifteen letters exactly once each.
There is a command
ALL ONto include all reports except the hourly report (particular ones can then be omitted with
-d
or whatever);
likewise ALL OFF
omits them (and particular ones can then
be included). The equivalent commandline arguments are +A
and
-A
. The hourly report and general report are not turned on by
ALL ON
or +A
; they must be turned on separately with
+H
and +x
. Note also that order is important; for
example, +i -A +r
will
include the request report but not the directory report.
The title line of the output page contains three adjustable variables. First, the logo in the top left hand corner can be turned on or off, or any other logo substituted (for example, your organisation's logo). This is accomplished by the command
LOGOURL url # or noneor by the commandline arguments
-p
(no logo: mnemonic, p for
picture) and +pURL
.
The organisation name on the title line can be specified by means of the
option -nname
; the hostname of your server would also be an
appropriate thing to put here. The name can have a link to your server's home
page by use of the option -uURL
; use -u-
if you
don't want any link. The equivalent configuration options are
HOSTNAME name # must be in quotes if it contains spaces HOSTURL URL HOSTURL - # for no link
A header file and footer file can be inserted near the top and bottom of your output. These should be written in HTML or ASCII according to whether your output is HTML or ASCII, and can contain anything you want. Possible uses include providing information about your organisation or about the way the statistics were calculated, linking to related pages, and no doubt many other things. The commands to achieve this are
HEADERFILE filename FOOTERFILE none # if you don't want one
There is a command SEPCHAR
to say which character should separate
each group of three digits in long numbers. For example,
SEPCHAR ,will give 123,456,789, whereas
SEPCHAR ' 'will give 123 456 789.
You can specify whether analog prints long numbers of bytes as exact numbers (e.g., 5,053,234) or as kilobytes, megabytes etc. (e.g., 4934k) by the command
RAWBYTES ON # for exact, OFF for abbreviated
There is a debugging command, for printing (to stderr) problems with your logfile. There are currently three levels of debugging: 0 for no debugging, 1 for printing corrupt logfile lines (prepended by "C:"), and 2 which also prints hosts for which the domain is unknown (prepended by "U:") and errors which cannot be classified (prepended by "E:"). The command for level n debugging is
DEBUG nand the equivalent commandline argument is
+Vn
(V for
verbose).
You can also use commandline options +V
for level 1 and
-V
for level 0.
Finally, there is an option to turn off warnings. It is
WARNINGS OFF # or ONThe equivalent commandline argument is
-q
to turn warnings off
(q for quiet) and +q
to turn them on again.
ad Andorra ae United Arab Emirates [...]There can be arbitrary space between the code and the corresponding location. The codes are converted to lower case. Use
?
(or anything starting with
?
) for the name if you want the domain
to be recognised, but don't want the name to be printed out.
The domains do not need to be in alphabetical order, though humans
may prefer it that way.
Comments can occur in the domains file. They are introduced by the
character #
.
So you could write, for example,
uk United Kingdom # God save the Queen!
To set up the form interface, go to the directory where the analog source code lives, and follow these steps.
FORMPROG
is set to be
the URL of the form processing program, which will be wherever cgi-bin
programs live on your server; normally in the cgi-bin directory.
make form
.
FORMPROG
.
Make sure it is executable by the server.
If the third step above fails to generate a form, you can generate one
yourself by means of the command analog -form +Oanalogform.html
.
You might also want to run this command yourself if you want to supply
different default options from normal for the form user: if you run the
command with extra commandline or configuration file options, they will be
respected in the construction of the form.
It is expected that system administrators may want to provide different options on the forms from the default ones. For this reason, the cgi program understands various other options that are not normally on the form. These can be added to the form by hand. For example, you may want to allow a choice of logfiles, perhaps via a <select>. Or you may want form users to use certain default options; these could be specified as <input type=hidden>. Because the form uses GET not POST you can also construct links to it. For experts, here follows a complete list of form options.
bq browser summary? 0 for off, 1 for on, 2 for default. ba +ve MINBROWREQS bb -ve MINBROWREQS bc +ve MINBROWBYTES bd -ve MINBROWBYTES bs BROWSORTBY (0 = REQUESTS, 1 = BYTES, 2 = ALPHABETICAL, 3 = RANDOM) Other reports similarly with initial B, f, i, o, r, S in place of b. cq status code report? dq daily summary? dg DAYGRAPH (R or B) Other time reports similarly with D, h, H, m, W in place of d. eq error report? fi FILEIGNORE; list, separated by commas fr FROM fy FILEONLY; list, separated by commas hi HOSTIGNORE; list, separated by commas ho HOSTURL hy HOSTONLY; list, separated by commas ie DIRLEVEL lb BROWLOG; list, separated by commas lc CACHEFILE; list, separated by commas le ERRLOG; list, separated by commas lf REFLOG; list, separated by commas lo LOGFILE; list, separated by commas or HOSTNAME ou OUTPUT -- 0 for HTML, 1 for ASCII rl REQLINKS -- f for ALL (files), p for PAGES, n for OFF (none) rt REQTYPE -- f for ALL, p for PAGES to TO TZ timezone xq general report?
If the form doesn't seem to work, check the following:
setenv QUERY_STRING "xq=1"
(C Shell) or
export QUERY_STRING="xq=1"
(other shells), then run
analform from the shell.
<input type=hidden name="TZ"
value="">
For the value you should insert your timezone, in standard format.
Usually this looks like your winter timezone name, followed by hours
west of Greenwich, followed by your summer timezone name. So the East
Coast of the USA should have value="EST5EDT"
,
and Germany value="MEZ-1MESZ"
.
It is better, although not essential, if when you change the default options for your analog, you remake the form.
Note that you probably want to restrict access to the form and form program to certain users; if it is world readable there could be considerable load on your server as well as potential confidentiality problems. Exactly how to do this depends on which server you are running.
Unfortunately, you cannot tell how many times your file has been read from this. The user may in fact request the file from a proxy server which already has a copy of it, or retrieve it from a local cache. In these cases no connection is made to your server, and no request is scored.
There are three categories of request, which can be seen in the status code report. Completed (or successful) requests are those with codes in the 200s (where the document was returned) or with code 304 (where the document was not needed because it had not been recently modified and the user could use a cached copy). Redirected requests are those with other codes in the 300s. The most common cause of these requests is that the user has incorrectly requested a directory name without the trailing slash. The server replies with a redirection ("you probably mean the following") and the user then makes a second connection to get the correct document (although usually the browser does it automatically without the user's intervention or knowledge). Failed requests are those with codes in the 400s (error in request) or 500s (server error). They come about for a variety of reasons, but the most common are when the requested file is not found or is read-protected.
The total data transferred refers only to successful requests, and does not include the message header, only the actual data. The detailed reports also only include successes.
Corrupt logfile lines are those we can't understand, and unwanted lines are those that refer to files, hosts or dates that we have specifically excluded.
Here is a complete list of all 121 configuration commands. For their usage, see the full documentation.
Here is a summary of all 39 commandline arguments. Again, for their usage, see the full documentation. Many of them can be given a - instead of a + to turn something off.
+7 stats for last 7 days +a ASCII output +A all reports (except hourly report) +b browser summary +B browser report +c status code report +C configuration command +d daily summary +D daily report +e error report +f referer report +form do a form +F from +g configuration file -G default config file off +help help message +h hourly summary +H hourly report +i directory report +k pagelinks +l dirlevel +m monthly report +n hostname +o domain report +O outfile +p logo -q no warnings +r request report, pages only +R request report, all files +s host count +ss approximate host count +S host report +T to +u host url +U cache file +v printvbles +V debug level +w pagewidth +W weekly report +x general summary
CFLAGS
in the Makefile to turn on the
ANSI option in a compiler like cc.
REQTYPE ALL
to list all files
in the request report, or ISPAGE
to say that this file is a
`page.'
index.html
).
+ss
option, or turning hostname counting off altogether
with -s
.
If we are doing a `top n' report and two entries tie for nth place, only one will be printed.
The reported `running time' is elapsed real time, not CPU time.
You can sort a report by requests even when you have turned off the request columns. This may confuse your readers.
The behaviour of FILEALIAS a b
; FILEALIAS b c
is
undefined.
Do not alias a file to itself
(e.g., FILEALIAS /home.html /home.html
) or a host to itself, or
it will get lost.
I am happy to help people who have trouble with analog, but please read the FAQ and list of known bugs first. Also, you might be able to diagnose the problem yourself if you run
analog -v [your usual options]which lists the value of all variables. But if you still can't get it to work, ask me. It helps me find bugs, and to know where the documentation is unclear. When submitting bug reports, please include the version number (which you can find out by the command
analog -v
).
The following features are already on the list to be done by version 2.0. Let me know if you have any comments on them.
http://www.statslab.cam.ac.uk/~sret1/analog/proposal.html
)
and I want more feedback on it first.
I would also welcome discussion on the following issues.
Thanks are also due to all those who helped in the early stages of writing this program. Those who made helpful suggestions during beta testing are numerous, but I must mention particularly Dan Anderson, Martyn Johnson, Joe Ramey, Chris Ritson, Quentin Stafford-Fraser and Dave Stanworth; and above all Gareth McCaughan for lots of programming advice, particularly in making the code faster.
Page last modified: 07-Feb-96