EUNIS97, Grenoble (France) 9-11 September 1997
Ref: 030104>
An analysis of the e-mail
traffic by a server called TIGRIS.KLTE.HU
Introduction
One of the first use of Internet is the
electronic mailing. Messages are sent between users of computer
systems and the computer systems are used to hold and transport
essages. There are several advantages of electronic mailing as
it is fast, cheap, comfortable and so on. The number of Internet
users increasing exponentially therefore more and more people
is able to get and send e-mails. There is no doubt about that
the academic staff and university students are used to it very
much. It is a part of not only research work but of the everyday
life as well and stooping services of e-mail would cause unimaginable
difficulties at keeping in touch. Electronic mail is an important
component of an office automation system. There is a problem one
should pay attention to writing an e-mail it is the problem of
usage of special characters of languages different from English.
The Hungarian Umlaut Û, say, supposed to be encoded and
decoded as well before reading it. Most of the client software's
are providing automatic coding services. The most popular one
at our University is The Pegasus mail with Uuen/decoding possibilities
among others. Newer systems support the composition and delivery
of multimedia mail, which can combine text, graphics, voice, facsimile,
and other forms of information in a single message
The mail server called Tigris
In 1991 a VAX6000/510 was installed at our
University it was one of the strongest server that time in the
country with one VAX processor, 128 MB RAM, 6 Gigabyte DSSI Winchester
later an extra 6 Gigabyte SCSI Winchester was added. The main
job of the server is providing Internet services for more then
4000 users and it is playing the role of mailgateway for some
other servers. It can be reached by Ethernet (10 Mbit/sec) from
the campus and by FDDI Ring (100 Mbit/sec) from all others institutions
of higher education of the our city Debrecen end by 512 Kbit/sec
leased line from Budapest. Besides the protocols TCP/IP and DECnet
the protocol POP3 has also been implemented because of the extensive
use of the clients Exchange and Pegasus Mail.
Logging of e-mails
The logging of e-mails gives information
about the e-mail traffic. On the base of logging one may monitor
the most popular mailing lists the activity of users and so on.
The number of outgoing e-mails is not the same as the number of
e-mails sent by the users of the server because of the mailgateway
function. Note here that the e-mail traffic of the server is sum
of its users mailing and the e-mail traffic some other servers.
The e-mail traffic of the University is larger then the traffic
of this server because there are servers not using the Tigris
as mailgateway. The public domain message transport system called
Message Exchange 4.2 is taking the responsibility for the mailing
function of Tigris. It is familiar with several protocols as SMTP,
NJE, UUCP and so on. As the gateway it is used to carry out necessary
protocol conversion if it necessary. It is reliable and is running
for several months without any problems. In 1996 two protocols
were used by Tigris for transferring mails
- SMTP: Simple Mail Transfer Protocol
for TCP/IP and
- LOCAL one for local users.
During logging several data are recorded
according to the protocols. The choice of data to be recorded
is depending on the postmaster. In our case the data are collected
by the following table.
| Protocol
| SOURCE
| HOST |
USER | SENT
| SIZE |
DATE | TIME
|
| LOCAL | +
| - | +
| - | +
| + | +
|
| SMTP | +
| + | -
| + | +
| + | +
|
The agents of MX are registering the data
of an e-mail when it leaves the MX-queue. The only outgoing mails
are logged so every mail is logged once. The logged data are transferred
to an Oracle data base V7 and tables are made by SQL questioning.
The MX system was installed in 1994 therefore the time series
of daily traffic of the Tigris available for three years 1994,
1996 and 1996. The data are analyzed by the help of MS Excel and
SPSS using standard methods of time series.
The following figure shows
the daily traffic in 1995 the number
of letters are plotted against days.
One may realize
the increasing number of letters at the beginning and decreasing
by the end of the semesters. It is a pity that there is no period
by semesters because of the difference in the length of the winter
and summer holidays.
We summarizing the e-mail traffic by years
and by protocols. In 1994 Bitnet SMTP and DECnet SMTP was also
running. It is seen that there is not too much difference between
the average size of the letters. The first step of the statistical
analysis was preliminary transformation to get rid of outliers.
The cause of outlier data is the problems with network, server
and mailing system. The outliers was changed by a regression method
using the data neighboring it.
The number of letters increasing of cause
and maximal value is decreasing it is because the stability of
the leased line and the mailing system became better and better.
The minimal value in 1996 was 37 showing that practically there
was no fault in the delivery.
For the detection of periods the estimated
and smoothed spectrum is considered.
The figure above contains the plot of the
spectrum for the time series of the daily traffic in 1995. The
values of the spectrum, i. e. the spectral density is plotted
versus periods in days. It has a high peak at 7 which means not
surprisingly that there is a weakly period.
The table of the descriptives statistics
concerning to the seven day period shows that the working days
in each year are significantly different form the holidays, the
minimum p-value is 0.4. The means of the working days traffic
are different nevertheless theses difference are not significant.
This is the case for the holidays as well. It was checked by t-test.
The time series of the weakly averages does
not contain the period any more and allows us to make further
analysis. The correlation between the time series of 1994, 1995
and 1995 was calculated. The data of 1994 proved to be independent
from both series of 1995 and of 1996. Therefore the base of the
decision about trend in the series of weakly averages was the
years 1995 and 1996. The question is whether the series contains
trend i.e.
can be put into the form
,
Testing the hypothesis b=0 we differentiated
the series once
,
and used the one sample t-test for
testing
. The estimated value for b
is 3,8 -with 2-tailed p-value 0,94 therefore there is no reason
to assume that there is any trend around.
Now we are in the position to predict the
weakly average series in 1997.
Denote
the traffic
in 96 and
the prediction for 97. It is
calculated by the formula
,
t=1,2,
.,52
where a = 279,5. The measured and the predicted
values for the first 19 weeks is plotted above.
References.
- Brillinger, D.R.1975), Time Series:
Data Analysis and Theory, Hold Reinhart Winston, New York.
- Zoltán Gál, Zsolt Korcsolay,
György Terdik: ``UDNET: An Informatics network at Universitas
of Debrecen``, Trends in Academic Information Systems in Europe
-Conference, Dusseldorf, November 1995.
- Zoltán Gál, Ida Rápolti,
Katalin Rutkovszky, György Terdik, Role of the computer center
in migration to Information Society-A case study at Lajos Kossuth
University of Debrecen, EUNIS97.
Center for Informatics and Computing,
Lajos Kossuth University of Debrecen,
4010 Debrecen, Pf. 58, Hungary
E-mail: terdik@cic.klte.hu,
perdosi@cic.klte.hu
Copyright EUNIS 1997 Y.E.