Detection of Compromised Email Accounts used for Spamming in Correlation with Mail User Agent

(1)

Detection of Compromised Email Accounts used for Spamming in Correlation with Mail User Agent

Access Activities Extracted from Metadata

Carlo Sch¨afer

Friedrich-Schiller-Universit¨at Jena Jena, Germany

Abstract—Every day over 29 billion spam and phishing mes- sages are sent. Commonly the spammers use compromised email accounts to send these emails, which accounted for 57.9 percent of the global email trafﬁc in September 2014. Previous research has primarily focused on the fast detection of abused accounts to prevent the fraudulent use of servers. State-of-the-art spam detection methods generally need the content of the email to classify it as either spam or a regular message. This content is not available within the new type of encrypted phishing emails that have become prevalent since the middle of 2014. The object of the presented research is to detect the anomaly with Mail User Agent Access Activities, which is based on the special behaviour of how to send emails without the knowledge of the email content.

The proposed method detects the abused account in seconds and therefore reduces the sent spam per compromised account to less than one percent.

Index Terms—compromised email account, hacked, spam, phishing, encrypted phishing, MUAAA, Mail User Agent Access Activities

I. INTRODUCTION

The global spam volume per day in 2013 was over 29 billion messages [1]. 57.9 percent of the global email trafﬁc in September 2014 was spam. Nevertheless, the phishing rate went down from one in 395 messages in May 2014 to one in 2,041 messages in September 2014. The primary information phishers seek is the login credentials for various accounts [2].

These compromised accounts are commonly abused for sending new spamming and phishing messages. Therefore they are one of the sources of the huge amount of worldwide spam messages. Moreover, it is possible to abuse these stolen login credentials on other systems because the username and password are often the same for different services, or there is a single sign on.

Why are valid credentials necessary for spamming? From a normal internet access point, it is not possible to send an email directly to a recipient Simple Mail Transfer Protocol (SMTP) server [3]. Most of the SMTP servers, which are called Mail Transfer Agent (MTA), use a DNS-based Blackhole List [4]

and do not accept connections from dialup IP addresses except from known local authenticated users with correct credentials.

This fact is one of the reasons phishers and spammers abuse foreign accounts and authenticate on the associated MTA. In Figure 1 the last MTA (foreign domain) does not directly accept emails from a spammer, but from their own outgoing

MTA, which is a valid server. Another reason is to hide the spammer’s identity because the destination MTA sees only the abused MTA and not the real spammer IP.

To prevent their own server against spamming, the compromised accounts have to be detected. The goal is that their own server does not get a bad reputation [5] from other servers, which happens if spam is sent. With a bad reputation there is a high risk of having IP addresses blocked by the receiving networks. Therefore with an abused email account, it is not possible for the email infrastructure to guarantee a high delivery rate to other servers for their own customer.

The challenge is to detect compromised accounts and prevent spamming. A side effect of this is that spam is blocked at the source of generation and not later by the destination server that should receive the spam.

There are also other state-of-the-art spam detection methods available, but most of them are based on content analysis. In some companies and environments it is forbidden to access email content with technical methods like spam detection programs due to data privacy, or it is technically not possible when an email is encrypted. In such scenarios most of the spam ﬁlters do not work. In this case, the current way to protect the server with authenticated users are methods like IP or account rate limiting and Country Counting (CC) or Theo- retical Geographical Travelling Speed (TGTS) [6]. However, these methods only work if a high number of spam emails or authentications, for example from a botnet, are available.

A. Normal Spam

Normally, spammers will try to send spam as fast as possible before the abused account is detected and blocked. It makes no sense for the spammer to send messages slowly because it gives the server administrator of the abused system too much time to interfere. However, sometimes spammers slow their speed to hide the sabotage and send a low number of daily spam for weeks or months.

B. Encrypted Phishing

In the middle of 2014 I detected a new kind of phishing message, which was content encrypted [7] and used in a real environment of hijacking user authentication information.

After a forensic data analysis, I reconstructed the mail ﬂow of

,(((

(2)

these messages. At ﬁrst a normal phishing email was received which looked like a standard newsletter, but contained an option to unsubscribe if a normal answer without private information was sent. The default behavior of the local mail client was to sign all outgoing emails with their own S/MIME [8] certiﬁcate and thus also with the unsubscribed email.

With this public key, an encrypted email from the phisher arrived in the account some hours later. This message could not be detected as a phishing message because the content was encrypted and invisible to state-of-the-art spam detection methods.

However, in this paper the presented method to detect this abused account is independent from the amount of messages sent per indicated period or the content of the message.

This method intervenes at the authentication step on the owner’s server side. It is more effective to block spam at this point because it is a centralised access point. The spammer or bots that are spamming are far away and cannot be controlled.

Therefore, the spam is blocked at the sender source and not later at the destination server.

In Figure 1 the workflow of the different connection types are illustrated. This workflow is only valid for human accounts (HA) in a closed user base. That means that non-human user accounts like technical accounts (TA), which for example only send notification emails like error messages or status information, were not observed [6] because they do not have a mailbox or access to one. This account type is primarily not regarded as a target for phishing emails because they cannot be read or interacted with.

Our method acts on the server side (at the same level as the owner Mail Delivery Agent (MDA) and incoming MTA, see Figure 1) and is not client based. Therefore it is possible to catch all of the abused accounts in a fast centralised way. For this detection method, only the authentication metadata from the MTA and MDA are necessary.

I will now outline the primary components [9] that are needed to send and read an email.

C. Mail Transfer Agent (MTA)

Mail Transfer Agents relay emails from one source to their destination, while using the Simple Mail Transfer Protocol (SMTP). A Mail Submission Agent (MSA) is nearly the same as a MTA, but it accepts emails on port 587 from only an authenticated Mail User Agent and not an anonymous MTA.

[10]

MTAs are the deliverer off spam and so they must protect themselves against this abuse with a variety of methods [4] [6]

[11] [12]. The easiest way is to deny the relay of non-local members to foreign domains. Therefore, everybody needs to authenticate themselves and be authorised as a local member.

D. Mail Delivery Agent (MDA)

The Mail Delivery Agent is software that delivers emails to a speciﬁc user mailbox. The user can access the email

Client

Internet Client

IMAP/SMTP SMTP

POP3/SMTP

SMTP Bot

SMTP Spammer Spammer

Our own MTA (incomming) Our own MDA

IMAP/POP3

Our own MTA (outgoing)

Other MTA (foreign domain) SMTP

SMTP Available metadata:

Connection time Authenticated user

IP

Internet SMTP SMTP

SMTP

Mailboxes

Fig. 1. Connection workﬂow

after authentication when using the Post Ofﬁce Protocol in the current standard version 3 (POP3) [13] or the current standard version 4 of Internet Message Access Protocol (IMAP4) [14].

E. Mail User Agent (MUA)

A Mail User Agent is typically called an email client or email reader. The normal behaviour of a user after starting the program is to at ﬁrst access the MDA and search for new messages, which are displayed after receiving them. The user can read and write emails with the MUA. If a new message is sent, the MUA connects to the MTA with valid credentials and sends this message with the SMTP to the MTA or MSA.

II. METHOD

To detect a compromised human account, metadata from the MTA and MDA can be combined because they correlate to the Mail User Agent Access Activities (MUAAA). This means that they have correlation points with the connection time, the authenticated account and the used IP address.

This method calculates the time difference between the authentication point on the incoming MTA and the time between the nearest access time on the MDA with the same account and source IP address in a deﬁned time period.

To make such an assertion, the different behaviours between a normal MUA and a spammer must be described.

(3)

A. Normal use

The normal MUA ﬁrst connects to the MDA with IMAP or POP3 and searches for new emails (some MUAs keep the IMAP connection open and others close it). Afterwards the user reads the new emails and possibly writes a new one or answers an email. If POP3 is in use, the email is commonly sent with this step, and it has already searched for new emails, so an access to the MDA is available. MUAs with IMAP normally store a copy of the email on the MDA in the sent folder. With this step, a new MDA access is also available.

B. Spammers

Spammers do not imitate such kind of human behaviour.

They act like a technical account. That means they only connect to the MTA and send emails without a connection to the MDA and look for new messages or stores the sent emails in the mailbox. This anomaly is signiﬁcant for an abused human account and makes it possible to label them as compromised.

III. EVALUATION ANDRESULTS

A. Evaluation Setup

The MUAAA method was implemented in a prototype system that consists of a virtual machine and a current available Linux with an independent rsyslog service in the latest version 8.7 and a relational database management system.

During each activity on the MTA and MDA, a syslog entry is locally generated. These syslog entries are forwarded to the prototype system, where a separated syslog service collects and stores the extracted metadata like the connection time, the authenticated account, and the used IP address for every email into a database. These data sets can then be analysed and the abused account can be labelled as compromised.

Parallel to this prototype there is a second system with full access to all incoming and outgoing email contents. This system consists of a modern server hardware with 64 GB of memory, 24 cores, and a current available Linux. Thise performance is necessary because this system checks, but does not delete or discard emails, with state-of-the-art spam detection methods, like IP or account rate limiting (with a limit of 50 messages per hour),Country Counting (with a limit of two countries per day) [6] and Theoretical Geographical Travelling Speed (with a limit from 1000 km/h) [6]. Of course amavisd-new [11] and SpamAssassin [12], including all rules and a required score of 4.9 are also available.

These are fairly aggressive parameters, and thus they produce thousands of false positive alerts, which are checked manually to ﬁnd all abused accounts and then tag them in the database.

This system, with the content-based spam detection method, represents the reference values for this evaluation.

B. Evaluation

In the observed period from December 2013 until October 2014 approximately 4.6 million real MTA and more than 167 million real MDA authentications from our university were

0 0͘1 0͘2 0͘3 0͘4 0͘5 0͘6

0 2 4 6 108 12 14 16 18 20

MTA Authentications

Millions

MDA Authentications Milliions

MDA MTA

Fig. 2. Authentications between December 2013 and October 2014

analysed, see Figure 2. Half of the MUA accesses to the MDA were made with POP3, the others with IMAP4 (Figure 3).

Our university email MTAs infrastructure handles more than two million incoming emails per month for approximately 27,000 active MDA accounts.

With this MUAAA method all available authentication metadata in the period between December 2013 and October 2014 were analysed.

The data sets are limited to human used accounts and must be cleaned up because some users use their account and credentials to send emails from other providers. For example, if an email with our locally handled email address as sender is sent from a remote provider, the provider uses the user’s credentials to send this message via our infrastructure without access to the MDA. This prevents the abuse of counterfeit sender addresses by the remote provider. Such provider SMTP servers were whitelisted to reduce possible false positive amounts.

If a new authentication on the MTA was successfully made, the necessary values like the time, the IP, and the used account are stored into a database. This information was combined with the correlated information from the MDA, which also stores the same information after a successful authentication. More precisely, the stored MTA database entries, where completed, with the shortest access time span to the MDA, if they exist.

If until then no access to the MDA has happened, nothing is modiﬁed on the database and it waits for the next MDA access with the correlated authentication and IP address.

C. Normal use

Figure 3 shows the closest connection time to the MDA compared with the sending time of an email.

Values greater than zero, represent how many hits in less than the specified minutes are found. Negative values label how many hits until the specified minutes are available. For example a MDA access of five seconds after the MTA access has happened is shown in minute zero or a MDA access five seconds before the MTA access is minus one.

Table I represents the time of the MTA authentication point, which means how big the time difference is between the MDA access before or after and how many MTA accesses are caught.

Table II shows the matched MDA access correlated with the MTA authentication and how long is needed to reach this amount.

(4)

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

-15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Percent

Minutes

IMAP POP3

Fig. 3. Closest access time difference between the MDA and MTA

TABLE I

ABSOLUTE TIME DIFFERENCE IN MINUTES BETWEEN SENDING AN EMAIL AND THE CORRELATEDMUAACCESS TO THEMDA

Absolute time difference Amount of activities

1 45%

3 68%

5 78%

10 87%

15 92%

D. Spammer

264 accounts were detected which were abused by spamming or phishing in the period between December 2013 and October 2014. Almost 25,700 abused MTA authentications were made. All of these authentications had no correlated access to the MDA until 48 hours before or six hours after.

However, these abused accounts were used normally with no such anomaly before those incidents.

Until May 2014 there was no anomaly detection system available and this data was analysed later with this MUAAA method. Between December 2013 and May 2014, 56 incidents were detected which authenticated 23,293 times. 519,776 spam or phishing emails were sent, which is an average of 22.31 messages per authentication and 9,282 messages per abused account.

Between May 2014 and August 2014 the systems detected the anomaly but only informed the administrator. In this

TABLE II

CUMULATIVE AMOUNT OF THE AVAILABLE CORRELATED ACCESSES TO THEMDAAFTER SENDING AN EMAIL

Time (minutes)

after MTA Auth closest MDA access previous MDA access

1 81.97% 99.47%

2 86.31% 99,50%

3 89.09% 99.53%

4 90.72% 99.55%

5 92.12% 99.56%

10 94.87% 99.61%

15 96,31% 99.64%

20 96.90% 99.67%

25 97.34% 99.69%

30 97.67% 99.71%

35 97.95% 99.73%

40 98.18% 99.74%

45 98.38% 99.75%

TABLE III

TIME BETWEEN THE FIRST ABUSED ACCESS AND BLOCKING THE ACCOUNT

Needed time 12/2013-05/2014 05/2014-08/2014 08/2014-10/2014

0 sec 1.8% 0.0% 98,80%

<1 min 0.0% 0.0% 1,20%

<10 min 1.8% 15.0% 0%

<1 h 7.1% 10.0% 0%

<6 h 3.6% 20,00% 0%

<12 h 7.1% 15.0% 0%

<1 d 12.5% 27.5% 0%

<7 d 21.4% 10.0% 0%

<14 d 23.2% 0.0% 0%

<31 d 21.4% 2.5% 0%

period, 40 accounts with 1,038 abused authentications and 33,578 bad emails were detected. An average of 32.35 messages per authentication and 839 messages per compromised account was reached.

After August 2014, the system detected abused accounts and intervened by itself. From August 2014 to October 2014, 168 accounts were abused with 598 authentications. Only 11,855 spam emails were sent, which is an average of 19.82 emails per authentication and 71 emails per account.

Table III represents the abused accounts and how long it takes to block them.

E. Detection Rate

The detection rate represents the reliability of a system and can be calculated with the detected true positive divided by the sum of the true positive, false positive, and false negative accounts. The best result is 100 percent: this means no false detection is detected.

The true positive accounts are those that are abused. False positive accounts are analysed as compromised but are not, and false negative accounts are not analysed as compromised but are abused.

In the period between December 2013 and October 2014, 264 accounts were abused. All of them were detected with the MUAAA method. The true positive is 264, the false negative is zero and the false positive is ﬁve. The calculated detection rate of the MUAAA method is 98.14 percent effective when compared to state-of-the-art spam detection methods.

IV. DISCUSSION

A. Mail User Agent Access Activities

The MUA connects almost at the same time to the MDA and MTA. Figure 3 and Table I show the closest connection time before or after the MTA authentication. However, for anomaly detection it is only important to get a high hit rate after the MTA authentication to prevent false positives.

From the data set in Table II, it is possible to judge the needed time and the cumulative hit rate for the authentications.

That means, directly after the MTA authentication, approximately 82 percent of the accounts have an available MDA authentication, ﬁve minutes later approximately 92 percent and 10 minutes later 95 percent.

(5)

To reduce this time it is not necessary to know the closest MDA authentication time but it is necessary to know if in the last few hours a connection with the same IP and credentials were made (see Table II). Now we get a hit rate of over 99.47 percent at the MTA authentication time. This implies that a MTA authentication with no available correlated MDA authentication can be classiﬁed as abused with an accuracy of 99.47 percent.

Without the MUAAA, compromised accounts send approximately 9,300 spam messages until it is closed. Only 37.4 percent of these abused accounts are detected in less than a day.

After developing and starting the prototype in the detection mode, the sent spam per account decreased to one-tenth because in less than six hours approximately 60 percent of the compromised accounts were detected. The reason behind this faster detection rate is that the administrator was informed via email about the compromised accounts and checked manually to see if the account had been abused. But at night or on weekends no one handles these reported incidents. That is the reason why nearly 40 percent of the abused accounts were detected after more than a day.

Until August 2014 the prototype, by itself, handled the detected accounts and blocked them. The results were that only 71 spam emails per abused account were sent and detected in less than a minute (Table III). Theoretically, it could be possible to reduce the number of sent emails per abused account to a minimum if the prototype acts directly in the MTA. Therefore, it is feasible to check the account status before the ﬁrst email is sent. The prototype in this evaluation has a delay between one and two seconds because the syslog collects and forwards the generated events to another server and stores them in a database. In this period, further authentications were made.

B. Detection Rate

In the beginning, the false positive rate of over 720 made this method unusable. To solve this problem, some remote provider servers, which send emails with the user credentials via our MTA, were whitelisted. After this step, the false positive rate was reduced to five. One of these accounts used an anonymizing network like TOR [15]. Three other accounts were connected via a VPN or a mobile internet access with changing endpoints and different IPs. The last false positive had email forwarding to another provider activated but used our own SMTP-Server for sending emails. The whitelisting can also be automated, while a profile for every user is generated. In this way, the detection system learns by itself if an authentication for a specific user typically comes from a MUA or a remote provider. If it is a remote provider, the IP address pool and the top and second level domains of the sender MTA is typically static and one of these attributes can be stored to reduce the false positive rate.

The impressive ability of this method, that it detects all compromised accounts in less than a minute, depends on the fact that until now the spammer did not imitate human

behaviour in any way. The circumstance could be that they would send as much spam as fast as possible before the compromised account was detected and closed or limited its sent messages. To imitate a human account the correlated MDA must be found for every MTA and a second connection must be established to it, which is time and resource intensive.

To protect against a theoretical imitation that looks like a human account, it is also possible to analyse other typical MUA settings for every user. For example, the device type, the connection encryption method and the authentication information like the username, protocol, port, or method.

Most of the MUAs used the IMAP4 ID extension [16].

This ID sends implementation identification information to the server. For example the name and version of the program and operating system would be sent. Also dependent on the MUA, the vendor, build, and other arguments are exchanged. With this very special information, an individual fingerprint of ever user’s MUA and their settings can be produced and stored in the automated generated user profile.

This prevents the detection system from becoming ineffec- tive if an imitation of a human account happens by accessing the MTA and MDA. This is so because it is very difﬁcult for the spammer to obtain the special settings from the users MUA and the system environment.

With this MUA fingerprint, it would also be possible to find four of the five false positives and reduce them to a minimum. VPNs and anonymizing networks like TOR are now no obstacle. With the MUA fingerprint, authentication data, and connection settings, the IP is not necessary to find the correlated MDA access for a special MTA authentication.

V. CONCLUSION

The handling of emails and their interactions with servers is a very special behaviour that distinguishes a human user from a spammer or botnet. With this knowledge it is possible to detect abused human email accounts in mere seconds and prevent spamming via their own server. A side effect is that spam is blocked at the source of generation and not at the destination server.

The presented Mail User Agent Access Activities method detects all types of compromised accounts, like fast senders, slow senders, rapid senders and very rare senders that for example, send encrypted messages or one spam message per day or week. Other methods, like IP rate limiting, Country Counting method, or Theoretical Geographical Travelling Speed [6], do not work on these behaviours, especially the case of one spam sent per week.

With this method only a small amount of metadata from the authentication is necessary and the content of the email is not needed. Using this method, the new type of encrypted phishing can also be detected, which is not currently a concern, but could become a factor in future.

The MUAAA has a detection rate of 98.14 percent, has no false negatives and can detect abused accounts in seconds,

(6)

which is in contrast to a combination of other state-of-the- art spam detection methods that need hours or days. The sent spam per abused account decreased from 9,282 to 71 messages, which is 130 times fewer. To conﬁrm these results, over 170 million real authentications from the university were collected and analysed between December 2013 and October 2014.

Until August 2014, the university system automatically detected abused accounts for approximately 27,000 active accounts and intervened by itself.

REFERENCES

[1] P. Wood, B. Nahorney, K. Chandrasekar, S. Wallace, and K. Haley,

“Internet Security Threat Report 2014,” Symantec Corporation, vol. 19, p. 81, Apr 2014.

[2] Symantec Corporation, “Symantec Intelligence Report, September 2014,” Symantec Corporation, p. 20, Oct 2014.

[3] J. Klensin, “Simple Mail Transfer Protocol.” RFC 5321 (Draft Standard), Oct. 2008.

[4] The Spamhaus Project Ltd, “The Spamhaus Project - PBL.”

http://www.spamhaus.org/pbl/, Jan 2015.

[5] Cisco Systems, Inc., “SenderBase - The world’s largest Email and Web trafﬁc monitoring network..” http://www.senderbase.org/, Apr 2015.

[6] C. Sch¨afer, “Detection of Compromised Email Accounts Used by a Spam Botnet with Country Counting and Theoretical Geographical Travelling Speed Extracted from Metadata,” in Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on, pp. 329–334, Nov 2014.

[7] J. Posluns and S. Sjouwerman, Inside the SPAM Cartel: By Spammer-X.

Syngress, 1st ed., November 2004.

[8] B. Ramsdell and S. Turner, “Secure/Multipurpose Internet Mail Ex- tensions (S/MIME) Version 3.2 Message Speciﬁcation.” RFC 5751 (Proposed Standard), Jan. 2010.

[9] D. Crocker, “Internet Mail Architecture.” RFC 5598 (Informational), July 2009.

[10] R. Gellens and J. Klensin, “Message Submission for Mail.” RFC 6409 (Internet Standard), Nov. 2011.

[11] Mark Martinec, “amavisd-new.” http://www.ijs.si/software/amavisd/, Jan 2015.

[12] Apache Software Foundation, “Spamassassin: Welcome to spamassassin.” http://spamassassin.apache.org/, Apr 2015.

[13] J. Myers and M. Rose, “Post Ofﬁce Protocol - Version 3.” RFC 1939 (INTERNET STANDARD), May 1996. Updated by RFCs 1957, 2449, 6186.

[14] M. Crispin, “INTERNET MESSAGE ACCESS PROTOCOL - VER- SION 4rev1.” RFC 3501 (Proposed Standard), Mar. 2003. Updated by RFCs 4466, 4469, 4551, 5032, 5182, 5738, 6186, 6858.

[15] The Tor Project, Inc., “Tor project: Anonymity online.”

https://www.torproject.org, Feb 2015.

[16] T. Showalter, “IMAP4 ID extension.” RFC 2971 (Proposed Standard), Oct. 2000.