• 沒有找到結果。

Detecting Malicious Web Links and Identifying Their Attack Types

N/A
N/A
Protected

Academic year: 2022

Share "Detecting Malicious Web Links and Identifying Their Attack Types"

Copied!
21
0
0

加載中.... (立即查看全文)

全文

(1)

Detecting Malicious Web Links and Identifying Their Attack Types

Joe Huang Anti-Spam Team

Cellopoint

(2)

Introduction

A great effort has been directed towards detection of malicious URLs

Blacklisting incurs no false positives, yet is effective only for known malicious URLs With classification, discriminative feature selection is crucial

This paper, Choi et al. (2011), proposes a

machine learning approach to detect malicious

(3)

Framework Overview

1. Data Collection 2. Supervised Learning

3-1. Detection 3-2. Identification Input: URL

Output: Benign URL Malicious URL, {Type}

This process can be batched learning or an

interleaving manner

(4)

Discriminative Features

Lexical

Link popularity Webpage content DNS

DNS fluxiness

Network

(5)

Lexical Features

No. Feature Type

1 Domain token count Integer

2 Path token count Integer

3 Average domain token length Real

4 Average path token length Real

5 Longest domain token length Integer 6 Longest path token length Integer 7∼9 Spam, phishing and malware SLD hit ratio Real

10 Brand name presence Binary

(6)

Link Popularity (LPOP) Features

No. Feature Type

1∼5 5 LPOPs of the URL Integer

6∼10 5 LPOPs of the domain Integer 11 Distinct domain link ratio Real

12 Max domain link ratio Real

13∼15 Spam, phishing and malware link ratio Real

AltaVista, Alltheweb, Google, Yahoo! and Ask

(7)

Webpage Content Features

No. Feature Type

1 HTML tag count Integer

2 Iframe count Integer

3 Zero size iframe count Integer

4 Line count Integer

5 Hyperlink count Integer

6∼12 Count of each suspicious JavaScript function Integer 13 Total count of suspicious JavaScript functions Integer

(8)

DNS Features

No. Feature Type

1 Resolved IP count Integer

2 Name server count Integer

3 Name server IP count Integer

4 Malicious ASN ratio of resolved IPs Real 5 Malicious ASN ratio of name server IPs Real

(9)

DNS Fluxiness Features

No. Feature Type

1∼2 ϕ of NIP, NAS Real

3∼5 ϕ of NNS, NNSIP and NNSAS Real

ϕ = N/N

single

(10)

Network Features (NET)

No. Feature Type

1 Redirection count Integer

2 Downloaded bytes from content-length Real

3 Actual downloaded bytes Real

4 Domain lookup time Real

5 Average download speed Real

(11)

Data sets

Benign URLs: DMOZ and Yahoo!

Spam URLs: jwSpamSpy and webspam Phishing URLs: PhishTank

Malware URLs: DNS-BH

Multi-label: McAfee SiteAdvisor and Web of

Trust (WOT)

(12)

Multi-label Data

Label Attribute LSAd LWOT LBoth

λ1 spam 6020 6432 5835

λ2 phishing 1119 1067 899

λ3 malware 9478 8664 8105

λ1,2 spam, phishing 4076 4261 3860 λ1,3 spam, malware 2391 2541 2183 λ2,3 phishing, malware 4729 4801 4225 λ1,2,3 spam, phishing, malware 2219 2170 2080

(13)

Results - Detection Accuracy

(14)

Results - Link Popularity Feature Analysis

(15)

Link Classification

Unpopular legitimate link

LPOPs might be ineffective for links of low LPOPs Malicious URL detection result: accuracy of 91.2%

Popularity manipulated link

LPOPs can be manipulated

Detection result: accuracy of 90.03%

(16)

Error Analysis

False positives

Disreputable URL (LPOP, LEX and DNS errors) Contentless URL

Brand name URL Abnormal taken URL

False negatives

Hosted by popular social networking sites

(17)

Attack Type Identification Metrics

Assume there is an evaluation data set of multi-label examples (x

i

, Y

i

)

Micro-averaged and macro-averaged metrics

Ranking-based metrics

(18)

Attack Type Identification Results

(19)

Evadability Analysis

Robust against known evasions

(redirection/link manipulation/fast-flux hosting)

URL obfuscation

JavaScript obfuscation

Social network site

(20)

Conclusion

A framework for detecting and identifying malicious URLs

Discriminative features

Evadability issue for further improvement

(21)

Reference

H. Choi, B. Zhu, and H. Lee. Detecting malicious

web links and identifying their attack types. In

USENIX, 2011.

參考文獻

相關文件

Sometimes called integer linear programming (ILP), in which the objective function and the constraints (other than the integer constraints) are linear.. Note that integer programming

Given a shift κ, if we want to compute the eigenvalue λ of A which is closest to κ, then we need to compute the eigenvalue δ of (11) such that |δ| is the smallest value of all of

These include so-called SOC means, SOC weighted means, and a few SOC trace versions of Young, H¨ older, Minkowski inequalities, and Powers-Størmer’s inequality.. All these materials

  SOA 記錄裏,記載著關於該 域名權責區域的一些主 要網域名稱伺服器 ( primary DNS server) 和其它 相關的次要名稱伺服器 ( secondary DNS server)

„ Indicate the type and format of information included in the message body. „ Content-Length: the length of the message

default initial value for extended types (if initialized automatically) 0, NULL, anything equivalent to integer 0: C’s way of saying “no reference”.. null Revisited:

The algorithm consists of merging pairs of 1-item sequences to form sorted sequences of length 2, merging pairs of sequences of length 2 to form sorted sequences of length 4, and so

Textbook Chapter 15.4 – Longest common subsequence Textbook Problem 15-5 – Edit