• 沒有找到結果。

In this section, we evaluate the performance of our proxy filtering algorithm and abnormal domain detection in terms of accuracy.

5.2.1 Accuracy of Proxy Filtering

Evaluation Metrics

To evaluate the performance of our proxy filtering algorithm, we compute a precision metric as follows.

• Precision : Number of correctly detected IP address among all IP address returned by our approach

In our experiment, we return a list of IP address as well as its port detected by our approach.

Then, we verify each of them by performing nmap (Network Mapper) scanning test. Namp takes an IP address and a port to discover services, software product used to run a service, exact version number of that product etc. on a scanned host. By manual analysis from the software product used to run a service responded from a scanning test, we can confirm whether a scanned IP address is a proxy server or not.

Table 5.2.1 shows an example of returned information by nmap scanning test during our exper-iment. Some software products such as ”AkamaiGHost”, ”nginx http proxy”, ”Squid webproxy”, and ”Sandpiper Footprint http load balancer” are labeled proxy-relevant. If a target IP class is run-ning one of the proxy-relevant software products in the list, then the IP class is confirmed as a proxy server.

Results

Figure 5.1 and Figure 5.2 show the precision as function of the length (up to 100) of the returned list for proxy filtering with and without smoothing factor included on different data sets. We set smooth to 0.1. As precision of filtering with smoothing factor is nearly twice higher than that of filtering without smoothing factor, we can conclude that proxy filtering with the introduce of smoothing factor achieves more effective detection accuracy than that of non-smoothing case. This is because by taking smoothing factor into consideration, proxy scores will not be dominated by low-frequency adoption of IP classes.

In most cases, the precision is around 70% achieved by smoothed proxy filtering on NCTU dor-mitory data set in Figure 5.1. During evaluation, some scanning tests are failed which might owing

Table 5.2: A list of software products collected by performing nmap scanning test.

Software Product Proxy Or Not

3Com switch webadmin No

AkamaiGHost Yes

AOLserver httpd No

Apache httpd No

Apache Tomcat/Coyote JSP engine No

Caucho Resin JSP engine No

Google httpd No

hp color LaserJet 4650 No

HP Jetdirect httpd No

HP LaserJet No

IBM HTTP Server No

Icecast http statistics plugin No

Jetty httpd No

lighttpd No

Microsoft IIS webserver No

Microsoft Windows Media Server No

Netscape Enterprise httpd No

Netscreen administrative web server No

nginx http proxy Yes

Sandpiper Footprint http load balancer Yes

Squid webproxy Yes

SunONE WebServer No

Urchin RSS aggregator No

Zeus httpd No

0

Figure 5.1: Precision of proxy filtering on NCTU dormitory data set.

to the downtime of scanned host. In that case, even though an IP class behaves like a proxy server in collected raw traffic logs, it will be considered a false alarm in our experiments. Consequently, it causes the decrease of detection accuracy. Overall, in the collected proxy pool, 298 (around 24.9%) IP classes lack of software product information.

5.2.2 Accuracy of Mutual Association Discovery

To evaluate the performance of proposed mutual association discovery algorithm, we collect IP classes for well-known corporations on the internet. Afterwards, to demonstrate the effectiveness of mutual associations for distinguishing domains, examination of mutual associations for IP classes within a company (e.g., Google) and between two different companies is performed. Specifically, in the case of intra-company experiment, we collect the IP classes adopted by each company. Then, for each pair of IP classes corresponding to a company, we compute its mutual associations. Finally, we collect all pairs of computed mutual association scores and draw a distribution for that. In the case of inter-company experiment, for each pair of companies, we compute mutual associations by iteratively taking one IP class from respective set of IP classes. Finally, we collect all pairs of computed mutual association scores and draw a distribution for that.

A public AS has a globally unique number, an ASN, associated with it. This number is used both in the exchange of exterior routing information (between neighboring ASes) and as an identifier of the AS itself. Following the same way, we take ASN as the groundtruth to evaluate mutual association algorithm. Detail statistics of groundtruth for evaluation are summarized in Table 5.2.2.

As illustrated in Table 5.2.2, only a few IP classes under a company appear in our dataset. Hence,

0

Figure 5.2: Precision of proxy filtering on Trend Inc. data set.

a group of IP classes under an ASN may not share similar characteristics. For example, some serve as PC, others may serve as web servers. Although the associations between IP classes and company can be aware via ASN, we still cannot learn that which pair of IP classes are mutually associated and which pairs are not.

Table 5.3: Groundtruth Statistics.

Company #AS #Prefix #IP Class #IP Class Appeared in our Dataset

Google 5 27 875 127

Facebook 1 4 40 10

Yahoo 34 177 5874 406

Evaluation Metrics

We examine the differences of mutual association scores by illustrating distributions of mutual as-sociations for IP classes inter(intra) company and inter(intra)ASNs.

• IntraComp : Distribution of mutual association scores for IP classes from the same company

• InterComp : Distribution of mutual association scores for IP classes from different companies

• IntraASN : Distribution of mutual association scores for IP classes from the same ASN

• InterASN : Distribution of mutual association scores for IP classes from different ASNs

Results

Figure 5.3 and Figure 5.4 illustrate comparison for mutual associations for IP classes and ASNs respectively. As we expected, the mutual associations among a group of IP classes under an ASN are higher than that under a company. The results prove the effectiveness of proposed mutual asso-ciations.

0.6 0.8 1

0 200 400 600 800 1000

Percentage

Mutual Association Score(K) IntraComp InterComp IntraAS InterAS

Figure 5.3: Evaluation on mutual association scores (Trend).

0.6 0.8 1

0 200 400 600 800 1000

Percentage

Mutual Association Score(K) IntraComp InterComp IntraAS InterAS

Figure 5.4: Evaluation on mutual association scores (NCTU).

5.2.3 Accuracy of Abnormal Domain Detection

To evaluate the performance of proposed framework for abnormal domain detection, we conducted experiments for FFSN detection. We use the domain corpus from the website [17]. We use both benign and suspected domains. Overall, ”lookups-benign” corpus contains 34,647 benign domains;

ff” corpus contains 94 suspected fast-flux domains detected by [4]. ”lookups-ndss-ff” corpus serves as a groudtruth for evaluating the performance of proposed model. Specifically, we compute bridging scores for all unique domains in collected corpus based on constructed profiles, which results in a list of domains ranked by bridging scores. By computing precision and recall for different length of returned domains, we complete the performance evaluation on FFSN detection.

Evaluation Metrics

We compute the following metric to evaluate the effectiveness of FFSN detection.

• Domain-Precision : Number of correctly detected domains among all domains detected by our approach.

• Domain-Recall : Number of correctly detected domains among all domains in ”lookups-ndss-ff” corpus.

Results

Figure 5.5 shows the quality of ranking using the recall-precision graph for FFSN detection. We construct two set of profiles based on Trend Inc. and NCTU dormitory data set respectively. In top 89 domains returned by our model, all returned domains are confirmed as FFSN domains as [4] did.

The top 95 returned domains contain one benign domain (bs(”runescape.com”)=66 at 90th). Table 5.4 shows the detail information of ”runescape.com”. The reason why ”runescape.com” is regarded as a suspect is because the mutual association for each pair of its IP classes learnt from the similarity of their neighboring domains in both data set is zero, which results in a large bridging score. Note that zero mutual association indicates the adjacent domains of two IP classes are disjoint, which means the two IP classes never co-occur in any traffic log.

Table 5.4: The IP classes adopted by ”runescape.com.”

64.37.71 64.90.181 65.39.250 66.151.43 69.22.158 69.31.109 80.64.4 82.133.85 85.133.44 168.75.179 209.249.24 216.180.254

50

Figure 5.5: Recall-precision graph of the abnormal domain detection model for FFSN corpus.

相關文件