Related work - 應用知識擷取與資料倉儲技術分析網路行為

2.1:DDoS ontology and classification

Since more and more network attacks occur often, and become various, some researches thus focus on modeling attacking behaviors according to the features of attack after analyzing. DDoS attacks are classified in [14][15], the classification criteria are based on attack tools. Network attacks are surveyed and discussed in general in [10]. DDoS attacks are discussed in very detail in [22], which proposed a detail taxonomy to classify DDoS attacks.

Although these researches proposed many criteria to classify network attack behaviors, the relation between these criteria and network raw data is not presented very clearly, and most of them are just concepts. Therefore, the mappings between criteria and raw data are needed to analyze attacks from network raw data using the criteria. In other word, there is no systematic approach for matching or transforming features of raw data to attributes defined for classification. Therefore, the criteria of classification may not be able to directly use in the analysis of network raw data. In order to solve this situation, a Knowledge Acquisition for Behavior Model Construction (KABMC) algorithm is proposed in this research. KABMC is used to acquire and model the relation between network raw data and network behaviors from experts. The acquired network behavior model can be easily applied on the data analysis framework, such as data warehouse and OLAP.

2.2:Repertory grid

In theories of developing knowledge acquisition tools, Repertory Grid is a

well-known knowledge acquisition and representation technique based on the work of Kelly on Personal Construct Theory (G. A. Kelly, 1955) [13]. Kelly thought that human can create their own explanations to things appeared in their experience, these explanations are called constructs. Constructs then be used to estimate or determine the future things. Hence, Kelly concluded a Personal Construct Psychology which believes that everyone has many constructs within to determine things which will happen in future. Repertory Grid is a tool to figure out constructs in one’s mind.

The Repertory Grid is a matrix where the rows represent constructs found, the columns represent the elements, and cells indicate with a number the position of each element within each construct. Suppose we want to build a Repertory Grid (a sort of matrix) for a psychosis patient, psychological therapist would first ask the patient to select about seven elements whose nature might depend on whatever the patient or therapist are trying to discover. For instance, “Two specific friends, two work-mates, two people you dislike, your mother and yourself”, or something of that sort. Then, three of the elements would be selected at random, and then the therapist would ask:

"In relation to… (whatever is of interest), in which way two of these people are alike but different from the third"? The answer is sure to indicate one of the extreme points of one of the patient’s constructs. He might say for instance that Fred and Sarah are very communicative whereas John isn’t. Further questioning would reveal the other end of the construct and the positions of the three characters between extremes.

Repeating the procedure with different sets of three elements ends up revealing several constructs the patient might not have been fully aware of. Furthermore, Repertory Grid could be used to acquire domain knowledge from experts in many domains. In short, knowledge acquisition using Repertory Grid is asking experts to rate each object. Besides, Repertory Grid only figures out the constructs to all selected

elements, adding new elements is not considered in the traditional Repertory Grid.

Therefore, the idea of incremental update not appeared in the traditional Repertory Grid.

In our research, a psychological theory is also applied. A self-regulation of Cognitive Development Theory proposed by Piaget is applied in the knowledge acquisition process. Piaget believes that human could enhance their knowledge by self-regulation which consists of two processes called assimilation and accommodation. Piaget’s theory is famous and basic in cognitive psychology. The theory says that human development of cognitive system is based on a Schema System. Schema is a module of human cognitive system. One’s cognitive system is formed through interacting with many things around us after the birth. Assimilation involves putting information into an existing scheme without changing the scheme.

Accommodation is the process of changing our existing scheme in order to make new, non-compatible information fit our understanding. In accommodation, our understanding or problem solving ability is improved.

Compare Repertory Grid technology with the knowledge acquisition process using self-regulation in this thesis. Some differences could be distinguished. For modeling network behaviors, features need to be modeled clearly such that machines could identify the network behaviors automatically and easily. Therefore, Repertory Grid is not suitable for modeling network behaviors because the attribute values of Repertory Grid are ratings which represent the degree of difference. For example, if there is an attribute named “port” which is a common attribute for modeling the service type of a network behavior. Two values which are 21 and 25 of attribute port may be treated as the degree between “port opened” and “port closed” in Repertory

Grid. But it does not make sense because the two specific port values which are 21 and 25 indicate totally different services which are FTP and SMTP, respectively.

Hence, in order to model network behaviors for network analysis, attributes value used to model network behaviors are specific values in our knowledge acquisition algorithm. Besides, for the initial purpose, repertory grid is used to figure out the constructs in experts’ minds, and self-regulation is used in knowledge development.

Furthermore, knowledge development by self-regulation is an incremental update approach, but the idea of incremental update does not appear in traditional Repertory Grid, which only figures out the constructs to all selected elements and does not take the situation of adding new elements into consideration. Since repertory grid is famous and has been applied in many domains, it has various types which can perform incremental update. However, when a new element is added in to the repertory grid, a new attribute may be added to distinguish ambiguous elements. If a new attribute is added into repertory grid, experts need to rate all elements for the added attribute. In our knowledge acquisition algorithm, only two elements which are ambiguous need to be distinguished by adding new attribute values, because other elements may not be suitable or no need for using the same attribute to differentiate.

For the tool design, Repertory grid is more skilful than our knowledge acquisition tool. However, in modeling network behaviors, attributes with specific attribute values is suitable for identifying the features of each network behavior.

Besides, incremental update is needed because many attack behaviors need to be monitored and new attack behaviors may appear often. By applying concepts of self regulation which are assimilation and accommodation, the knowledge maintained by our knowledge acquisition tool could easily achieve the objective of incremental update.

2.3:Traditional analysis approaches for network intrusion

As the cost of the information processing and Internet accessibility falls, more and more organizations are becoming vulnerable to a wide variety of cyber threats.

According to a recent survey by CERT/CC (Computer Emergency Response Team/Coordination Center), the rate of cyber attacks has been more than doubling every year in recent times. It has become increasingly important to establish our information systems, especially those used for critical functions in the military and commercial sectors, resistant to and tolerant of such attacks.

Intrusion detection includes identifying a set of malicious actions that compromise the integrity, confidentiality, and availability of information resources.

Traditional methods for intrusion detection are based on extensive knowledge of signatures of known attacks, where monitored events are matched against the signatures to detect intrusions. These methods extract features from various audit streams, and detect intrusions by comparing the feature values to a set of attack signatures provided by human experts. The signature database has to be manually revised for each new type of intrusion that is discovered. A significant limitation of signature-based methods is that it is hard to detect emerging cyber threats, since by their very nature these threats may be launched using previously unknown attacks.

These limitations have led to an increasing interest in intrusion detection techniques based upon data mining.

Previous researchers have developed systematic approaches to analyze network traffic [1], [8], [20], [23] and the format of network traffic is usually pre-defined and hard to change. Continuous Query systems [12], [26], share many of the concerns of acquiring and filtering continuous streams of data from the database field, but do not

have the ability to easily add new function over that data.

2.4:Using OLAP for log analysis

OLAP can organize and present data in various formats in order to accommodate the diverse needs of the different analysis approaches. OLAP server provides server operations for analyzing multidimensional data cube:

(1) Roll-up: The roll-up operation collapses the dimension hierarchy along a particular dimension(s) so as to present the remaining dimensions at a coarser level of granularity.

(2) Drill-down: In contrast, the drill-down function allows users to obtain a more detailed view of a given dimension.

(3) Slice: Here, the objective is to extract a slice of the original cube corresponding to a single value of a given dimension. No aggregation is required with this option. Instead, server allows the user to focus on desired values.

(4) Dice: A related operation is the dice. In this case, users can define a sub-cube of the original space. In other words, by specifying value ranges on one or more dimensions, the user can highlight meaningful blocks of aggregated data.

(5) Pivot: The pivot is a simple but effective operation that allows OLAP users to visualize cube values in more natural and intuitive ways.

A specific implementation of using OLAP (On-Line Analytical Processing) technology on log analysis is discussed [17]. The OLAP architecture is flexible in analyzing data; however only single data source is used in this architecture. Data

source is limited to Windows NT system log and concept hierarchies are pre-defined.

The diversity of data source and the quality of concept hierarchies would affect the ability of analysis.

A Network Intrusion Monitoring System Architecture based on OLAP is proposed in [27] to integrate multiple network traffic data sources. Various systematic analysis approaches can be applied through OLAP server using operations such as drill-down, roll-up, slicing, etc., and OLAP Mining (OLAM) is then used to increase the diversity of network analysis result. Through Network Intrusion Monitoring System (NIMS), multiple data sources can be integrated to increase diversity of analysis approaches. Integrated data source can be analyzed on different dimensions and different concept levels to get more information.

Since the analysis process is manipulated by administrators manually, the analyzing result is highly dependent on the experience of administrators. If domain knowledge could be embedded in the framework to assist the analyzing process, the effort of administrator could be reduced. Hence, in this thesis, the knowledge of network behaviors is extracted first in the original NIMS to support the analysis of suspicious network behaviors. It also reduces the effort of junior administrators.

Chapter 3: The Framework of Network Monitoring and

在文檔中應用知識擷取與資料倉儲技術分析網路行為 (頁 14-21)