Parallel Computation - 惡意行為檢測規則生成之研究

Chapter 3 Methodology

3.2 Parallel Computation

detection algorithm is getting slower. Only improving by parallel computation is still inadequate according to our experiment results. We need both parallel and distributed computation to speed up the outlier detection algorithm.

3.2 Parallel Computation

Most important thing to develop the parallel and distributed computation for our outlier detection algorithm is to avoid accessing to the same variable. Therefore we apply parallelization in the parts that won’t be influenced by the calculating sequence of data. We implement the algorithm in Scala. Scala’s parallel collections facilitate parallel programming, so that we can easily speed up the algorithm by using a simple way.

The most time-consuming part of our outlier detection algorithm is BPNN. We thus improve BPNN first. The parallel backpropagation algorithm and its notation definitions are presented in Table 2 and Table 1. The purpose of BPNN is to learn an optimal SLFN with minimum error. The SLFN is defined in equations (1) and (2), and the error is defined in equations (3).

𝑎_! 𝑥, 𝑤^! ≡ tanh 𝑤_!!^! + 𝑤_!"^!𝑥_!

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Notations Definitions

tanh(𝑥) ≡ 𝑒^! − 𝑒^!!

𝑒^!+ 𝑒^!! Hyperbolic tangent.

𝑇 The transposition.

𝑁 Total number of input behaviors.

x^! ≡ (𝑥_!, 𝑥_!, … , 𝑥_!)^! Explanatory variables of the 𝑐^!! input behavior.

𝑦^! The vector of desired output corresponding to the 𝑐^!! input behavior with explanatory variables 𝑥^!.

𝑚 The number of explanatory variables 𝑥_!’s.

𝐻 The hidden layer.

ℎ The number of adopted hidden nodes.

𝑤_!!^! The threshold of the 𝑖^!! hidden node.

𝑤_!"^! The weight between the 𝑗^!! explanatory variable 𝑥_! and the 𝑖^!! hidden node.

𝑂 The output layer.

𝑞 The number of adopted output nodes.

𝑤_!!^! The threshold of the 𝑙^!! output node.

𝑤_!"^! The weight between the 𝑖^!! hidden node and the 𝑙^!! output node.

Table 1. Notation definitions

‧

1. Via a pipeline with p1 threads (or VMs), execute the following forward operation of each training data in parallel.

1.1 Via a pipeline with p1.1 threads (or VMs), calculate and record 𝑎_! 𝑥^!, 𝑤_!^! ∀ 𝑖 and then calculate and record 𝑔 𝑎^!, 𝑤_!^! ∀ 𝑙.

1.2 Use a pipeline with p1.2 threads (or VMs) to calculate the net input value of each (hidden or output) node.

2. Via a pipeline with p2 threads (or VMs) and based upon 𝑔 𝑎^!, 𝑤_!^! ∀ 𝑙, calculate and record 𝐸 𝑤 .

3. If 𝐸 𝑤 is less than the predetermined value (says, 𝜀_!), then STOP.

4. Via a pipeline with p4 threads (or VMs), execute the following backward operation regarding all training data in parallel.

4.1 Via a pipeline with p4.1 threads (or VMs) and based upon values of 𝑎_! 𝑥^!, 𝑤_!^! ∀ 𝑖 and 𝑔 𝑎^!, 𝑤_!^! ∀ 𝑙.

4.1.1Calculate and record the value of ^{!! !}_!!

!!! in terms of each training data.

4.1.2Calculate and record the value of ^{!! !}_!!

!"! in terms of each training data.

4.1.3Calculate and record the value of ^{!! !}_!!

!!! in terms of each training data.

Use a pipeline with p4.1.3 threads (or VMs) to calculate the summation.

4.1.4Calculate and record the value of ^{!! !}_!!

!"! in terms of each training data.

Use a pipeline with p4.1.4 threads (or VMs) to calculate the summation.

5. Via a pipeline with p5 threads (or VMs) and based upon the values obtained from Step 4.1, calculate and record the partial derivatives of 𝐸 𝑤 regarding all thresholds and weights.

6. Via a pipeline with p6 threads (or VMs) and based upon the values obtained from Step 5, calculate the magnitude of gradient vector. If the magnitude of gradient vector is less than the predetermined value (says, 𝜀_!), then STOP; otherwise, calculate and record the normalization values of partial derivatives of 𝐸 𝑤 regarding all thresholds and weights.

7. Via a pipeline with p7 threads (or VMs) and the current 𝜂, update the thresholds

‧

8. Via a pipeline with p8 threads (or VMs), execute the following forward operation of each training data in parallel.

8.1. Via a pipeline with p8.1 threads (or VMs), calculate and record

𝑎_! 𝑥^!, 𝑛𝑒𝑤_𝑤_!^! ∀ 𝑖 and then calculate and record 𝑔 𝑎^!, 𝑛𝑒𝑤_𝑤_!^! ∀ 𝑙.

8.2. Use a pipeline with p8.2 threads (or VMs) to calculate the net input value of each (hidden or output) node.

9. Via a pipeline with p9 threads (or VMs) and based upon 𝑔 𝑎^!, 𝑤_!^! ∀ 𝑙, calculate and record 𝑛𝑒𝑤_𝐸 𝑤 . If 𝑛𝑒𝑤_𝐸 𝑤 is less than 𝐸 𝑤 , then 𝜂 ← 𝜂×1.1, 𝑤_!!^! ← 𝑛𝑒𝑤_𝑤_!!^!, 𝑤_!"^! ← 𝑛𝑒𝑤_𝑤_!"^!, 𝑤_!!^! ← 𝑛𝑒𝑤_𝑤_!!^!, 𝑤_!"^! ← 𝑛𝑒𝑤_𝑤_!"^!, 𝐸 𝑤 ← 𝑛𝑒𝑤_𝐸 𝑤 , and GOTO Step 3; otherwise, 𝜂 ← 𝜂×0.8 and GOTO Step 7.

Table 2. The parallel backpropagation algorithm

As we mentioned in the previous chapter, BPNN can be divided into two phases:

forward and backward. Each forward involves three steps (step 1 to 3). In step 1, BPNN forwards each input data, and calculates the activations of all hidden nodes and output nodes. In step 2, BPNN calculates 𝐸 𝑤 through comparing the desired output and the actual output of all output nodes. Finally, if 𝐸 𝑤 is less than the predetermined value, BPNN stop; otherwise, BPNN go on to backward phase. Each backward involves the following steps, step 4 to 9. In step 4 and 5, BPNN calculates and records gradient vector. Then BPNN calculates the magnitude of gradient vector based upon the values obtained from step 5. If the magnitude of gradient vector is less than the predetermined value 𝜀_!, then BPNN stop; otherwise, BPNN calculates and records the normalization values of partial derivatives of 𝐸 𝑤 obtained from step 5.

After completing calculations of partial derivatives and magnitude of gradient vector, BPNN updates the thresholds and weights in step 7. In step 8, BPNN forwards each

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

input data again, calculates 𝑛𝑒𝑤_𝐸 𝑤 based upon the activations of all hidden nodes and output nodes. If 𝑛𝑒𝑤_𝐸 𝑤 is larger than 𝐸 𝑤 , BPNN reduces magnitude of gradient vector and goes back to step 7; otherwise, goes back to step 3.

Since BPNN needs 𝑎_! 𝑥^!, 𝑤_!^! to calculate 𝑔 𝑎^!, 𝑤_!^! , we can only apply parallelization to calculating 𝑎_! 𝑥^!, 𝑤_!^! and 𝑔 𝑎^!, 𝑤_!^! respectively. All input data and weights of neurons would be placed in a parallel collection, so that the calculation of 𝑎_! 𝑥^!, 𝑤_!^! and 𝑔 𝑎^!, 𝑤_!^! would be executed in parallel respectively. Each layer in whole neural network is also placed in a parallel collection. Therefore calculations in step 4 are also executed in parallel.

After improving the performance of our backpropagation algorithm, we then ameliorate other parts in the outlier detection algorithm. Another time-consuming part is adding extra hidden nodes. The algorithm is shown in Table 3.

‧

1. Via a pipeline with p1 threads (or VMs), execute the following operation to determine 𝛼 in parallel.

1.1 Set 𝛽_! ← 1 and let 𝑘 ← 2.

1.2 Via a pipeline with p1.2 threads (or VMs), calculate ^!_!!!𝛽_!(𝑥_!^!− 𝑥_!^!)≠ 0 ∀ 𝑐 ∈ 𝐼 𝑛 − 𝜅 − 𝐶_! where 𝛽_!, 𝑗 = 1, … , 𝑘 − 1 and record 𝛽_!, which is the smallest integer that is greater than or equal to 1.

1.3 𝑘 ← 𝑘 + 1. If 𝑘 ≤ 𝑚, GOTO Step 1.2. determine weights and thresholds of new hidden nodes in different VMs.

3.1 Let two new hidden nodes h + 1 and ℎ + 2 with 𝑤_!!!,!^! ← 𝜁 − 𝜆𝛼^!𝑥^!,

3.2 Apply parallel backpropagation algorithm to adjust 𝑤 (thresholds and weights) of SLFN.

3.3 If an acceptable result is obtained, record 𝜆.

4. Add two new hidden nodes ℎ + 1 and ℎ + 2 to the existing SLFN with minimum 𝜆, ℎ ← ℎ + 2, and STOP.

Table 3. Add extra hidden nods (part of the outlier detection algorithm)

Both 𝛼 and 𝜆 are important parameters for determining newly weights and thresholds of newly added hidden nodes. Through step 1, we can determine a 𝑚-vector 𝛼 of length one. In step 3, we find an appropriately positive 𝜆 and use BPNN to retrain input behaviors that can make their squared residuals are less than the envelope width in parallel. The formula in step 3.1 for determining newly weights thresholds comes from [35]. Tsaih and Cheng gave an ample demonstration.

To determine weights and thresholds of new hidden nodes is also very time-consuming. We improve it with parallel programming and focus on enhancing the speed of determining 𝛼 and 𝜆 for newly added hidden nodes. As we mentioned before, all input data would be placed in a parallel collection, so that the calculation in

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

step 1.2 would be executed in parallel until finding the smallest integer 𝛽_! that is greater than or equal to 1. As for step 3.2, we have already mentioned how to improve BPNN.

In summary, there are several points to note about applying parallel collections provided by Scala. The first is that it takes some time to produce a parallel collection, so we cannot abuse it and only produce the necessary. Next, all calculations against parallel collections will not be affected by calculating sequence of data. Finally, we need to produce all required parallel collections at the beginning of program, and we will no longer change them. In simple terms, parallel collections are not be used indiscriminately, it must be used with caution. Therefore, we apply parallel collections to input data, neurons of each layer, and weights of each neuron. Such a simple way allows us to speed up our program easily. But according to our experiments, the parallel outlier detection algorithm is better than the sequential one, but it is still inadequate. To develop the distributed outlier detection algorithm is imperative.

在文檔中惡意行為檢測規則生成之研究 - 政大學術集成 (頁 22-28)