Performance Measurement - Clustered Architecture for High-Speed IPsec Gateway

Since the performance of many single machines and products can catch up the wire speed of Fast Ethernet. And migrating to high-speed environment is the trend in few years. It’s meaningless and out of fashion to test cluster technologies in 100 Mbps environment this time. Consequently, we choose Gigabit Ethernet as our testing environment [17] [18] [19]. Although Princeton University has proved that compression improves performance when encryption is employed in high-speed environment [20], but our main purpose is to test the scalability and try to prove it. So, IPComp [21] is not used in our testing.

The test bed is six machines with 1 AMD XP1800+ CPU, 256 MB RAM, 2 Intel PRO/1000 XT Gigabit NIC running on 66 MHz/64-bit PCI bus, and 1 SafeNet SafeXcel 140-PCI encryption acceleration card. One 3Com SuperStack 3 4900 12-port Gigabit Switch with one 4-port 1000BASE-TX module is configured as 2 or 4 VLANs for testing.

The operating system of these machines is Red Hat Linux 7.2 with 2.4.18 kernel.

FreeS/WAN 1.97 is used as implementation of IPsec on this clustered IPsec gateway.

And Smartbits-200 of Spirent Communications with 2 GX-1420B Gigabit modules acts as the Traffic Analyzer.

4.1 Fixed-Size Traffic

The larger packet sizes, the less overhead caused by IP stack. Thus, the overhead would be different by feeding packets with different size. So, we first generate a set of fixed-size streams from Smartbits-200 to test our IPsec gateway in this ideal testing environment. Because we only have six machines for testing, three of them must act as

sender side gateway and the other three machines must act as receiver side gateway.

After testing from one node to three nodes, we found that the performance of this testing was bounded by the sender side. It denotes that the performance was bounded by encryption operations; the speed of decryption can always catch up encryption. So we change our settings to that our entire six machines act as sender side IPsec gateway.

We test its performance after they encrypt the packets then deliver they to the router.

Table 4-1 shows performance results using simple round-robin dispatching scheme.

We gained these results by setting Smartbits-200 to send timed-burst packets in specified Databits/sec for 30 seconds. If no packet lost, we assume our IPsec gateway can handle such traffic. The performance was measured by the Databits/sec, which we configured in SmartWindows, after subtracting 14-byte Ethernet header.

Table 4-1. Throughput for fixed-size traffic

ESP/3DES-MD5 ESP/DES-MD5 AH/MD5

Number of

Figure 4-1. Throughput for fixed-size traffic

We also draw a comparison figure with ideal linear scale up in Figure 4-2. It shows that our performance of this flat architecture can scale up almost linearly. Taking the 1446 bytes/packet 3DES/MD5 testing data for further evaluation using the proposed overhead estimating formula –

)

Table 4-2 shows results we evaluated.

Figure 4-2. Comparison with ideal linear scale up 0.00

frames/sec

Total Per node Ideal linear

Performance for fixed-size frame

Table 4-2. Overhead evaluate for 1446 bytes/packet, ESP/3DES-MD5 mode

Assume R(1446) take 1 unit of time, by using binomial for different number of cluster nodes, we can have the average number of P is 7665.3 packets/sec (standard division = 8.57) and the average number of Q(1446) is 0.0172 packets (standard division

= 0.0011). This means that without receiving packets and update SA database, our machine can encapsulating and transmitting 7665.3 packets per second in average. As the number of incoming packets increasing, our machine would spend its time on handling incoming packets, it costs about 0.0172 degrades per packet.

Therefore, if we assume R(512) take 1 unit of time, we can have the average number of P is 12018.3 packets/sec (standard division = 6.97) and the average number of Q(512) is 0.0134 packets (standard division = 0.0006); if R(1024) take 1 unit of time, we can have the average number of P is 9187.8 packets/sec (standard division = 4.02) and the average number of Q(1024) is 0.0252 packets (standard division = 0.0004).

Since the standard divisions of P and Q are so small, it denotes that the proposed overhead estimating formula has been verified. This flat cluster architecture can achieve high performance and the performance of it is nearly scalable while adding new cluster nodes. In our computation, when the number of cluster nodes is larger than 14, it could handle 1000.3 Mbps, more than the wire speed of Gigabit Ethernet.

# of Nodes(N) Frames/sec Percentage Cn,1446

Cn,1446/ C1,1446

1 7535.6 100.00 % 7535.6 100.00 %

2 14817.4 196.63 % 7408.7 98.32 %

3 21914.4 290.81 % 7304.8 96.94 %

4 28801.8 382.21 % 7200.5 95.55 %

5 35112.4 465.96 % 7022.5 93.19 %

6 41500.7 550.73 % 6916.8 91.79 %

degradation increases geometrically instead of linearly. It caused about 3.37%, 9.19%, 17.79%, 34.04%, 49.27% degradation while the number of cluster nodes is 2, 3, 4, 5, and 6, respectively, compared with linear scale up. We can guess that while adding more cluster nodes, our IPsec gateway can handle more incoming packets, but it causes more overhead for each node. As a result, the degradation increases geometrically, not linearly, we imaged before experiment.

4.2 Real Traffic

In the following experiments, throughput of packet-based scheme and session-based scheme are evaluated respectively using real traffic which collected from campus backbone router. The characteristic of the traffic figured as follows: Total 1,328,780,468 bytes in 1,118,665 packets, 18,229 sessions. Average 1187.8 bytes per packet (standard division = 524.623). Average 72615.0 bytes per session (standard division = 460311.287). We only collect the IP header of these packets and then regenerate them using SmartBits-200 as the data rate we want. In both session-based and packet-based schemes, sessions and packets are assigned to cluster nodes in RR fashion and SQF fashion. The throughput of each dispatching scheme is presented in Table 4-3.

Table 4-3. Throughput using real traffic

Figure 4-3. Throughput using real traffic

According to the results we got from experiments, it apparently shows that the packet-based schemes scale up almost linearly while adding more cluster nodes. The two fashions in packet-based scheme result similar result, we guess that it is because the unbalance of different-size packets in RR fashion and searching shortest-queue overhead in SQF fashion cause similar performance degradation.

For session-based schemes, not only unbalanced sessions would cause them to have Number of

在文檔中 Clustered Architecture for High-Speed IPsec Gateway (頁 21-26)