In this chapter, we illustrate the background knowledge about application replay.
First, Section 2-1 gives an example of HTTP proxy to explain the failure of existing tools. Section 2-2 makes taxonomy of application level devices and introduces their characteristics. Section 2-3 illustrates the factors that will affect application replay accuracy. Last, Section 2-4 gives the related work about replay approaches in detail.
2-1 Failure Replay of Existing Tolls
Figure 1 (a) shows a regular HTTP scenario without a proxy. The web client first establishes a connection to the web server. Then, the web client sends an HTTP request via the connection. Finally, the web server sends an HTTP response back to the web client. Figure 1 (b) shows an HTTP scenario with the existence of a proxy.
First, the web client establishes a connection to the proxy server instead of the web server. The web client then sends an HTTP request to the proxy server via the connection. On receipt of the request, the web proxy establishes another connection to the web server according to the URL in the HTTP request and then forwards the HTTP request to the web server. To send the response back to the client, the web server sends the HTTP response to the web proxy in step 3. Finally, the web proxy forwards the HTTP response to the web client.
Figure 1(a): An HTTP scenario without a proxy
Figure 1(b): An HTTP scenario with the existence of a proxy
Figure 1(c): A failed replay scenario for a web proxy
As we can see in Figure 1 (a) and Figure 1 (b), the number of actions and the targets of actions are different. If we replay the network trace collected in Figure 1 (a) directly to a proxy, the result must not be what we expected. Figure 1 (c) shows a failed replay scenario for a web proxy. In Figure 1 (c), a web proxy is physically acted the man in the middle between the web client and the web server. The relayed web client and the replayed web server are the replay interfaces directly connected to the web proxy. Suppose the web proxy listens for incoming connections on port 3128. In the first step, the replayed web client traffic, which tries to establish a connection on port 80 would fail since the destination IP address and the destination port number are invalid to the proxy. Therefore, the connection cannot be established between the replayed web client and the web proxy. Since the connection is not established between the replay web client and the web proxy, web proxy will not receive the HTTP request. In step III, the replayed web server replays the HTTP response to the web proxy. Similarly, since there is no connection established between the web proxy and the replay web server, the HTTP response cannot be sent.
2-2 Taxonomy of Application Level Devices
There are various different types of application proxies. The differences can be caused by the protocols being focused, the targeted features, or the implemented network architecture. Table 1 lists network behaviors for each application level device.
Understanding the network behavior of devices before designing the replay tool is important since each behavior will affect the accuracy of replay.
Table 1Taxonomy of Application Level Devices
Application level devices can also be classified into non-transparent devices and transparent devices. Non-transparent devices are servers which have network devices with public IP address. Transparent devices are firewall or gateway products which act as a router between LAN and WAN areas.
Figure 2(a) shows the difference between the implementation of two HTTP proxies. The proxy in Figure 2(a) is a non-transparent proxy while that in Figure 2(b) is a transparent proxy. The main difference between the two scenarios depends on whether the client knows the existence of the proxy. In Figure 2(a), since client knows existence of the proxy, it establishes a connection directly to the proxy and adds the HTTP “Proxy-Connection” header field in the request.
Figure 2(a): An HTTP scenario with a non-transparent proxy
Figure 2(b): An HTTP scenario with a transparent proxy
On the contrary, the client in Figure 2(b) does not know the existence of a transparent proxy. Hence, the client establishes a connection directly to the server and then sends an HTTP request. When the client establishes a connection to the server, the transparent proxy intercepts the connection and then establishes another connection to the server. Table 2 compares the difference between non-transparent proxies and transparent proxies.
Table 2: A Comparison between non-transparent proxy and transparent proxy
Non-transparent proxy Transparent proxy Client needs to configure for
proxy
YES NO
Product Proxy server Application-level Firewall
Application-level Gateway
2-3 Issues Affecting Accuracy of Application Traffic Replay
The main factors affecting accuracy of application replay are the type of application device and completeness of network traces. For the type of application device problem, we can classify it into protocol dependency and functional dependency problem. Protocol dependency is to mimic different kinds of protocol procedure and functional dependency is to handle different network behaviors. Refer to Table 1; each type of device has different network behaviors. Replay tool should handle the behavior whenever it happens. Otherwise, replay maybe failed due to improper network behavior handling Mechanism. Figure 3 shows an HTTP scenario without handling the content filtering behavior. Replayed web client replays a HTTP request after establishing a connection with web proxy. Next, web proxy filters out the request and sends a 503 forbidden response message back to the replayed web client.
If replay tool does not handle the content filtering behavior, replayed web server will replay HTTP response to web proxy. Finally, web proxy drops the unrealistic HTTP response message due to impractical network behavior.
Figure 3: An HTTP Scenario without Handling Content Filtering
Another factor leading to failure in replay is incomplete network trace. We
classify this problem into error resistance problem. There are several reasons that cause the incompleteness. First, it is hard to capture traces in heavy-load environment due to the bounded I/O hardware performance. Second, it is possible that a connection is established before initiating a capture. It is also possible that a connection may still be alive after concluding a capture. Third, packet loss caused by network congestion also makes the trace incomplete. In conclusion, it is essential to design a replay mechanism for each type of application level device and make sure the completeness of network traces.
2-4 Related Works
As mentioned in chapter 1, there are tools developed for traffic replay.
TCPReplay [1] replays packets based only on the timestamps. It does not interact with the DUT it interacts with. On the contrast, Tomahawk [3] replays consequent packets depending on the responses from DUTs. The above two tools replay traffic without maintaining any state of network protocols. Stateful traffic replayer, such as the NATReplay tool, maintains the mapping state between private IP addresses and public IP address so that traffic can be replayed through NAT devices. There are many studies focusing on network layer and transport layer stateful traffic replay. TCPopera [5] extends TCPReplay and follows TCP/IP stack by defining data dependencies between messages. SocketReplay [2] mimics TCP/IP stack and improves the selective replay. Furthermore, SocketReplay increases the traffic replay accuracy by recovering packet loss situation.
There are also three researches focusing on application layer stateful replay.
Monkey [6] emulates web clients replaying web traffic to web servers and using Dummynet to emulate network environment. RolePlayer [7] proposed a machine learning method to identify state-specific protocol fields such as IP addresses, host names, etc. and then modify these fields accordingly to replay application traces.
Replayer [8] provides a binary analysis and program verification techniques to solve the application-level replay problem. In other words, the system can guarantee accurate response to a request received by a protocol state analysis formula.
Saperlipopette [9] captures the web access pattern and replays the pattern to verify the functionalities of web caching system. However, RolePlayer and Replayer can only replay traffic to end point devices but not intermediate proxy servers. The reason is that different proxy servers have different behaviors as mentioned in Section 2.3 and the behavior of proxies are totally different from that of end point devices.
Saperlipopette is a tool which can replay traffic through a web proxy. However, Saperlipopette benchmarks web-caching systems using one connection at a time and it is not able to replay connections concurrently. On the other hand, three of them cannot replay incomplete connections caused from capture loss. In this work, we develop a tool called ProxyReplay to solve these problems. Table 3 compares the existed network testing tools.
Table 3: Comparison of network testing tools including trace-based approach and model-based approach
Name Tool type Completed PCAP Stateful Proxy Replay
TCPReplay Trace-based Not required No No
NatReplay Trace-based Required Partial No
SocketReplay Trace-based Not required Layer4 stateful No Monkey Trace-based Not required Layer7 stateful No RoleReplay Trace-based Required Layer7 stateful No Replayer Trace-based Required Layer7 stateful No
Saperlipopette Trace-based Required Layer7 stateful YES ProxyReplay Trace-based Not required Layer7 stateful Yes