Signal Handling Overhead - 運用記憶體保護機制增進資料監控點檢查之效能

From Figure 14 in section 5.1.2, we learned that the overhead from signal handling is proportional to the number of non-stack write operations.

It seems inevitable that not all programs write to stack at a higher rate than to heap or other areas. For instance, if the program uses a lot of global variable to record its states, then the modification to the global variables will generate heavy overhead.

6.3.1 Write Target Analysis

We do a little modification to Beagle2 to classify the write target ad-dresses while running a real program. There are three categories: stack, heap, and other. The ‘other’ section is when we write a value to areas like global data, library buffers, or other mapped areas. We run the modified version of Beagle2 to record the target classification result. The analysis chart is shown in Figure 21.

First we inspect the result of running WsMp3d Webserver introduced in section 5.2. Even though the heap writing percentage is only 4.22%, the other area’s writing percentage is 34.72%, and the two kinds of writing both trigger segmentation faults. Only stack writing will not cause a segmentation fault. Around 40% of non-stack writing will cause signal handler to be called more often. Recall in Figure 15, we can find the fact that 40% of non-stack writing’s performance is not too good.

Finally, we look at the analysis result when running a 10x10 matrix multiplication. Almost all of the write targets reference variables on stack.

Thus the segmentation fault happens rarely. The larger the matrix is, the higher the percentage of stack writing is, and we can overlook the overhead generated from signal handling. This is the optimal situation in asynchronous checking by the DupWrite method.

6.3.2 Merged Checking

Another issue is that, programs usually write many bytes sequentially to buffers. Writing a 32 byte long string to a buffer will generate 8 segmenta-tion faults, assuming that the compiler or the library funcsegmenta-tion called divide

Figure 21: Write Target Analysis

the 32-byte write operation into 8 32-bit word write operations. Thus it is not efficient to check the sequential write operations. If we can predict the consecutive memory reference behavior and get the range out of it, then the checks can be merged together and the overhead can be reduced substantially.

Merged checking is an optimization technique mentioned in LIFT by Qin et al. [31]. It scans through the whole program to perform memory reference analysis. By building a data dependency graph (DD Graph) and examine it, unnecessary checking to the same or consecutive reference area can be eliminated. For sure we can apply this technique to reduce the redundant checking, but then we cannot benefit from the clean and straightforward lower overhead instrumentation of our Trap Address Encoding and DupWrite method, because querying which write operations ought to be instrumented and which should not will be time consuming.

6.3.3 Hybrid Solution

A hybrid solution may exist to solve the problem. Currently the Dup-Write method instruments all memory write accesses no matter that the write operation is in user’s program, libc function or other external library’s func-tions. The hybrid method means that, we do not instrument the duplicated write in external library function functions including libc functions, and use a function wrapper to do the write range validation beforehand.

It is relatively easy for us to accomplish this method, and there is no demand to do an analysis on data dependency in advance. In the instru-mentor, we can look at the current instrumenting instruction. If the write instruction’s code address is not located in the program’s .text section, then we do not instrument it. This is the method we used for the no-library in-strumentation optimization back in Beagle. From Figure 2 in chapter 1, we can see the impact on performance from no-library instrumentation.

Without the instrumentation, any write operations happen in the exter-nal library will not generate a segmentation fault. The asynchronous mecha-nism thus is not deployed on external libraries, and we need active checking to make sure that the operation will not overwrite the protected targets.

A possible way to achieve that is to use function wrappers to wrap critical and widely used library functions, for example, strcpy() and memset(), and do a bound checking to see whether the write region is overlapped with the protected targets or not, before the function is actually called. This method is very similar to Libsafe, proposed by Baratloo et al. [32], which also does bound checking before the function is executed.

It will take a lot of effort to implement the hybrid optimization because

Figure 22: Beagle2 No-Library Instrumentation Optimization

there are many library functions to be wrapped to do bound checking. Al-though we do not implement this method, we redo the throughput evaluation of WsMp3d in section 5.2.1, and this time without instrumenting the library functions to show the possible performance enhancement from the hybrid solution. The evaluation result is shown in Figure 22.

Beagle2’s original throughput is about 24.5 MB/s. After removing the library write instrumentation, the transmission rate is approximately twice faster, to 51.34 MB/s. Beagle-all is the throughput when running Beagle with instrumentation in library functions, so we can see that actually Beagle2 is only a bit slower than Beagle. The extra overhead between Beagle2 and Beagle comes from the segmentation faults generated from writing global variables in user program. The performance of Beagle2 will be slower if bound checking is implemented, but it will still be faster than entering signal handlers all the time.

Furthermore, we discover that running the webserver and poll files from

Table 4: Trap Address Conflict

it continuously after an amount of time, the throughput become lower and lower. By running memcheck provided by Valgrind, it reports that there are many memory leak spots in the webserver. Appendix 3 is the output record from memcheck to prove that the memory leakage problem in WsMp3d Webserver exists. Beagle spends more and more time on checking the unfreed garbage on the heap, which is the reason of the slowdown. On the other hand, the asynchronous method we use in Beagle2 does not suffer from memory leakage, because the garbage chunks will not be modified at all. This is another good aspect to support asynchronous checking.

在文檔中運用記憶體保護機制增進資料監控點檢查之效能 (頁 49-54)