Improved IPC Design In Embedded System Via Context Switching

(1)

Improved IPC Design In Embedded System Via

Context Switching

Huang Guo Jen R98922005 (yellowpuppy@gmail.com) Chang Kai Fu R98922086 (r98922086@csie.ntu.edu.tw)

NTU CSIE 1^st-year Master Student

(2)

ABSTRACT

Interprocess communication (IPC) is a common mechanism in many time-sharing operating systems. Most tasks need to rely on it to complete their jobs by cooperation. IPC facilities are indeed important tools that operating systems can’t be without. Therefore, how to make IPC efficient is an important issue to work out.

Especially for a real-time embedded system, the correctness of computation not only depends on logical correctness and also on the time when the result is produced. The amount of interprocess communication may be a lot, which can make a great impact on the performance for the system. Context-switch is involved in interprocess communication and it can be very time-consuming so that degrades system performance. Reducing overhead of context- switch is an effective way to improve IPC performance. It can be achieved through two aspects. One is to adapt library-based IPC design, and the other is to share a singular address space for all processes including kernel. The first one can strip away the necessity context-switch from each interprocess communication and the other can reduce the spended time for context-switch.

Combination of both techniques may produce a great amount of performance improvement. Hence, this report consists mainly of these two parts respectively.

I. LIBRARY-BASED IPC SYSTEM

1. System Structure And Implementation

(In the following, we use RT-IPC to stand for this IPC method.)

RT-IPC is library-based and POSIX compatible. It utilizes related data structure in the user space and file system. Each IPC object is associated

(3)

including the magic number, free list, lock and waiting queue. The figure below is the structure of a user process.

Fig.1 RT-IPC-PCB

The RT-IPC field is not embedded into PCB ( process control block ). So RT_IPC is independent of kernel design. In the structure, magic number is first checked to make sure that it’s working on the correct structure. And a descriptor table, des_tbl[] is established to maintain related information associates with descriptor in process space. This table is used for both regular files and IPC objects that are semaphore or message queue.

In RT-IPC design, the IPC functions and system services such as file system, memory management, device driver, and naming system are designed as library functions. When a process calls a IPC library function, the process first disable the interrupt before accessing any library functions and enable them afterwards. The overhead is only two clock cycle ( CLI and SEI instructions which set the interrupt mask ) on Motorola MC68HC12 micro processor.

This library based system has three layers, and is as figure below. First layer is the kernel which consists of interrupt service routines ( ISR ) , kernel semaphores, and scheduler. The second layer is system services implemented as library functions. And the third layer is user processes.

(4)

Fig. 2

A mutual exclusion mechanism is needed to prevent simultaneous access to shared structures but may result in a priority inversion, that is, a higher priority process owning the resource to release it. And RT_IPC uses a test-and-set mechanism.

2. Performance Improvement

In a message based architecture, system services run as processes. If a user process request a system service, kernel must switch the system and user process back and forth. Therefore it will have more overhead due to the context switch. On MC68HC12 processor, it is 52 cycles compared to 4 cycles in a library-based system.

II. SHARED SINGULAR ADDRESS SPACE ISR Semapho

re

Schedule r Kernel

Library Function

IPC Naming

System File

System

Memory Management Device

Drivers User

Processes

(5)

context-switch, the whole content of virtual cache must be flushed back and invalidated because the content is bound by the address space of an individual process. And the memory access right after context-switch leads to cache miss at each entry of cache, which makes heavy loads. The same problem happens in terms of Translation Look-Aside Buffer (TLB) which keeps records only for the current running process.

2. Experiment Environment

In this report, we analyzed the virtual addressing cache architecture of the ARM9 and compared the time of context switching for uClinux and Linux (same version of kernel 2.6.7). uClinux is a derivation of Linux kernel intended for MMU-less processors. It provides a single shared address space for all processes while the Linux kernel provides a separate virtual address space for each process using hardware MMU. The ARM9 processor features virtually indexed caches and a TLB without address space tag. The structure of the cache and TLB of the MMU based ARM processor is as Fig.3.

For flushing the cache, about 1k ~ 18k CPU cycles needed depend on cache size and side operations needed to fill-up the cache-line and TLB takes up to about 54k CPU cycle. For example, a 200MHz ARM9 processor takes

Fig.3 The Cache and TLB architecture diagram of ARM processor.

(6)

about 270μ s which is a heavy burden for many real-time applications which needs under several tens of μ s response delay.

3. The benchmark programs

“lmbench” is a well-known benchmark program for performance testing over UNIX-based operating systems. In this report, “lat_ctx,” “lat_fifo” and

“bw_fifo” is used with some modifications. “lat_ctx” is for measuring the requirement time for context switching. Creating “N” processes and series of N’s pipes, it constructs “pipe-ring” which links all the processes. Each process accesses its own k KB independent memory and “token” is passed through the next pipe to the neighbor processes, which makes a series of synchronized context switching and measure the cycle delay time. The

“lat_fifo” is for measuring the requirement time for send and receive a token between 2 processes. The “bw_pipe” is for measuring the bandwidth of “pipe” to send and receive through it.

4. Performance Comparison

Fig. 4. The FIFO structure of the modified “lat_ctx”

(7)

III. CONCLUSION

This report brings up two kinds of major technique of implementation in order to improve existing IPC system performance. The proposed library- based IPC system incorporated the aforementioned features could provide an open, efficient and POSIX compliant system. The system services were not implemented as processes but as a set of library functions while the calling process is still kept in the running state. Consequently, the overall real-time system performance was improved by avoiding processes switching.

At the second part, compared the context switching time and IPC performance of uClinux and Linux on the same hardware platform with ARM9 core. With the series of benchmark programs, uClinux showed much improved performance of context switching delay and IPC than Linux. This comes from the virtual address usage for cache architecture and the virtual address space support of Linux kernel which needs invalidation of the whole caches which makes a fixed amount of cache- miss load whenever switching the contexts of processes. uClinux which supports singular address space boosts the efficiency of cache even if context switching occurs and dramatically reduced the required delay.

uClinux showed much better performance on the IPC performance as well.

As a result of proposals above, we can improve IPC system performance by reducing the amount and the overhead of context-switch which is a major factor affecting performance in real-time embedded systems. Even though great improvements are suggested in the report, there are many concerns about such as development, reliability and protection issues which needs to work out. Library-based system calls lack in modularity and may cause

Table 1. THE RESULTS OF THE IPC PERFORMANCE OF LINUX AND UCLINUX

(8)

support memory protection. Hence, in spite of the attractive improvement which these techniques can bring, there are still many tradeoffs needed to be evaluated carefully to meet individual system needs to achieve best performance.

R

EFERENCES

1. Hyok-Sung Choi, Hee-Chul Yun., “Context Switching and IPC Performance Comparison between uClinux and Linux on the ARM9 based Processor” (Software Platform Lab, Digital Media R&D Center, Samsung Electronics)

2. Hosein Marzi, Larry Hughes, Yanting Lin, “Embedded Systems with Improved Interprocess Communication Design”, 2009 7th IEEE International Conference on Industrial Informatics (INDIN 2009)