Synopsis - 利用實驗分析加速嵌入式Linux 2.6.14核心的開機時間

Chapter 1 Introduction

1.3 Synopsis

This thesis is organized as follows. In Chapter 2 and Chapter 3, related work and background are surveyed. In Chapter 4, we analyze the issue of boot time measurement tools, use complex tools and oscillator and logical analyzer to measure boot time, and analyze the measurement result to find long execution time operations.

In Chapter 5, we implement experiments to optimize the lone execution time operations in Chapter4 to achieve faster boot time. Finally, we the conclusions and further work are given in Chapter 6.

Chapter 2 Related Work

There are many exist techniques to improve the boot time of Linux. They improve different parts of full boot process separately. They include the different file system structure of flash storage device [1] [2], the special method to execute kernel [3] [10] and the process control initialization utility [4].

2.1 Snapshot Technique for NOR Flash

This technique stores snapshot to variable-size areas managed by linked lists and sequentially record the location of the stored snapshots to prearranged areas by using an ordered tree data structure.

In Figure 2-1, it can be seen that the first block of flash memory is reserved as a root block which sequentially stores pointers to snapshot header blocks. During the mount_root operation, the last stored pointer can be found quickly using sequential or binary search algorithms. The binary searching divides the root block into two sub-blocks and reads the boundary pointer of these sub-blocks. If the pointer is null, this searching selects the left sub-block. Otherwise, the other one is selected. With the selected sub-block, the above procedure is repeated until the last stored pointer is found. Since the block size (Bsize) is typically 128KB in NOR flash and the size of a pointer to block is 2B (Psize), this search algorithm has a better time complexity of O(lg(Bsize / Psize)) = O(16).

In summary, this technique only reads lg(Bsize / Psize) x Psize + lg(Bsize / Hsize) x

Hsize (=92) bytes in an average case to fine the location of the last stored snapshot, providing an instant lookup time.

However, the author doesn’t release the source code. Therefore we can’t try using this technique.

Figure 2.1 Snapshot Management of Snapshot Technique for NOR Flash

2.2 Erase Block Summary

Erase Block Summary (EBS) is an improvement to speed up the mount process.

EBS stores extra summary information at the end of every (closed) erase block. This information is generated automatically at file system write operations. To make it possible to determine the size of the summary node, there is an 8 byte long summary marker node (jffs2_sum_marker) at the end of erase blocks. At mount time jffs2_scan_eraseblock() reads the last 8 bytes of the erase block during the scan process. If it finds valid sum_marker node, it loads the summary node pointed by the

relative offset stored in sum_marker. All information needed at mount time is stored in this node, so scanning the full erase block is not necessary. It can cause a big speedup, especially at NAND devices. If sum_marker is not found (or invalid) the normal scan process will be applied.

Known the EBS is only existent in JFFS2 image. That is to say, EBS is only existent in the parts of used space and not existent in the parts of unused space.

Therefore, the effect of EBS is limited.

2.3 Kernel Execute-In-Place

Execute-In-Place (XIP) allows the kernel run from non-volatile storage directly addressable by the CPU, such as NOR flash. This saves RAM space since the text section of the kernel is not loaded from flash to RAM. Read-write sections, such as the data section and stack, are still copied to RAM. The XIP kernel is not compressed since it has to run directly from flash, so it will take more space to store it. The flash address used to link the kernel object files, and for storing it, is configuration dependent. Therefore, the proper physical address where to store the kernel image depending on specific flash memory usage must be known.

For OMAP-based platform, Kernel XIP is only effective on OMAP Innovator using rrload. Now, Kernel XIP still not support by U-Boot on ARM-based platform.

2.4 InitNG

InitNG is a full replacement of the old and in many ways deprecated sysvinit tool (init) created by Jimmy Wennlund. It is designed to significantly increase the speed of

booting a UNIX-compatible system by starting processes asynchronously. On boot, initng will be invoked as the first process (pid = 1) by the kernel. At first, initng will parse configuration files located in /etc/initng for critical information such as runlevel and service data. After that, all services required by the default runlevel will be started as soon as their dependencies are met, allowing services to run in parallel. This asynchronous execution can dramatically improve boot time by better utilizing the system resources (especially in the case of multiprocessor systems).

The last version of InitNG is 0.6.7, which still not support for ARM-based platform.

2.5 Summary

In Chapter 2, we introduce many new techniques for improve the boot time.

Some techniques are only absorbed in PPC; some techniques are still not ported to OMAP-based platform and others techniques are only working on specific peripheral and application.

Chapter 3 Background

On the market, the choices of hardware and software for development of mobile device and high-level consumer electronics are very many. To choose a good combination for product which is suitable for the function requirement and high return on investment is the most important.

3.1 OSK5912 OMAP Starter Kit

The OMAP 5912 multiprocessor platform is available in the OSK5912 OMAP Starter kit by Spectrum Digital. The dual-core architecture provides benefits of both DSP and reduced instruction set computer (RISC) technologies [5].

The MPU core is the ARM926EJ-S reduced instruction set computer (RISC) processor. The ARM926EJ-S is a 32-bit processor core that performs 32-bit or 16-bit instructions and processes 32-bit, 16-bit, or 8-bit data. The core uses pipelining so that all parts of the processor and memory system can operate continuously. The MPU core also incorporates the data and program memory management units (MMUs) with table look-aside buffers. To minimize external memory access time, the ARM926EJ-S includes an instruction cache, data cache, and a write buffer. In general, these are transparent to program execution.

The DSP core of the OMAP5912 device is based on the TMS320C55x DSP generation CPU processor core. The C55x DSP architecture achieves high performance and low power through increased parallelism and total focus on

reduction in power dissipation. The CPU supports an internal bus structure composed of one program bus, three data read buses, two data write buses, and additional buses dedicated to peripheral and DMA activity. These buses provide the ability to perform up to three data reads and two data writes in a single cycle. In parallel, the DMA controller can perform up to two data transfers per cycle independent of the CPU activity. A central 40-bit arithmetic/logic unit (ALU) is supported by an additional 16-bit ALU. Using of the ALU provides the ability to optimize parallel activity and power consumption. The OMAP5912 DSP core also includes a 24K-byte instruction cache to minimize external memory accesses, improving data throughput and conserving system power.

The TMS320C55x DSP core within the OMAP5912 device utilizes three powerful hardware accelerator modules which assist the DSP core in implementing specific algorithms that are commonly used in video compression applications such as MPEG4 encoders/decoders. They are DCT/iDCT Accelerator, Motion Estimation Accelerator and Pixel Interpolation Accelerator. These accelerators allow implementation of such algorithms using fewer DSP instruction cycles and dissipating less power than implementations using only the DSP core. The hardware accelerators are utilized via functions from the TMS320C55x Image/Video Processing Library available from Texas Instruments.

The OMAP5912OSK platform also provides rich user interfaces, high processing performance, and long battery life through the maximum flexibility of a fully integrated mixed processor solution. Therefore, the OMAP5912OSK could meet of requirement of following applications:

z Applications Processing Devices z Mobile Communications

WAN 802.11X

Bluetooth　

GSM, GPRS, EDGE

CDMA

z Video and Image Processing (MPEG4, JPEG, Windows Media V　 ideo, etc.)

z Advanced Speech Applications (text-to-speech, speech recognition)

z Audio Processing (MPEG-1 Audio Layer3 [MP3], AMR, WMA, AAC, and Other GSM Speech Codecs)

z Graphics and Video Acceleration z Generalized Web Access

z Data Processing

For the diversified features and applications, we choose OMAP5912OSK as our development platform.

3.2 U-Boot

In an embedded system the role of the boot loader is more complicated since these systems do not have BIOS to perform the initial system configuration. The low level initialization of microprocessors, memory controllers, and other board specific hardware must be performed before a Linux kernel image can execute. At a minimum an embedded loader provides the following features:

1. Initializing the hardware, especially the memory controller.

2. Providing boot parameters for the Linux kernel.

3. Starting the Linux kernel.

Additionally, most boot loaders also provide convenience features that simplify development:

1. Reading and writing arbitrary memory locations.

2. Uploading new binary images to the board's RAM via a serial line or Ethernet.

3. Copying binary images from RAM to FLASH memory.

Das U-Boot is a GPL'ed cross-platform boot loader shepherded by Wolfgang Denk [6] and provides the full functions of above-mentioned requirement. It also provides out-of-the-box support for hundreds of embedded boards and a wide variety of CPUs including PowerPC, ARM, XScale, MIPS, Coldfire, NIOS, Microblaze, and x86. The easy configuration of U-Boot strikes the right balance between a rich feature set and a small binary footprint. Therefore, U-Boot 1.1.3 is the best choice of the boot loader on our implementation platform, and supports for Linux kernel 2.6.

3.3 Embedded Linux

There are many embedded operating system, which are designed to be very compact and efficient, forsaking many functionalities that non-embedded computer operating systems provide and which may not be used by the specialized applications they run. Embedded operating systems include: eCos, Embedded Linux, FreeDOS, FreeRTOS, LynxOS RTOS, NetBSD, OpenBSD, Inferno, OSE, OS-9, QNX, VxWorks, Windows CE and Windows XP Embedded…etc. Among them, Embedded Linux refers to the use of the open source Linux operating system in embedded systems such as cell phones, PDAs, media player handsets, and other consumer electronics devices.

In the past an embedded development was mostly performed using proprietary code written in assembler. Developers had to write all of the hardware drivers and interfaces from scratch. It appeared that the Linux kernel, combined with a small set

of other free software utilities, could be fit into the confines of the limited hardware space of an embedded device. And a typical installation of embedded Linux takes about 2 megabytes. Therefore, we use the embedded Linux kernel 2.6.14 (linux-2.6.14-omap2) [7] [8] as our embedded operating system.

3.4 BusyBox

BusyBox [9] combines tiny versions of many common UNIX utilities into a single small executable. It provides replacements for most of the utilities in GNU, which are archival utilities, coreutils, console utilities, editors, finding utilities and init utilities…etc. The utilities in BusyBox generally have fewer options than their full-featured GNU cousins; however, the options that are included provide the expected functionality and behave very much like their GNU counterparts. BusyBox provides a fairly complete environment for any small or embedded system.

BusyBox has been written with size-optimization and limited resources in mind.

It is also extremely modular so including or excluding commands (or features) is easy at compile time. This makes it easy to customize specific embedded systems. To create a working system, developers just need to add some device nodes in /dev, a few configuration files in /etc, and a Linux kernel. We use BusyBox 1.01 to replace the original big file system of PC running Linux.

3.5 Summary

In Chapter 3, we describe the background of our development platform. It includes powerful OMAP5912OSK, universal U-Boot, the open source embedded Linux and tiny BusyBox.

Chapter 4 Boot Time Analysis

Before starting reducing the booting time, we should understand the boot sequence first. Then measuring the booting time and analyzing the timing result.

Finally, to improve the original embedded operating system as fast booting system.

4.1 Boot Sequence

We can summarize the initial boot sequence of Linux kernel as follows [10] [11]:

1. The boot loader arranges for the kernel to be placed at the proper address in memory. This code is external to Linux source code and usually the first code segment executed once the system is powered on. Finally, this boot loader jumps to execute Linux kernel.

2. Architecture-specific assembly code in Linux kernel performs very low-level tasks, such as initializing memory and setting up CPU registers so that C code can run flawlessly. This includes selecting a stack area and setting the stack pointer accordingly. The amount of such code varies from platform to platform; it can range from a few dozen lines up to a few thousand lines.

3. Function start_kernel is called. It acquires the kernel lock, prints the banner, and calls function setup_arch to configure the system according to the platform's architecture.

4. Architecture-specific C-language code completes low-level initialization,

including interrupt vectors initialization, and retrieves a command line for start_kernel to use.

5. start_kernel parses the command line and calls the handlers associated with the keyword it identifies.

6. start_kernel initializes basic facilities and forks the init thread.

7. init is the first user space application, it does the process control initialization, runs the initialization script and start daemons. Finally it starts the getty processes that put the login prompt.

4.2 Boot Time Measurement Tools

The usual way to look at a program is to start where execution begins. As far as Linux is concerned, it's hard to tell where execution begins - it depends on how you define begins. Therefore we need to use some measurement tools to assist us measuring boot time.

4.2.1 Kernel Function Trace

Kernel Function Trace (KFT) [10] [12] is a kernel function tracing system, which uses the “-finstrument-functions” capability of the gcc compiler to add instrumentation callouts to every function entry and exit. The KFT system provides for capturing these callouts and generating a trace of events, with timing details. KFT is excellent at providing a good timing overview of kernel procedures, allowing you to see where time is spent in functions and sub-routines in the kernel.

The STATIC_RUN mode of operation with KFT is doing configuration for a KFT run and is compiled statically into the kernel. This mode is useful for getting a

trace of kernel operation during system boot (before user space is running).

The KFT configuration lets you specify how to automatically start and stop a trace, whether to include interrupts as part of the trace, and whether to filter the trace data by various criteria (for minimum function duration, only certain listed functions, etc.) KFT trace data is retrieved by reading from /proc/kft_data after the trace is complete.

Figure 4.1 Numeric Trace Data of KFT

Entry Delta Function Caller 0 7813 jffs2_do_read_inode jffs2_read_inode+0x64

Figure 4.2 Symbolic Trace Date of KFT

KFT supplies two useful log analysis tools: addr2sym is supplied to convert numeric trace data (see Figure 4.1) to kernel symbolic trace data (see Figure 4.2), and kd is supplied to process and analyze the data in a KFT trace. By using both tools, the log with function name, execution count, amount execution time and average execution time of kernel routines can be produced. In addition, a log with the trace of

kernel routines in nested (see Figure 4.3) can be produced by using “kd -c”.

Entry Delta PID Trace

--- --- --- --- 0 -1 1 run_init_process

Figure 4.3 Kernel Routines Date in Nested

4.2.2 Printk Times

Printk times [13] is a simple technology which adds some code to the standard kernel printk routine, to output timing data with each message. While a crude status, this can be used to get an overview of the areas of kernel initialization which take a relatively long time. This feature is used to identify areas of the Linux kernel requiring work.

With printk times turned on, the system emits the timing data as a floating point number of seconds (to microsecond resolution) for the time at which the printk started.

The utility program shows the time between calls, or it can show the times relative to a specific message. This makes it easier to see the timing for specific segments of kernel code during boot.

4.2.3 initcall-times patch

Matt Mackall provided an initcall-times [13] patch which measures times for the

initialization of each driver during do_initcalls. This is a special tool to look at the time of initialization of buses and drivers. It times just the initcalls and is enabled by putting “initcall_debug” on the command line. The records of device initializations can be read by dmesg after boot and use grep to find time-consuming initializations (see Figure 4.4).

Calling initcall 0xc000ea6c: ptrace_break_init+0x0/0x2c() initcall elapsed 0.000000s - ptrace_break_init+0x0/0x2c() Calling initcall 0xc000f8d4: consistent_init+0x0/0xb4() initcall elapsed 0.000061s - consistent_init+0x0/0xb4() Calling initcall 0xc0013a30: helper_init+0x0/0x48() initcall elapsed 0.000427s - helper_init+0x0/0x48() Calling initcall 0xc0013b88: ksysfs_init+0x0/0x44() initcall elapsed 0.000122s - ksysfs_init+0x0/0x44() Calling initcall 0xc0015958: filelock_init+0x0/0x54() initcall elapsed 0.000091s - filelock_init+0x0/0x54() Calling initcall 0xc0016320: init_script_binfmt+0x0/0x1c() initcall elapsed 0.000000s - init_script_binfmt+0x0/0x1c()

Figure 4.4 Initcall Log in Kernel Ring Buffer

4.2.4 Expect

Wolfgang Denk provides a expect [14] script do start-to-finish timings by filtering every outputted lines of kermit [15]. The timestamp is refers to the newline character, i.e. to the end of each line. Because this expect script measure the time on host, it depends on clock of host, not the clock of target. Therefore, the time measurement will not make any affection to the target. There is a special parameter called “start_string”, which can be set to reset the timestamp (see Figure 4.5).

5.837 Starting kernel ...

5.837

7.717 Uncompressing Linux...

... done, booting the kernel.

8.794 Linux version 2.6.14-omap2 ([email protected]) (gcc version 3.3.2)

#2 Tue Jul 18 16:06:26 CST 2006

0.008 CPU: ARM926EJ-Sid(wb) [41069263] revision 3 (ARMv5TEJ) 0.019 Machine: TI-OSK

0.070 Memory policy: ECC disabled, Data cache writeback

0.071 OMAP1611b revision 2 handled as 16xx id: 5b058f7948960a0f

Figure 4.5 The Timestamp Resetting

4.3 Measurement Tools Analysis

We must to check the accuracy of different tools on OMAP5912OSK. In order to obtain exact boot time, we use the oscilloscope to measure signals of RS232_TX which represent the console outputs. So we can compare the time before and after using specific tool, and cross check with the records of oscilloscope.

4.3.1 Kernel Function Trace

Since KFT add instrumentation callouts to every function entry and exit. The requirement of system performance will increase in a large amount. Therefore, the execution performance of KFT is limited to the platform. If the performance of specific platform is not enough, KFT will causes huge overhead when doing record.

The timing result of KFT is not correct, because the result includes not only original execution time but also overheads.

In Table 4.1, we observe that the boot time will become 2 times because the performance of OMAP5912 can not meet the requirement of KFT. And most of boot time waste on routine schedule which reschedules tasks schedule when the usage of MPU is almost 100%.

Table 4.1 KFT Activates from start_kernel to to_userspace

在文檔中利用實驗分析加速嵌入式Linux 2.6.14核心的開機時間 (頁 11-0)