• 沒有找到結果。

Linux Kernel Initialization

Chapter 2 RELATED WORK

2.3 Linux Kernel Initialization

The overview of the boot process of an Intel x86-based personal computer is how the initialization routine provided by the operating system running. Generally, there are three stages to the booting process. When a PC is powered on, the BIOS (Basic Input-Output System) run first followed by a boot loader and finally the operating system initialization routine.

The BIOS is the first code executed by the processor when boot-up. When power is initially applied to the computer this triggers the RESET pin on the processor. This causes the processor to read from memory location 0xFFFFFFF0 and begin executing the code located there. This address is mapped to the Read-Only Memory (ROM) containing the BIOS. The BIOS must poll the hardware and set up an environment capable of booting the operating system.

Once the BIOS load the first sector of the boot device into RAM, the boot loader begins execution.

After the chosen boot loaders [31] [32] has run, it loads the Linux kernel image as shown as Figure 2, typically named vmlinuz-[version number] for a compressed kernel image and vmlinux-[version number] for an uncompressed image. A compressed kernel image will have the Linux boot loader, found in

/arch/i386/boot/bootsect.S, located at the very beginning of the image. After the boot loader execute the assembly code, the code runs assembly level initialization, like reinitialize all hardware, switches CPU from real mode to protected mode, fills the bss segment of the kernel with zeros and final jump to assembly function startup_32(). The code runs high-level initialization till process 1 executing, and then the Linux kernel initialization is complete.

Figure 2: System Boot-up Memory Map

Chapter 3

MULTIMEDIA DEVICE

In designing embedded systems, especially the consumer electronics, the boot time becomes a major challenge. As streaming multimedia applications become popular in consumer devices, the multimedia device is the application for boot time analysis.

3.1 Product Specification

There are many multimedia devices on the marked. Before making the decision of using which platform on the product, the PRD (Product Requirement Document) from the customer is the major reference. For the multimedia application, there are some minimum hardware and software requirements [23]:

1. Processor Requirement: Processors are the main brain of the product. For more and more fancy features, powerful processor is necessary. From software point of view, ARM based processors are more popular and has more resources. So ARM version 5TE (v5TE) architecture, which support DSP-Enhanced instruction set, is what product needed.

2. Display Requirement: LCD supported is important for multimedia requirement. For most popular multimedia format, we at least require 262144 colors (18-bits), 640x480 resolution of LCD module.

3. Memory Requirement: According to the LCD requirement and related software features for multimedia device, the product’s minimum system memory requirement is 64MB and minimum flash memory requirement is 32MB.

4. Storage Requirement: The most cost effective storage memory card, which also have small form factor, is MMC/SDIO memory card. So MMC/SDIO card supported is also required.

5. Networked Interfaces Requirement: For internet access or remote multimedia display, the product needs 10 or 10/100 Ethernet port supported or 802.11b/g WLAN supported.

6. Peripheral Requirement: IrDA and USB are minimum requirements of peripheral interfaces.

7. Operating System and Other Software Feature Requirements: OS could be Linux. Other software features are not mainly related to the boot time analysis, so the features are not listed here.

3.2 Platforms

According to the product specification of multimedia device, there are two kinds of platform for our selection. One platform is the dual-core architecture.

One main processor is used for normal computing and another DSP core is specially handle the multimedia data computing. The other platform is the single-core architecture. It has one powerful processor to handle all data computing and by software codec with strong computing power to handle the multimedia data.

According to the two kinds of the platforms, we select two platforms, which meet the minimum requirement of multimedia device, for boot time analysis. One is TI OMAP5912, and the other one is Intel PXA270. The platform specifications of the related evaluation boards are described below:

3.2.1 OMAP5912

z Board

TI OMAP5912osk (OMAP Starter Kit) is a highly integrated evaluation board, designed to meet the application processing needs of next-generation embedded devices. The dual-core architecture of the OMAP5912 provides benefits of both DSP and reduced instruction set computer (RISC) technologies. It is incorporating a TMS320C55x DSP core and a high-performance ARM926EJ-S ARM core. There are rich peripherals reserved on the boards which could meet the requirements of different products.

z Processor

– The DSP core of the OMAP5912 device is based on the TMS320C55x DSP generation CPU processor core. The C55x DSP architecture achieves

high performance and low power through increased parallelism and total focus on reduction in power dissipation. The CPU supports an internal bus structure composed of one program bus, three data read buses, two data write buses, and additional buses dedicated to peripheral and DMA activity. These buses provide the ability to perform up to three data reads and two data writes in a single cycle. In parallel, the DMA controller can perform up to two data transfers per cycle independent of the CPU activity.

– The ARM926EJ-S processor is a member of the ARM9 family of general-purpose microprocessors. The ARM926EJ-S processor is targeted at multi-tasking applications where full memory management, high performance, low die size, and low power are all important. The ARM926EJ-S processor supports the 32-bit ARM and 16-bit Thumb instruction sets, so it provides the user to trade off between high performance and high code density. ARM926EJ-S processor implements ARM architecture version 5TEJ.

3.2.2 PXA270

z Board

The MT Creator PXA270 is a highly integrated evaluation board. It includes the integrated system-on-a-chip microprocessor for high performance, dynamic, low-power portable handheld and hand-set devices as well as embedded platforms. The processor incorporates the Intel XScale technology. The processor also provides Intel Wireless MMX media enhancement technology, which supports integer instructions to accelerate audio and video processing. In addition, it incorporates Wireless Intel Speedstep Technology, which provides sophisticated power management capabilities enabling excellent MIPs/mW performance. There are also rich peripherals reserved on the boards which could meet the requirements of different products.

z Processor

The Intel XScale core is an ARM V5TE compliant microprocessor. It has been designed for high performance and low-power; leading the

industry in mW/MIPs. The core is not intended to be delivered as a stand alone product but as a building block for an ASSP (Application Specific Standard Product) with embedded markets such as handheld devices, networking, storage, remote access servers, etc.

3.3 Comparison

According to the device specifications of OMAP5912 and PXA270, there are some comparisons of device specification in Table 2 and Table 3.

In Table 1, both processors are all ARM version 5TE (v5TE) architecture compliant. XScale has large size cache, 7 stage pipeline and higher processor frequency.

In Table 2, OMAP5912 has additional DSP core for multimedia data, but PXA270 supports Intel Wireless MMX for additional media instructions.

OMAP5912 support hardware accelerators for cryptographic and PXA270 use software features to handle. Both platforms support minimum memory requirements (32MB flash and 64MB SDRAM). Although OMAP5912 use 16-bit memory bus, but it supports Mobile DDR RAM, which sampling the data at rising and falling edge of memory clock then it gets the 16-bit data twice per clock cycle.

Other peripherals of both are all meet the requirements of product specification of multimedia device. The difference on network interface is OMAP5912 supported 10Mbps and PXA270 supported 10/100Mbps.

Table.1 Comparison of ARM926EJ-S and XScale

ARM926EJ-S Core XScale Core

ISA ARM V5TEJ ARM V5TE

Pipeline 5 stage pipeline 7 stage pipeline 16 KB i-cache 32 KB i-cache Cache

8 KB d-cache 32 KB d-cache Clock/Freq. Max. 192 MHz Max. 520 MHz

Table.2 Comparison of OMAP5912 and PXA270

TI OMAP5912 Intel PXA270

Multimedia TMS320C55x DSP Core Intel Wireless MMX 16-bit Mobile DDR SDRAM

(max. 64MB)

32-bit SDRAM (max. 1GB) Memory

16-bit Flash (max. 256MB) 16-bit Flash (max. 384MB) USB 1.1 Client USB 1.1 Client LCD 16-/18bit LCD Controller 18-bit LCD Controller

Card Slot SD/MMC SD/MMC/MS

Keypad Keypad I/F Keypad I/F

LAN 10 Mbps 10/100 Mbps

3.4 Analysis

3.4.1 Processor Architecture

Refer to Figure 3 [22] and Figure 4 [15], ARM926EJ-S Core is the 5-stage pipeline architecture, and XScale Core is the 7-stage super-pipeline architecture.

So XScale Core has the better performance. XScale Core also has larger cache size and higher clock frequency. So doing the comparison from the processor architecture, XScale Core has the better performance than ARM926EJ-S.

Figure 3: ARM926EJ-S block diagram

Figure 4: XScale block diagram

3.4.2 Platform Architecture

OMAP5912 is the dual-core architecture, so there is the multi-level bus architecture. According to the block diagram of OMAP5912 on Figure 5, there are many different buses: MPU bus, DSP bus, DMA bus, MPU public/private peripheral bus and DSP public/private peripheral bus. This architecture could get the better performance when each subsystem access different components. A multi-level bus architecture could reduce the resource conflict and interference when access the same bus.

Figure 5:OMAP5912 block diagram

PXA270 is the single-core architecture, so its bus architecture is simpler than OMAP5912. From Figure 6, it has system bus and peripheral bus. There are six clients on system bus: the core, the DMA controller, the LCD controller, the USB host controller, and the two memory controllers (internal and external). Most of all peripherals are on peripheral bus connected to the DMA controller. Even it

support programmable weight on system bus arbitration, the single bus is still the bottleneck of the architecture.

PXA270’s memory controller could be use to connect external ASIC. When ASIC and PXA270 have busy communication, it will affect the performance of memory controller portion.

Figure 6: PXA270 block diagram

3.4.3 Bottlenecks

From boot time point of view, we would like to analysis which portion will be the bottleneck of each platform.

OMAP5912 has more flexible bus architecture, but the relationship between TC (Traffic Controller) clock and MPU clock, and Flash clock and TC clock will be the bottleneck of boot time. Max. TC clock is half of max. MPU clock so max. TC clock is 96 MHz. But max. MPU clock is 192MHz. Max. flash clock is 48MHz and it is always 1/6 of TC clock. MPU TIPB (TI peripheral bus) (public and

private) also has max strobe freq., 48MHz.

PXA270 use one system bus for many subsystems. So the system bus will be bottleneck. It also will be the bottleneck on peripheral bus. There are too many peripherals connected to peripheral bus, and these peripherals take more time when doing hardware initialization. Its memory controller also uses to connect to external ASIC. This will be the bottleneck of memory access when the external ASIC and PXA270 have busy communication.

Chapter 4 LINUX BOOT-UP

A computer system is a complex machine [16], and the operating system is an elaborate tool that orchestrates hardware complexities to show a simple and standardized environment to the end user.

Currently Linux is the most popular operating system because of its open source policy. For embedded systems, Embedded Linux is the cost effective operating system which refers to the open source Linux. We’ll introduce the normal Linux boot-up steps on PC and embedded systems, then boot factors and boot sequence of Embedded Linux will be discussed.

4.1 Overview

The PC is more widespread use than other platforms so Linux boot-up steps on the x86 PC will be introduced first. In order to be able to use the computer when the power is turned on, the processor begins execution from the system's firmware.

It is called the Basic Input-Output System (BIOS). BIOS functionalities are Power-on Self Test (POST), system configuration set-up and execution code from boot device. Then boot-loader located at boot device is loaded by BIOS services.

The boot-loader’s major features are basic hardware initializations, uncompression/execution of the kernel image. After boot-loader transfer control to kernel [17], kernel will do the whole system initializations and then execute the user space program from the file-systems. Then system is ready for user.

The embedded system is always resource constrained. There is no any BIOS on the embedded system. The BIOS is substituted by power-on strapped pins or internal boot ROM of the processor. Because the storage and memory are also limited, the boot-loader, kernel subsystems, file-system are modified to suit for embedded systems. The Linux boot-up steps on the embedded systems are from boot-loader execution, to kernel image copied to ram, uncompressed then execution and finally load program from root file-systems at user space.

4.2 Boot Sequences

The boot sequence could be divided into four stages: Hardware Initialization, Boot-loader, Kernel and User Space (Figure 7). For more detail analysis, each stage could be subdivided into more phases:

Figure 7: Boot Sequence main stages

4.2.1 Hardware Initialization

There are two phases in the hardware initialization stages (Figure 8):

z Phase 1: The time for the MPU reset

It is measured from Power-On (Vin stable) to signal MPU_Reset de-asserted.

From the Vin becoming stable (MPU_nReset low) to the signal MPU_nReset becoming high. When the power input of the MPU is stable, oscillator input of MPU is also stable then MPU exit reset mode.

z Phase 2: The time for the MPU initialization and the Peripheral reset

It is measured from signal MPU_Reset de-asserted to signal MPU_RST_OUT de-asserted.

From the signal MPU_nReset becoming high to the signal MPU_nRST_OUT becoming high. MPU exit reset mode then doing simple hardware configuration by reading power-on strap-pins or reading internal boot ROM. After finishing the configuration, then the nRST_OUT signal to other peripherals is de-asserted.

MPU read the 1st instruction.

Figure 8: Hardware Initialization

4.2.2 Boot-loader Stage

There are four or five phases in the boot-loader stages (Figure 9). When the kernel image is uncompressed at kernel stage, there are only four phases. When

the kernel image is uncompressed by the boot-loader, there are five phases in the boot-loader stage:

z Phase 3:

It is measured from the signal MPU_RST_OUT de-asserted to function env_relocate_spec() finished.

From the signal MPU_RST_OUT becoming high and to the last signal of Flash_CS for function env_relocate_spec() finished (the last signal Flash_CS rising edge before env_relocate_spec() finished). MPU read the 1st instruction to do simple hardware initialization then U-boot starts and prepares to execute the first function which access flash. Then the environment parameters of U-boot are relocated.

z Phase 4:

It is measured after the function env_relocate_spec() finished and before the kernel image checksum verify starting.

z Phase 5:

It is measured from the kernel image checksum verify starting to copy image to ram finished.

From the signal RS232_TX of function image checksum verify start to the signal RS232_TX of function copy image to ram over. U-boot verifies the checksum of kernel image. If the checksum is correct, then U-boot copies the kernel image from the flash to system memory.

4.2.2.1 Case 1-Kernel Uncompress Image

z Phase 6:

It is measured from copy image to ram finished to the function cleanup_before_linux() finished.

From the signal RS232_TX of function copy image to ram over to the signal RS232_TX of function cleanup_before_linux() finished. Boot-loader transfers the control of system to Linux kernel.

4.2.2.2 Case 2-Boot-loader Uncompress Image

z Phase 6’:

It is measured from copy image to ram finished to uncompress kernel image finished.

From the signal RS232_TX of function copy image to ram over to the signal RS232_TX of uncompress kernel image finished. Boot-loader uncompress the kernel image.

z Phase 7’:

It is measured from uncompress kernel image finished to the function cleanup_before_linux() finished.

From the signal RS232_TX of uncompress kernel image finished to the signal RS232_TX of function cleanup_before_linux() finished. Boot-loader transfers the control of system to uncompressed Linux kernel.

Figure 9: Boot-loader stage

4.2.3 Kernel Stage

There are three or four phases in the kernel stages (Figure 10). When the kernel image is uncompressed at boot-loader stage, there are only three phases. When the kernel image is uncompressed at the kernel stage, there are four phases in the kernel stage:

4.2.3.1 Case 1-Kernel Uncompress Image

z Phase 7:

It is measured from the function cleanup_before _linux() finished to before the uncompress kernel starting.

From the signal RS232_TX of function cleanup_ before_linux() finished to the signal RS232_TX of function Uncompress kernel start. Linux kernel gets the controls and prepares to uncompress kernel.

z Phase 8:

It is measured from the uncompress kernel starting to the uncompress kernel finished.

From the signal RS232_TX of function Uncompress kernel start to the signal RS232_TX of function Uncompress kernel over. Linux kernel image is uncompressed and prepare to start the kernel.

4.2.3.2 Case 2-Boot-loader Uncompress Image

z Phase 8’:

It is measured from the function cleanup_before _linux() finished to before start_kernel.

From the signal RS232_TX of function cleanup_ before_linux() finished to before start_kernel. Linux kernel gets the controls and prepares to start the kernel.

z Phase 9:

It is measured from the uncompress kernel finished to before file-system initialization/built.

From the signal RS232_TX of function Uncompress kernel over to the signal RS232_TX of function File-system built/fill super start. Linux kernel uncompress and execute routine start_kernel, Linux kernel doesn’t access the flash until the routine mount_root.

z Phase 10:

It is measured from before file-system initialization/built to before Invoke /sbin/init.

From the signal RS232_TX of function File-system built/fill super start to the

signal RS232_TX of function File-system built/fill super over. Root File-system is built by kernel.

Figure 10: Kernel stage

4.2.4 User Space Stage

There are two phases in the user space stages (Figure 11):

z Phase 11:

It is measured from Invoke /sbin/init to before RC script start.

From the signal RS232_TX of function Invoke init to the signal RS232_TX of function RC Script start. Linux kernel invokes the sysvinit tool: /sbin/init then init_main started for user space and prepares to run RC Script.

z Phase 12:

It is measured from RC script start to shell prompt output finished.

From the signal RS232_TX of function RC Script start to RC Script starts several daemons. Then RC Script is finished and shell prompt is enabled.

Figure 11: User Space stage

4.3 Impact Factors

In this thesis we would like to investigate the Embedded Linux boot process and find the related factors of reducing boot time. According to the platforms comparison and Linux boot procedure analysis , we induce that both hardware factors, including processor frequency, memory and I/O access speed, and

In this thesis we would like to investigate the Embedded Linux boot process and find the related factors of reducing boot time. According to the platforms comparison and Linux boot procedure analysis , we induce that both hardware factors, including processor frequency, memory and I/O access speed, and

相關文件