1.1. Motivation
Nowadays, portable devices such as mobile phones and PDAs are becoming more and more popular. Since these devices are not only for business, but also for entertainments, rich-multimedia audio-video functionalities are essential for the devices. The Third Generation Partnership Project (3GPP) working group defines a framework for watching streaming multimedia presentation via the unicast Packet-switched Streaming Service (PSS) based on the Real-time Transport Protocol (RTP) [1], or the multicast Multimedia Broadcast/Multicast Service (MBMS) based on the Secure Real-time Transport Protocol (SRTP) [2]. Many embedded multimedia devices are built with heterogeneous multi-processors. For example, the dual-core platforms may contain a microprocessor unit (MPU) and a digital signal processor (DSP). The MPU core is responsible for control while the DSP core is responsible for low level complicated tasks. However, new generations of MPU cores are more powerful and are able to deal with computationally expensive jobs, so we could assign such jobs to the MPU core for overall performance improvement if it is not busy.
In this thesis, we proposed the designed of a streaming player that could watch streaming video via RTSP/RTP/RTCP protocols [3][4] on the TI OMAP5912 platform. The streaming library and the system control module are running on the MPU core while the MPEG-4 Simple Profile video decoding tasks are dynamically assigned to both cores. The decoding task is assigned according to the loading of each core in this approach. From the experimental results, one can see that the performance is improved with dynamic task partition of video decoding job between heterogeneous cores. Therefore, the proposed dynamic partition system is very promising for practical applications.
1.2. Introduction to the OMAP 5912 Starter Kit
The Texas Instruments OMAP 5912 includes the MPU subsystem, the DSP subsystem, and the system direct memory access (SDMA). It is designed for multimedia applications, such as decoding of MPEG-4/H.263 video, MP3/AAC audio, and JPEG images. The MPU subsystem which performs most operation on the chip is based on the ARM926EJ. The DSP subsystem based on the TMS320C5510 is responsible for intensive data computing tasks. Both the MPU core and the DSP core have a maximal frequency at 192MHz. The OSK 5912 is a development board that integrated the OMAP 5912 chip and other peripherals such as 32MB DDR SDRAM, 32MB Flash ROM, an RS-232 serial port, and a 10Mbps Ethernet port, etc.
Figure 1.1 shows the OSK 5912 development board and the Q-VGA display module. The Q-VGA display module is connected to the OSK 5912 board for displaying video frames.
Figure 1.1 - The OSK 5912 development board and Q-VGA display module.
1.2.1. The Memory Map of OSK 5912
Table 1.1 shows the MPU global memory map. The DSP core has a 32KBx16-bit on-chip dual-access RAM (DARAM) and a 48KBx16-bit on-chip single access RAM (SARAM). Table 1.2 gives the DSP global memory map [5]. Note that the MPU core uses byte addressing while the DSP core uses word addressing.
Device Name Start Address End address Signal Size Data access Type External Memory Interface Slow (EMIFS)
CS0 0x0000 0000 0x03FF FFFF 64MB
Boot ROM 0x0000 0000 0x0000 FFFF 64KB 32-bit Ex/R Rserved boot ROM 0x0001 0000 0x0003 FFFF 192KB 33-bit Ex/R Reserved 0x0004 0000 0x01FF FFFF
NOR flash 0x0200 0000 0x03FF FFFF 32MB 8/16/32-bit Ex/R/W
CS1 0x0400 0000 0x07FF FFFF 64MB
NOR flash 0x0400 0000 0x07FF FFFF 64MB 8/16/32-bit Ex/R/W
CS2 0x0800 0000 0x0BFF FFFF 64MB
NOR flash 0x0800 0000 0x0BFF FFFF 64MB 8/16/32-bit Ex/R/W
CS3 0x0C00 0000 0x0FFF FFFF 64MB
NOR flash 0x0C00 0000 0x0FFF FFFF 64MB 8/16/32-bit Ex/R/W External Memory Interface Fast (EMIFF)
SDRAM external 0x1000 0000 0x13FF FFFF 64MB 16-bit Ex/R/W Reserved 0x1400 0000 0x1FFF FFFF
L3 OCP T1
Frame buffer 0x2000 0000 0x2003 E7FF 250KB 32-bit Ex/R/W Reserved 0x2003 E800 0x2007 D7FF
Reserved 0xF000 0000 0xFFFA FFFF OMAP5912 peripherals 0xFFFB 0000 0xFFFE FFFF Reserved 0xFFFF 0000 0xFFFF FFFF
Table 1.1 - The MPU global memory map.
Byte address range Word Address range Internal memory External memory 0x00 0000 - 0x00 FFFF 0x00 0000 - 0x00 7FFF DARAM (64Kbytes)
0x01 0000 - 0x02 7FFF 0x00 8000 - 0x01 3FFF SARAM (96Kbytes) 0x02 8000 - 0x04 FFFF 0x01 4000 - 0x02 7FFF Reserved
0x05 0000 - 0xFF 7FFF 0x02 8000 - 0x7F 8FFF Managed by DSP MMU
0xFF 8000 - 0xFF FFFF 0x7F C000 - 0x7F FFFF PDROM (MPNMC=0) Managed by DSP MMU (MPNMC=1)
Table 1.2 - The DSP global memory map.
1.2.2. Inter-processor communication on OMAP 5912
Three mechanisms are available for inter-processor communication (IPC) between the MPU and the DSP on the OMAP5912 device. These facilities include mailbox registers, the MPU Interface, and shared memory space.
There are four sets of mailbox registers. Two of them are for the MPU core to send messages and issue interrupts to the DSP core, and the other two are for the DSP core to signals the MPU core. There are one 16-bit command register, one 16-bit data register, and one 1-bit flag register in each set of mailbox registers. When the command register is written, it causes an interrupt to the other processor and sets the responding flag register. The interrupted processor could read the command and data registers, and clear the flag.
The second mechanism is the MPU interface (MPUI). MPUI enables the MPU core and the system DMA controller to access the memory-mapped registers of the DSP core and its peripherals such as the SARAM, the DARAM, and the control registers.
The MPU/DSP shared memory via the traffic controller is the third mechanism for IPC.
The MPU core and the DSP core could share accesses to the on-board SRAM and SDRAM if
the DSP memory management unit (MMU) is enabled and configured properly. We will give more details in section 4.2.2.
1.3. Scope of the Thesis
The rest of the thesis is organized as follows. Some previous work related to the design of the proposed streaming server with the dynamic task partition approach is introduced in chapter 2. Chapter 3 presents the architecture and details of the proposed streaming library. Chapter 4 describes the implement details of the dynamic task partition system in the streaming player.
Chapter 5 shows the experimental results, and finally, the conclusion and some discussions will be given in chapter 6.