System Flowchart - KEY WORDS - STUDENT GESTURE RECOGNITION SYSTEM IN CLASSROOM 2.0

STUDENT GESTURE RECOGNITION SYSTEM IN CLASSROOM 2.0

KEY WORDS

2. System Flowchart

A flowchart of the student gesture recognition system is shown in Figure 2. Once the video sequence frames have been input into the system, the motion pixels and the candidates of the foreground pixels are detected by the system. These two parameters are helpful in identifying the foreground pixels. On the other hand, certain main lines, which indicate the horizontal line of a row of chairs, will also be located. Using the locations of the main lines as constraints, the foreground regions can be extended by considering the identified foreground pixels as the seeds. These foreground regions can then be combined to segment the foreground objects, which are assumed to represent individual students. Finally, a gesture recognition technique is applied to identify various student gestures.

Figure 1. Example of three input sequences.

2.1 Main Line Location

Figure3 shows the flowchart of the main line location process, and Figure4 shows an example.

Once the video sequence frames have been inputted into the system, the system detects the edges using Sobel‟s approach. Figure 4(a) shows one of the input frames, and the edge detection result is shown in Figure 4(b). Subsequently, the system extracts the horizontal edges by implementing the morphological opening operation, using a 5X1 horizontal kernel, and is shown in Figure 4(c). The horizontal edges are projected in the horizontal direction to obtain the numbers of the edge pixels, which can be regarded as the importance of the main line edges.

The system can extract the main line candidates from the bottom of the frame based on the degree of importance. The red lines shown in Figure 4(d) indicate the main line candidates. The system clusters these candidates into different classes and calculates the average locations of these classes to identify the real positions of the main lines. Only three main lines located in the bottom of the frame are extracted and preserved for the following process, as depicted in Figure 4(e). Finally, the system will calculate the locations of the other main lines using a geometric series, as depicted by the green lines shown in Figure 4(f).

2.2 Motion detection

The system can detect motion by subtracting the intensity values of the pixels in the tth frame from the corresponding pixels in the t-1th frame and by calculating the absolute values of the subtraction results. Let the intensity values of a pixel p at time t-1 and t be I_t-1(p) and I_t(p), respectively. Therefore, the magnitude of the motion of this pixel can be defined as

Motion detection

Figure 2. Flowchart of the student gesture recognition system.

Edge detection

Output main line location Main line identification Video sequence input

Horizontal edge extraction

Importance of edge calculation

Main line candidateextraction

Figure 3. The flowchart of the main line location process.

Figure 5 shows an example of the process of motion detection. Figures 5(a) and (b) show the t-1th frame and the tth frame respectively, and Figure 5(c) shows the motion detection results.

2.3 Foreground Pixel Extraction and Identification

The input frames are represented by the RGB color model of a given pixel p, whose R, G, and B values are R_p, Gp, and B_p respectively. The translation equation to calculate the hue value h of the pixel in the HIS model is given by:

. Moreover, the translation equation to calculate the Cr value of the pixel in the YCrCb model is given by:

(a) (b) (c)

(d) (e) Figure 4. An example of the main line location process. (f)

Figure 5. An example of motion detection.

(a) (b) (c)

p frame to form a histogram of the hue and Cr values . Figure 6 (c) shows the Hue-Cr histogram of the image in Figure 6 (a). It is assumed that the background occupies larger regions than the foreground in the classroom. Therefore, the system normalizes and sorts these pixel numbers on the histogram. After normalization, the top 40% of pixels are classified as the background pixels, and the bottom 5% pixels are classified as the candidates of the foreground pixels.

Subsequently, the system identifies the foreground pixels using motion, color, and template information. Given a pixel p, let M(p) indicate the normalized magnitude of the motion, C(p) be the normalized value of the location of the pixel p in the Hue-Cr histogram, and F_t-1(p) indicate the foreground pixel probability of the pixel p at time t-1. The foreground pixel probability of the pixel p at time t can then be calculated by:

) foreground pixel, shown in yellow in Figure 6(b). On the other hand, the top 40% pixels are marked as background pixels, shown in blue in Figure 6(b).

2.4 Region Growing

The Main function and the RegionGrowing function of the region growing algorithm are developed to grow regions. The Main function selects the foreground pixels whose y_axis locations are between the maximum and minimum main lines, and sets these selected pixels as the seeds necessary for the growth of the foreground regions.

Function Main() {



x the set of foreground pixels, and let r = 0, If (the y_axis location of x is between the maximum and minimum main lines){

If (x does not belong to any labelled region){

Label the region number of x as r; neighbours of y. The RegionGrowing function first selects a pixel y from the SSL_queue, whose region

Figure 6. An example of foreground pixel extraction.

(a) (b)

number is ry. All neighbouring pixels, whose properties are similar to those of pixel y, will be classified into the same region. Otherwise, the neighbouring pixels of y will be set as the boundary pixels of region ry. Here, T₁ and T₂ denote the thresholds used to check the pixel properties.

Function RegionGrowing(x){

SSL_queue



^{x ;}

While (SSL_queue is not empty) {

Output a pixel y which belongs to region ry from the SSL_queue;

Figure 7 illustrates the results of the region growing algorithm. The input frame is shown in Figure 7(a), and Figure 7(b) shows the distributions of the foreground pixels (yellow) and the background pixels (blue). The result of the region growing algorithm is shown in Figure 7(c). Notice that the foreground regions are successfully bounded by the background pixels.

2.5 Object Segmentation

The system segments the objects by the process of region combination. Using a graph to represent an object, each region can be regarded as a node of the graph. If two regions are next to each other, then a link will be added to connect these two corresponding nodes in the graph. The weight of this link represents the strength of the connection between these two nodes. Let the two nodes be denoted by n_k and n’_k respectively. The weight of the link can be defined as distance between the centers of these two nodes. Here, we assume that the height of the seated student is not greater than twice the distance between two adjacent main lines. If the distance is greater than twice the distance between two adjacent main lines, then the weight is set to zero.

Figure 7. An example of region growing.

(a) (b) (c)

Moreover, the system increases the weight of the link if the link connects two nodes which have the same neighbors. Figure 8(a) shows an example where the nodes n_k and n’_k have two common neighbors, thus the weight of their link is assigned a value depending on the number of common neighbors. There is also a high probability that these two nodes belong to the same object. On the other hand, the system decreases the weight of the link if the link connects two nodes which do not have any common neighbors. Figure 8(b) shows an example where the nodes n_k and n’_k contain no common neighbors, thus the weight of their link is decreased to a constant value. In addition, there is a low possibility that these two nodes belong to the same object.

Figure 9 shows the results of object segmentation. Figure 9(a) shows the original input frame, Figure 9(b) shows the results of region growing, and Figure 9(c) shows the result of object segmentation. It can be seen that the system selects those objects with substantially large areas, while the smaller objects are ignored.

在文檔中卓越數位學習科學研究中心( III ) (頁 25-30)