Human Shape Recognition - Intelligent Human Detection

Chapter 3 Intelligent Human Detection

3.3 Human Shape Recognition

and the example of HOG is shown in Fig-3.9.

Fig-3.9 the example of HOG

After the HOG descriptors are extracted and computed, they are used as training patterns of human recognition system to determine whether any human is in the image or not.

3.3 Human Shape Recognition

In this section, the system has to judge whether the ROI contains human based on the extracted features. For higher detection rate and the ability of variation tolerance, this thesis adopts Leeds Sport Pose (LSP) dataset [36] which contains 2000 pose images of mostly sports people gathered from Flickr. Some examples of original LSP dataset are shown in Fig-3.10. Human shape recognition based on LSP dataset

could be divided into pre-processing and shape recognition. First, pre-processing includes data normalization and the classifier establishment. Then, shape recognition would be implemented through the classifier.

Fig-3.10 some examples of original dataset

In pre-processing, LSP dataset first is transferred to gray images, and then normalized into the size 64×128. Fig-3.11 shows some examples of normalized dataset. The examples in Fig-3.11 have same size no matter how large it is, so the example may be shrunk and zoomed.

Fig-3.11 some examples of normalized dataset

After every data is normalized, the features of normalized dataset would be extracted by HOG descriptors, and the vector length of feature is 3780. In other words, an input of recognition system should have 3780 dimensions. After the features of

dataset are extracted, the recognition approach would consider these features as positive training data and establish the classifier based on the class of training data. In this thesis, there are two approaches, including support vector machine (SVM) and artificial neural network (ANN). Because two approaches are supervised learning, the training data of humans and non-humans are required. Therefore, the HOG features of LSP dataset would be considered as positive training data of SVM, and the system collects 1328 non-human image as dataset randomly. These two approaches would be introduced below and their performances would be compared in Chapter 4.

 Approach 1: Support vector machine

In this section, the training data could be described simply in Fig-3.12, where the rectangles represent negative training data of SVM and the circles represent positive training data of SVM. One symbol implies the feature vectors and the feature vectors could be described as

[

1 2 3780

]

bin bin bin T vec =

feature  (3.11)

where bin is the result of HOG descriptors computing.

Fig-3.12 the distribution of training data of SVM

After the HOG descriptors are computed, the equations from (2.13) to (2.18) could be used to find out the maximum margin in Fig-3.12 and the optimal hyperplane.

The optimal hyperplane is considered as the classifier for human shape recognition of SVM. Therefore, the classifier would be established and shown in Fig-3.13.

Fig-3.13 the classifier for human shape recognition of SVM

 Approach 2: Artificial neural network

The second approach is using neural networks to classify the whole dataset, including positive training data and negative training data. The concept of ANN is similar to SVM, since ANN adopts dichotomy to train the classifier as well. The weights of neural network would be adjusted through the process of learning introduced in Section 2.2. After learning, the human can be recognized according to the output value of neural networks, and the neural network will be introduced below in detail.

The structure of neural networks is shown in Fig-3.14, which contains one input layer with 3780 neurons, first hidden layer with 420 neurons, second hidden layer with 30 neurons and one output layer with a neuron. The values of feature vector are

sent into the neural network as inputs. The 3780 neurons of the input layer are second hidden layer with weighting ²( , )

W q r . Therefore, there exist a weighting array ¹( , )

W p q of dimension 3780× 420 and a weighting array WS²_I^{( , )}q r of dimension 420×30. Besides, the q-th neuron of first hidden layer is also with an extra bias ¹ ( )

b q , and the r-th neuron of second hidden layer is also with an extra bias

2 ( )

b r Finally, the r-th neuron of second hidden layer is connected to output neuron with weighting ³( )

W r , r=1,2,…,30, and a bias ³

b is added to the output neuron.

Fig-3.14 the structure of neural networks

Let the activation function of first hidden layer be the hyperbolic log-sigmoid

transfer function and the output of q-th neuron ¹ ( )

and let the activation function of second hidden layer be the hyperbolic log-sigmoid transfer function and the output of r-th neuron ² ( )

Thus, let the activation function of the output layer be the linear transfer function and the output is expressed as

and the process of above operations are shown in Fig-3.15. Therefore, the result of ANN would be 0 or 1, or the result represents the classifier for human shape recognition.

Fig-3.15 the process of ANN

As to shape recognition, the feature vector of test image would be sent into the classifier of SVM or ANN, and the classifications of feature vector are only human classification and non-human classification. As a result, the system could judge the ROI whether includes the human or not.

在文檔中基於色彩及深度影像之即時人形偵測系統設計 (頁 46-52)