Face Recognition based on OpenCV and GPU Hung-rei Chen

(1)

Face Recognition based on OpenCV and GPU

Hung-rei Chen^∗ Computer Science National Taiwan University

PASLab

Edison Chen^† Computer Science National Taiwan University

ECLab

Abstract

In this paper, we implement a high performance real-time Face Recognition System by applying GPU(Graphics Processing Units).

We explore the algorithm [Feifei Lee and Ohmi 2009]using APIDQ (Adjacent Pixel Intensity Difference Quantization)to gernerate histogram and take advantage of OpenCV (Open Source Computer Vi- sion) library in processing face images and setting databases. The databases created from AT&T and our faces, for easily access the databases we also define some methods manage it. We will discuss issues in providing some (while not perfect) parallel data processing at CUDA programing. Experiment result show our system im- prove about 4 times, and recognition much more faces in the same execution time.

CR Categories: I.5.4 [Computer Vision]: Face recognition—;

Keywords: GPU, CUDA, openCV, real time face recognition Links:

1 Introduction

Face recognition has been studied for the recent year as an active research area. Especially, after the September 11 terrorist attacks on the United States. In the research of face recognition, real-time face recognition set much more strict requirements on the recognition time, especially when the number of data in database increased and pixels of camera .

Nowadays, GPU is being developed to provide parallel computing for the goal of faster computation. In our work, we introduced the algorithm mentioned in [], which is called APIDQ, cooper- ates with GPU to realize the real-time face recognition system. We construct this real-time face recognition system mainly based on CUDA (Compute Unied Device Architecture) and OpenCV. And we do get some speedup compared to the CPU version.

In our research, we have CPU run face recognition when the database size is 95000.If there is one face in the camera captur- ing and the size of image is 640 x 480, the FPS will drop to 1 2 and average recognition time is 520 ms.The video will be not smooth.

It lost lots of frames and very hard to see clearly.

∗e-mail:ismail2071@gmail.com

†e-mail:bushwei@hotmail.com

First, we leverage OpenCV with face detection. OpenCV can tag the faces in the vedio. Then, the faces will process with the APIDQ algorithm. APIDQ algorithm uses the counted histogram of whole human face as the feature vector to identify the people and the location information of face are unused. Furthermore, the histogram data will be registered to database or match with database.

OpenCV (Open Source Computer Vision) is a library of program- ming functions for real time computer vision. The latest released version is 2.2.9 which gains the ability to compute with the CUDA support. There are more and more functionalities of OpenCV are getting supported by CUDA, and the performance of the CUDA version is optimized by using the share memory on the GPU.

This paper use CUDA for face detection and database matching, we will discuss how to use CUDA computing power and memory allocate. We developed both CPU serial code and GPU parallel code to compare the execution time in each case and measure the speed up achieved by the GPU over the CPU.After experiment, our highest speedup achieved was 6x. Face detection was about 6x and recognition was 3.5x per each face.

This section is followed by the section 2, which describes the details what the CUDA working in our system, as the architecture for the use of GPU ; section 3 shows our test machine; section 4 shows the expriment result, and conclude this paper in section 5.

2 Method

We construct our face recognition system mainly based on CUDA and openCV. For the face recognition algorithm, we choose to implement APIDQ [], which is a statics-based approach. In this approach, human face is treated as a 2D pattern of intensity varia- tion. Figure 1. shows the flow chart of our system. As we can see, the whole process is comprised of ”Face detection”, ”APIDQ” and

”Database matching”. We will focus on ”APIDQ” and ”Database matching” since the face detection part has done by openCV and had performance speedup 6x on our test machine.

2.1 APIDQ

APIDQ is a statics-based approached face recognition. In APIDQ, the intensity difference of the horizontally adjacent pixels (dIx) and the intensity difference of the vertically adjacent pixels (dIy) are first calculated by using simple subtraction operations shown as for- mula (1) and (2).

dIx(i, j) = I(i + 1, j) − I(i, j) (1)

dIy(i, j) = I(i, j + 1) − I(i, j) (2) The dIx-dIy coordinates then change to the polar coordinates, and quantized according to their r and θ. Here we choose a simple approach, we set 8 levels in r and 12 levels in θ. Namely, totally 76 bins. It should be noted that we use 1, 2, 4, ..., 256 to quantize the

(2)

Figure 1: system flow chart

r coordinate. After the histogram was generating, we then check what the users want, i.e learning or matching.

we had tested our implemented APIDQ with the AT&T database [] just like the paper we referenced []. There are totally 40 per- sons and 10 pictures for each person. We choose 240 pictures for learning, and 160 for testing. Since we focus on the real time and number of faces that can be recognized, so here we just simply test the recognition result. The recognition accuracy is good for AT&T database. However, AT&T database contains the picture which is normalized and variations in lighting, posing, and expres- sions. Such a good conditions seems hardly appeared in real time face recognition, hence the result of real time face recognition may not as good as we have in AT&T. Next section we will describe the algorithm of how we handle the histogram matching.

2.2 Database matching

After we generate the histogram, we then match it with the database or add it to database. It is unavoidable to match all faces in the database since we just implement a simple histogram database. But we think it is hard to do something like sorting to the large histogram data with multi-features. When the number of faces in the database get larger, the time needed to match get longer. And this induces a problem that we may miss the time constraints or may limits the number of faces that can be recognized. This problem is crucial in our real time system.

We introduce GPU here. We copy a large amount of data in database and compare with the histogram we just generated. If we compared each picture sequentially, we lose the parallel computing abilities of GPU, since each thread handles just one features of histogram. Without shared memory, GPU’s performance may not so obvious. Instead, we let each thread handles a block of features over faces. To do this, we first duplicate input histogram to the same amount of faces in database. We first carry out the sim- plest approach ”Iterative duplication”. But this method involves too many data movement which lowers the overall performance of matching. We instead copy the data exponentially like figure 3. By this method, we decrease the total data copying commands signif-

icantly. So, the overall performance get better, approximately 6x compare to the iterative method.

Figure 2: binary tree alike copying data

We concurrently subtract each database faces with input duplicate image. With the usage of share memory, each thread put partial data into the share memory and subtract with the corresponding features.Our database matching use ”Manhattan distance” to measure the similarity between each face. Finally we find the minimum value and location by openCV, which used a lot of reduction skills.

And in our test, the optimized version has 6x speedup than iterative version and 3 4x speedup than CPU version.

3 Experiment setup

We implemented the system both on CPU and GPU, and we empha- sized that these two versions use the same algorithm. The main difference is that GPU’s involves data communications between CPU and GPU. Hence we optimized the part of data communications to minimize the data copying overhead.Finally, table 1 summarized the machine we test on.

CPU Intel(R) Celeron(R) E3200 @ 2.40GHz

GPU GeForce 9600 GT

Meomory 2GB

Database 10K, 50K and 100K faces Table 1: Experiment machine

4 Result

In this section, we briefly described the performance comparison between CPU and GPU. We show the database matching performance data and overall system performance data. Table 2 pro- vides the average time needed for database matching when there are 10000, 50000 and 100000 faces in the database. As we can see, our GPU version outperformed than CPU’s about 2 4x. One can find that when the number of faces is small, the performance gap between CPU’s and GPU’s get smaller. And we noted that when the number of faces is very small, CPU’s may outperform than GPU’s.

The number of faces do affect the performance of overall system, including number of faces can be recognized and the frames per second. When database contains 100000 faces, CPU version needs 83.3 ms to recognize a face while GPU’s can recognize about 3 faces in the same time period.

Figure 3 shows the overall performance in frame per seconds(FPS).

The horizontal coordinate denote for the number of faces in the database. As we can see, the FPS of GPU in 10000 faces is 6x faster than CPU’s. We have already showed the time needed to recognize a face. The database matching time doesn’t enlarge the gap between GPU and CPU, but the FPS has the highest improvement in 10000

(3)

10000 50000 100000 CPU 10.097 39.16 83.3 GPU 5.7911 16.797 26.513

Table 2: database matching performance for one face in CPU and GPU (in millisecond)

Figure 3: speedup comparison for our CPU and GPU version

faces. This is because openCV does a good optimization on the face detection. The face detection with GPU is 6x higher than the CPU’s, so when the number of faces increased , the performance speedup get more closely to 2 ∼ 4x.

5 Conclusion

Face recognition is a hot topic over these years. The real time face recognition set a hard constraint on both number of faces that can be recognized at a time and so frames per second. In our work, we constructed a face recognition system based on the face recognition algorithm mentioned in [Feifei Lee and Ohmi 2009] with the aid of CUDA and openCV. It shows a good performance compare to the one implemented by CPU and openCV. And we do have some op- timizations to the GPU version, since we found the memory impact the performance much.

Finally, we conclude our work by reminding that memory band- width is a critical issue which directly affect the overall performance. And the database matching phase comprises a lot of memory request. So it is important for us to further optimize this part.

Acknowledgements

To Prof. Wei-chao Chen.

References

FEIFEI LEE, KOJIKOTANI, Q. C.,ANDOHMI, T. 2009. Face recognition using adjacent pixel intensity difference quantization histogram. vol. 9, 147–153.