使用動態最佳化之參數設定辨識成人影像群組之研究

(1)

Recognizing Adult Image Groups Using Dynamically

Optimized Parameter Settings

Hung-Ming Sun

Department of Computer Science and Information Engineering, Kainan University E-mail: sunhm@mail.knu.edu.tw

Abstract

The recognition accuracy of adult image groups depends on the performance of the adult image recognizer and the final decision rule. Earlier methods of recognizing adult image groups do not take into account the performance tuning of the adult image recognizer but only focus on the decision rule. The proposed method considers the two factors together and resolves optimal parameter settings to achieve the best recognition accuracy for image groups. Experimental results show that the proposed method can attain higher recognition accuracy than the earlier methods.

Keywords ： adult image recognition; image group classification; web site

classification; neural network

(2)

1. Introduction

As the Internet usage becomes more and more popular and the large amount of adult information (text, images, videos, etc.) exists in World Wide Web sites, how to screen children from accessing such offensive material has become an active research topic. A number of methods have been proposed for detecting adult images in the literature. Forsyth et al. [1] used a skin filter to mark skin-like pixels based on color and texture properties. The edge points of the segmented skin areas are connected to form ribbons by means of a proposed algorithm. Then the ribbons are assembled to form limbs and segments, and finally a set of geometric constraints is used to determine whether a human figure is present. The method of Yang et al. [2] divided an image into 4x4 sub-regions and used a region-growing algorithm to extract the regions of interest (ROIs). In the ROIs, a set of points is selected and refined to construct the contour of a body trunk. Thirty four features are extracted from the processing result and the nearest center classifier is employed to determine whether it is an adult image.

Jones et al. [3] built a skin color model based on manually labeled skin pixels existing in 4,675 images and a non-skin color model based on 8,965 non-skin images. With the two models, the Bayes classifier is used to detect skin pixels by calculating the skin likelihood ratio of each pixel. After the skin detection process, seven features are extracted and input into a neural network to classify the image. Bosson et al. [4] also used the skin likelihood ratio to select skin pixels and took the skin pixels as seeds for a region-growing algorithm. Eleven features are tested to investigate their effect on identifying adult images. They also test four different classification techniques and the experimental results show that the multi-layer perceptron (MLP) neural network and the k-nearest neighbor classifier perform better than the generalized linear model and the support vector machine. Lee et al. [5] used a learning-based chromatic distribution-matching scheme and a coarseness feature to segment skin regions. The Adaboost method is employed to classify the image as adult or benign according to the features extracted from the skin regions. A similar approach is also used in the work of Zheng et al. [6].

Shih et al. [7] and Wang et al. [8] used similar schemes to identify adult images based on image retrieval techniques. First, an image database containing both adult and benign images is pre-organized. Then for an input unknown image, similar images are retrieved from the image database. If the number of adult images exceeds a certain ratio in the retrieved images, the input image is recognized as an adult image. The system of Zheng et al. [9] used a maximum entropy model to detect skin. Nine features are extracted based on the ellipse fitting on the skin regions and a MLP neural network is employed as the classifier. The work done by Kim et al. [10] utilized MPEG-7 descriptors and a neural network classifier to recognize adult images.

All of the above methods focus on dealing with single images. In fact, adult images usually exist as a group but rarely stand alone, for example in web sites, computer file folders, or email attachments. Hence, an alternative way of recognizing adult images at a certain location is treating them as a whole but not considering them separately. In this paper, we present a method of recognizing adult image groups, which aims at achieving optimal recognition rates for the entire image group but not for individual images.

(3)

a new method is proposed. Section 3 describes our approach to discovering optimal parameter settings to achieve the best recognition accuracy for image groups under different conditions. The experimental result includes comparison with the earlier methods. Finally, conclusion and discussion are stated in Section 4.

2. Recognizing Adult Image Groups based on an Adult Image

Recognizer

Image group classification was discussed in some papers related to pornographic web site detection. Hu et al. [11] calculated the probability that the images collected from a web sit (called an image group) were adult images by using a conditional probability formula. Suppose that the image group contains n images and that n1

images are classified as adult and n2 images are classified as benign after the image

recognition step. The above event is defined as r = (n1, n2) and its conditional

probability Prob(r | s), where s means the image group is adult, can be calculated by

2 1 1 1) 1 ( ) | (r s e n e n Prob   ₍₁₎

where e1 is the probability that an adult image is mistaken for benign. In a similar

manner, the conditional probability Prob(r | s) can be calculated by

2 1₍₁ ₎ ) | (r s e₂n e₂ n Prob    (2)

where e2 is the probability that a benign image is mistaken for adult and s means the

image group is benign. According to the Bayes rule

) ( ) ( ) | ( ) | ( r Prob s Prob s r Prob r s Prob  (3) ) ( ) ( ) | ( ) | ( r Prob s Prob s r Prob r s Prob     (4)

the decision factor f can be calculated as below

) ( ) ( ) 1 ( ) 1 ( ) | ( ) | ( 2 1 2 1 2 2 1 1 s Prob s Prob e e e e r s Prob r s Prob f n n n n       (5)

If f is larger than 1, the image group is considered to be adult; otherwise, it is considered to be benign. In this method, the recall rate and false detection rate of adult image groups depend on e1 and e2. If we change the parameter setting in the adult

image recognizer, e1 and e2 values will also change, and the recall rate and false

(4)

To involve the relationship between the recognition accuracy and the parameter setting, the recognition rate Rs of an adult image recognizer can be expressed by a set

of triple quantities

Rs = (ps,i, qs,i, Ti) (6)

where ps,i and qs,i are the adult-image recall rate and the benign-image recall rate,

respectively, accomplished by parameter setting Ti, i = 1, 2, 3, and so on. The

definition of Ti is dependent on the implementation of the adult image recognizer. For

instance, Ti can be a threshold value used in the final decision rule or it can be a

threshold value used to check the output of the neural network classifier.

For an image group containing n images, we apply the adult image recognizer to scanning each image in the image group and use the following criterion to do the classification:

if the number of images classified as adult is more than or equal to m, m  n, the image group is classified as an adult image group; otherwise, it is classified as a benign image group.

Although the above criterion has only one threshold value m, the parameter setting T of the adult image recognizer is an implicit factor, which can influence the classification of the image group, because the number of the detected adult images is dependent on T. Therefore, the recognition rate Rg of adult image groups can be

expressed by a set of quadruple quantities

Rg = (pg,i, qg,i, Ti, mi) (7)

where pg,i and qg,i are the adult-image-group recall rate and benign-image-group recall

rate accomplished by the adult image recognizer with parameter setting Ti; mi is the

threshold value used in the aforementioned criterion. According to the above discussion, pg,i and qg,i will change if we modify the parameter setting Ti of the adult

image recognizer. The mi value can also influence pg,i and qg,i. Therefore, the

following question should be considered:

how to find the best combination of T and m that can achieve optimal adult-image-group recall rate pg and benign-image-group recall rate qg under a

certain n.

The question can be answered by conducting experiments. We may create a class of adult image groups and a class of benign image groups, and apply the adult image recognizer to scanning all the image groups with different parameter setting T and m values. The combinations of T and m that can produce the highest recognition rates are stored into a table together with the corresponding pg and qg for lookup. The

details of the experiment are described in the following section.

3. Experimental Results

In this section, we demonstrate how to obtain the best combinations of T and m under the case of n = 5. The result is also compared with the methods of Hu et al. and Arentz et al. First, we prepare two classes of image groups Cadult, 5 and Cbenign, 5. The

class Cadult, 5 consists of 1,000 image groups and every image group contains 5 adult

(5)

adult images. The class Cbenign, 5 also consists of 1,000 image groups, each of which

contains 5 benign images selected arbitrarily from a benign image database containing 2,090 benign images.

We apply an adult image recognizer to scanning Cadult, 5 and Cbenign, 5 to detect

adult image groups using the criterion proposed in section 2. This scanning process is executed repeatedly with different combinations of T and m values. Our adult image recognizer is developed based on a MLP neural network, and the features used include percentage of pixels detected as skin, percentage of pixels in the largest skin region, number of skin regions, number of colors in the image, and average skin score of the skin pixels [3, 4]. Our MLP neural network consists of four layers; the input layer contains five nodes, the second and the third layers both contain thirty nodes, and the output layer contains two nodes. The output values of the two output nodes are averaged to obtain the final output of the neural network.Fig. 1 shows the ROC curve of our adult image recognizer. The parameter setting T corresponds to the threshold value used to examine the output of the MLP neural network. In the experiment, we set T = 0.05 initially and increase it by 0.05 each time until T = 0.95. The m value starts with 1 and increases by 1 until m = n. For a certain m value, the various T values will result in different recognition rates after scanning Cadult, 5 and Cbenign, 5. These

recognition rates can be plotted as a ROC curve, denoted ROCm; that is,

ROCm = {(pg,i, 1 - qg,i, Ti, mi) | Ti {0.05, 0.1, 0.15, …, 0.95}, mi = m} (8)

where 1 - qg,i is used instead of qg,i because the ROC curve is plotted based on recall

rates vs. false detection rates. Fig. 2 shows the ROCm curves according to the

experiment.

(6)

Fig. 2. ROCm curves of adult-image-group recognition (n = 5).

(7)

Table 1. Parameter settings T and m and their corresponding recognition rates pg and

qg in the ROCopt in Fig. 3.

The ROCm curves in Fig. 2 exhibit that different settings of T and m will

influence the adult-image-group recognition rates considerably. Based on these ROCm

curves, we can identify what combinations of T and m can produce the best recognition rates. For instance, under the same false detection rate, as the dash line shown in Fig. 2, we should select the T and m that can produce the highest recall rate, i.e., selecting point A but not point B in Fig. 2. Based on this criterion, we can construct an optimal ROC curve, denoted ROCopt, from ROCm, m = 1, 2, …, 5, as

below

ROCopt =₀max_ _₁{ROCm |m1,2,...,5}

q_g . (9)

Fig. 3 shows the ROCopt derived from the ROCm in Fig. 2. The original recognition

rate data in ROCm are not continuous. An interpolation algorithm is used to produce

more compact points for construction of ROCopt. Our system can automatically collect

the T and m in ROCopt and record them into a table together with the associated

recognition rates pg and qg, as the form of (7). Table 1 shows the output table after

processing the ROCm in Fig. 2 (not all entries are displayed due to the space

limitation). The entry (0.93, 0.98, 0.60, 4) means that 93% of adult image groups and 98% of benign image groups can be correctly recognized by using parameter setting T = 0.60 and m = 4. Given an unknown image group containing 5 images, we can first specify the desired adult-image-group recall rate and then look into the table to find the suitable T and m values for setting the adult image recognizer and the decision rule. Then, the unknown image group is scanned and classified accordingly. The above discussion is based on n = 5. The cases of other different n values can be accomplished in a similar manner. Fig. 4 shows the ROCm for n = 3 and the derived

ROCopt. Fig. 5 shows the ROCm for n = 7 and the derived ROCopt.

To compare our method with Hu’s and Arentz’s methods, another testing database is organized, which consists of 1,000 adult image groups (each contains 5 adult images) and 1,000 benign image groups (each contains 5 benign images). It should be noted that Hu’s, Arentz’s and our methods can be applied with any adult image recognizer. Our method needs to reconstruct the lookup table for different image recognizers and Hu’s method needs the prior knowledge of the recognition rates of the image recognize. Although our adult image recognizer is used for testing the three methods, it does not mean that our method has any advantage over the other two methods.

(8)

(a)

(b)

Fig. 4. (a) ROCm of adult-image-group recognition for n = 3; (b) ROCopt derived from

(9)

(a)

(b)

Fig. 5. (a) ROCm of adult-image-group recognition for n = 7; (b) ROCopt derived from

(10)

Fig. 6. ROC of adult-image-group recognition for Hu’s, Arentz’s, and our methods under n = 5.

shown in Fig. 6. The recognition rates of Hu’s method scatter irregularly under the ROC of our method. The locus of the recognition rates of Arentz’s method is more curve-like and still under the ROC of our method. The test result shows that our method can attain higher recognition accuracy than the other two methods. Other tests for different n values have also been made and our method still performs better than Hu’s and Arentz’s methods. Fig. 7 illustrates the cases for n = 3 and 7. By comparing Figs. 6 and 7, we can find that recognition accuracy improves as the n value increases. This is reasonable because we may suppose that every image provides a piece of information that can help the classification. As the image number increases, the amount of information also increases and more information can help to make correct classification. Besides, our method can produce more smooth ROC curves than Hu’s and Arentz’s methods, and this advantage makes the adjustment of the system performance easier.

4. Conclusion and Discussion

(11)

(a)

(b)

Fig. 7. ROC of adult-image-group recognition for Hu’s, Arentz’s, and our methods: (a)

(12)

lookup table, we can specify the desired recall rate or false detection rate and the system will perform accordingly. Earlier methods can not allow such dynamic adaptation. Experimental result also demonstrates that this method can attain higher recognition accuracy than earlier methods. One disadvantage of the proposed method is that producing the lookup table may take a large amount of time because we have to apply the adult image recognizer to scanning the testing database many times with various T and m. According to our experiment, yielding the ROCm curves in Fig. 2

takes about three hours on a P4/3.0G computer. If we want to construct a lookup table including n = 2, 3, …, 10, it will take over fifty hours (the larger the n value, the more time it will take). To overcome this problem, we are developing a probability procedure to calculate the lookup table based on the recognition rates of the adult image recognizer. With this procedure, the lookup table could be produced in a few minutes.

References

[1] D. A. Forsyth and M. M. Fleck, “Automatic detection of human nudes,” IJCS, vol. 32, no. 1, pp. 63-77, 1999.

[2] J. Yang, Z. Fu, T. Tan, and W. Hu, “A novel approach to detecting adult images,”

Proc. 17th Int. Conf. Pattern Recognition, pp. 479-482, 2004.

[3] M. J. Jones and J. M. Rehg, “Statistical color models with application to skin detection,” Int. J. of Computer Vision, vol. 46, no. 1, pp. 81-96, 2002.

[4] A. Bosson, G. C. Cawley, Y. Chan, and R. Harvey, “Non-retrieval: blocking pornographic images,” Proc. Int. Conf. Challenge of Image and Video Retrieval, pp. 50-60, 2002.

[5] J. S. Lee, Y. M. Kuo, P. C. Chung, and E. L. Chen, “Naked image detection based on adaptive and extensible skin color model,” Pattern Recognition, vol. 40, pp. 2261-2270, 2007.

[6] Q.-F. Zheng, W. Zeng, G. Wen, and W.-Q. Wang, “Shape-based adult image detection,” Proc. Int. Conf. Image and Graphics, pp. 150-153, 2004.

[7] J.-L. Shih, C.-H. Lee, and C.-S. Yang, “An adult image identification system employing image retrieval technique,” Pattern Recognition Letters, vol. 28, pp. 2367-2374, 2007.

[8] J. Z. Wang, G. Wiederhold, and O. Firschein “System for screening objectionable images using Daubechies’ wavelets and color histograms,” Proc. Int. Workshop

Interactive distributed Multimedia Systems and Telecommunication Services, pp.

20-30, 1997.

[9] H. Zheng, M. Daoudi, and B. Jedynak, “Blocking adult images based on statistical skin detection,” Electronic Letters on Computer Vision and Image

Analysis, vol. 4, no. 2, pp.1-14, 2004.

[10] W. Kim, H.-K. Lee, J. Park, and L. Yoon, “Multi class adult image classification using neural networks,” Conf. Computational Studies of Intelligence, pp. 222-226, 2005.

(13)

[12] W. A. Arentz and B. Olstad, “Classifying offensive sites based on image content,”