• 沒有找到結果。

in region-based image retrieval. A literature review is provided in Chapter 2 to show

N/A
N/A
Protected

Academic year: 2021

Share "in region-based image retrieval. A literature review is provided in Chapter 2 to show "

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

81

8. Conclusion and Future Work

8.1. Conclusion

This thesis proposes a framework for region-based image retrieval. Region- based image retrieval, regarding it as a special type of content-based image retrieval, fulfills the image understanding by use of the region-based representation such that the retrieval task could be more accurate. Generally speaking, a good region-based image retrieval system needs to involve three major issues – (i) how to represent images according to the segmented regions and extracted visual features, (ii) how to compare and match images based on the image representation, and (iii) how to interactively estimate the user intention according to the user feedbacks. Specially, the problem of semantic gap, between low-level visual features of images and high-level concepts in the human intention, is one of the most challenging issues in region-based image retrieval. Many researchers have paid much attention to different issues of region-base image retrieval. Unfortunately, it is still far from the success for the problem.

Our works aims to handle the problem of semantic gap through the three issues

in region-based image retrieval. A literature review is provided in Chapter 2 to show

the state-of-the-art approaches of these issues. In the beginning, we propose the color-

size features, in Chapter 3, which integrate both color and region-size information for

images. Two types of region-based image representation have been presented – using

visual-word-based image features and using semantic-based image features in Chapter

4 and 5, respectively. For the former, the visual-word-based image feature is built

(2)

82

according to the low-level features, and it could be categorized into middle-level information of images. On the other hand, the semantic-based image feature, for the latter, is generated by use of the results of image annotation Moreover, we propose an interactively approach, in Chapter 7, to estimating the user intention according to positive examples in relevance feedbacks, and then fulfill the image matching and ranking by use of the similarity measure between two images.

In this thesis, we try to solve the problem of semantic gap in the following two ways. The first is to construct a scheme for region-based image representation. On one hand we design the visual-word-based image feature for providing the representing units for the user intention in the visual feature space, and on the other the semantic-based image feature could discover the semantic contents by use of the image annotation. The second is to estimate what the user requests the query involves in a query session. We design an interactive approach to estimate the user intention using the previous two types of image representation.

8.2. Future Work

In this thesis, our proposed framework for region-based image retrieval can be used to applications of image retrieval. For example, TRECVID and CLEF are two famous contests for image indexing and retrieval. TRECVID aims to promote progress in content-based retrieval from digital video via open, metrics-based evaluation [TREC], and we are planning to attend TRECVID in the next year. Cross- Language Evaluation Forum (CLEF) offers a series of evaluation tracks to test different aspects of cross-language information retrieval system development [CLEF].

Besides, in our work, the semantic gap in image retrieval may be reduced, but not

(3)

83

fully eliminated. Several issues could be extended in the future and we discuss them as the follows.

The most straightforward method to bridge the semantic gap is to extract the semantic features for image contents, and image annotation aims at that. Hence, we can design a more accurate method for image annotation to extract the semantic contents in images. Moreover, our proposed method for image annotation is scalable such that we could design an interactive approach, like relevance feedbacks in image retrieval, to refine the annotation.

Regarding the image retrieval, relevance feedback plays an important role to estimate the user intention. Our proposed approach covers both likelihood and confusion measures at the same time, but we do not consider the negative examples in user feedbacks. Negative feedbacks cannot have consistent contents semantically – because they are irrelevant with the user requests in a query session. But we can apply negative examples to filter out some irrelevant images that are somehow similar with the user requests.

Many researchers have paid their attention to bridge the semantic gap, but we do not find any works that can fully solve this problem. The main reason is the variance of semantic: different users perceive different concepts for the same target. Hence, a format for knowledge representation should be added in the solution. If the retrieval task limits in an application domain, the prior knowledge should help the image analysis and understanding, e.g, ontology is active to the image retrieval [Jin et al. 04]

[Srikanth et al. 05]. To build the user models, in long term, that describes the user

intention is also potential but difficult.

參考文獻

相關文件

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

(It is also acceptable to have either just an image region or just a text region.) The layout and ordering of the slides is specified in a language called SMIL.. SMIL is covered in

* All rights reserved, Tei-Wei Kuo, National Taiwan University, 2005..

For your reference, the following shows an alternative proof that is based on a combinatorial method... For each x ∈ S, we show that x contributes the same count to each side of

 The stereo matching techniques developed in the computer vision community along with ima ge-based rendering (view interpolation) tech niques from graphics are both essential

To convert a string containing floating-point digits to its floating-point value, use the static parseDouble method of the Double class..

In this thesis, we propose a novel image-based facial expression recognition method called “expression transition” to identify six kinds of facial expressions (anger, fear,

In this chapter, the results for each research question based on the data analysis were presented and discussed, including (a) the selection criteria on evaluating