An information fusion approach to integrate image annotation and text mining methods for geographic knowledge discovery

(1)

An information fusion approach to integrate image annotation and text

mining methods for geographic knowledge discovery

Chung-Hong Lee

⇑

, Shih-Hao Wang

Department of Electrical Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan

a r t i c l e

i n f o

Keywords:

Geographic knowledge discovery Image annotation

Text mining Information fusion Machine learning

a b s t r a c t

Due to the steady increase in the number of heterogeneous types of location information on the internet, it is hard to organize a complete overview of the geospatial information for the tasks of knowledge acqui-sition related to specific geographic locations. The text- and photo-types of geographical dataset contain numerous location data, such as location-based tourism information, therefore defining high dimensional spaces of attributes that are highly correlated. In this work, we utilized text- and photo-types of location information with a novel approach of information fusion that exploits effective image annotation and location based text-mining approaches to enhance identification of geographic location and spatial cog-nition. In this paper, we describe our feature extraction methods to annotating images, and utilizing text mining approach to analyze images and texts simultaneously, in order to carry out geospatial text mining and image classification tasks. Subsequently, photo-images and textual documents are projected to a uni-fied feature space, in order to generate a co-constructed semantic space for information fusion. Also, we employed text mining approaches to classify documents into various categories based upon their geospa-tial features, with the aims to discovering relationships between documents and geographical zones. The experimental results show that the proposed method can effectively enhance the tasks of location based knowledge discovery.

1. Introduction

1.1. Motivation

The content with location information available on the web is now becoming an important resource for location search and dis-covering more knowledge about some land’s history and story. Mining the people-centric location information promises new per-sonalized information services, including local news summarized from messages of nearby microblog (e.g. Twitter) users, the target-ing of regional advertisements, spreadtarget-ing business information to local customers, and novel location-based applications. Also, popu-lar web services such as Flickr for image collection, YouTube for vi-deo contribution, and numerous multimedia blogs offer fruitful photo- and text-based geospatial information for users to quickly ﬁnd geographic location information. According to recent studies, it has been found that lots of user queries express a geographic information need on the internet, and an estimate of up to 1/5 of all web pages contain location references (Himmelstein, 2005;

Kamvar & Baluja, 2006; McCurley, 2001; Sanderson & Kohler, 2004). For example, the collection of geo-referenced (‘‘geotagged’’) images on Flickr: images (photos) whose precise location was automatically captured by some camera or a location-aware device. Such multimedia content with location information can enhance and enrich the study of specific geographical activities for business operations and field survey. However, in reality there are still lots of multimedia location (e.g. landmark photos) infor-mation without making clear geo-reference or spatial mapping work. As a result, in this work we developed a method to identify collected images (photos) and texts related to specific geographic locations for geospatial mapping and knowledge discovery. An example illustrating the relationship of geographic documents (including images and texts) associated with specific locations is shown inFig. 1.

1.2. Research objectives

It is believed that, rich information about locations and land-marks can be learned automatically to organize a location-based knowledge resource by exploiting the geographic features of col-lected multimedia contents available on the web. The application scenario for the attempted solution in this work is a location-based

⇑Corresponding author.

E-mail addresses:[email protected](C.-H. Lee),[email protected]. edu.tw(S.-H. Wang).

Expert Systems with Applications 39 (2012) 8954–8967

Contents lists available atSciVerse ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a

(2)

information system for mobile or pedestrian users. We aim to identify location references in the forms of texts and images at a fine granularity level of individual buildings or tourist attraction that is directly applicable to a mobile user or retrieval and analysis tasks at this geographical granularity. Most of the location infor-mation available was represented in type of texts, photos (images), and videos. In this work, we only concentrate on the fusion of texts and photos (images) for geospatial mapping and knowledge dis-covery tasks. However, extracting geographic information from images and texts is a challenging work. The major difficulties encountered are that the geographic objects, unrelated characters, and cluttered background may occur in the image simultaneously. In addition, the illumination of images, image clarity, and the an-gles of geographic objects in the image may also influence the per-formance of extraction geographic information from time to time. Also, semantics in the geospatial documents (including texts and images) for extraction of geographic features are often vague, dif-ficult to be identified.

To solve the aforementioned issues, in this work we have devel-oped a combined method to deal with the difﬁculties in fusing dif-ferent multimedia types (i.e. texts and images) of geospatial information. Our proposed framework mainly consists of three modules, namely, image annotation, text processing, and information fusion. The image annotation module computes an extraction of representative geographic features from collected photo images. The bag-of-keypoints methods are used to construct a visual vocabulary and generate feature vectors to support image classiﬁ-cation and annotation. Based on a standard practice in information retrieval (IR) for text preprocessing and Vector Space Model (VSM) operations, text processing module computes an extraction of geo-graphic features from collected geospatial documents. Information fusion module is mainly used to integrate the extracted features of geospatial images and texts. Our experimental results have dem-onstrated the effectiveness of the proposed approach.

The remainder of the paper is organized as follows: Section2 discusses related research work and techniques. Section3presents our approach and system framework. Section4describes the pro-posed image annotation method. Section5presents the proposed fusion approach for dealing with geographic texts and photo-images. Section6shows our experimental results, and ﬁnally we conclude this paper in Section7.

2. Related work

Combining the geospatial features from image attributes and texts associated with specific locations reveals interesting properties of location knowledge and serves as a powerful way of discovering geospatial knowledge. However, to-date little research work has been reported about the development of hybrid methods on mining the combination of the photos and texts for geospatial knowledge discovery. In our work there are several application fields related to the system development, including image feature detection/extraction, image annotation/classification, geospatial/geographic data mining, and geographic information retrieval, etc.

In our research work, we used an image detection approach so-called SIFT to extract scale-invariant feature. In our previous project, we have implemented an annotation method using local scale-invariant features and bag-of-keypoints techniques to per-form geographic image annotation (Lee, Yang, & Wang, 2010a, 2011). The images were processed to produce keypoints by mainly utilizing Scale-invariant Feature Transform (SIFT) technique. SIFT (Lowe, 1999) was developed for image feature generation for con-structing feature vectors which are invariant to image translation, scaling, rotation, and illumination change. For this approach, robust object recognition can be reached in cluttered partially-occluded images. Extracted features are utilized for solving the problem of recognizes images in various viewpoints, contrasts, and luminance.

For image feature detection, in this work we used a concept so-called bag-of-keypoints (Csurka, Dance, Fan, Willamowski, & Bray, 2004) to re-construct image feature by a simply ﬂow which is similar steps of preprocess in text mining approach. General classiﬁcation approaches are able to utilize these re-constructed feature vectors for performing the image annotation work. The above mentioned concept makes it easy for performing image annotation approach and enhanced the annotated precision. After the image annotation approach, images of scenic spots are expressed using corresponded geographic nouns as the image source, to discover multimedia data according to their implicit geographic information. In order to extract geographic information in various data types, we generated a co-constructed semantic space to project image and documents in the same feature space Fig. 1. An example of a geospatial mapping of landmark introductory documents (including images and texts).

(3)

simultaneously, and then we used the text mining approach to ﬁnd relatedness between multimedia data and locations.

In the direction of Geospatial/Geographic Data Mining, most of researches focused on discovering geographic knowledge. Uryup-ina (2003)presented an approach to discover geographical gazette-ers automatically from internet. By utilizing bootstrapping techniques, new gazetteers are learned starting from a small set of preclassiﬁed instances. This presented approach is helpful for the Named Entity Recognition task in language.Ding, Stepinski, Parmar, Jiang, and Eick (2009)proposed a supervised clustering ap-proach for discovering feature-based hot spots. Such a method is relied on supervised clustering to produce a list of hot spots re-gions. By employing a ﬁtness function, the dataset is subdivided optimally, and try to rank using the interestingness of clusters. The relationship between hot spots and top ranked clusters is high-er.Goldberg, Wilson, and Knoblock (2009)described a methodol-ogy by generating highly complete and detailed regional gazetteers from Internet sources automatically for solving the problem of gazetteers are not complete and measures of their accuracy. By utilizing information extraction and integration tech-niques, geographic features, associated footprints, and widely available online data are obtained, and then such data can be used to create a gazetteer for nearly any area.McCurley (2001) investi-gated several various approaches to discovering geographic con-text for web pages, and a navigational tool is described for web browsing by geographic proximity. Ourioupina (2002)described an algorithm for knowledge extraction in the geographic domain. In order to classify places into different location types and deter-mine for a give place name, text mining approach is applied to the internet. As results of experiments, such an approach is able to create gazetteers for Named Entities Recognition tools automat-ically.Zong, Wu, Sun, Lim, and Goh (2005)gave spatial semantics to web pages by assigning place names. This assignment task is di-vided into three parts: place name extraction, place name disam-biguation, and place name assignment. And this approach works well for geo/geo ambiguities.Doerr and Papagelis (2007)presented a statistical model to solve the problem of missing information in the gazetteer, multiple matches, or false positive matches for inte-grating place names with actual locations. The model was based on statistical analysis of the place names mapping process and with-out any other background data. Such an approach has been applied to a real-world case study.

In terms of image-text information fusion, most of previous re-search work focused on image-text fusion are carried out by utiliz-ing the concept of unsupervised approach (Jiang & Tao, 2006, 2009; Nguyen, Woon, & Tan, 2008; Xing, Yan, & Hauptmann, 2005). Such methods generally fused images and documents in the same data source. In contrast, in this work most of our kernel approaches are focusing on the supervised learning approaches to deal with collected geospatial datasets.

In summary, the research work in the domain of geospatial knowledge discovery have mostly focused on analyzing data col-lected from GPS devices, satellite, and geographic information sys-tems. Unfortunately, little attention has paid on developing methods for extracting implicit geographic information from exist-ing multimedia data sets to enhance geospatial knowledge discov-ery. The view is taken, therefore in this work, we propose an approach to discover relatedness between multimedia data sets (i.e. photo-images and texts) and geographic locations for analyz-ing implicit geospatial relationships among various types of multi-media data sets. Our solution combined techniques of image annotation, text processing, information fusion, and text mining. A key issue of utilizing text mining approaches to learn the associa-tions of multimedia information sources and geospatial locaassocia-tions is the lack of large training data sets. Learning from small training data sets poses the new challenge of handling implicit

relation-ships. In this work, we overcome the above difﬁculties and we found that our method is fairly reliable, and able to consistently of-fer a satisfactory performance to enhance geospatial knowledge discovery.

3. Approach and system framework

As mentioned above, there are three main components in our system, including image annotation, information fusion, and location estimation. The system framework is illustrated inFig. 2.

3.1. System overview

As shown inFig. 2, we ﬁrst collected photo-images and text doc-uments related to various geographic locations, and then start to annotate images by using geographic nouns. Unfortunately, due to photo-images of speciﬁc geographic spots may be taken from various shooting angles and with different brightness, which often lead to a problem of incorrect geographic information extracted. To solve this issue, we employed the concept of bag-of-keypoints to extract local scale-invariant features and used such techniques to discover correct image object as our geographic information.

For computing cross-media (i.e., photo-images and texts) data, subsequently we converted images into textual documents by using geographic nouns, and utilize a co-constructed semantic space for vector computation. The concept of created co-con-structed semantic space will be described in details later. In fact, most of geographic documents often contained vague parts of im-age semantics. It is hard to analyze geographic information implicit in documents. In order to solve this problem, we used clustering approaches for filtering documents to ensure the quality of se-lected documents are qualified to be acted as our training samples, for further classification experiments of multimedia data according to their geographic information.

3.2. Dataset collection

The system development started with a collection of the geo-spatial text documents and photo-images related to some scenic spots for computation of information fusing and geospatial knowl-edge discovery. Then we employed a classic text retrieval method, the vector space model (VSM), to represent text documents and images in the form of vector. Each collected multimedia data in-cludes several geospatially related proper nouns. For our experi-ments, geographical nouns and proper nouns are collected to construct a lexicon, including geospatial named entities such as Taipei 101, Love River, Palace Museum, etc. After collecting geo-graphic multimedia data sets, geogeo-graphic nouns are extracted to establish the lexicon, in order to provide the source for annotation images and constructing the semantic space. Furthermore, col-lected images and documents are categorized to various locations by performing our approach.

3.3. Generating a co-constructed semantic vector space

A problem associated with the computing of cross-media (i.e., visual-to-text) data vectors is the in compatibility of joined data types. For solving such an issue, we investigate an intermediate layer of semantic vector spaces, so-called co-constructed semantic vector space, in which the text and image data vectors in the textual feature space and the visterm space are represented in a uniﬁed form in a common space for further computation. Meanwhile, a transformation from the visterm space to the textual space is re-quired. Hence, after performing the image annotation task, we con-verted the annotated images to textual documents using 8956 C.-H. Lee, S.-H. Wang / Expert Systems with Applications 39 (2012) 8954–8967

(4)

geographic nouns. After that, we utilize the co-constructed seman-tic vector space, and project geographic documents and images in the same semantic space simultaneously. A classic text retrieval method, the vector space model (VSM), is employed to represent text documents and images in the form of vector. Each collected multimedia data includes several geospatially related proper nouns. Then, we employed the Latent Semantic Indexing (LSI) method to construct the semantic space with reduced dimensions, and utilized the matrix for document-to-zone mapping. The pro-cess is being described in next section in details.

3.4. Identifying associations of datasets and geographic locations

One of the kernel goals of this work is to study the relation be-tween location and content in a number of photo and text collec-tions. The method of using a semantic space to support the fusion of photo-image and text information sources is described in the previous section. In this section, we address our approach to discover the relation between collected datasets and geographic locations. In this process, there are two main components in our system, including document-to-zone mapping module, and fram-ing maximize zones algorithm. In order to provide precise trainfram-ing samples for classifying multimedia data, we ﬁrstly employed clus-tering approach to extract clear semantic documents.

After categorizing documents according to their geospatial fea-tures, classified documents can be divided into two parts, namely the correct-classified documents and incorrect-classified docu-ments. The classified document is categorized to geographic loca-tions based on their geospatial features. This means that each geospatial document is mapped to a corresponding location. There-fore, the resulting correct-classified documents are mapped into proper geographic location, and the incorrect-classified documents are mapped into other geographic location. Therefore, we em-ployed Support Vector Machines (SVM) (Boser, Guyon, & Vapnik, 1992; Cortes & Vapnik, 1995) to frame decision boundaries (Gunn, 1998; Sebald & Bucklew, 2000) in the feature vector of test docu-ments, and the boundaries will be used in our work as maximize zones, in order to describe relationships between data sets and

locations. The details of the developed image-annotation approach, and the techniques about fusion of geographic texts and photos for location estimation are given in Sections4 and 5.

4. Proposed image annotation method

4.1. Annotation of photo-images about speciﬁc scenic spots

As shown inFig. 3, we ﬁrst collected photo-images related to various geographic locations, and then start to extract geographic information from the collected images. In this paper, scenic spot mean some geographic location which is a popular place or land-mark. However, geographic spots in photo-images may have vari-ous shooting angles and brightness, which often leads to extract incorrect image semantics. At the same time, most of image anno-tation approach is performed by the means of image segmenanno-tation. It is hard to integrate images and text documents which contain similar semantics. To solve this issue, we employed the bag-of-key-points approach to extract local scale-invariant features and used such them to discover correct geographic image objects. Some of the working results have been implemented and reported in our previous work (Lee et al., 2010a, 2010b, 2011).

The concept of bag-of-keypoints (bok) (Csurka et al., 2004) is a vocabulary of keypoints extracted from images. As the presenta-tion of documents, the bok follows dicpresenta-tionary-based approach, and each document is regarded as a set of keywords. It means each document contained some proper nouns in the dictionary. In this study, the steps of system implementation are addressed as follows:

1. Feature detection and description of geographic images. 2. Extracting similar keypoints and constructing a visual dictionary

by the concept of bag-of-keypoints.

3. Generating geographic image feature vectors and annotating images as a source for geospatial text mining.

For geographic features detection and description of images, the Difference of Gaussian (DoG) and Scale-invariant Feature Fig. 2. System framework.

(5)

Transform (SIFT) (Lowe, 1999, 2004) methods were employed to extract and describe geographic information. They are described as follows.

4.1.1. Feature detection and description of geographic images Local scale-invariant features are widely used in the applica-tions of object recognition and image registration. As afﬁne-invari-ant keypoints, such features are stable under lighting and viewpoint changes, and enable match keypoints between various images accurately. In this work, some image features are used to represent keypoints of the geographic object and as starting points for performing computer vision algorithms.

In the stage of extracting geographic image features, an extrac-tion approach so-called Difference of Gaussian (DoG) is employed. This approach tries to reverse the part of high-frequent features and extract local scale-invariant features. There are two steps in this approach, (i) blurring an image using a Gaussian kernel func-tion, as shown in Eq.(1)(Lowe, 2004). (ii) Subtracting an image from two blurred images which lie between the ranges of frequen-cies. By using two Gaussian curves, local scale-invariant features are discovered as geographic image features:

Dðx; y;

r

Þ ¼ ðGðx; y; k

r

Þ Gðx; y;

r

ÞÞ Iðx; yÞ

¼ Lðx; y; k

r

Þ Lðx; y;

r

Þ ð1Þ where L(x, y,

r) is the scale space of an image, which is constructed

by image I(x, y) with Gaussian kernel function G(x, y,

r), and

D(x, y,

r) is the smoothed image with the scale space k and

smooth-ing

r.

After completing feature extraction, we employed a method so-called Scale-invariant Feature Transform (SIFT) (Lowe, 1999, 2004) to describe the features. SIFT descriptors are multi-image represen-tation of an image neighborhood. By computing Gaussian deriva-tives at 8 orientation planes over a 4 4 grid of spatial locations, a 128-dimension vector is generated as our geographic image keypoints.

4.1.2. Constructing a visual vocabulary and generating feature vectors Once the geographic images extraction and description have been done, we started constructing a visual vocabulary and

repre-sented image features in the form of feature vectors. As mentioned above, geographic image keypoints are described by SIFT method. In fact, keypoints generated by using SIFT technique are normally presented in a numeric form, and so it is hard to ﬁnd similar key-points for performing image annotation. Rather, similar keykey-points can be aggregated together by clustering techniques. As a result, in this work we utilized the Afﬁnity Propagation (AP) (Frey & Due-ck, 2007) method to clustering geographic image features, in order for grouping keypoints in all images with a large number of clus-ters. As shown inFig. 4, by employing AP clustering approach, sim-ilar keypoints are aggregated in the same cluster, and cluster centers are used to construct a visual vocabulary. In each image, the amounts of keypoints in each cluster are calculated, and uti-lized them to generate feature vectors.

4.1.3. Image annotation by geographic nouns

In this step, we annotated geographic images by various geographic nouns. Once the feature vectors have been produced, we divided these images into two parts: labeled images and unla-beled images. In order to annotate geographic images in multiple classes, a classification technique so-called Fuzzy ARTMAP (FAM) (Carpenter, Grossberg, Markuzon, Reynolds, & Rosen, 1992) was used. By training labeled images, each unlabeled image is classified and labeled by geographic nouns. Correct-classified images are sorted out as the geographic image source for applications of geospatial data/text mining.

5. Feature fusion of geographic texts and photo-images for location estimation

5.1. Fusing visual and textual features from content of photos and texts

We next turn to the task of estimating which locations the col-lected photos and geographic texts are related based on the pro-cessed visual features and textual content. As both images and texts co-exists in the corpus, feature vectors representing the semantics of image content and textual feature vectors should be able to calculate with each other. For solving the incompatible problem of the semantics in content of the visual and textual features, once completing the image annotation approach, we Fig. 3. The framework of image annotation of scenic spots.

(6)

transformed the annotated image into a text-based document rep-resented by selected geographic nouns for such an information fu-sion. In addition, as mentioned previously, a common semantic vector space is particularly established to project geographic text

documents and images in the same semantic space simultaneously for vector computation and text mining tasks. As such, we over-come the challenging issues in visual–textual transformation in our problem domain.

Fig. 4. The illustration of constructing the visual vocabulary.

(7)

fact that there is a considerable geographic documents and mul-timedia data without any geo-tag showing their geolocation information being generated on the internet each day, our method is no doubt a more realistic solution in the application domain.

In our experiments, appropriate cluster number help ﬁnd more useful features in the visual vocabulary, and also enable FAM process to achieve excellent overall performance in the image annotation stage. However, clustering with a large cluster num-ber may require longer time for execution. Meanwhile it is not easy to determine an optimum cluster number to get a good balance between accuracy in image annotation and time cost for construction of visual vocabulary. By continuously modeling different snapshots of the data and tuning our experiments in the model over time, we have veriﬁed the impact of various cluster numbers and parameters to the experimental results. In our approach, geographic nouns are extracted from

docu-ments. In fact, geographic locations are largely represented by several proper nouns. For example, the terms ‘Massachusetts Institute of Technology’ and ‘MIT’ are used to describe the same organization place. This may incur some difﬁculties to discrim-inate semantics of some related documents. A more effective method to solve the name-entity recognition will be included in our future work.

7. Conclusion

Location information on the internet is a precious asset for knowledge acquisition in daily lives. The map- and text-based loca-tion references have been becoming critical informaloca-tion needs for internet users. Searching for locations with local information is very useful particularly for the visitors with little experience or knowledge about the areas for traveling purposes. In this work, we have developed a combined method to deal with the difﬁculties in fusing text- and image-types of location information for geospa-tial knowledge discovery. Also, we have implemented a framework which is capable of detecting implicit relations among various col-lected datasets without involving a large number of training sam-ples. According to the experimental results, our approach is sensible for achieving such an objective. However, there are some important points needed to be further addressed:

In our experimental image data, geographic objects represented in the collected images are associated with speciﬁc locations. Unfortunately, in some cases, a large range of geographic land-mark may contain various geographic objects. As a result, vari-ous geographic objects occurred in same image set might be regarded as a geographic spot sample. This increases the difﬁ-culties to annotate images in such cases.

Our work presented a costly feasible way to support real world applications with limited resources to fulﬁll tasks of geograph-ical knowledge discovery from internet. By utilizing a small set of pre-classiﬁed instances, our approach has been proven to be a useful solution for location based multimedia data mining. Given the fact that there is a considerable geographic document collection and multimedia data being generated on the internet each day, our method has demonstrated some initial progress to the application domain. We will focus on expanding dataset col-lections in our future work on the topic.

For future work, we will try to add geographical coordinates (e.g. longitude and latitude) to our system to enhance the docu-ment-to-zone mapping functions, and involved other geospatial features such as trajectories in the system development.

References

Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classiﬁers. In Proceedings of the 5th annual workshop on computational learning theory (pp. 144–152). Pittsburgh, PA, United States: ACM.

Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., & Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. New York, NY, ETATS-UNIS: Institute of Electrical and Electronics Engineers.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.

Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision (ECCV-2004) (pp. 1–22).

Doerr, M., & Papagelis, M. (2007). A method for estimating the precision of placename matching. IEEE Transactions on Knowledge and Data Engineering, 19, 1089–1101.

Ding, W., Stepinski, T. F., Parmar, R., Jiang, D., & Eick, C. F. (2009). Discovery of feature-based hot spots using supervised clustering. Computers & Geosciences, 35, 1508–1516.

Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Washington, DC, ETATS-UNIS: American Association for the Advancement of Science.

Gunn, S. R. (1998). Support vector machines for classiﬁcation and regression. Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science.

Goldberg, D. W., Wilson, J. P., & Knoblock, C. A. (2009). Extracting geographic features from the internet to automatically build detailed regional gazetteers. International Journal of Geographical Information Science, 23, 93–128. Himmelstein, M. (2005). Local search: The internet is the yellow pages. Computer,

38, 26–34.

Jiang, T., & Tao, A. H. (2009). Learning image-text associations. IEEE Transactions on Knowledge and Data Engineering, 21, 161–177.

Jiang, T., & Tao, A. H. (2006). Discovering image-text associations for cross-media web information fusion. In Knowledge discovery in databases: PKDD (pp. 561– 568).

Kamvar, M., & Baluja, S. (2006). A large scale study of wireless search behavior: Google mobile search. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 701–709). Canada: ACM.

Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21, 1–6.

Lee, C. H., Yang, H. C., & Wang, S. H. (2010). An image annotation approach based on bag-of-keypoints for geospatial location search. In The 25th international conference on computers and their applications (CATA-2010) (pp. 37–42). Lee, C. H., Yang, H. C., & Wang, S. H. (2010). A location based text mining method

using ANN for geospatial KDD process. In Proceedings of the 7th international symposium on neural networks (ISNN 2010), Part II, LNCS (Vol. 6064, pp. 292– 301).

Lee, C. H., Yang, H. C., & Wang, S. H. (2011). An image annotation approach using location references to enhance geographic knowledge discovery. Expert Systems with Applications, 38, 13792–13802.

Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the international conference on computer vision (Vol. 2, p. 1150). IEEE Computer Society.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.

McCurley, K. S. (2001). Geospatial mapping and navigation of the web. In Proceedings of the 10th international conference on World Wide Web (pp. 221– 229). Hong Kong, Hong Kong: ACM.

Nguyen, L. D., Woon, K. Y., & Tan, A. H. (2008). A self-organizing neural model for multimedia information fusion. In 11th International conference on information fusion (pp. 1–7).

Ourioupina, O. (2002). Extracting geographical knowledge from the internet. In ICDM-AM international workshop on active mining ACDM-AM.

Sanderson, M., & Kohler, J. (2004). Analyzing geographic queries. In Proceedings of the workshop on geographic information retrieval, 27th annual international ACM SIGIR conference (pp. 25–29).

Sebald, D. J., & Bucklew, J. A. (2000). Support vector machine techniques for nonlinear equalization. New York, NY, ETATS-UNIS: Institute of Electrical and Electronics Engineers.

Uryupina, O. (2003). Semi-supervised learning of geographical gazetteers from the internet. In Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references (Vol. 1, pp. 18–25). Association for Computational Linguistics.

Wang, K., Zhang, J., Li, D., Zhang, X., & Guo, T. (2007). Adaptive afﬁnity propagation clustering. Acta Automatica Sinica, 33.

Xing, E. P., Yan, R., & Hauptmann, A. G. (2005) Mining associated text and images with dual-wing harmoniums. In Proceedings of the 21st conference on uncertainty in artiﬁcial intelligence (UAI-2005).

Zong, W., Wu, D., Sun, A., Lim, E. P., & Goh, D. (2005). On assigning place names to geography related web pages. In Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (pp. 354–362). Denver, CO, USA: ACM.