• 沒有找到結果。

An Analysis of Photo Editors’ Query Formulations

N/A
N/A
Protected

Academic year: 2021

Share "An Analysis of Photo Editors’ Query Formulations"

Copied!
24
0
0

加載中.... (立即查看全文)

全文

(1)

An Analysis of Photo Editors’ Query Formulations

for Image Retrieval

Tsai-Youn Hung

Researcher, National Museum of Taiwan Literature

E-mail: joyhung@nmtl.gov.tw

【Abstract】

The objective of this study is to examine the characteristics of queries raised by searchers in the process of collecting image information. The study involves 30 photo editors using the photo archive database system of the Associated Press (AP) to retrieve specific, general, and subjective photos. The results show that searchers usually raise a great number of queries per search and each query contains few terms. Generally, searchers utilize various combinations of terms on a trial-and-error basis to find relevant photos. Image retrieval for subjective photos is revealed to be the most difficult for searchers to formulate and reformulate queries, and as a result more complex querying strategies are employed. The study suggests that a sensitive and responsive thesaurus system be established, whose display of synonymous and hierarchical terms may help searchers construct their queries more correctly and more efficiently. In addition, an image system that can integrate current theories in human linguistic and visual processing is also desirable, which may offer better support for image retrieval by allowing searchers to simultaneously input both textual and visual queries.

Keyword

Image retrieval;Query formulation;Query analysis

4:1=80(May ’12)13-36 ISSN 1023-2125

(2)

Images are very useful and important information. In many fields, image materials are the primary subjects and more important than texts. As more and more images have been digitized and are available for access, the problem of image retrieval has attracted attention. One of the major problems in image retrieval is how to index and retrieve visual materials. There are a number of nationally-recognized indexing and retrieval systems designed to address the problem. Basically, they can be categorized into text-based and content-based approaches. The text-based approach refers to retrieval from text-based indexing of images, which use a controlled vocabulary or natural language text; whereas the content-based approach refers to retrieval based on image attributes such as color, texture, shape.

These two approaches have their weaknesses. Text-based indexing suffers from a lack of interindexer consistence, because describing image content is highly subjective. A picture is composed of pixels and has many possible roles. It cannot be translated neatly and simply into words, because words are not native elements of pictures. It means that the same image might mean something different to different people, even the same person at a different time. Content-based indexing can only provide a relatively low level of interpretation of the image. There is a gap between low-level visual descriptions and a user’s semantic expectation (Santini, 2001). In both approaches, users use queries to express their information needs. Queries are important variables in the exploration of visual information seeking behavior. Much of the previous research has been focused on query studies in art

history domain. Journalism is also one of the domains in which people utilize many digital photos. Their image searching behavior should be studied. In this article, the researcher investigated photo editors’ image searching behavior by analyzing their query formulations in an interactive text-based image retrieval environment.

Related Literature

Queries from Non-Digitized Collections

There has been a significant amount of literature that examines how searchers pose queries in image retrieval since the early 1990s (Enser, 2008). The first and most extensive study on users’ query analyses was conducted by Enser and McGregor (1993) from the Hulton Deutsch Collection. Analyzing 2,722 requests, they classified users’ queries into four categories: nonunique, unrefined; nonunique, refined; unique, unrefined; and unique, refined. The analysis shows that unique subjects account for most requests, and requests were refined mostly by time. Hastings (1994, 1995) investigated art historians’ queries by using Caribbean paintings. She found that identification, subject, text, style, artist, category comparison, and color are the major features of their queries. Ornager (1997) presented the types of journalists’ queries done in newspaper image archives in Denmark and classified the queries into five types of typologies: specific inquirer, general inquirer, story teller inquirer, story giver (handing over the story to the staff to let them choose the photos), and fill in space inquirer (only caring about the size of the photo to fill an empty space on the page). Keister (1994) analyzed queries submitted to the National Library of Medicine Collection. She

(3)

suggested that terms describing concrete image elements are important. In contrast, Jörgensen (1996) found that the number of abstract description is almost equal to the number of object description. Armitage and Enser (1997) collected queries from seven picture libraries, and categorized them into four classes: image content, identification/ attribution/provenance checking, accessibility of image/artist work, and miscellaneous. McCay-Peet and Toms (2009) interviewed 30 journalists and historians regarding how they used images and the types of image attributes used to find appropriate images for their work. The findings indicate that participants tended to use images more as illustration than information. Person/animal/object, viewer response, event/action, and visual elements are four key attributes that are important in selecting the image for use. These studies shed light on understanding the characteristics of users’ queries and core image attributes; however, they were conducted in non-digitized collections.

Queries from Online Databases

In the late 1990s, a great number of research has analyzed users’ online search logs to study their visual information needs and query characteristics. Turner (1995) compared the most popular user assigned terms with indexer assigned terms by using the stockshot database of the National Film Board of Canada. He found that there was a high correlation between the users and the indexers, and pointed out that because of the nature of the database, the users overwhelmingly supplied of-ness terms rather than about-ness terms for the most part. Chen (2001) investigated 29 participants’ image queries by

comparing the features of the queries to those identified in previous studies by Enser and McGregor (1993), Jörgensen (1995) and Fidel (1997). The findings of this study show that participants mainly used unique terms and some used nonunique terms, but seldom used color, shape, and texture terms as refiners. Choi and Rasmussen (2003) investigated 38 faculty and graduate students of American history using the Library of Congress American Memory photo archives. The findings indicate that subjects’ search terms along with format terms were popular in pursuing visual information for historians, and terms with abstract attributes are seldom used C. Jörgensen and P. Jörgensen (2003) analyzed a search log from a commercial image provider over a one-month period. The results indicate that nouns accounted for almost half of the searches. It contrasts with earlier studies suggesting unique term searches that are frequently used. The data of these studies were collected from online image retrieval environments. They reveal the structure of online search queries and provide guidelines to indexers on what attributes of image should be indexed as descriptors.

Queries from Web

With the tremendous growth in the quantity of digital image on the Web, studies on Web queries have been given a lot of attention by researchers. Goodrum and Spink (2001) examined Web image queries from users of the Excite search engine. The results reveal that users input relatively few terms to specify their image information needs on the Web. These image queries contained a large number of unique terms, and most terms were used infrequently. Pu (2003) investigated the difference between Web

(4)

image and textual queries. The findings show that image queries may have higher specificity and contain more refined queries, and queries were refined more by interpretive attributes than by reactive and perceptual attributes. Pu (2008) also found that failed queries had a higher specificity than successful queries. Westman and Oittinen (2006) studied journalists and archivists’ image searching behavior and found that most queries and requests dealt with specific entities. Roughly 40% of requests were for a specific. More recently, Jansen (2008) examined the structure and formation of image queries collected from a Web search engine transaction log by mapping them to Enser and McGregor (1993), Jörgensen (1995), and Chen’s classification schemes (2001) for image searching. The findings show that the features and attributes of image queries differ relative to the image queries from other information retrieval systems. Tjodronegoro, Spink and Jansen (2009) used a Dogpile 2006 Web transaction logs analyzing the duration, structure of Web search queries, and most popular multimedia Web searching terms. The findings are that more than 50% of searching episodes are less than one minute; most image search queries using natural language and query length ranges between one to four terms; and the most popular query category is about people. All these studies’ findings can provide suggestions for improving image indexing and retrieval systems to meet the information needs of Web users. Nevertheless, using Web logs to analyze users’ queries has its limitations, that users’ real-time interaction with search engines cannot be accurately captured.

Research Question

The data of previous research on journalists’ image seeking behavior were collected mostly by observing and interviewing journalists in their work. The journalists’ search interaction with the database systems was not examined in detail. The researcher designed a study to examine the micro-dynamics of image searching process with an attempt to provide a holistic view of photo editors’ image searching behavior in search moves (Hung, 2009), relevance judgments, and querying. This article presents part of the results of the study—photo editors’ querying behavior. The research question was developed as follows:

How do photo editors formulate their queries for searching images? Are the query formulation patterns different when they search for specific, general, and subjective images?

Method

Subjects

Fifteen news photo editors and 15 non-news photo editors were recruited from newspaper and magazine companies from the New Jersey, New York, and Philadelphia areas. Prior to the search, a pre-search questionnaire was distributed to each subject to collect background information. Of the 30 subjects, 17 were males and 13 were females. The average age of subjects was 41.63 (Table 1). A seven-point Likert scale questions were asked to measure subjects’ frequency of using image databases (1 indicates never, 7 indicates frequently) and level of search expertise (1 indicates poor, 7 indicates excellent). The results of the questionnaire

(5)

show that most subjects used image databases very frequently (M=6.73). Overall, the subjects had substantial experience in using image databases

(M=5.67). The average number of years working was 12.83 (Table 2).

Table 1

Age & Gender of Subjects (N=30)

Subject Average Gender(sex) Male Female Non-news 44.60 4 56.7% 11 43.3% News 38.67 13 2 Total Average 41.63

Table 2

Working and Image Searching Background of Subjects

Subjects Work experience Frequency of use Search expertise

Non-news 15.41 7.00 5.93

News 10.25 6.47 5.40

Average 12.83 6.73 5.67

Database System

This study used a photo archive database system of Associated Press. The Associated Press (AP) is the largest and oldest news organization in the world, serving as a source of news, photos, graphics, audio and video for more than one billion people a day. The AP photo archive is a carefully selected collection of photographs from the vast holdings of the AP. The photo archive has been widely used in the fields of history, journalism, political science, and art. It contains over 700,000 photos and dates back over 150 years. Each picture is accompanied by a

50-to-75-word caption that fully describes the person or event in its surrounding context, and indexed by location, photographer, date taken, date submitted, and subject keywords. Some photos receive additional picture-oriented indexing to represent the

look and feel of the photograph itself, which refers to

emotional sensations like excitement, fear, pride, etc. Searchers can use this database not only to search for factual photos but also subjective and emotional photos. (See Appendix A for the AP photo database interface)

(6)

Setting

The experiments of this study were conducted in 2005 at participants’ workplaces. An appointment was made with each participant before the researcher arrived at the participant’s workplace. The researcher brought a laptop and connected to the AP archive database system through the Internet connection of the participants’ workplaces. CyberCam, a screen capture software system, was used to capture every search move of the participants on the screen.

Tasks

Three tasks of this study were created to ask the subjects to select specific, general, and subjective

photos respectively based on the topics provided. Each subject selected up to five photos for each task searched. The definitions of the three types of photos were based on Shatford’s (1986) image analysis (Table 3). Shatford (1986) was the first researcher to test Panofsky’s (1939) theory in which describes three levels of meaning in a work of art, and suggested that images convey information relating to what an image actually presents and what an image is about. Shatford proposed that subjects in an image can be categorized as specific of, generic of, and

about, which are equivalent to the three categories of

photos selected in this study.

Table 3

Shatford’s Image Analysis

Subjective Specific General Finding images which have

emotional or abstract concepts.

Finding images of an individually named person, group, thing, event, location, or action.

Finding images of a kind of person, group, thing, event, place, condition, or action.

Topics

Under each task, a topic was provided. The selection of topics was considered based on the distribution of photos across the database and subject of topic. This study is to investigate subjects’ image searching behavior. It is believed that the distribution of relevant photos could affect individual’s searching behavior, such as, relevance judgment, time spent on searching, and search moves, etc. Topic subject was also considered as a selection factor. To avoid this,

topics were selected which would be accessible with general knowledge and non-technical subject areas.

Based on the considerations, three topics were chosen from the most downloaded topics of photos for the year of 2004 by the users of the AP archive database. They are US presidential election, an individually named event for specific search; Terrorist, a kind of group for general search; and Peace, an abstract concept for subjective search.

(7)

Text Queries

Under each topic, a text query related to the topic was provided. The three text queries were selected from the LexisNexis Academic Database, which covers full-text of more than 50 major English-language newspapers from the U.S. and around the world, and more than 400 magazines and journals and over 600 newsletters. The criterion for selecting these text queries is that they can not be news of one specific day. Event news would lead searchers to use specific terms, e.g., date and location to find exact photos which were taken for the events. This study is to investigate subjects’ image search for “aboutness” matches, not exact matches.

Sources of Data

A combined approach was used for data collection. Data were collected through verbal thinking-aloud, transaction logs, and interviews. Participants’ thinking-aloud contents were audio taped during their searching. The computer screens and transition moves were recorded by CyberCam software. After the completion of each search, a post-search interview to elicit participants’ comments on the three searches was conducted and audio taped.

Results

The selection of search terms is the focus of this study. The results reported here investigated the rules 30 searchers used when they selected search terms, such as the number of queries, terms used, the source of search terms, and the modification of querying strategies such as the use of refiners and Boolean operators. The highlights of the findings are as follows.

Queries

The 30 searchers made a total of 532 queries, 179 queries in the specific search, 170 queries in the general search and 183 in the subjective search. The searchers made an average of 5.97 queries in the specific search; 5.67 queries in the general search; and 6.10 queries in the subjective search. There was no significant difference in the means of query numbers in the three searches. However, there was great difference between the minimum and maximum number in each search, especially in the specific search, from a minimum 1 to maximum 22.

The average query numbers were higher than those found in previous studies. Though the query numbers were higher, the searchers reformulated their queries in experimental basis. They had little idea where to go, and they just tried every possible term to change their queries. For example, in the specific search a subject typed presidential election, US presidential election, and US and election, in the general search, a subject typed Iraq bombing, Iraq bombing suicide, Iraq and bombing and suicide, Israel and bombing and suicide, and bombing and suicide;, in the subjective search, a subject typed peace and shalom, shalom, peace, and Hebrew and peace. These queries show that the searchers were willing to change their queries, but seldom tried narrower or broader terms.

Search Terms

The average terms per query were 2.60 in the specific search, 1.87 in the general search, 1.74 in the subjective search. It indicates that when searching for images, the searchers tended to use few terms, with the more terms being used in the specific search.

(8)

Table 4 displays the average number of search terms by each searcher between search tasks. There was no power searchers found in these three searches. In general, the 30 searchers used relatively few terms to

specify their image information needs, compared to the mean number of 7-15 terms per query used to find textual documents in structured databases (Spink & Saracevic, 1997).

Table 4

Average Number of Search Terms by Each Searcher among Search Tasks

Searchers Specific task General task Subjective task

1 1.33 1.86 2.29 2 1.60 2.00 1.43 3 3.00 1.50 1.45 4 3.67 3.00 1.33 5 2.71 1.00 2.25 6 2.86 2.18 1.50 7 3.00 2.25 2.20 8 2.33 1.89 1.60 9 2.55 3.12 2.22 10 2.20 1.75 2.11 11 3.44 1.00 2.13 12 2.50 1.50 1.00 13 3.00 2.33 1.75 14 2.11 2.00 2.00 15 2.00 1.75 1.33 16 2.09 1.33 2.00 17 2.38 1.25 1.40 18 2.00 1.00 1.44 19 2.00 1.45 1.50 20 2.00 1.83 2.14 21 2.75 2.20 3.36 22 2.92 2.60 1.40 23 2.33 2.33 2.00 24 2.55 2.50 2.25

(9)

Table 4

(Continued)

Searchers Specific task General task Subjective task

25 2.50 2.33 1.14 26 2.25 1.40 1.00 27 3.25 1.50 2.00 28 4.50 2.00 1.00 29 4.00 1.50 2.00 30 2.30 1.75 1.00 Average 2.60 1.87 1.74

Refiners

Some search queries contained terms that can represent specific attributes, such as time, location, and action. The searchers used them to refine broad terms into more specific requests to increase precision. For example, the searchers used “Iraq,” “Palestine” and “Israel” location refiners to refine a

general term into a more specific visual request. Table 5 shows that searchers in the specific search used more refiners to modify their queries accounting for 55% of the total number of refiners used, while the searchers in the subjective search used the fewest number of refiners during the searching.

Table 5

Number of Refiners Used by Search Tasks

Specific task General task Subjective task Number of

refiners used to modify queries

126 (55%) 64 (27.9%) 39 (17%)

Boolean Operators

Of the 30 searchers, 6 searchers used Boolean operators in one of the three searches; 6 searchers used Boolean operators in two of the three searches; 7 searchers used it in all three searches. One third of the searchers did not use any Boolean operators during their searching. In counting the use of

Boolean operators in query terms, AND, OR and NOT operators were all used in this study, but the OR and NOT operators were only used once respectively. The AND operator was used much more frequently than the OR and NOT operators. The searchers used the operators more frequently when conducting the specific search with a total of 61 times. In the general and subjective searches, the

(10)

searchers onlyused theoperators 37 and 35 times respectively.

Term Occurrences

Some individual searchers repeatedly used the same terms in their queries. Thus, some terms were frequently used because of some individual searchers’ heavy use. After eliminating duplicate terms used by the same searcher, a list of unique terms was developed. There were 69 unique terms in the specific search, 82 unique terms in the general search, and 103 unique terms in the subjective search. It indicates that there were differences in the use of vocabulary among the three searches. The largest vocabulary was used in the subjective search.

Appendix B displays unique terms that occurred in each of the searches. They show that in the specific search, 4 terms were used more than 10 times, while in the general and subjective searches only 1 term was used more than 10 times. A graph of rank-frequency distribution of the unique terms is shown in Figure 1. The frequency represents the number of searchers. The graph shows at the beginning both lines of specific and general searches fall gently, while in the subjective search, the line drops very rapidly. The three lines end with a long tail. The graph exhibits that many unique terms were only used by one searcher, sometimes only one time across the three searches, especially in the subjective search. 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Rank F re q u e n c y Specific General Subjective

Figure 1 Distribution of Terms Used by Search Tasks

Term Sources

The following methods were used in determining the sources of search terms. First, if a search term appears in the text query, the search term counts as a term selected from the text; if a search term comes from a selected photo or its accompanied textual information after the searcher

viewed the photo, the search term counts as a term selected from the database; if a search term is not selected from either of the text query or the database, it counts as a term generated from the searcher himself, e.g., from the searcher’s knowledge, or from searcher-interaction with the database system during the search process.

(11)

The total number of search terms used in the specific, general, and subjective searches was 412 (specific), 293 (general), and 318 (subjective), for a grand total of 1,023. The distribution of these 1,023 search terms across the three sources is presented in Table 6. In the specific search, 68.89% of terms came from the text query, which was much higher than those in the general and subjective searches, with 22.87% and 35.85 % respectively. Searchers in the specific search were responsible for 30.34% of the terms. Searchers were responsible for 74.74% of the

terms in the general search and 50.94% in the subjective search, which were large percentages. It indicates that when conducting the specific search, searchers often extracted terms from the text query; while conducting the general and subjective searches searchers had to develop search terms on their own. As for the source from the database, in the specific search, only 0.97% of terms came from the database, and 2.39% and 13.21% in the general and subjective searches respectively.

Table 6

Frequency and Percentage of Term Sources Used by Search Tasks

Source Specific task General task Subjective task

Text query 283 (69.69%) 67 (22.87%) 114 (35.85%)

Searcher 125 (30.34%) 219 (74.74%) 162 (50.94%)

Database 4 (0.97%) 7 (2.39%) 42 (13.21%)

Total 412 293 318

Others

The average time spent in the three searches was 11 minutes 54 seconds in the specific search, 10 minutes 53 seconds in the general search and 10 minutes 49 seconds in the subjective search. There was no significant difference in the means of time spent in the three searches. But, there was great difference between the minimum and maximum number across the three searches, such as from 4 minutes to 17 minutes. As mentioned in the previous section, there was no power searcher found in this study. The great difference was made by browsing. Browsing is the key move for image searching. Most of the searchers spent a large

portion of their time on browsing. On average, the searchers chose 3.93 photos in the specific search, 3.53 photos in the general search, and 3.17 photos in the subjective search.

Fifteen news photo editors and 15 non-news photo editors were recruited to participate in this study. This article also analyzes if there are behavior differences between the news and non-news photo editors. The analysis indicates that the news photo editors used more queries across the three searches, and used more Boolean operators especially in the specific and subjective searches. In the specific search, the news photo editors used operators 41 times accounting for 67.21%, and used

(12)

22 times in the subjective search accounting for 62.86% of the operators. In addition, the new photo editors were more likely to use terms from the text query, and generated more search terms from themselves in the subjective search. There was no significant difference between the news and non-news photo editors in terms of the use of search terms and refiners. According to the analysis, the non-news photo editors seemed more conservative than the new photo editors when searching for image information. However, the assumption needs further studied to verify.

Discussion

The searchers input an average of 6 queries (5.97 for the specific search; 5.67 for the general search; and 6.10 for the subjective search) during an image search. With this study and the studies by Goodrum and Spink (2001) (3.36 queries), by Goodrum, Bejune and Siochi (2003) (2 queries), and by Jansen, Spink and Saracevic (2000) (2.8 queries in textual retrieval), it seems that the searchers employed higher number of search queries in this study. But queries were short. On average, a query contained approximately 2 terms. The searchers used fewer search terms in the general and subjective searches than in the specific search (2.6 in the specific search, 1.87 in the general search; and 1.74 in the subjective search). The average number of search terms in this study was much smaller than that in Jansen et al.’s study (2000), 2.4 terms for Web queries, in Goodrum and Spink’s study (2001), 3.74 terms for Web image queries, and in Choi and Rasmussen’s study (2003) in which the participants selected an average of 4.87 search

terms. Since all the searchers in this study had good experience in using image databases especially the AP archive database, and there was no time limit when conducting these experimental searches, the reason that the searchers used a greater number of queries and a few terms in each query could be the inability to translate image information needs into textual queries, and it was difficult to come up with conceptual terms for non-specific topics.

Even though the searchers used an average of 6 queries, most searchers repeatedly used the same terms to construct their queries. They tried to use different combinations of the same terms on a trial-and-error basis to find any relevant photos. The results also show that when querying, the searchers mostly employed operational tactics which is to modify a retrieved set without changing the conceptual meaning it represented. The finding supports Markkula and Sormunen (1998) and C. Jögensen and P. Jörgensen’s studies (2003). In both studies, the authors pointed out that different approaches or tactics do not appear to be carefully thought out but seem rather be tested experimentally. This could also be due to the low precision or the lack of support for query reformulation of the database system.

Boolean operators were not used often. In those, OR and NOT were seldom used with AND being used by far the most. Once the searchers started to use the AND operator, they repeatedly used it to combine old terms with new terms. The recall and precision of the retrieval sets seemed not to be the factors for them deciding to use an operator or not. The use of Boolean operators found in this study supports the results by

(13)

Wildemuth, Jacob, Fullington, De Bliek and Friedman (1991) and Sutcliffe, Enni and Watkinson (2000) suggesting that searchers, in general, use mostly AND operators, and very seldom other Boolean operators. This result also echoes Jansen et al.’s (2000) study which found that searchers don’t use the operators properly.

The distribution of the frequency use of unique terms in the queries was highly skewed. A few terms were used repeatedly and many terms were used only once. This proves the saying that a picture can mean different things to different people, and a picture is worth a thousand words. If a picture is worth a thousand words to one searcher, then it could be worth 30,000 words to 30 searchers in the study. The only way to reduce the wide variation is to implement thesauri in the image systems.

Much research investigating image queries (Chen, 2001; Enser & McGregor, 1993; Hastings, 1994, 1995) has consistently demonstrated the occurrence of refiners such as time, location, format, and medium in image queries. The result from this study is consistent with these findings that searchers add refiners to modify their queries. However, this study found that only in the specific search did the searchers use time and location very frequently. Many searchers of this study commented that it was much more difficult to come up with proper terms to represent terrorist and peace when conducting the general and subjective searches. This was because the searchers had to translate the concepts into textual terms by themselves. The study shows that approximately 70% of terms came from the text query in the specific search; nevertheless, in the general and subjective searches,

most terms came from the searchers themselves. The result indicates that the selection of search terms and the construction of queries is a highly interactive when searching for general and subjective images.

Most searchers expressed that the subjective search was the most difficult and complex task in querying. They had difficulties in defining the meaning of peace. The strategies they most employed were to use related terms such as

meditation, calm, tranquility, bliss, and Interfaith to

expand recall, and to convert general and subjective concepts into more tangible queries, i.e., person’s name, place, and event; for example, Dalai Lama, Pope, and Mother Teresa, the Middle East, Israel. Prior research has confirmed the importance of object identification as an important heuristic for both indexing and querying image retrieval systems (Armitage & Enser, 1997; Chen, 2001; Fidel, 1997; Hastings, 1994, 1995; Jörgensen, 1998). The result of this study suggests that in addition to objects, other attributes such as abstract, symbol, and emotions need to be indexed to enhance the retrieval process for searchers to achieve more meaningful and relevant matches. Layne (1994) suggested that ideally access should be provided at all possible generic identities as well as to the specific identity of a person, object, or event. Greisdorf and O’Connor (2002) also commented that without mechanisms to convey metaphoric potential the description of the image may lack retrieval strength. One of the searchers in this study made the similar comment:

“The only way you could really improve the database is that anytime a photo goes in not only

(14)

are the who, what, where, why, how all that realistic put in, but also the conceptual or symbolic quality of the photo, because this is a lot more sophisticated search all about. So you have to have someone look at those selected photos and say bliss or uncertainty. They’ve got to be in the caption. Somehow each information has to be put in…I kind of knew in my head what I am looking for, but the look was visual. It wasn’t anything you can put in words; for example, to find a perfect Pope picture, and there is no way you can type in the term “perfect Pope’s picture.”

The result also supports Vakkari’s suggestion (1998) that task complexity can influence the user’s searching behavior in terms of query formulation. Vakkari suggested that the more complex the task the more ill-structured it is. The poorer the conceptual structure of a searcher about a task, the less clearly can be expressed what type of information is useful, concepts and relations in a request and query.

Conclusion

There are a number of important and unresolved fundamental questions in image retrieval. One of them is how searchers represent their non-textual information needs. This study reveals that query formulation and especially query reformulation are the difficult tasks that the searchers face, especially for the subjective image search. In this study, the searchers tried many possible terms, but they received zero hits. This might imply that there was a vocabulary gap between the indexers and the searchers. To overcome this problem, the indexers need to include

more diverse terms and terms related to concepts, moods, feelings, and etc., not just indexing terms from the captions and other descriptions associated with the photos. Yet, the human indexing is costly, labor-intensive, and inconsistent between indexers and searchers, and among indexers themselves as well. Lately many researchers have explored different approaches by using machine-assisted techniques (Escalante, et al. 2008; Feng, 2008; Meij, Bron, Hollink, Huurnink, and Rijke, 2009; Wang, et al., 2006). For example, to help searchers to match the terms in image systems, the system designers developed a responsive and possibly user sensitive thesaurus system that will display to the searcher synonym or hierarchical terms to help searchers construct image queries more correctly and efficiently. Nonetheless, relying too much on the thesaurus to display hierarchical terms, the searchers may retrieve images which are not relevant to their information needs because of the differences in the searchers’ subjective interpretations. To help the searchers to find relevant images more effectively, a relevance feedback mechanism could be implemented to let the searchers give feedback to tell systems what items are relevant or not relevant. This is more effective than just using query terms to tell the systems what they want. Additionally, there has been a problem in representing image information needs with textual queries. An ideal image system that could integrate current theories of human linguistic and visual processing that permit searchers inputting textual and visual queries might be able to offer better support for those who search for images.

(15)

The searchers’ query analysis in this study indicates that searchers had difficulties in operationalizing search tasks into possible search terms. It shows that the course of using textual search terms to represent non-textual information needs seems to be of high cognitive processes. It could be the fundamental problem that holds up the progress in image retrieval. Future research on what the cognitive processes are involved in searchers’ image searching might be needed.

This study examined photo editors’ image querying behavior and has identified some common and discrepant attributes. The results contribute to the field of information science by bringing insight into understanding searchers’ image searching behavior in the field of journalism. However, there are some limitations in this study. First, the researcher was dependent on 30 volunteers from the New Jersey, New York City, and Philadelphia areas,

so non-random sampling was used. Therefore, it might be problematic to generalize these results to a larger population. Second, the results of this study came from a homogeneous group of subjects— photo editors of newspapers and magazines. A user’s perspective would vary across different user groups. Images of different kinds or from different disciplines will have their own particular attributes that appear to be different for different users. This study focuses on the journalism field, thus the findings might not be applied to a different user group or the examination of different types of images. Third, only one database system and three text queries were used in this study. It is dangerous to generalize from a small number of instances. Further studies on other groups, databases, and various tasks would be recommended.

(Received on 7 August 2011)

Acknowledgements

I am grateful to Mr. Chuck Zoeller, director of the Associated Press photo library, for his assistance in providing the AP archive database system for this study, and thank the photo editors in the New Jersey, New York, and Philadelphia areas, who were kind enough to participate in this study.

References

Armitage, L., & Enser, P. (1997). Analysis of user need in image archives. Journal of Information Science, 23 (4), 287-299. Chen, H.-L. (2001). An analysis of image queries in the field of art history. Journal of the American Society for Information

Science and Technology, 52 (3), 260-273.

Choi, Y., & Rasmussen, E. (2003). Searching for images: The analysis of users’ queries for image retrieval in American History. Journal of the American Society for Information Science and Technology, 54 (6), 498-511.

(16)

Enser, P., & McGregor, C. (1993). Analysis of visual information retrieval queries(British Library Research and Development Department Report, No.6104). London: British Library.

Escalante, H. J., Hernández, C., López, A., Marín, H., Montes, M., Morales, E., et al. (2008). Towards annotation-based query and document expansion for image retrieval. In Advances in Multilingual and Multimodal Information

Retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007 (C. Peters, V. Jijkoun, T. Mandl, H.

Müller, D. W. Oard, A. Peñas, et al., Eds., pp. 546-553). (Lecture notes in computer science, No.5152). Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg.

Feng, S. (2008). Statistical models for text query-based image retrieval. Unpublished doctoral dissertation, University of Massachusetts Amherst, Massachusetts.

Fidel, R. (1997). Image retrieval task: Implications for the design and evaluation of image databases. The New Review of

Hypermedia and Multimedia, 3, 181-199.

Goodrum, A., & Spink, A. (2001). Image searching on the Excite Web search engine. Information Processing &

Management, 37, 295-311.

Goodrum, A., Bejune, M., & Siochi, A. (2003). A state transition analysis of image search patterns on the Web. In Image and

Video Retrieval: Second International Conference, CIVR 2003 (E. M. Bakker, T. S. Huang, M. S. Lew, N. Sebe, &

X. Zhou Eds., pp. 281-290). (Lecture notes in computer science, No. 2728). Berlin, New York: Springer.

Greisdorf, H., & O’Connor, B. (2002). Modelling what users see when they look at images: a cognitive viewpoint. Journal of

Documentation, 58 (1), 6-29.

Hastings, S. K. (1994). An exploratory study of intellectual access to digitized art images. Unpublished doctoral dissertation, Florida State University.

Hastings, S. K. (1995). Query categories in a study of intellectual access to digitized art images. In T. Kinney (Ed.), ASIS’95:

Proceedings of the 58th ASIS annual meeting, 32 (pp. 3-8). Medford, NJ: Information Today.

Hung, T. -Y. (2009). Photo editors’ search tactics and moves for image retrieval. Journal of Library and Information

Science, 35 (1), 23-36.

Jansen, M. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study of analysis of user queries on the Web. Information Processing and Management, 36 (2), 207-227.

Jansen, B. J. (2008). Searching for digital images on the Web. Journal of Documentation, 64 (1), 81-101.

Jörgensen, C. (1995). Image attributes: An investigation (Indexing systems, retrieval systems, computerized). Unpublished doctoral dissertation, Syracuse University, New York State.

Jörgensen, C. (1996). Indexing images: Testing an image description template. In S. Hardin (Ed.), ASIS '96: Proceedings of

the 59th Annual Meeting of the American Society for Information Science, 33 (pp. 209-213). Medford, NJ:

Information Today.

(17)

Jörgensen, C., & Jörgensen, P. (2003). Image querying by image professionals. In R. J. Todd (Ed.), ASIST 2003:

Proceedings of the 66th ASIST Annual Meeting, 40 (pp.349–356). Medford, NJ: Information Today.

Keister, L. (1994). User types and queries: Impact on image access systems. In Fidel, R. et al. (Eds.), Challenges in

Indexing Electronic Text and Images(pp.7-22). Medford, NJ: Learned Information for the American Society for

Information Society .

Layne, S. S. (1994). Some issues in the indexing of images. Journal of the American Society for Information Science, 45 (8), 583-588.

Markkula, M.,& Sormunen, E. (1998). Searching for photos - journalists' practices in pictorial IR. In J. P. Eakins, D. J. Harper, J. Jose(Eds.), The Challenge of Image Retrieval, Workshop and Symposium on Image Retrieval held at the University of Northumbria Newcastle upon Tyne, UK. Retrieved August 10, 2011, from British Computer Society Web Site: http://www.bcs.org/content/conWebDoc/4437

McCay-Peet, L., & Toms, E. (2009). Image use within the work task model: Image as information and illustration. Journal

of the American Society for Information Science and Technology, 60 (12), 2416-2429.

Meij, E., Bron, M., Hollink, L., Huurnink, B., & de Rijke, M. (2009). Learning semantic query suggestions. In The Semantic

Web: 8th International Semantic Web Conference, ISWC 2009 (A. Bernstein, D. R. Karger, T. Heath, L. Feigenbaum,

D. Maynard, E. Motta, & K. Thirunarayan, Eds., pp. 424-440). (Lecture notes in computer science, No.5823). Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg.

Ornager, S. (1997). Image retrieval: Theoretical analysis and empirical user studies on accessing information in images.

Proceedings of the 60th Annual Meeting of the American Society for Information Science, 34, 202-211.

Panofsky, E. (1939). Studies in iconology: Humanistic themes in the art of the Renaissance. New York: Oxford University Press.

Pu, H.-T. (2003). An analysis of Web image queries for search. Proceedings of the 66th Annual Meeting of the American Society for Information Science and Technology, 40, (pp.340-348). Medford NJ: Information Today.

Pu, H.-T. (2008). An analysis of failed queries for Web image retrieval. Journal of Information Science, 34 (3), 275-289. Santini, S. (2001). Exploring image databases context-based retrieval. San Diego: Academic Press.

Shatford, S. (1986). Analyzing the subject of a picture: A theoretical approach. Cataloging & Classification Quarterly, 6 (3), 39-62.

Spink, A., & Saracevic, T. (1997). Interactive information retrieval sources and effectiveness of search terms during mediated online searching. Journal of American Society for Information Science, 48 (8), 741-761.

Sutcliffe, A., Enni, M., & Watkinson, S. (2000). Empirical studies on end-user information searching. Journal of American

Society for Information Science, 51 (13), 1211-1231.

Tjondronegror, D., Spink, A., & Jansen, B. J. (2009). A study and comparison of multimedia Web searching: 1997-2006.

(18)

Turner, J. M. (1995). Comparing user-assigned terms with indexer-assigned terms for storage and retrieval of moving images: Research results. Proceedings of the 58th Annual Meeting of the American Society for Information Science, 32, 9-12.

Vakkari, P. (1998). Growth of theories on information seeking: An analysis of growth of a theoretical research program on relation between task complexity and information seeking. Information Processing and Management, 34 (3/4), 361-382.

Wang, J. Z., Grieb, K., Zhang, Y., Chen, C.-C., Chen, Y. & Li, J. (2006). Machine annotation and retrieval for digital imagery of historical materials. International Journal on Digital Libraries, 6(1), 18-29.

Westman, S., & Oittinen, P. (2006). Image retrieval by end-users and intermediaries in a journalistic work context. In P. Borlund, J. W. Schneider, M. Lalmas, A. Tombros, J. Feather, D. Kelly, et al., (Eds.), Proceedings of the 1st International Conference on Information Interaction in Context (pp. 102-110). New York, NY: ACM.

Wildemuth, B., Jacob, E. K., Fullington, A., De Bliek, R., & Friedman, C. P.(1991). A detailed analysis of end-user search behaviors. In J.-M. Griffiths(Ed.), Proceedings of the 54th Annual Meeting of the American Society for Information Science, 28, 302-312. Medford, N. J.:Learned Information.

(19)

Appendix A

A Snapshot of the AP Photo Database Interface

The database opens to a simple search template:

Enter your search criteria in the “What”, “When”, and/ or “Where” fields. You can use one search field or a combination of the three fields. Click on search or hit enter key.

Command Bar

(20)

Appendix B

Table B1 Unique Terms Used by Searchers in the Specific Task

Term Number of

searchers Percent Term

Number of searchers Percent Bush 22 12.2 Decision 1 0.6 Kerry 20 11.1 Democratic 1 0.6 Election 19 10.6 Demonstrater 1 0.6 Presidential 12 6.7 Demonstrator 1 0.6 Debate 8 4.4 Farm 1 0.6 Campaign 7 3.9 Fear 1 0.6

George Bush 5 2.8 Florida 1 0.6

John Kerry 4 2.2 Get 1 0.6

War 4 2.2 Night 1 0.6 2004 3 1.7 Out 1 0.6 Argue 3 1.7 Pro-Bush 1 0.6 Convention 3 1.7 Protest 1 0.6 Iraq 3 1.7 Protestor 1 0.6 Topix 3 1.7 Question 1 0.6 Cheney 2 1.1 Reax 1 0.6 Polls 2 1.1 Republican 1 0.6 Reaction 2 1.1 Result 1 0.6 Security 2 1.1 Shout 1 0.6 Supporter 2 1.1 Sign 1 0.6 US 2 1.1 Smear 1 0.6 Victory 2 1.1 Soldier 1 0.6 Voter 2 1.1 Speech 1 0.6 Voting 2 1.1 States 1 0.6 2005 1 0.6 Support 1 0.6 America 1 0.6 Teresa 1 0.6 American 1 0.6 Terror 1 0.6 Anti-Kerry 1 0.6 Uncertain 1 0.6 Battleground 1 0.6 Uncertainty 1 0.6

(21)

Table B1(Continued)

Term Number of

searchers Percent Term

Number of searchers Percent Bukaty 1 0.6 Undecided 1 0.6 Campaigning 1 0.6 United 1 0.6 Canvas 1 0.6 Vote 1 0.6 Clash 1 0.6 Watch 1 0.6 Concede 1 0.6 Win 1 0.6 Confident 1 0.6 Yell 1 0.6 Crowd 1 0.6 Total 180

Table B2 Unique Terms Used by Searchers in the General Task

Term Number of

searchers Percent Term

Number of

searchers Percent

Terrorist 15 9.9 Dejong 1 0.7

Bombing 9 5.9 Demonstration 1 0.7

Terrorism 8 5.3 Explosion 1 0.7

Bin Laden 6 3.9 Faces 1 0.7

Freedom 5 3.3 Fidel Castro 1 0.7

Suicide 5 3.3 Fire 1 0.7 Terror 5 3.3 Funeral 1 0.7 Attack 4 2.6 Guard 1 0.7 Fighter 4 2.6 Hanging 1 0.7 Iraq 4 2.6 Insurgent 1 0.7 Osama 4 2.6 Ireland 1 0.7 World Trade Center 4 2.6 Islamic 1 0.7 Al qaeda 3 2.0 Jenin 1 0.7 Bomb 3 2.0 Khalid Mohammed 1 0.7 Bomber 2 1.3 Killed 1 0.7 Bus 2 1.3 Lederhandler 1 0.7 Hamas 2 1.3 Market 1 0.7

(22)

Table B2(Continued)

Term Number of

searchers Percent Term

Number of

searchers Percent

Hero 2 1.3 Marty 1 0.7

Israel 2 1.3 Middle East 1 0.7

Palestinian 2 1.3 Mohammed 1 0.7 Training 2 1.3 Mourning 1 0.7 Airplane 1 0.7 Northern 1 0.7 Airport 1 0.7 Plane 1 0.7 Alert 1 0.7 Poster 1 0.7 Anniversary 1 0.7 Poverty 1 0.7 Aatta 1 0.7 Prisoner 1 0.7 Blast 1 0.7 Profile 1 0.7 Body 1 0.7 Pulitzer 1 0.7 Bomb 1 0.7 Saddam Hussein 1 0.7 Camp 1 0.7 Search 1 0.7 Car 1 0.7 Sharon 1 0.7 Carry 1 0.7 Silhouette 1 0.7 Celebrate 1 0.7 Solider 1 0.7 Chao 1 0.7 Supporter 1 0.7

Che Guevara 1 0.7 Taliban 1 0.7

Chechnya 1 0.7 Train 1 0.7

Civilian 1 0.7 Unknown 1 0.7

Combatant 1 0.7 Victims 1 0.7

Custody 1 0.7 Wall Street 1 0.7

Deface 1 0.7 Yasser Arafat 1 0.7

Total 152

Table B3 Unique Terms Used by Searchers in the Subjective Task

Term Number of

searchers Percent Term

Number of

searchers Percent

Peace 29 15.3 Freedom 1 0.5

(23)

Table B3 (Continued)

Term Number of

searchers Percent Term

Number of searchers Percent Shalom 6 3.2 Girl 1 0.5 Israeli 5 2.6 Hadj 1 0.5 Peaceful 5 2.6 Interfaith 1 0.5 Dove 4 2.1 Iraq 1 0.5 Hand 4 2.1 Isreal 1 0.5 Prayer 4 2.1 Jewish 1 0.5 Calm 3 1.6 Lay 1 0.5 Inner 3 1.6 Love 1 0.5 Palestinian 3 1.6 March 1 0.5 Protest 3 1.6 Market 1 0.5 Vigil 3 1.6 Meditate 1 0.5 World 3 1.6 Meditative 1 0.5 Yoga 3 1.6 Mideast 1 0.5 Arafat 2 1.1 Monk 1 0.5 Clinton 2 1.1 Mountain 1 0.5 Compassion 2 1.1 Muslim 1 0.5 Conflict 2 1.1 Nablus 1 0.5 Cooperation 2 1.1 Namaste 1 0.5

Dalai Lama 2 1.1 Pensive 1 0.5

Hebrew 2 1.1 People 1 0.5

Holding 2 1.1 Playing 1 0.5

Israel 2 1.1 Prare 1 0.5

Mother Teresa 2 1.1 Promot 1 0.5

Non-violence 2 1.1 Protester 1 0.5 Palestine 2 1.1 Quiet 1 0.5 Pray 2 1.1 Rally 1 0.5 Resolution 2 1.1 Religion 1 0.5 Rifle 2 1.1 Resort 1 0.5 Treaty 2 1.1 Sadat 1 0.5 Area 1 0.5 Scenic 1 0.5 Beach 1 0.5 September 11 1 0.5

(24)

Table B3(Continued)

Term Number of

searchers Percent Term

Number of searchers Percent Bliss 1 0.5 Serene 1 0.5 Boy 1 0.5 Shake 1 0.5 Boy 1 0.5 Shake 1 0.5 Buddhish 1 0.5 Shaking 1 0.5 Bush 1 0.5 Sign 1 0.5 Camp David Accord 1 0.5 Signing 1 0.5 Celebrate 1 0.5 Smiling 1 0.5

Central Park 1 0.5 Soldier 1 0.5

Children 1 0.5 Symbol 1 0.5 Close 1 0.5 Tank 1 0.5 Day 1 0.5 Teen 1 0.5 Delay 1 0.5 Tranquility 1 0.5 Demonstration 1 0.5 Understanding 1 0.5 Detainee 1 0.5 Unity 1 0.5 Devote 1 0.5 Up 1 0.5 Down 1 0.5 Vietnam 1 0.5 Effort 1 0.5 War 1 0.5 End 1 0.5 Weapons 1 0.5 Enemy 1 0.5 Women 1 0.5 Flower 1 0.5 Total 189

數據

Table 4 displays the average number of search terms  by each searcher between search tasks
Table 4 (Continued)
Figure 1   Distribution of Terms Used by Search Tasks
Table B2 Unique Terms Used by Searchers in the General Task
+3

參考文獻

相關文件

【There was trash/garbage everywhere】 【on/in the playground one/an hour ago.】【However, everything】 【is different now.】.. 【There was trash/garbage all over/around】

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

In order to assess and appreciate the results of all these studies, and to promote further research on the Suan Shu Shu, an international Symposium was held on August 23-25

This is especially important if the play incorporates the use of (a) flashbacks to an earlier time in the history of the characters (not the main focus of the play, but perhaps the

exegetes, retrospectively known as the Shan-chia and the Shan-wai. In this essay I argue that one especially useful way of coming to understand what was truly at stake in

In accordance with the analysis of relevant experimental results carried in this research, it proves that the writing mechanism and its functions may improve the learning

IPA’s hypothesis conditions had a conflict with Kano’s two-dimension quality theory; in this regard, the main purpose of this study is propose an analysis model that can

In this study, the impact of corporate social responsibility to corporate image, service quality, perceived value, customer satisfaction and customer loyalty was explored