• 沒有找到結果。

Chapter 3 The Leading Relationship between Conferences and Journals

3.2 Data Selection

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

3.2 Data Selection

The following paragraphs present the criteria for the data selection. The domain selected, and then described by keywords. The suitable database and search engine are then selected. A suitable descriptor for searching topics within a paper is then identified. Table 3-1, 3-2, 3-3 and Fig. 3-3 show the contents of the dataset.

3.2.1 Select the Domain

A domain was selected to set the focus of the research discussion. As mentioned previously, researchers in this particular area are fairly interested in new titles, since fresh issues symbolize the discovery of new fields of research. Innovative research titles represent problems that are yet to be discovered or solved, and generally induce animated debate in the discipline of computer science. Computer science was chosen as the research domain, because it changes rapidly, and we use lots of techniques of it in this study. Additionally, this domain has an accepted common practice that “the issues discussed in conference papers usually lead the issues in journal papers.”

However, this principle is not followed in all domains. Therefore, computer science was selected as the main domain.

Data mining and information retrieval were selected as sub-domains within the discipline of computer science. Many papers have been written in many topics in computer science. Therefore, sub-domains has to be chosen to help narrow the field of study. Data mining, document processing and document classifying are often associated with information retrieval. The two chosen sub-domains, namely data mining and information retrieval, are strongly relevant to document processing.

3.2.2 Use the Keywords to Represent the Domain

Ten keywords commonly associated with the two sub-domains were selected as the basis for searching papers. The keywords are listed as follows.

1. natural language processing 2. machine learning

3. data mining

7. text retrieval 8. web search 9. document mining

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

These keywords were utilized to search papers that are related to the selected sub-domains. Fig. 3-2 shows the relationships among the domain, sub-domains and keywords.

Fig. 3-2 The Relationship Between Domain, Sub-Domains and Keywords.

3.2.3 Choose Databases and Search Engines

The appropriate databases and search engines were then chosen to designate papers that are compatible with the criteria. Databases were selected in two ways.

 Conference paper databases and search engines

The ACM Digital Library and IEEE Computer Society were selected as databases for searching conference papers. These two databases are widely adopted by researchers in computer science. Their academic communities include seminars, journals of various domains and collections of resources from other databases.

The ACM (Association for Computing Machinery) Digital Library, called ACM hereafter, is a digital textual database with the following features.

(1) 21 different ACM Journals & Magazines dated from the year 1991 to present.

(2) ACM Conference Proceedings dated from the year 1991 to present.

(3) ACM Bibliographic Citation dated from the year 1985 to present.

(4) Special Interest Group Newsletter.

ACM is also the largest and oldest academic community in the field of education and computer science. ACM has offered a platform to exchange information, innovation and discovery since 1947. Users of ACM include members of the computer science community such as professors, technicians, and students in industries, academia and public services in over 100 countries.

Domain Sub-Domain Keywords

Select Select

Represent Represent

The IEEE Computer Society Digital Library, referred to as IEEE hereafter, is a database that is updated weekly, with a complete collection of 22 expert electronic journals dating from 1988. It also holds 1200 various conference records in full detail dating from 1995. Together, these resources contain over 100,000 academic articles related to computer information.

 Journal paper databases and search engines

Journal papers were searched from ScienceDirect Onsite and ProQuest Databases, as well as ACM and IEEE, which publish both conference papers and journal papers. ScienceDirect Onsite, called SDOS hereafter, holds electronic journals published by Elsevier in the Taiwan Region Edition. These journals cover various subjects including technology, medicine, economics and business administration.

ProQuest Databases, hereafter called ProQuest, provides indices, abstracts and full text to journals, doctorate dissertations and news articles in all academic areas, ranging from accounting, business, psychology, education, biomedicine, theology and technology. The specific categories of journal papers in ProQuest are listed below.

1. ABI/INFORM Archive Complete 2. ABI/INFORM Dateline

3. ABI/INFORM GLOBAL

4. ABI/INFORM Trade & Industry 5. Academic Research Library 6. Accounting & Tax

7. Banking Information Source 8. EIU Views Wire

9. ERIC

10. Hoover's Company Records

11. ProQuest Asian Business and Reference

12. ProQuest Biology Journals

13. ProQuest Dissertations and Theses 14. ProQuest Education Journals 15. ProQuest European Business 16. ProQuest Newspapers

17. ProQuest Psychology Journals 18. ProQuest Religion

19. ProQuest Science Journals

20. ProQuest Telecommunication U.S.

National Newspaper Abstract

The journals that were explored in this experiment are not merely limited to the publications of ACM and IEEE. The assumption should hold even if the study scope has expanded to other databases for journals. ScienceDirect Onsite and ProQuest Databases are two major databases for searching journal papers in computer science

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

3.2.4 Pick the Descriptor of the Paper

The titles determine the search matches. We assume that researchers use words that are relevant to the content of papers in their titles. In other words, the title of a paper is strongly representative of the content of the document. Consequently, if paper’s title is irrelevant to the content, or contains words that do not fully portray the essence of the paper; the research findings would be jeopardized.

We believe that titles are the benchmarks for illustrating the new trend of research papers. The formatting of research papers helps express the content of papers from a few important sectors of the paper. Several benchmarks were run to compare four descriptors, namely the number of words used, the integrity of the paper topic, the precision of the vocabulary used that would best depict the concepts and the presence of obscured concepts representing new trends. The four descriptors are compared as follows.

 Title: Researchers treat the title of a paper is the condensed description of the entire text. Researchers have to capture the essence of the paper concisely within a limited number of words. Words are sometimes defined by the researchers themselves, causing new trends to be concealed within titles.

 Abstract: researchers and committee members reviewing a paper use its abstract to roughly grasp the content within a short period of time. Abstracts can illustrate the content of research papers far more explicitly than titles, but usually contain ten times as many words. Consequently, each unit of word used has to deliver the integrity of the paper is thus diluted by the number of words being used.

 Keywords: Keywords in papers have the highest density in knowledge (the content that is delivered the number of words used), but cannot describe a new trend. Researchers must identify the keywords from finishing abstract of the paper. The research paper can then be searched according to its keywords. In other words, keywords are expressive words that are most widely adopted by researchers for a particular concept within the same domain. Therefore, researchers in a particular research domain take a long time and much effort to reach a consensus that enables concepts to be translated into keywords. This process is usually time-consuming. Therefore, keywords in research papers are understood to achieve a high density of knowledge, filtered and crystallized through various researchers to form a single accumulated consensus on a concept, enabling them to express the paper far more precisely than titles. However, keywords can rarely identify new trends. This is because keywords relate to

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

well-known concepts that have already reached consensus. A long period of time would be required for domain experts to reach a consensus about a concept.

Therefore, this study concludes that keywords do not describe the content of a paper as well as its title does.

 The Full Text: The full text includes every concept the researcher used concerning the subject, yet each word embodies very little substance. The full text obviously includes the integrity of the content, but it is a compilation of an immense load of information that far exceeds that of titles, abstracts and keywords. Therefore the degree to which each phrase can express the concepts of the paper is small. Using the full text to describe the content of a paper would waste resources and time.

Therefore we conclude that only phrases with a high density of knowledge should be employed to represent to the full content of the paper. The titles, abstracts, and keywords of research papers all have this quality. In particular, titles and keywords are composed of short strings of words that express the concepts discussed in the research paper. This study aims to discover whether conference papers represent the new trend for academic papers, and therefore must identify concepts that represent these new trends. Keywords cannot fulfill this particular requirement, as explained previously. Therefore, the titles of research papers are adopted as descriptors to express the full content of a paper.