• 沒有找到結果。

知識組織工具(二) 索引與摘要

N/A
N/A
Protected

Academic year: 2021

Share "知識組織工具(二) 索引與摘要"

Copied!
42
0
0

加載中.... (立即查看全文)

全文

(1)

知識組織工具(四)

索引典

(Thesaurus)

藍文欽

Lanw@ccms.ntu.edu.tw

05/15/2003

(2)

Prelude

索引典是字彙控制

(vocabulary control)

的工具之一。

索引典是索引用語及檢索詞彙的

authority

list 。

索引典是由已知的概念查得代表該概念的

適當用語。

[concept  term]

索引典透過標準化詞彙的選用,使同一概

念產生類聚

(grouping) 的作用。

(3)

Introduction

 Thesaurus 的原義為: Treasury, Collection  通常用於同義字字典。

A book of words and their synonyms” ( Merriam-Webst

er’s Dictionary )

“A book of words that are put in groups together according to connections between their meanings rather than in an alp habetical list.” (Longman Dictionary of Contemporary Eng

lish)

e.g., Roget’s Thesaurus of English Words and Phrases

 1957AD – H. P. Luhn 最早以 Thesaurus 代表「主題索引

用語辭典」(簡稱索引典),並以之為字彙控制的工 具。(一說 Brownson 於 1957 正式使用索引典一詞)

(4)

Definition

“The vocabulary of a controlled indexing

language, formally organized so that the

a priori

relationships between concepts

(for example as “broader” and

“narrower”) are made explicit. “

(Source: Guidelines for the establishment and development of

(5)

Definition (cont.)

“A thesaurus may be defined either in terms of its function o

r its structure. In terms of function, a thesaurus is a terminol

ogical control device used in translating from the natural lan

guage of documents, indexers or users into a more constrain

ed “system language” (documentation language, informatio

n language). In terms of structure, a thesaurus is a controlle

d and dynamic vocabulary of a controlled and dynamic voca

bulary of semantically and generically related terms which c

overs a specific domain of knowledge.”

(Source: Unesco. The UNISIST Guidelines for the Establishment and

(6)

Definition (cont.)

 “A compilation of words and phrases showing

synonymous, hierarchical, and other relationships and dependencies, the function of which is to provide a standardized vocabulary for information storage and retrieval. “

 “A controlled vocabulary arranges in a known order in which equivalence, homographic, hierarchical, and

associative relationships among terms are clearly

displayed and identified by standardized relationship indicator, which must be employed reciprocally.”

(Source: Guidelines for the Construction, Format, and Management of Monolingual Thesauri Document Number, ANSI/NISO Z39.19-1993)

(7)

Definition (cont.)

A thesaurus in the field of information storage and retrieva

l is a list of terms and/or of other signs (or symbols) indica

ting relationships among these elements, provided that the

following criteria hold:

(a)

the list contains a significant proportion of non-pre

ferred terms and/or of preferred terms not used as

descriptors;

(b)

terminological control is intended.

(Source: Dagobert Soergel. Indexing Languages and Thesarui:

Construction and Maintenance. Los Angeles: Melville,

(8)

Definition (cont.)

「就資訊儲存與檢索的範疇而言,索引典乃收集

足以表示知識概念的字或詞,並將之以特定的結

構加以排列,這些字彙控制了同義字,區別了同

形異義字,並顯現各相關詞彙間階層及語意互屬

上的各種關係,以作為索引者在分析處理資料及

讀者在檢索資料時能選用一致的、經過控制的詞

彙。換言之,及提供資訊儲存與檢索標準化的用

語。」

( Source: 蔡明月。線上資訊檢索—理論與應用。台北: 台 灣學生,民 80 。頁 177 。)

(9)

Brief History

 1959 – the Engineering Information Center of E. I. Du

pont de Nemours developed the first true thesaurus

 1960 – the Armed Services Technical Information Age

ncy (ASTIA) produced the Thesaurus of ASTIA Descrip tors

 1961 – the American Institute of Chemical Engineers

(AIChE) published the Chemical Engineering Thesauru s

 1964 – the Engineers Joint Council (EJC) published th

e Thesaurus of Engineering Terms

 1967 – Thesaurus of Engineering and Scientific Terms

(10)

Brief History (cont.)

 1967 – the Committee on Scientific and Technical Inf

ormation (COSATI) published the first set of guidelin es for thesaurus construction

 1970 – Unesco Guidelines for the Establishment and Development f Monolingual Scientific and Technical T hesaurus

 1974 – ANSI (American National Standards Institute)

Z39.19 [a US national standard for thesaurus constru ction]

 1974 – the first international standard for thesaurus c

(11)

Purposes and Use of Thesauri

 “Its purposes are to promoted consistency in the indexing of documents

, predominantly for postcoordinated information storage and retrieval s ystems, and to facilitate searching by linking entry terms with descripto rs” (ANSI Z39.19-1993, p. 38)

 Four principal purposes are served by a thesaurus:

a) Translation. To provide a means for translating the natural languag

e of authors, indexers, and users into a controlled vocabulary used f or indexing and retrieval.

b) Consistency. To promote consistency in the assignment of index ter

ms.

c) Indication of Relationships. To indicate semantic relationships amo

ng terms.

d) Retrieval To serve as a searching aid in retrieval of documents.

(12)

Vocabulary Control

The need to control the formation and use of ter

ms stems mainly from two basic features of natur

al language:

Synonyms

– different terms representing the sam

e concept

Polysemes

– a word with multiple meanings [in s

poken language, polysemes are

homonyms

; in wr

itten language, they are

homographs

– terms wit

h the same spelling representing different concep

ts. Only the latter is relevant to thesauri.]

(13)

Vocabulary Control (cont.)

Vocabulary control in a thesaurus is achieved th

rough three principal means:

a)

the delineation of the scope, or meaning, of des

criptors  Scope Note (SN)

b)

the linking of synonymous and nearly (quasi) s

ynonymous terms through equivalence relations

hip  USE and UF

c)

the disambiguation of homographs  Qualifier

(Source: ANSI Z39.19-1993, p. 1)

(14)

Structure and Relationships

 An intrinsic feature of a thesaurus is its ability to distingui sh and display the structural relationships between the ter ms it contains.

 There are two broad types of relationships within a thesau rus:

 Micro Level – the semantic links between individual te

rms

 Macro level – how the terms and their inter-relationshi

ps relate to the overall structure of the subject field

(Source: J. Aitchison, A. Gilchrist, & D. Bawden. Thesaurus Construct

ion and Use: A Practical Manual. 3rd ed. London: Aslib, 1997. P. 4

(15)

Basic Thesaural Relationships

Three basic inter-term relationships:

Equivalence: the relationship between preferred and non-pref

erred terms where two or more terms are regarded, for indexi ng purposes, as referring to the same concept

Hierarchical: this relationship shows levels of superordinatio

n and subordination. The superordinate term represents a clas s or whole, and the subordinate terms refer to its members or parts

Associative: the relationship is found between terms which ar

e closely related conceptually but not hierarchically and are n ot members of an equivalence set.

( 本頁及以下關於各種 relationship 的敘述,主要參考 : Aitchison , Gilchrist, & Bawden, 1997, Section F)

(16)

Equivalence Relationships

Descriptors – Preferred terms

Lead-in terms (Entry terms) – Non-preferred terms

Lead-in term

USE DESCRIPTOR

DESCRIPTOR

UF Lead-in term

Example:

耗子 USE

老鼠 (preferred term)

(17)

Equivalence Relationships

(cont.)

Synonyms – terms are virtually interchangeable or

regarded as the same

 Popular names and scientific names

 Common nouns or scientific names, and trade names  Standard names and slang

 Terms originating from different cultures sharing a co

mmon language (e.g., pavements/sidewalks)

 Competing names for emerging concepts (e.g., metadat

a 之各種中譯名 )

 Current or favored term versus outdated or deprecated t

(18)

Equivalence Relationships

(cont.)

Lexical variants – different word forms for the sa

me expressing, such as spelling, grammatical vari

ation, irregular plurals, direct versus indirect order

, and abbreviated formats

 Variant spellings

e.g., moslems/muslims; mouse/mice; colour/color

 Direct and indirect form

e.g, academic library vs. library, academic

 Abbreviations and full names

(19)

Equivalence Relationships

(cont.)

Quasi-synonyms, or near-synonyms – terms whose

meanings are generally regarded as different in

ordinary usage, but they are treated as though they are

synonyms for indexing purposes.

 Terms having a significant overlap

e.g., urban areas/cities

gifted people/geniuses

 Antonyms or terms representing different viewpoints of

the same property continuum e.g., dryness/wetness

(20)

Equivalence Relationships

(cont.)

Upward posting (generic posting) – This is a

technique which treats narrower terms as if they are

equivalent to, rather than a species of, their broader

terms. The effect is to reduce the size of the

vocabulary.

 SOCIAL CLASS UF Elite Middle class Working class ……  Elite

(21)

Hierarchical Relationships

The relationship is reciprocal and is set out in a

thesaurus using the following conventions:

BT (Broader Term) NT (Narrower Term) e.g., Public Libraries BT Libraries Libraries NT Academic Libraries Children’s Libraries Public Libraries ……

(22)

Hierarchical Relationships

(cont.)

Generic/species relationship – identifies the link bet

ween a class or category and its members or species

(e.g., Bird / Robin)

Whole/part relationship

 Systems and organs of the body (e.g., 消化系統 /

胃 )

 Geographical location (e.g., Taipei / Ta-an District)  Discipline or field of study (e.g., Chemistry / Organi

c chemistry)

 Hierarchical social structure (e.g., army and its rank

(23)

Hierarchical Relationships

(cont.)

Instance relationship – a general category of things and event

s, expressed by a common noun, and an individual instance of that category, the instance then forming a class of one which i s represented by a proper name (e.g., SEAS / Pacific Ocean)

Polyhierarchical relationships – the relationship between the

term and its two or more superordinate terms is said to be poly hierarchical.

NURSES HEALTH ADMINISTRATORS

NT Nurse Administrators NT Nurse Administrators NURSES ADMINISTRATORS

BT Health administrators Nurses

(24)

Associative Relationships

The relation is reciprocal, and is distinguished by

the abbreviation “RT” (Related Terms)

e.g.,

TEACING

RT Teaching aids

TEACHING AIDS

RT Teaching

(25)

Associative Relationships

(cont.)

Two types of associative relationship:

 Terms belonging to the same category (e.g., motorcycle /

bicycle)

 Terms belonging to different categories

 Whole-part (e.g., buildings / doors)

 A discipline and the objects studied (e.g., ethnography /

primitive societies)

 An operation or process and the agent or instrument (e.g.,

motor racing / racing cars)

 An occupation and the person in that occupation (e.g.,

accountancy / accountants)

 An action and the product of the action (e.g., publishing /

(26)

Associative Relationships

(cont.)

 Terms belonging to different categories (cont.)

 An action and its patient (e.g., data analysis / data)

 Concepts related to their properties (e.g., women / femininit

y)

 Concepts linked by causal dependence (e.g., injury / accident

s)

 A thing or action and its counter-agent (e.g., pests / pesticide

s)

 A raw material and its product (e.g., 皮革 / 皮衣 )

 An action and a property associated with it (e.g., precision m

easurement / accuracy)

 A concept and its opposite (e.g., single people / married peop

(27)

A Sample Thesaurus Entry –

from

Thesaurus of ERIC Descriptors

COMPETENCY BASED EDUCATION

Mar. 1980 CIJE: 884 RIE: 2881 GC: 330

SN Educational system that emphasizes the specification, learning, and demonstrat ing of those competencies (knowledge, skills, behaviors) that are of central im portance to a given task, activity, or career.

UF Consequence Based Education Criterion Referenced Education Output Oriented Education

NT Competency Based Teacher Education BT Education

RT Academic Standards Accountability

Back to Basics

(28)

Display

Alphabetical

Classified

Hierarchical

Permuted Keyword Index

Graphical

(29)

Planning and Design of Thesauri

– Two Check Points

Is a thesaurus necessary?

If it is, which of the followings would be a better o

r more suitable approach?

Buying

Compiling

Adapting

A very useful Web site to find information about t

hesaurus construction and use – prepared by Willp

ower Information

http://www.willpower.demon.co.u

k/thesbibl.htm

(30)

Planning and Design of Thesauri

Information System Considerations

Subject field

Type of literature/data

Quantity of literature/data

Language considerations

System users

Questions, searchers, profiles

Resources available

(31)

How to Build a Thesaurus – The

Top-Down Method

Convene a group of subject experts to decide on the sc

ope and broad categories of terms to be included.

Use existing dictionaries and thesauri to decide on the

terms and their relationships.

Review and organize the preliminary term set: decide

on preferred terms and make Use references from the

variants and synonyms; and build hierarchical and ass

ociative relationships among the preferred terms.

Produce a draft thesaurus, test index and revise.

(32)

How to Build a Thesaurus – The

Bottom-up Method

 Develop a group of subject experts to serve as advisors; work with th

em to determine the scope if it is not already set.

 If there is a set of representative already-indexed documents, use the

index terms from this set as your preliminary term list.

 If not, index a set of representative documents using free language (i.

e., no vocabulary control), and take this term set as your preliminary list.

 Build your thesaurus by reviewing and organizing these terms, using

a variety of resources as aids, as in the top-down method.

 Refer to your subject experts on terms whose meaning or usage is un

clear, and for advice on which variant or synonym to prefer (or on w hether two terms really are synonyms in the field).

 Produce a draft thesaurus, test index, and revise.

(33)

Procedures Involved in Thesaurus

Construction

 Collecting terms

 Modifying and inventing terms

 Choosing preferred terms and standardizing the form of w ords

 Establishing semantic relationships  Thesaurus arrangement and display  Testing and revising

 Thesaurus maintenance

The American Society of Indexers provides a list of thesau rus management software -- http://www.asindexing.org/s ite/thessoft.shtml

(34)

Standard

The UNISIST Guidelines for the Establishment and Deve

lopment of Monolingual Thesauri. 2nd rev. ed. (Paris: U

NESCO, 1981)

Guidelines for the establishment and development of mo

nolingual thesauri, ISO 2788:1986

(http://www.nlc-bnc.ca/iso/tc46sc9/standard/2788e.htm)

Guidelines for the establishment and development of mul

tilingual thesauri, ISO 5964: 1985

(35)

Standard (cont.)

Guidelines for the Construction, Format, and Management

of Monolingual Thesauri Document Number, ANSI/NISO

Z39.19-1993 (R1998)

(http://www.niso.org/standards/resources/Z39-19.html)

Guidelines for Forming Language Equivalents: A Model B

ased on the Art & Architecture Thesaurus, prepared by Int

ernational Terminology Working Group, 1999 (

http://www.chin.gc.ca/Resources/Publications/Guidelines/English/index.html)

(36)

Examples

農業科技索引典

水資源索引典

立法資訊系統主題索引典

http://lis.ly.gov.tw/lghtml/c rshelp/search.htm 

食品科技索引典

科技索引典

中文教育類詞庫

(http://140.122.127.251/ttscgi/ttsweb1?@0: 0:1:ericthe::http|//140.122.127.251/edd/edd.htm@@0.57560553)

(37)

Examples (cont.)

Unesco Thesaurus: A Structured List of Descriptors

for Indexing and Retrieving Literature in the Fields

of Education, Science, Social and Human Science,

Culture, Communication and Information.

The Unesco: IBE Education Thesaurus

Thesaurus of ERIC Descriptors

Thesaurus of Sociological Research Terminology

(38)

Examples (cont.)

Arts and Architecture Thesaurus

(http://www.getty.edu

/research/tools/vocabulary/aat/index.html)

Thesaurus of Graphic Materials I: Subject Terms

(TGM I)

(http://www.loc.gov/rr/print/tgm1/)

Thesaurus for Graphic Materials II: Genre and Ph

ysical Characteristic Terms (TGM II) (

http://www.lo

c.gov/rr/print/tgm2/)

British Museum Materials Thesaurus

(http://www.mda.org.uk/bmmat/matintro.htm)

Vocabulary of Basic Terms for Cataloguing

Costum

(39)

Examples (cont.)

British Museum Object Names Thesaurus

Union List of Artist Names (ULAN)

http://www.getty.ed

u/research/tools/vocabulary/ulan/index.html

Thesaurus of Geographic Names (TGN)

http://www.

getty.edu/research/tools/vocabulary/tgn/

Thesaurus of Monument Types

mda Archaeological Objects Thesaurus

Building Materials Thesaurus

(40)

Examples (cont.)

Macrothesaurus for Information Processing in the F

ield of Economic and Social Development

Social Science and Business Microthesaurus: A Hie

rarchical List of Indexing Terms Used by NTIS

Political Science Thesaurus

SPINES Thesaurus: A Controlled and Structured Vo

cabulary of Science and Technology for Policy Maki

ng

(41)

Examples (cont.)

Thesaurus of Engineering and Scientific Terms

(TEST)

INSPEC Thesaurus

NASA Thesaurus

Thesaurus of Computing Terms

Thesaurus of Scientific, Technical and Engineering

Terms

International Road Research Documentation

(IRRD) Thesaurus

(42)

Examples (cont.)

ASIS Thesaurus of Information Science and Librar

ianship

Thesaurus of Information Science Terminology

Zoological Record Online Thesaurus

Food: Multilingual Thesaurus

Thesaurus of Agricultural Terms

Medical Subject Headings (MeSH)

The ISDD Thesaurus. Keywords Relating to

參考文獻

相關文件

The WG also conducted three open seminars, two student forums and a school questionnaire survey to collect views from the public, school principals, teachers,

(c) If the minimum energy required to ionize a hydrogen atom in the ground state is E, express the minimum momentum p of a photon for ionizing such a hydrogen atom in terms of E

For the proposed algorithm, we establish a global convergence estimate in terms of the objective value, and moreover present a dual application to the standard SCLP, which leads to

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

z gases made of light molecules diffuse through pores in membranes faster than heavy molecules. Differences

In terms of “Business Model Canvas,” the Value Proposition of Humanistic Buddhism is “to establish the Buddha’s vocation in the world.” Given that a specific target audience

A convenient way to implement a Boolean function with NAND gates is to obtain the simplified Boolean function in terms of Boolean operators and then convert the function to

 understand and use the English terms for describing the animal types, external features, body parts, feeding habits, movement and habitats of a panda, a crocodile and a crab