• 沒有找到結果。

本研究以阿茲海默症為主題,以一篇文章為主,回答測詴文章所提出的問題,

首先提出問題的假設(Hypothesis),使用 TFIDF 給予權重值,並決定每個句子的重 要性,最後再給予每個假設一個分數並從中挑選出正確答案。另外結合字詞擴充 的方法,計算每個背景知識庫中的字以及整個問題所代表的向量,並利用相似度 決定字與問題的關聯性,進而擴充一定數量的字到原來的問題中,補充問題的語 義。實驗所得最好的結果為使用答案驗證方法並配合詞彙的使用及重要相關語句 的挑選,另外再以 Global Analysis 的方法來對問題做字詞的擴充,accuracy 為 0.5,

這說明了本實驗所使用到的每個方法都扮演著很重要的因素。

未來,我認為還有以下的方法能對本論文嘗詴並得到改進:

(1) 考慮語法結構,誠如第四章-第三節-(二)中對一些測詴題目個別的分析,顯示 語法結構的分析也是相當重要且在我們的研究中較為缺乏的,加入對句子語 法的判斷應該是本研究未來發展的大方向。

(2) 加強重要相關語句的判定方法,得到真正相關的語句(例:依據每個問題得到 相關語句的比例來決定挑選前幾高作為重要相關參考因素)。

(3) 加入指代語(anaphora)的判斷,如果能正確判斷測詴文章句子中的 anaphora,

就更能幫助問題得到正確的相關語句。

(4) 以更大的背景知識庫來擴充字詞,在 Global Analysis 的方法中,背景知識庫 越大則每個字的向量特徵就越多,理論上字義可以分得越明確。

況(例:改用測詴文章來取代 Query words-Query 向量的計算……)。

(6) 以更準確關聯的方式加入 OMIM 的專有詞,找到的相關語句才能更正確(例:

改變 OMIM Concept 建立彼此之間關聯的方式、使用其他的基因-基因、蛋白 質-蛋白質、基因-蛋白質關聯資料庫來擴充……)。

(7) 考慮問題的類型,以期能進一步判定答案與相關語句要找尋哪一類相對應的 問題,並剔除與此問題的類型不相關的部分。

(8) 嘗詴不同的 IR Model(例:機率模型、BM25……)。

參考文獻

Ask Jeeves. Available from http://www.ask.com.

Attardi, Giuseppe, Atzori, Luca and Simi, Maria (2012). Index Expansion for Machine Reading and Question Answering. QA4MRE Pilot Task – Machine Reading of Biomedical Texts about Alzheimer‟s Disease at CLEF 2012.

Bhaskar, Pinaki, Pakray, Partha, Banerjee, Somnath, Banerjee, Samadrita, Bandyopadhyay, Sivaji and Gelbukh, Alexander (2012). Question Answering System for QA4MRE@CLEF 2012. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2012.

Bhattacharya, Sanmitra and Toldo, Luca (2012). Question Answering for Alzheimer Disease Using Information Retrieval. QA4MRE Pilot Task – Machine Reading of Biomedical Texts about Alzheimer‟s Disease at CLEF 2012.

Cao, Ling, Qiu, Xipeng and Huang, Xuanjing (2011). Deep Question Answering for Single Document with Lexical Chains. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2011.

CLEF2012. Available from http://clef2012.org/

Fellbaum, Christiane (1998). WordNet: An Electronic Lexical Database. Cambrige,

MA: MIT Press.

GDep Parser. Available from http://people.ict.usc.edu/~sagae/parser/gdep/index.html

LA-PDFText. Available from http://code.google.com/p/lapdftext/

Miller, G.A. (1995). WordNet: A Lexical Database for English. Communications of the ACM, Vol. 38, No. 11:39-41.

Morante, Roser, Krallinger, Martin, Valencia, Alfonso and Daelemans, Walter.

Machine Reading of Biomedical Texts about Alzheimer‟s Disease. QA4MRE Pilot Task – Machine Reading of Biomedical Texts about Alzheimer‟s Disease at CLEF 2012.

Online Mendelian Inheritance in Man. Available from http://omim.org/

Pakray, Partha, Bhaskar, Pinaki, Banerjee, Somnath, Pal, BidhanChandra, Bandyopadhyay, Sivaji and Gelbukh, Alexander (2011). A Hybrid Question Answering System based on Information Retrieval and Answer Validation. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2011.

Phan, Suan-Hieu (2006). CRF Chunker: CRF English Phrase Chunker. PACLIC.

Porter, M.F. (1980). An algorithm for suffix stripping. In Program, 14(3), pp.130-137.

Porter‟s Stemmer. Available from http://tartarus.org/martin/PorterStemmer/

QA4MRE. Available from http://celct.fbk.eu/ResPubliQA/

Qiu, Yonggang and Frei, H.P. (1993). Concept Based Query Expansion. In Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp.160-169.

Ramakrishnan, C., Patnia, A., Hovy, E. and Burns G. (2012). Layout-Aware Text Extraction from Full-text PDF of Scientific Articles.Source Code for Biology and Medicine 7(1): 7.

Sagae, K. and Tsujii, J. (2007). Dependency parsing and domain adaptation with LR models and parser ensembles. Proceedings of the CoNLL 2007 Shared Task. Joint Conferences on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). Prague, Czech Republic.

Stop Word List. Available from http://www.lextek.com/manuals/onix/stopwords1.html

Wren, Jonathan D. Question answering systems in biology and medicine – thetime is now. Bioinformatics 2011, 27 (14):2025 – 2026.

Zhou, Guangyou, Cai, Li, Zhao, Jun and Liu, Kang. Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives (2011). Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics,

pp.653-662.

附錄

閱讀測詴集 1(R1, 測詴文章編號 22506010)的測詴問題:

1. Which technique was used to determine the cellular locations of the CLU1 and CLU2 gene products?

(a) intracellular and secreted (b) ER

(c) intracellular localization (d) Golgi apparatus

(e) immunofluorescence experiments

2. What compartments inside the cell contain clusterin proteins?

(a) ER and the Golgi apparatus (b) epitope tag

(c) anibody

(d) secretory pathway (e) secreted

3. Which of the two CLU isoforms is the main one expressed in the choroid plexus?

(a) fetal tissue (b) CLU1 (c) clusterin (d) CLU2

(e) cerebrospinal fluid

4. Which clusterin single nucleotide polymorphism has been linked to a reduction in the risk for developing Alzheimer's disease?

(a) CLU2 (b) SNPs

(c) rs11136000T (d) clusterin (e) CLU

5. Which CLU protein isoforms in the brain have been characterized?

(a) CLU1 and CLU2 (b) clusterin

(c) rs11136000 (d) secreted proteins (e) AD

6. Which CLU isoform has a consistently higher gene expression?

(a) CLU2

(b) ribosomal protein L13A (c) CLU1

(d) allele

(e) PNGase

7. Which hormone can control the expression of CLU isoforms?

(a) real-time PCR (b) cDNA

(c) AD

(d) rs11136000 (e) androgen

8. What effect do androgens have on CLU2 gene expression?

(a) association (b) repression (c) inhibition (d) activation (e) expression

9. Which particular histone deacetylase inhibitor has been shown to enhance the expression of CLU?

(a) astrocytes (b) CLU2

(c) therapeutic agents (d) valproate

(e) amyloid

10. How many residues does the CLU2 protein sequence have?

(a) 449

1. Which entity does allosterically regulate insulin degrading enzyme activity?

(a) AD (b) Ab

(c) somatostatin (d) microglia cells (e) IDE

2. Which peptide hormone is the positive transcriptional regulator of IDE?

(a) Ab

(b) somatostatin (c) BV-2

(d) AD (e) mRNA

3. In which cell line was the gene expression regulation of IDE characterized?

(a) mouse

(b) astrocytes (c) microglia (d) BV-2

(e) beta-amyloid

4. Which method was applied to measure the quantity of IDE mRNA in the gene regulation experiments described in the paper?

(a) ELISA (b) siRNA

(c) RealTime PCR (d) Western Blotting (e) IgG antibody

5. What regulates the production of neprilysin?

(a) somatostatin (b) NEP

(c) enzyme

(d) matrix metalloproteinase (e) microglia

6. What kind of glial cell is able to phagocyte b-amyloid?

(a) neprilysin (b) siRNA (c) brain

(d) culture medium (e) microglia

7. What is the major protease produced by microglia responsible for degrading A?

(a) Ab (b) cells (c) IDE

(d) extracellular (e) beta-amyloid

8. What substance exhibited a similar effect on IDE secretion as achieved by somatostatin?

(a) octreotide (b) analogue

(c) endogenous modulator (d) substrate

(e) beta-amyloid

9. What are the sst receptors that are expressed on rat astrocytes?

(a) SSTR-2, SSTR-3 and SSTR-4 (b) SSTR-1, SSTR-2 and SSTR-4 (c) somatostatin

(d) microglia (e) rat

10. What method was used to inhibit the expression of IDE?

(a) Western blot (b) microglia

(c) positive modulation (d) siRNA

(e) culture medium

閱讀測詴集 3(R3, 測詴文章編號 22523685)的測詴問題:

1. What cell type in AD brains shows mitochondrial defects?

(a) astrocytes (b) epithelial (c) fibroblasts (d) membrane (e) cytosol

2. In which anatomical structure in the brain does amyloid-beta aggregate?

(a) receptor (b) receptor (c) choroid plexus (d) fibroblasts (e) mitochondrial

3. How many persons worldwide are estimated to have a medical condition related to neurodegeneration?

(a) LPR2

(b) mitochondria (c) 60%

(d) more than 10 million (e) 70 years old

4. Which protein is able to block nitric oxide creation?

(a) amyloid (b) gelsolin (c) NO

(d) mitochondrial proteins (e) cytotoxicity

5. Which is the best-characterized factor that increases chances of getting AD?

(a) damage (b) Swedish (c) pathogenesis (d) age

(e) stress

6. With which particular protein does amyloid-beta interact?

(a) extracellular domain (b) disease-related proteins (c) mitochondria

(d) receptor

(e) gelsolin

7. The aggregation of which peptide has been widely observed in brains of Alzheimer patients?

(a) AD

(b) amyloid-beta (c) extracellular (d) mtDMA (e) secretase

8. What specific animal model can be used to study the role of amyloid-beta in apoptosis of choroid Plexus cells?

(a) patients with AD (b) animal AD models (c) brain

(d) APP/Ps mice (e) mouse

9. Where does amyloid-beta assemble into oligomeric structures?

(a) fractions

(b) monomeric amyloid (c) membranes

(d) synaptic terminals (e) lipid

10. When does oxidative stress happen in AD patients?

(a) transgenic mouse

(b) predominantly in synaptic mitochondria (c) choroid plexus

(d) before amyloid-beta accumulation (e) postmortem

閱讀測詴集 4(R4, 測詴文章編號 22529981)的測詴問題:

1. What effect can be observed when when γ-secretase is blocked?

(a) APP-CTF accumulation (b) PSEN1 mutations (c) cell-based data (d) APH1 variants

(e) transition-state analogue

2. When APH1 genes are overexpressed in MEF KO what happens with the Aβ?

(a) They are purified (b) They are anterior (c) They are shorter (d) They are longer (e) They are supported

3. In which gene are mutations associated to many cases of early-onset familial

forms of Alzheimer's disease?

4. What experimental technique was used specifically to purify the γ-secretase complex?

(a) plasmids

(b) affinity chromatography (c) lysate

(d) cell lines

(e) knockout experiments

5. What peptide is able to control the expression of the ApoE gene?

(a) AD

(b) APP-CTFs (c) c-secretase (d) cholesterol (e) AICD

6. Which amino acid is critical for the activity of the PS1 protein?

(a) aspartate (b) C-terminal (c) 42-residue (d) 99

(e) DDAA

7. What experimental technique was used to determine the structure of γ-secretase?

(a) densitometry (b) EM

(c) ELISA

(d) immunostaining (e) purification

8. What candidate drug that blocks the γ-secretase is now tested in clinical trials?

(a) LRP1 (b) biochemical (c) PSEN1 (d) AD

(e) Semagacestat

9. What mutation of the PS1 protein causes γ-secretase activity almost to disappear?

(a) P436Q (b) L166P (c) wild-type (d) AICD (e) C100-His

10. How many mutations relevant for familial forms of Alzheimer's disease have been detected for the PSEN1 gene?

(a) 13 (b) 42 (c) P436Q (d) 185 (e) PSEN2

相關文件