• 沒有找到結果。

本論文以阿茲海默症為主題,當讀取一篇文章後,問答系統能回答測試文章 所提出的問題。本研究首先提出摘要方法,對相關文章做摘要後將文章中重要的 資訊擷取出來,再將它們做為問答系統中新的資料集。另外也提出資訊距離方法,

藉由計算問題與答案之間資訊距離的權重,並配合 TFIDF 權重計算方法及詞語擴 充方法選擇出正確的答案。因此,本研究探討將摘要系統與資訊距離方法應用於 生醫問答系統中,由實驗結果發現,因為背景知識庫中的文獻與對應測試集的問 題主題關聯性較低,代表文章中之資訊大多為不重要的資訊,所以若對背景知識 庫做摘要,可以有效的將重要之資訊擷取出來。以及對資訊距離方法而言,採取 增加 Question Focus 數量的方式能夠有效的使準確率提升。

未來,我認為還有以下的方法能進一步做實驗嘗試,以期得到改進:

(1) 加入指代語(anaphora)的判斷,如果能正確判斷測試文章句子中的 anaphora,

就更能幫助問題得到正確的相關語句。

(2) 考慮問題的類型,期望能進一步判定答案與相關語句要找尋哪一類相對應的 問題,並剔除與此問題的類型不相關的部分。

(3) 在對背景知識庫的摘要方法中,針對摘要後的背景知識庫作出與其對應的擴 充詞語。

(4) 結合本論文的兩個研究方法,期望可以藉由摘要系統所擷取出的重要句子,

58

使資訊距離的權重計算能夠更準確。

(5) 以更大的背景知識庫或不同的詞語擴充方法來擴充字詞。

(6) 嘗試使用不同的方法處理英文字原型化,如 GDep parser。

(7) 嘗試使用不同的 Summarization System 對文章做摘要,如 Mead12

(8) 嘗試不同的 Information Retrieval Model,如機率模型、BM25 (Robertson and Zaragaza, 2009)。

12http://www.summarization.com/mead/

59

參考文獻

Ask Jeeves. Available from http://www.ask.com

Bhaskar, Pinaki, Pakray, Partha, Banerjee, Somnath, Banerjee, Samadrita, Bandyopadhyay, Sivaji and Gelbukh, Alexander (2012). Question Answering System for QA4MRE@CLEF 2012. CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE). CLEF 2012 Labs and Workshop- Working Notes Papers.

Bhattacharya, Sanmitra and Toldo, Luca (2012). Question Answering for Alzheimer Disease Using Information Retrieval. CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers.

Cao, Ling, Qiu, Xipeng and Huang, Xuanjing (2011). Deep Question Answering for Single Document with Lexical Chains. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2011.

CLEF2013. Available from http://www.clef2013.org/

Erkan, Günes, and Radev, Dragomir R.(2011).LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 22,pp. 457-479.

60

GDep parser. Available from http://people.ict.usc.edu/~sagae/parser/gdep/index.html

Google search engine. http://www.google.com

Hou, Wen-Juan and Tsai, Bing-Han (2014). An Answer Validation Concept Based Approach for Question Answering in Biomedical Domain. Modern Advances in Applied Intelligent Systems: 27th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2014, Kaohsiung, Taiwan, June 3-6, 2014, Proceedings, Part I. Moonis Ali et al.

(Eds.), IEA/AIE 2014, Part I, LNAI 8481, pp. 148-159, Springer International Publishing Switzerland, July, 2014.

LA-PDFText. Available from http://code.google.com/p/lapdftext/

Li, Fangtao, Zhang, Xian and Zhu, Xiaoyan (2008). Answer Validation by Information Distance Calculation. Coling 2008:Proceedings of the 2ndworkshop on Information Retrieval for Question Answering, pp. 42-49.

Li, Ming and Vitanyi, Paul (2008). An Introduction to Kolmogorov Complexity and Its Applications. Third Edition, Springer Verlag.

Manning, Christopher D., Raghavan, Prabhakar and Schütze, Hinrich (2008).

Introduction to Information Retrieval. Cambridge University Press.

MEAD. Available fromhttp://www.summarization.com/mead/

61

Morante, Roser, Krallinger, Martin, Valencia, Alfonso and Daelemans, Walter.

Machine Reading of Biomedical Texts about Alzheimer’s Disease. QA4MRE Pilot Task – Machine Reading of Biomedical Texts about Alzheimer’s Disease at CLEF 2012.

Pakray, Partha, Bhaskar, Pinaki, Banerjee, Somnath, Pal, BidhanChandra, Bandyopadhyay, Sivaji and Gelbukh, Alexander (2011). A Hybrid Question Answering System based on Information Retrieval and Answer Validation. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2011.

Porter, M.F. (1980). An Algorithm for Suffix Stripping. Program, 14(3), pp.130-137.

Porter Stemmer. Available from http://tartarus.org/martin/PorterStemmer/

QA4MRE. Available from http://nlp.uned.es/clef-qa/

Qiu, Yonggang and Frei, H.P. (1993). Concept Based Query Expansion. Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp.160-169.

Ramakrishnan, C., Patnia, A., Hovy, E. and Burns G. (2012). Layout-Aware Text Extraction from Full-text PDF of Scientific Articles. Source Code for Biology and Medicine, 7(1),pp. 7.

Robertson, Stephen and Zaragoza, Hugo (2009). The Probabilistic Relevance

62

Framework:BM25 and Beyond. Foundations and Trends in Information Retrieval, 3 (4), pp. 333–389.

Stopword List. Available from http://www.lextek.com/manuals/onix/stopwords1.html

Strohman, T., Metzler, D., Turtle, H. and Croft, W.B. (2005). Indri: a Language-Model Based Search Engine for Complex Queries. Proceedings of the International Conference on Intelligent Analysis.

Wren, Jonathan D. (2011).Question Answering Systems in Biology and Medicine—the Time is Now, Bioinformatics, 27 (14), pp.2025-2026.

Yahoo search engine. http://tw.yahoo.com

Zhou, Guangyou, Cai, Li, Zhao, Jun and Liu, Kang (2011). Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives.

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp.653-662.

蔡秉翰 (2013),以答案驗證方法為基礎之生醫相關問答系統,國立台灣師範大學 資訊工程所碩士論文,2013 年。

63

附錄 A

閱讀測試集 1(R1, 測試文章編號 22506010)的測試問題:

1. Which technique was used to determine the cellular locations of the CLU1 and CLU2 gene products?

(a) intracellular and secreted (b) ER

(c) intracellular localization (d) Golgi apparatus

(e) immunofluorescence experiments

2. What compartments inside the cell contain clusterin proteins?

(a) ER and the Golgi apparatus (b) epitope tag

(c) anibody

(d) secretory pathway (e) secreted

3. Which of the two CLU isoforms is the main one expressed in the choroid plexus?

(a) fetal tissue (b) CLU1 (c) clusterin (d) CLU2

(e) cerebrospinal fluid

4. Which clusterin single nucleotide polymorphism has been linked to a reduction in the risk for developing Alzheimer's disease?

(a) CLU2 (b) SNPs

(c) rs11136000T (d) clusterin (e) CLU

5. Which CLU protein isoforms in the brain have been characterized?

(a) CLU1 and CLU2 (b) clusterin

(c) rs11136000 (d) secreted proteins (e) AD

6. Which CLU isoform has a consistently higher gene expression?

(a) CLU2

(b) ribosomal protein L13A

64

(c) CLU1 (d) allele (e) PNGase

7. Which hormone can control the expression of CLU isoforms?

(a) real-time PCR (b) cDNA

(c) AD

(d) rs11136000 (e) androgen

8. What effect do androgens have on CLU2 gene expression?

(a) association (b) repression (c) inhibition (d) activation (e) expression

9. Which particular histone deacetylase inhibitor has been shown to enhance the expression of CLU?

(a) astrocytes (b) CLU2

(c) therapeutic agents (d) valproate

(e) amyloid

10. How many residues does the CLU2 protein sequence have?

(a) 449

1. Which entity does allosterically regulate insulin degrading enzyme activity?

(a) AD (b) Ab

(c) somatostatin (d) microglia cells (e) IDE

2. Which peptide hormone is the positive transcriptional regulator of IDE?

(a) Ab

(b) somatostatin (c) BV-2

(d) AD (e) mRNA

65

3. In which cell line was the gene expression regulation of IDE characterized?

(a) mouse (b) astrocytes (c) microglia (d) BV-2

(e) beta-amyloid

4. Which method was applied to measure the quantity of IDE mRNA in the gene regulation experiments described in the paper?

(a) ELISA (b) siRNA

(c) RealTime PCR (d) Western Blotting (e) IgG antibody

5. What regulates the production of neprilysin?

(a) somatostatin (b) NEP

(c) enzyme

(d) matrix metalloproteinase (e) microglia

6. What kind of glial cell is able to phagocyte b-amyloid?

(a) neprilysin (b) siRNA (c) brain

(d) culture medium (e) microglia

7. What is the major protease produced by microglia responsible for degrading A?

(a) Ab (b) cells (c) IDE

(d) extracellular (e) beta-amyloid

8. What substance exhibited a similar effect on IDE secretion as achieved by somatostatin?

(a) octreotide (b) analogue

(c) endogenous modulator (d) substrate

(e) beta-amyloid

9. What are the sst receptors that are expressed on rat astrocytes?

(a) SSTR-2, SSTR-3 and SSTR-4 (b) SSTR-1, SSTR-2 and SSTR-4 (c) somatostatin

(d) microglia (e) rat

66

10. What method was used to inhibit the expression of IDE?

(a) Western blot (b) microglia

(c) positive modulation (d) siRNA

(e) culture medium

閱讀測試集 3(R3, 測試文章編號 22523685)的測試問題:

1. What cell type in AD brains shows mitochondrial defects?

(a) astrocytes (b) epithelial (c) fibroblasts (d) membrane (e) cytosol

2. In which anatomical structure in the brain does amyloid-beta aggregate?

(a) receptor (b) receptor (c) choroid plexus (d) fibroblasts (e) mitochondrial

3. How many persons worldwide are estimated to have a medical condition related to neurodegeneration?

4. Which protein is able to block nitric oxide creation?

(a) amyloid (b) gelsolin (c) NO

(d) mitochondrial proteins (e) cytotoxicity

5. Which is the best-characterized factor that increases chances of getting AD?

(a) damage (b) Swedish (c) pathogenesis (d) age

(e) stress

6. With which particular protein does amyloid-beta interact?

(a) extracellular domain (b) disease-related proteins

67

(c) mitochondria (d) receptor (e) gelsolin

7. The aggregation of which peptide has been widely observed in brains of Alzheimer patients?

8. What specific animal model can be used to study the role of amyloid-beta in apoptosis of choroid Plexus cells?

(a) patients with AD (b) animal AD models (c) brain

(d) APP/Ps mice (e) mouse

9. Where does amyloid-beta assemble into oligomeric structures?

(a) fractions

(b) monomeric amyloid (c) membranes

(d) synaptic terminals (e) lipid

10. When does oxidative stress happen in AD patients?

(a) transgenic mouse

(b) predominantly in synaptic mitochondria (c) choroid plexus

(d) before amyloid-beta accumulation (e) postmortem

閱讀測試集 4(R4, 測試文章編號 22529981)的測試問題:

1. What effect can be observed when when γ-secretase is blocked?

(a) APP-CTF accumulation (b) PSEN1 mutations (c) cell-based data (d) APH1 variants

(e) transition-state analogue

2. When APH1 genes are overexpressed in MEF KO what happens with the Aβ?

(a) They are purified (b) They are anterior (c) They are shorter (d) They are longer (e) They are supported

68

3. In which gene are mutations associated to many cases of early-onset familial forms of Alzheimer's disease?

(a) FAD (b) I-CLiPs (c) NCT (d) PSEN1 (e) D257A

4. What experimental technique was used specifically to purify the γ-secretase complex?

(a) plasmids

(b) affinity chromatography (c) lysate

(d) cell lines

(e) knockout experiments

5. What peptide is able to control the expression of the ApoE gene?

(a) AD

(b) APP-CTFs (c) c-secretase (d) cholesterol (e) AICD

6. Which amino acid is critical for the activity of the PS1 protein?

(a) aspartate (b) C-terminal (c) 42-residue (d) 99

(e) DDAA

7. What experimental technique was used to determine the structure of γ-secretase?

(a) densitometry (b) EM

(c) ELISA

(d) immunostaining (e) purification

8. What candidate drug that blocks the γ-secretase is now tested in clinical trials?

(a) LRP1 (b) biochemical (c) PSEN1 (d) AD

(e) Semagacestat

9. What mutation of the PS1 protein causes γ-secretase activity almost to disappear?

(a) P436Q (b) L166P (c) wild-type (d) AICD

69

(e) C100-His

10. How many mutations relevant for familial forms of Alzheimer's disease have been detected for the PSEN1 gene?

(a) 13 (b) 42 (c) P436Q (d) 185 (e) PSEN2

70

附錄 B

閱讀測試集 1(R1, 測試文章編號 22506010)的測試文章利用 Porter’s Stemmer 經過 Stem 處理後之結果。

plo on: genet of clusterin isoform express and alzheim's diseas risk

open access

research articl

genet of clusterin isoform express and alzheim's diseas risk

i-fang ling1, jiraganya bhongsatiern1, jame f. simpson1, david w. fardo2, steven estu1*

1 depart of physiolog and sander-brown center on ag, univers of kentucki, lexington, kentucki, unit state of america, 2 depart of biostatist, univers of kentucki, lexington, kentucki, unit state of america

abstract top

the minor allel of rs11136000 within clu is strongli associ with reduc alzheim's diseas (ad) risk. the mechan underli thi associ is unclear. here, we report that clu1 and clu2 ar the two primari clu isoform in human brain; clu1 and clu2 share exon 2–9 but differ in exon 1 and proxim promot. the express of both clu1 and clu2 wa increas in individu

71

with signific ad neuropatholog. howev, onli clu1 wa associ with the rs11136000

genotyp, with the minor “protect” rs11136000t allel be associ with increas clu1 express.

sinc clu1 and clu2 ar predict to encod intracellular and secret protein, respect, we compar their express; for both clu1 and clu2 transfect cell, clusterin is present in the secretori pathwai, accumul in the extracellular media, and is similar in size to clusterin in human brain. overal, we interpret these result as indic that the ad-protect minor rs11136000t allel is associ with increas clu1 express. sinc clu1 and clu2 appear to produc similar protein and ar increas in ad, the ad-protect afford by the rs11136000t allel mai reflect increas solubl clusterin throughout life.

citat: ling i-f, bhongsatiern j, simpson jf, fardo dw, estu s (2012) genet of clusterin isoform express and alzheim's diseas risk. plo on 7(4): e33923.

doi:10.1371/journal.pone.0033923

editor: tsuneya ikezu, boston univers school of medicin, unit state of america

receiv: juli 20, 2011; accept: februari 21, 2012; publish: april 10, 2012

copyright: © 2012 ling et al. thi is an open-access articl distribut under the term of the creativ common attribut licens, which permit unrestrict us, distribut, and reproduct in ani medium, provid the origin author and sourc ar credit.

fund: thi work wa fund by the nih (p01ag030128 and p30ag028383). the funder had no role in studi design, data collect and analysi, decis to publish, or prepar of the

manuscript.

72

compet interest: the author have declar that no compet interest exist.

* e-mail: steve.estu@uki.edu

introduct top

clusterin (clu, apoj) ha been implic in diseas rang from cancer to alzheim's diseas (ad) (review in [1], [2], [3], [4]). although the primari role of clusterin in ad is unclear, clu is implic in ad by sever line of evid, includ (i) clu mrna and clusterin protein is increas in ad [5], [6], (ii) clusterin is a compon of plaqu [4], [5], [7], (iii) clusterin modul ad-relat pathwai such as inflamm and apoptosi [1], [8], [9] and (iv) clusterin act as an amyloid-beta (aß) chaperon to alter aß aggreg and/or clearanc ([10], [11], review in [4], [12], [13], [14]). the physiolog relev of clu to ad wa confirm recent when clu singl nucleotid polymorph (snp)s were associ with ad risk [15], [16], [17], [18], [19]. overal, clu genet variat is essenti unequivoc associ with ad given the robust statist power of the initi genom-wide associ studi and subsequ replic studi [15], [16], [17], [18], [19].

how clu snp modul clusterin to alter ad risk is unknown.

two clu isoform, clu1 and clu2, have been report that consist of nine exon and differ onli in their first exon and associ proxim promot; clu1 is predict to encod a nuclear protein and clu2 a secret protein (review in [20]). addit report isoform includ a clu isoform that lack exon 5 and a clu isoform that lack exon two, which encod the leader sequenc, result in anoth nuclear clusterin [21], [22]. here, we investig the hypothesi that clu isoform ar differenti modul by ad statu and ad-associ snp. we identifi clu1 and

73

clu2 as the major clu isoform in human brain. quantit express studi show that both clu1 and clu2 ar increas in ad but onli clu1 is associ with rs11136000. lastli, although clu1 and clu2 ar predict to produc intracellular and secret protein, respect, immunofluoresc and western blot studi indic that clu1 and clu2 both produc secret protein that ar similar to those detect in the human brain. overal, we interpret our result as suggest that snp-mediat increas in secret, solubl clusterin express mai act to reduc ad risk.

method top

ethic statement

the work describ here wa perform with approv from the univers of kentucki institut review board.

cell cultur

sh-sy5y (human neuroblastoma) and hepg2 (human hepatocellular carcinoma) cell were maintain in dulbecco's modifi eagl's medium (dmem) supplement with 10% fetal bovin serum, 50 u/ml penicillin and 50 µg/ml streptomycin at 37°c in a humidifi 5%

co2 - 95% air atmospher.

clu express plasmid

express plasmid encod clu1 and clu2 were gener from sh-sy5y cellular mrna that wa revers transcrib by us the primer 5′-taggtgcaaaagcaacat-3′ which correspond to

74

sequenc just after the clu stop codon. clu1 and clu2 cdna were then amplifi by pcr with forward primer 5′-tgagtcatgcaggtttgcag-3′ (clu1) and 5′-atgatgaagactctgctgctg-3′

(clu2) us in combin with the common revers primer 5′-ctcctcccggtgctttttg-3′. pcr fragment were ligat into pcdna3.1/v5-hi-topo t/a clone vector (invitrogen, carlsbad, ca).

clone encod clu1 and clu2 were detect by pcr screen and clone integr confirm by sequenc.

human autopsi tissu

de-identifi human brain specimen were provid by the univers of kentucki ad center neuropatholog core [23], [24]. ad and non-ad design follow niari neuropatholog

guidelin, which includ indic of neurit senil plaqu and neurofibrillari tangl, and provid a likelihood stage of ad neuropatholog diagnosi [25], [26]. individu with “low” ad neuropatholog were cognit intact prior to death and had no or low likelihood of ad by niari criteria; their averag ag at death wa 81.8±10.2 (mean ±sd, n = 17). individu with

“high” ad neuropatholog repres a combin of dement individu with high likelihood of ad by niari criteria (n = 27) and cognit intact individu that were found to have moder or high ad neuropatholog at death (n = 7); their averag ag at death wa 81.9±6.2 (mean ±sd, n = 34). the averag post-mortem interv (pmi) for low ad neuropatholog individu wa 3.0±0.8 hour (mean ± sd, n = 17) while the pmi for high ad neuropatholog individu wa similar (3.2±0.8 hour (n = 34)). choroid plexu sampl were from six individu with an averag ag at death of 80.0± 3.3 year and pmi of 2.9±1.1 hour. fetal tissu rna sampl were obtain commerci (stratagen, santa clara, ca) and have been describ previous [27].

pcr amplif

75

total rna wa extract from human brain specimen and convert to cdna in 1 µ g aliquot with random hexam and revers transcriptas (superscript iii, invitrogen), essenti as we describ previous [24], [28], [29], [30]. pcr primer were design such that the splice of each intern clu exon as well as clu1 and clu2 were evalu (tabl 1). in initi screen, cdna pool from five high ad neuropatholog and five low ad neuropatholog sampl were subject to pcr-amplif (platinum taq, invitrogen) by us each primer pair and a pcr profil consist of initi denatur for 5 minut at 95°c, follow by 27–32 cycl of 94°c for 30 s, 60°c for 30 s, and 72°c for 1 min, and final extens at 72°c for 7 min (perkin elmer 9600). pcr product were separ by polyacrylamid gel electrophoresi, stain with sybr gold and visual by us a fluoresc imag (fuji fla-2000). the ident of the pcr product wa confirm by direct sequenc (davi sequenc, davi, ca).

thumbnailt 1. pcr primer for evalu splice variat.

doi:10.1371/journal.pone.0033923.t001

real-time pcr

the express level of clu1 and clu2 wa quantifi by real-time pcr. each isoform wa specif amplifi by us a sens primer correspond to sequenc within their respect exon 1, i.e., 5′

-gcgagcagagcgctataaat-3′ for clu1 and 5′-agatggattcggtgtgaagg-3′ for clu2′,

-gcgagcagagcgctataaat-3′ for clu1 and 5′-agatggattcggtgtgaagg-3′ for clu2′,

相關文件