• 沒有找到結果。

a Graduate School of Medicine, Kobe University, Kobe, Japan

3. Measurement 1. Use Case Analysis

4.1. GSVML structure

The outlined structure of GSVML is shown in Figure 2.

GSVML is consist of three data criteria as variation data, direct annotation, and indirect annotation.

Figure 2. Outlined structure of GSVML. (quoted from [25])

The variation data criterion describes the straightforward variation data as allele, type, position, length, region, etc (Figure 3).

Some elements in the variation data criterion as variation type, location, variation attribute, and source are required, while the other elements are optional. Here database reference structure is defined as a dbref type that is a user defined complex type. The associated gene element is coupled with location element and the epidemiology element.

Figure 3. Variation data criterion of GSVML (quoted from [25])

The direct annotation criterion describes the attached data of variation data as whole genome sequence, mendelian segregate, homozygote detect, somatic mutation, experiment analysis, epidemiology, and miscellaneous (Figure 4). All of the direct annotation elements are optional. The experiment analysis element is consisted of the two categorized elements as identify and characterize. The variation-identify element has child elements to describe the experimental background to identify the variation data.

The variation-identify element also has elements for publication and submitter, and recursive reference is allowed between the publication elements and submitter elements. The variation-characterize element has child elements to describe clinical statistics or genetic statistics. All kinds of statistical methodologies are

allowed, while the typical elements as p-value, linkage disequilibrium index, descendent index, and maximum lod score are defined as the isolated elements. The epidemiology category gathered the statistical elements from all of GSVML elements, and it describes the statistical data. This category includes the associated gene from variation data, the disease epidemiology from indirect annotation, population and frequency from direct annotation. Each element of the disease epidemiology element is defined in the indirect annotation criterion.

Figure 4. Direct annotation criterion of GSVML (quoted from [25])

The indirect annotation criterion describes the explanatory/higher-level information of variation data as the omics data, the clinical information, and the environmental data (Figure 5). The personal information is defined with personal description element and database reference. GSVML supposes all situations of describing the personal information, while at the most cases the personal information is encrypted or numbered.

The phenotype element and the omics element allow broad data type to be able to describe it in any types of data formats. The clinical annotation category has disease element, clinical observation element, and database reference. The disease element is consisted of minimum set of disease descriptions, disease epidemiology, and database reference. Some elements in the disease description category are coupled with its expression probability. This probability elements are referred in the epidemiology data section. The database reference allows to describe with the other type of onotological description like SNOMED-CT. The clinical observation element is followed to SOAP description, while the almost all elements are diverted from the disease elements. The family history element is defined under the clinical observation element. The family history element for each family member is coupled with personal information, phenotype, and clinical annotation. To describe the character of each member, the recursive description is allowed.

These data criteria have relations to each other internally.

Figure 5. Indirect annotation criterion of GSVML (quoted from [25])

5. Discussion

In the current context, annotative information around the genomic sequence variation data is increasing and is getting to embed the information holes. The variation data themselves are also increasing and resulted in various databases. This trend is typical in SNP data.

Here the pit fall of genomic sequence variation data handling is in the lack of the sequence variation centric data exchanging formats. Historically many markup languages and programs are developed to handle the genomic information. However, there have been no SNP centric or sequence variation centric markup languages so far. GSVML can be addressed as the first genomic sequence variation centric markup language.

From application side, GSVML is human health centric. Considering that SNP is the highly researched polymorphism and has the great impact especially in human health domain, we can say that GSVML has the greatest potential to be the pinpointed ML for human healthcare. On the other hand, setting the applications to practical human health means to handle the direct or indirect annotative information. Here direct annotation indicates general information such as SNP associated genes, and indirect annotation indicates all of omics information and clinical information. To understand the situation of each patient, we need these kinds of additional information. For this reason, the development of GSVML need harmonization with the clinical standardization organizations such as Health Level Seven, International Organization for Standardization (ISO). The development work of GSVML collaborated with the Health Level Seven Clinical Genomics SIG work. The standardization effort of GSVML in ISO is in process, the improvement raised from the standardization process of ISO activities will be fed back and the design will be changed if need at the next version of GSVML. The "to and fro" process between the design work and the standardization process will continue to reflect the demands in future.

GSVML intends to apply for exchanging messages related to human health. In development and standardization of GSVML in this application domain, we kept an eye on the patient safety, the clinical

efficiency, and the medical costs. For the patient safety, the conservation and the secrecy of patient information are important. The sharable data format and the standardization activity of GSVML can contribute to the data conservation of the domain field internationally, and the public key infrastructure will also need these kinds of the sharable data format. For the clinical efficiency, the simplicity and the easy understandability are important. The structure of the GSVML is hierarchically classified from end-user's view such as clinicians or researchers. The information model of the GSVML adopted the element-based-definition to simplify the usage of the GSVML. For the medical costs, the installation ability is important. Providing the GSVML with DTD and XML schema will be a good offer for installation at current context. The GSVML designed with intention to adopt the end-user understandable classification and the simplified information technology.

GSVML can be used for the clinical variation data exchanging among various facilities having various types of data formats. In the greater framework of clinical data standardization, GSVML will play a part of describing the variation data and its necessary information. At the version 1, we validated the annotative information such as clinical information or omics information with intentional roughness to accept the various representations of the end-user's descriptions.

The many efforts to standardize the data format of these annotative information are going on. If these efforts reach a stage where one can rest, the more detailed validation of the annotative information will be our future work.

6. Conclusion

GSVML is in the demand for genomic sequence variation data exchanging. The GSVML is the sharable data exchanging format to exchange the genomic sequence variation data and the annotative information among the facilities having various types of data formats.

The envisioned applications of GSVML are in human health domain, and the GSVML are demanded to equip a harmonization with clinical information and omics information as annotations of variation data. The GSVML can enhance the genomic sequence variation data utilization internationally by providing a sharable platform for data exchanging.

7. Acknowledgements

We thank the Health Level Seven Clinical Genomics Special Interest Group and the International Organization for Standardization TC 215 Working Group who gave us the valuable advises.

8. References

[1]. Holden AL. 2002. The SNP consortium:

summary of a private consortium effort to develop an applied map of the human genome. Biotechniques Suppl: 22-24, 26.

[2]. Elias Zerhouni, "Medicine. The NIH Roadmap." Science. 2003 Oct 3;302(5642):63-72.

[3]. Cognitive Science Princeton University,

“Overview for Markup Language,” internet article of

http://www.cogsci.princeton.edu/cgi-bin/webwn2.0?stage=1&word=Markup+Language, 1998

[4]. International Organization for Standardization, ISO 8879: Information processing --Text and office systems -- Standard Generalized Markup Language (SGML), ([Geneva]: ISO, 1986)

[5]. T. Berners-Lee and Dan Connolly, "HyperText Markup Language Specification -- 2.0", RFC 1866.

Proposed Standard , Nov. 1995.

[6]. W3C recommendation, "Extensible Markup Language (XML) 1.0 (Second Edition)", internet article of http://www.w3c.org/TR/2000/REC-xml-20001006, 1998

[7]. W3C recommendation, "XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition) A Reformulation of HTML 4 in XML 1.0", internet article of http://www.w3.org/TR/xhtml1/, Jan.

2000

[8]. W3C recommendation, "WAP Forum - W3C Cooperation White Paper ", internet article of http://www.w3.org/TR/1998/NOTE-WAP-19981030, 1998

[9]. W3C recommendation, "Simple Object Access Protocol (SOAP) 1.1", internet article of http://www.w3.org/TR/2000/NOTE-SOAP-20000508/ , 2000

[10]. Flying Boat Mobile Communications,

“Glossary of Terms relevant to Mobile Communications,” internet article of http://homepages.nildram.co.uk/~jidlaw/pages/glossary.

html, 2004.11.

[11]. Laurent SS. Biggar RJ. 1999 "Inside SMLDTDs: Scientific and Technical." Berkeley, CA:

McGraw-Hill.

[12]. Hucka M, Finney A, Sauro HM, Bolouri H, et al. "The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models." Bioinformatics. 2003 Mar 1;19(4):524-31.

[13]. Hedley, W.J., Nelson, M.R., Bllivant, D.P. and Nielson, P.F. "A short introduction to CellML." Phil.

Trans. Roy. Soc. London A, 359, 1073-1089, 2001.

[14]. Goddard NH, Hucka M, Howell F, Cornelis H, Shankar K, Beeman D. "Towards NeuroML: model description methods for collaborative modelling in neuroscience." Philos Trans R Soc Lond B Biol Sci.

2001 Aug 29;356(1412):1209-28.

[15]. Freimuth RR, Stormo GD, McLeod HL.

"PolyMAPr: programs for polymorphism database

mining, annotation, and functional analysis." Hum Mutat.

2005 Feb;25(2):110-7.

[16]. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308-311.

[17]. Buetow KH, Edmonson MN, Cassidy AB. 1999.

Reliable identification of large numbers of candidate SNPs from public EST data. Nat Genet 21: 323-325.

[18]. Ohnishi Y, Tanaka T, Yamada R, Suematsu K, Minami M, Fujii K, Hoki N, Kodama K, Nagata S, Hayashi T, Kinoshita N, Sato H, Kuzuya T, Takeda H, Hori M, Nakamura Y. 2000. Identification of 187 single nucleotide polymorphisms (SNPs) among 41 candidate genes for ischemic heart disease in the Japanese population. Hum Genet 106: 288-292.

[19]. Cheung KH, Miller PL, Kidd JR, Kidd KK, Osier MV, Pakstis AJ. "ALFRED: a Web-accessible allele frequency database".Pac Symp Biocomput 2000.:639-50.

[20]. International Organization for Standardization

Technical Committee 215, http://www.iso.org/iso/en/stdsdevelopment/tc/

[21]. Health Level Seven Clinical Genomics Special Interest Group, internet article of http://www.hl7.org/Special/committees/, Since Sep.

2002

[22]. HL7 information model of genotype (HL7 POCG_DM000023),

http://www.hl7.org/special/Committees/clingenomics/do cs.cfm

[23]. Yoshida T. "[SNP project in the Millennium Genome Project, Japan]" Gan To Kagaku Ryoho. 2002 Jun;29(6):963-7.

[24]. Waugh A, Gendron P, Altman R, Brown JW, Case D, Gautheret D, et al. "RNAML: a standard syntax for exchanging RNA information." RNA. 2002 Jun;8(6):707-17.

[25] International Organization for Standardization, TC215. Genomic sequence variation markup language.

Available from:

http://www.iso.org/iso/en/CatalogueDetailPage.Catalogu eDetail?CSNUMBER=43182&scopelist=PROGRAMM E. <Accessed July 11, 2006>

Design a pathway/genome expert system using a prolog machine