• 沒有找到結果。

CHAPTER 2 CONSTRUCTION OF SMALL NON-CODING RNAS INFORMATION

2.2 R ELATED WORKS

There are several existing resources that provided data regarding each of these areas of research. RegulonDB is a database that integrates biological knowledge of the mechanisms that regulate the transcription initiation in Escherichia coli, as well as knowledge on the organization of the genes and regulatory signals into operons in the chromosome [21]. EcoCyc [22] and RegulonDB databases are both curated by the same group at the UNAM, and that the databases therefore contain the same data content on transcriptional regulation of gene expression. Actual curation of the data occurs within EcoCyc, and is periodically propagated to RegulonDB. ASAP [23] is developed to store,

11

update, and distribute genome sequences in conjunction with associated annotations and functional characterization data. NONCODE [24] is an integrated knowledge database dedicated to non-codingRNAs. These databases above provide the annotations of the genes which include a number of sRNA genes. In addition, Storz et al. use northern analysis to document a total of 79 small in E.coli in 2005 [7]. About the interaction information between sRNA genes and their regulators or sRNA genes and their targets, RegulonDB, NPInter which is a new database covering eight category functional interactions between noncoding RNAs (except tRNAs and rRNAs) and proteins related biomacromolecules (proteins, mRNAs and genomic DNAs) in six model organisms [25]

and sRNATarBase [26] is manually collected experimental data on sRNA–target interactions from peer-reviewed papers provide the interaction information. Gene Expression Omnibus (GEO) [27] is a database repository of high throughput gene expression data and hybridization arrays, chips, microarrays. Some experiments treat the roles of sRNAs and some expression profiles contain the known sRNAs in a variety of conditions. The summary of sRNAMap resource is presented in Table 1. The detailed introduction about these resource is illustrated as following.

12

Table 1 Summary of sRNAMap resource.

Resource Description URL Reference

RegulonDB

Gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated

promoters and Textpresso navigation.

http://regulondb.ccg.unam.mx/index.jsp [21]

EcoCyc A scientific database for the bacterium Escherichia coli

K-12 MG1655. http://www.ecocyc.org/ [22]

ASAP A systematic annotation package for community analysis of genomes.

http://www.genome.wisc.edu/tools/asap.ht

m [23]

NONCODE An integrated knowledge database of non-coding RNAs. http://www.noncode.org/ [24]

NPInter The noncoding RNAs and protein related

biomacromolecules interaction database. http://bioinfo.ibp.ac.cn/NPInter/ [25]

sRNATarBase A comprehensive database for bacterial sRNA targets

verified by experiments. http://ccb.bmi.ac.cn/srnatarbase/index.php [26]

GEO

A gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.

http://www.ncbi.nlm.nih.gov/geo/ [27]

13

RegulonDB

RegulonDB [21] (http://regulondb.ccg.unam.mx/) is the primary reference database of the best-known regulatory network of any free-living organism, that of Escherichia coli K-12. The major conceptual change since 3 years ago is an expanded biological context so that transcriptional regulation is now part of a unit that initiates with the signal and continues with the signal transduction to the core of regulation, modifying expression of the affected target genes responsible for the response. We call these genetic sensory response units, or Gensor Units. We have initiated their high-level curation, with graphic maps and superreactions with links to other databases. Additional connectivity uses expandable submaps. RegulonDB has summaries for every transcription factor (TF) and TF-binding sites with internal symmetry. Several DNA-binding motifs and their sizes have been redefined and relocated. In addition to data from the literature, we have incorporated our own information on transcription start sites (TSSs) and transcriptional units (TUs), obtained by using high-throughput whole-genome sequencing technologies.

A new portable drawing tool for genomic features is also now available, as well as new ways to download the data, including web services, files for several relational database manager systems and text files including BioPAX format.

EcoCyc

EcoCyc [22] (http://EcoCyc.org) is a comprehensive model organism database for Escherichia coli K-12 MG1655. From the scientific literature, EcoCyc captures the functions of individual E. coli gene products; their regulation at the transcriptional, post-transcriptional and protein level; and their organization into operons, complexes and pathways. EcoCyc users can search and browse the information in multiple ways.

Recent improvements to the EcoCyc Web interface include combined gene/protein pages and a Regulation Summary Diagram displaying a graphical overview of all known

14

regulatory inputs to gene expression and protein activity. The graphical representation of signal transduction pathways has been updated, and the cellular and regulatory overviews were enhanced with new functionality. A specialized undergraduate teaching resource using EcoCyc is being developed.

ASAP

ASAP [23] (http://www.genome.wisc.edu/tools/asap.htm) is a relational database and web interface developed to store, update and distribute genome sequence data and functional characterization. ASAP facilitates ongoing community annotation of genomes and tracking of information as genome projects move from preliminary data collection through post-sequencing functional analysis. The ASAP database includes multiple genome sequences at various stages of analysis, corresponding experimental data and access to collections of related genome resources. ASAP supports three levels of users:

public viewers, annotators and curators. Public viewers can currently browse updated annotation information for Escherichia coli K-12 strain MG1655, genome-wide transcript profiles from more than 50 microarray experiments and an extensive collection of mutant strains and associated phenotypic data. Annotators worldwide are currently using ASAP to participate in a community annotation project for the Erwinia chrysanthemi strain 3937 genome. Curation of the E. chrysanthemi genome annotation as well as those of additional published enterobacterial genomes is underway and will be publicly accessible in the near future.

NONCODE

The NONCODE [24] (http://www.noncode.org/) is an integrated knowledge database designed for the analysis of non-coding RNAs (ncRNAs). Since NONCODE was first released 3 years ago, the number of known ncRNAs has grown rapidly, and there is

15

growing recognition that ncRNAs play important regulatory roles in most organisms. In the updated version of NONCODE (NONCODE v2.0), the number of collected ncRNAs has reached 206 226, including a wide range of microRNAs, Piwi-interacting RNAs and mRNA-like ncRNAs. The improvements brought to the database include not only new and updated ncRNA data sets, but also an incorporation of BLAST alignment search service and access through our custom UCSC Genome Browser.

NPInter

NPInter [25] (http://bioinfo.ibp.ac.cn/NPInter/) is a database that documents experimentally determined functional interactions between noncoding RNAs (ncRNAs) and protein related biomacromolecules (PRMs) (proteins, mRNAs or genomic DNAs).

NPInter intends to provide the scientific community with a comprehensive and integrated tool for efficient browsing and extraction of information on interactions between ncRNAs and PRMs. Beyond cataloguing details of these interactions, the NPInter will be useful for understanding ncRNA function, as it adds a very important functional element, ncRNAs, to the biomolecule interaction network and sets up a bridge between the coding and the noncoding kingdoms.

sRNATarBase

sRNATarBase [26] (http://ccb.bmi.ac.cn/srnatarbase/index.php) is a comprehensive database for bacterial sRNA targets verified by experiments. The database holds 138 sRNA-target interactions and 252 noninteraction entries, which were manually collected from peer-reviewed papers. The detailed information for each entry, such as supporting experimental protocols, BLAST-based phylogenetic analysis of sRNA-mRNA target interaction in closely related bacteria, predicted secondary structures for both sRNAs and their targets, and available binding regions, is provided as accurately as possible.

16

This database also provides hyperlinks to other databases including GenBank, SWISS-PROT, and MPIDB.

GEO

GEO [27] (http://www.ncbi.nlm.nih.gov/geo/) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data.

Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities.

相關文件