• 沒有找到結果。

Chapter 3 miRTar - an integrated system for identifying miRNA-target interactions

3.4 Materials and methods

The miRTar is a web-based system that runs on an Apache web server with a Linux operating system. Figure 19 presents in brief the intention that underlies miRTar, which is to design an analytical platform that allows researchers to focus on all possible scenarios to discuss the regulatory relationships between miRNAs and genes. After data are submitted to the system, miRTar identifies the miRNA target sites using TargetScan, miRanda, PITA, and RNAHybrid. The miRTar identifies the target sites against 3' UTR, 5' UTR and coding regions. Thus, the potential miRNA-target interactions between miRNAs and genes are constructed. For a gene set that may be regulated by single miRNA, based on gene set enrichment analysis (GSEA), a p-value is calculated to estimate the overrepresentation of genes in which the KEGG pathways, to estimate the biological function of miRNA. Additionally, miRTar can provide the information of miRNA target sites within exons, which are alternatively spliced (AS) or constitutively spliced (CS).

Figure 19. Concept that underlies miRTar.

3.4.1 Data collection

Figure 20 depicts the system flow of miRTar. miRTar utilizes several well-known resources, including the miRNA sequences, obtained from miRBase database Release 15 (74), gene information and relevant annotations, based on ASTD database Release 1.1 (133) and GenBank database Release 167 (134). The splice variants of transcripts are obtained from this ASTD (133), UniGene database Release 217 (135) and GenBank database (134). The biological pathways are extracted from the KEGG/PATHWAY database Release 53.0 (136). Table 6 lists all versions and data types obtained from

Figure 20. System flow of miRTar.

Table 6. Data statistics and data obtained from databases.

Data source Version Data descriptions Data amount

miRBase (74) V.15 MicroRNA information (name, sequences, ...) 1100

KEGG (136) V. 53 The pathway maps 195

ASTD (133) V. 1.1

Gene annotation 16,715

mRNA sequences 93,467

Protein information 34,545

Alternative splicing events 78,165

GenBank (134) V. 167

Gene annotation 32,123

Genomic sequences 32,123

Protein sequences 125,259

UniGene (135) V. 217

mRNA sequences 137,654

protein information (mRNA gi to protein gi) 125,259

3.4.2 Identifying miRNA target sites in human

First, TargetScanS was utilized to detect perfect Watson-Crick base pairing against all mRNA transcripts with lengths of at least six nucleotides. Four seed types, 8mer,

7mer-m8, 7mer-A1 and 6mer, which were defined clearly by the Bartel's group (65).

Detecting the perfect seed region considerably reduces the number of false-positive predictions, especially for the conserved seed types (65,66,137). The latest version of miRanda (138) is also utilized to identify miRNA target sites. Notably, the terminal miRNA nucleotides the first and last two nucleotides no longer contribute to the miRanda score (139). The cutoff of minimal free energy (MFE) of the miRNA:target duplex was set to -12 kcal/mol and the cutoff of miRanda score was set to 120. Hence, miRNA targets whose MFEs are lower than -12 kcal/mol and whose score exceeds 120, are identified in the miRTar. Besides, RNAhybrid and PITA, which were developed to identify the miRNA target sites against 3’UTR, were utilized herein to identify miRNA target sites within 3’UTR. In order to reduce false positive predictions generated by multiple miRNA target prediction tools, miRTar applies several criteria concerning both their biological evolution and their structural context. These criteria are described below.

A. Target site in conserved region. Since target sites that are conserved across

species are likely to be biologically functional, they are potential miRNA target sites. The UCSC PhastCons conservation score (140) is utilized to filter out the non-conserved predictions. Human data alignments were downloaded from the UCSC Genome Browser (92). The lowest bound on the PhastCons conservation score at the predicted target site in a human is set to 0.5.

B. Target site in accessible regions. Conventional target prediction tools consider

the complementarity between the miRNA and its target sequence, the conservation of the target sites, and the kinetics and thermodynamics of the miRNA/target duplex.

Although these properties are important in identifying miRNA target sites, the sequence context that surrounds miRNA target sites reportedly affects the binding affinities and the regulation of the miRNA. Harlan et al. (71) hypothesized that single-strand miRNAs can only bind to stretches of free mRNA for potential target sites. Dang et al. (72) posited

which are more likely to be real if they are in more accessible regions. RNAplfold can exactly determine the local base-pairing probabilities and the accessibilities of mRNA transcripts, which thus do not have to be computed from a Boltzmann-weighted sample of structures.

3.4.3 Exon/Intron boundary recognition

Recognition of the boundaries between exons and introns in gene transcripts has been studied for several years. Numerous technologies have been adopted to align cDNAs against genomic sequences. In this work, the cDNA sequences are obtained from UniGene and the genomic sequences are obtained from GenBank (134). Three tools are utilized to recognize these boundaries. They are SIM4 (142), splign (143), and spidey (144). The exon/intron boundaries on the transcripts were confirmed by using at least two tools. A total of around one million exons from 150,000 transcripts in about 30,000 genes were recognized.

3.4.4 Identifying different types of alternatively spliced exons

Five well-defined types of alternatively spliced exons are skipped exons, alternative 5' spliced sites, alternative 3' spliced sites, mutually exclusive exons and retained introns (130). In this work, in order to identify different exon types, the collected transcripts from UniGene were aligned pairwisely. First, the mRNA sequence was converted into a bit string of ones and zeros. Then, the logical operation (XOR, AND, OR), mentioned in SpliceInfo, is performed (145). Otherwise, alternatively spliced exons from ASTD (133) can be downloaded from the website. Of these five types of alternatively spliced exons, the cassette exon has the most ones, followed in order by the alternative 5' splice sites and the alternative 3' splice sites. Retained introns have the fewest ones (Table 7).

Table 7. Statistics the various types of alternative splicing exons between two different data sources.

Types of alternatively spliced exons Data source

ASTD UniGene

No. of cassette exon 34,435 9,361,222

No. of alternative 5' splice sites 6,469 1,030,325

No. of alternative 3' splice sites 3,720 913,112

No. of mutually exclusive exon 3,384 9,401

No. of intron retention 9,639 75,481

3.4.5 Alternative splicing effects to miRNA regulation

Following the prediction of miRNA target sites against all human transcripts, the alternative splicing information were considered for elucidating the miRNA-target interactions affected by alternative splicing. We utilize two data sets of alternatively spliced exons to study how alternative splicing mechanism control miRNA-target interactions. The first data set were obtained from ASTD (133) and the second data set were derived from the gene annotation in UniGene (135) and GenBank (134).

Table 8 presents the percentage of putative miRNA targets that are located on the

transcripts that have been collected by miRTar. Since the average length of CDS is larger than the average length of 5' UTR and 3' UTR, generally the miRNA target sites are more probably to occur within the CDS regions than within 5' UTR and 3' UTR. Moreover, Table 5 gives the distributions of miRNA target sites within different types of alternatively spliced exons. The miRNA target sites are identified more often in cassette exons higher than in other types of alternatively spliced exons. The distribution is similar to the percentages of splicing exons given in

Table 7. Accordingly, when miRNA target sites are located in the alternatively

spliced exons of a specific gene, various potential regulatory relationships between the miRNA and the gene can be further investigated. Thus, if a miRNA targets to an alternatively spliced exon, the target site can be conditionally spliced out and cannot be included in the gene transcripts. Therefore, RNA alternative splicing can cause incomplete gene suppression by a miRNA and affect miRNA regulations to diverse protein functions.

Table 8. Statistics of miRNA target site locations

Transcripts from ASTD Transcripts from UniGene

*MFE <= -12 kcal/mol

*Score > 120

5'UTR 10.46 % 9.32 %

CDS 67.12 % 65.83 %

3'UTR 22.41 % 24.85 %

* miRNA target prediction parameters:

MFE: Minimum Free Energy of duplex; Score: alignment score of duplex.

3.4.6 GSEA for miRNA-regulated genes

After the prediction of miRNA targets, miRTar performs a gene set enrichment analysis (GSEA) for the miRNA-regulated genes in the KEGG pathway maps. It allows users to observe conveniently the biological pathway in which the miRNA-regulated genes participate and to determine the regulatory networks of miRNA-regulated genes.

As shown in Figure 21, the first step of the analysis is to determine the enrichment of specific miRNA target gene groups in various KEGG pathway maps. These maps are ranked by the number of p-values of the miRNA target genes in the biological pathway.

The “Title [ID]” column provides the names of the KEGG pathway maps in which the miRNA target genes are involved; the “matched genes” column presents the number of miRNA target genes in each map; and the “gene in pathway” column presents all of the genes in each map.

Figure 21. Analysis to identify miRNA target genes in KEGG pathway maps.

Figure 22 shows the second step of the analysis. The miRNA target genes are marked in “slate blue” in the KEGG pathway map, and the colors of traffic lights are utilized to represent the states of the miRNA target regions (3’ UTR, 5’ UTR and CDS).

Users can focus to observe the miRNA target region of interest through changing the state in a biological pathway.

Figure 22. miRNA target genes in KEGG pathway map.

3.4.7 The approximate runtime of miRTar

Users can identify the miRNA targets on a set of groups of genes by using multiple miRNA sequences. The execution time of ten randomly selected miRNAs against the gene set (in FASTA format around 20MB file sizes) was computed on a PC server with eight CPU-cores. The miRNA target genes were predicted on average in 8.38 s for each miRNA, indicating that the proposed method can be utilized to identify the miRNA targets throughout the genome.

相關文件