• 沒有找到結果。

Chapter 2 Related Works

2.2 miRNA Target Prediction Web Server

2.2.3 miTarget

Figure 2.9 Web page of miTarget.

iRNA/target duplex. In contrast with those programs, miTarget[27] using a support vector machine (SVM) Among the existing miRNA target prediction programs, most of them identified the targets by considering the complementary between miRNA and its target and the thermodynamics of m

classifier for miRNA target prediction.

The SVM features which were designed based on the RNA secondary structure prediction results produced by RNAfold program in the Vienna RNA Package [28, 29] and were categorized into three elements: structure

features, thermodynamic features and position-based features. The general scheme of miRNA:mRNA interactions were shown in Fig. 2.10. Finally, 41 features were chose to training the SVM model. Table 2.3 list the top 15 contributing features.

Figure 2.10 General scheme of miRNA:mRNA interactions.

ble e top 15 atures.

nk Score

Ta 2.3 Th contributing fe Ra Rank Feature

2.3

r methods were based on ther

h of the three prediction tools, miRanda, RNAhybrid and TargetScan, we integrated in this work will be described in detail following.

Methods f miRNA target prediction program

ility availability

miRNA Target Prediction Software

At present, different computational methods have been developed for identifying miRNA targets (Table 2.1). Because of the challenge of predicting miRNA targets, there are several methods which can divide into different categories. The most widely used method is focus on the complementarity between miRNA and its targets and some methods require strict complementarity to the seed region of miRNA [15, 16]. Except the complementarity between two sequences, othe

modynamics and binding structure [18, 30, 31]. Besides, SVM is also the method used to predict miRNA targets [27].

For eac

Table 2.4 and resources o s.

Tool Type of method Method

availab

Data Refs

miRanda Complementarity Download Yes [17]

miRanda miRBase Complementarity Online

search Yes [22]

TargetScan Seed complementarity

earch Online

s Yes [15]

TargetScanS Seed complementarity Online es

search Y [16]

DIANA microT Thermodynamics Download Yes [31]

PicTar Thermodynamics Yes [30]

RNAhybrid

istical model load

Thermodynamics

2.3.1 miRanda

MiRanda[17] is the second published method of predicting miRNA targets. It identifies the potential miRNA target binding sites by looking for the high-complementarity regions on the target sequences using a weighted dynamic programming algorithm (Fig 2.3). The scoring matrix used by this algorithm is built based on that the bases at the 5’ end of the miRNA are rewarded more than those at the 3’ end. The binding sites exhibiting perfect or almost perfect match at the seed region of miRNAs display a better score.

The resulting binding sites are then evaluated thermodynamically, using the Vienna RNA folding package [28, 29].

Figure 2.11 System flow of miRanda.

2.3.2 RNAhybrid

RNAhybrid[18] recognizes regions in the 3’-UTRs that have the potential to form a thermodynamically favorable duplex with a specific miRNA. The core algorithm of RNAhybrid is an extension of RNA secondary structure prediction. Instead of a single sequence folding back to itself like MFold, RNAhybrid determined the most favorable hybridization site between miRNA and its potential target using an artificial linker.

Intra-molecular hybridizations base pairing between target nucleotides or between miRNA nucleotides are not allowed. The time complexity of this algorithm is linear in the target length, it allows many long sequences to be search in a short time. RNAhybrid is available at http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/.

Figure 2.12 Web page of RNAhybrid.

2.3.3 TargetScan

TargetScan[15] is the first method applied for human miRNA target prediction using mouse, rat and fish genomes for conservation analysis.

Different from those methods looking for the complementary sites, TargetScan requires the perfect complementarity to the seed region which is the position 2-8 of a miRNA numbered from 5’ end. This approach can successfully reduce the false positive at the beginning of prediction process.

Moreover, TargetScan also consider the thermodynamic stability of each potential binding site using RNAFold from the Vienna Package[32].

Figure 2.13 Web page of TargetScan.

2.3.4 MirTarget

MirTarget [33] is an algorithm for detecting miRNA targets. The algorithm combines relevant parameters for miRNA target recognition and heuristically assigns different weights to these parameters according to their relative importance. First step of this algorithm, miRNA seed sequence (positions 2–8) was scanned against all human 3’-UTR sequences to identify perfect complementary using a computer hashing technique.

Then the level of cross-species conservation of seed pairing was examined.

MirTarget evaluated orthologous sequences from five organisms and a gene candidate was rejected if the perfect seed pairing was not found in the orthologs from at least three organisms. The miRNA/target site duplex stability was evaluated by binding free energy (DG). DG values were computed using RNAFold [29]. A candidate target site was rejected if the DG value was higher than -13 kcal/mol. If a candidate site passed these screening filters, local sequence alignment was performed to extend the alignment between miRNA and 22 bases downstream of the seed-binding site in 30-UTR. Bases surrounding the seed sequences are important for target recognition [16]. Thus limited seed extension was evaluated for pairing to miRNA positions 1, 9 and 10. The longest stretch of perfect matches (including positions 2–8) was considered as an extended seed for raw score calculation. Different weights were assigned with the following order to differentiate their relative importance: seed conservation > limited seed extension > duplex binding stability > terminal base match. A score is recorded if it is no less than the threshold value 30.

Figure 2.14 The simple flowchart for MirTarget. (Wang, X, 2006)

Chapter 3 Materials and Method

3.1 Materials

In the systematic method for identifying miRNA target we propose in this work, we integrated some biological data source and computational programs. Table 3.1 and Table 3.2 show the biological data sources and prediction programs integrated in this work respectively.

Table 3.1 Resources of biological data.

Category Data

Source Version Link Ref.

Genome

Sequence Ensembl 49 http://www.ensembl.org/index.html [34]

Known miRNA Sequence

miRBase 11.0 http://microrna.sanger.ac.uk/sequences/ [21]

Gene expression Profile

NCBI

GEO - http://www.ncbi.nlm.nih.gov/projects/geo/ [35]

Table 3.2 Resources of computational tools.

Category Tool Name Version Ref.

miRNA Target Prediction

miRanda v 1.9 [17]

RNAhybrid v 2.1 [18]

TargetScan v 1.0b [15]

Target Accessibility Calculation Sfold [36]

3.1.1 miRNA sequences

miRBase::Sequences provides miRNA sequences data, annotation, references and links to the other resources for all published miRNAs. The latest version (release 11.0) of the database contains 6396 entries representing hairpin precursor miRNAs, expressing 6211 miRNA products from 72 species: a rapidly growth of over 2000 sequences in the past two years.

Figure 3.1 The growth of miRBase from 2002 to 2008.

In this work, we extracted 678 human miRNA from miRBase::Sequences (release 11.0).

3.1.2 Target genes

Several previous researches indicated that miRNA target sites are

conserved across species. In target prediction, considering target sites conserved across multiple species is more likely to reduce the false positives and also increasing the prediction efficiency [15, 17, 37]. Thus, in this work we retrieved the 15,314 3’UTR from 7,907 human genes from UCSC Genome Browser [38].

3.1.3 Sfold

Figure 3.2 Web page of Sfold.

Sfold is a RNA secondary structure prediction tool using statistical algorithm. In addition, Sfold also can be employed to predict the accessible target regions for RNA-targeting nucleic acids.

The core algorithm of Sfold could be separated into two steps. In the

forward step, it computes the equilibrium partition functions for all substrings of an RNA sequence. In the backward step, it takes a recursive sampling algorithm to draw secondary structures.

For prediction of accessible sites for targeting by antisense oligonucleotides, Sfold using a probability profiling approach based on the sampling algorithm[39]. On a profile for width W, the probability that W consecutive bases are all unpaired is plotted against the first base o f the segment. The target site was considered as accessible if there is at least one peak > 0.5, the target site was considered moderate for a peak with probability between 0.3 and 0.6, and the potential was low for a site with probability < 0.3 of being single-stranded. Sfold 2.0 application server is now available at http://sfold.wadsworth.org/.

3.1.4 Expression profiles of miRNA and target genes

In this work, we integrated two data sets of miRNA expression profiles which were obtained by different experimental method, Q-PCR and miRNA-based array[40] respectively.

Table 3.3 Details of expression profiles.

Category Author Method Description Ref.

miRNA

Q-PCR 224 human in 18 major

normal tissues in human Lu et al. miRNA-bead array 217 mammalian miRNAs

from 334 human samples [40]

Target Gene Su et al. gene expression array-based

Coding genes in 79 human

tissues [41]

All 224 human in 18 major normal tissues in human were detected by using a real-time PCR-based 220-plex miRNA expression profiling method to determine the tissue-specificity to human miRNAs. In the Lu study, a systematic expression analysis of 217 mammalian miRNAs from 334 human samples was detected by a bead-based flow cytometric miRNA expression profiling method.

Except the expression profiles of miRNAs, we also collected the gene expression profiles of coding genes in 79 human tissues. These data were obtained from NCBI GEO (GEO accession: GSD596).

Figure 3.3 Cluster analysis of GDS596.

Since the miRNA downregulates its target gene, the expression profile of miRNA and its target g

Pearson correlation coefficient is computed from the expression profiles both miRNA and target gene for each miRNA and its target gene (coding gene). There are 13 overlapping human tissues between the Q-PCR data set of the miRNA expression profiles and the GDS596 data set of the target gene expression profiles. The details of the 13 overlapping tissues are listed in Table 3.4.

Table 3.4 The 13 overlapping human tissues.

issue Index Tissue enes are typically negatively correlated. The

Index Tissue Index Tissue Index T

1 Brain 5 Lung 9 Prostate 13 Trachea

2 Heart 6 Muscle 10 estis T

3 Kidney 7 Ovary 1 1 Thymus

4 Liver 8 Placenta 12 Thyroid

3.2 System flow

Fig. 3.4 shows the flowchart of the systematic method of identifying miRNA targets we propose in this work.

Figure 3.4 System flow.

The inputs should be a specific miRNA and its overexpression profiles.

First, we identify the downregulated genes by analysis the miRNA overexpression profiles. This approach narrow down the search scope of targets successfully and let the prediction process be more efficiently. To support the input miRNAs and targets, the sequences of both known miRNAs and targets were retrieved from miRBase (release 11.0, April 2008)[21] and Ensembl (release 49, March 2008) [34] respectively.

For accelerating the identifying of miRNA targets against the prepared target sequences, we applied a filtering strategy based on dynamic programming which named iScan. iScan is a sequence local alignment program using the simple sum-of-pair scoring function (SP scoring function). For each kind of pair, G:C, A:T and G:U, iScan assigned score 6, 4 and 2 respectively. Otherwise, penalties of -3 and -5 are assigned for mismatched pairs and a gap respectively. After this filtering process, only those fragments which the score of alignment to a specific miRNA sequence exceed the cutoff value would be retained. These retained fragments are the candidates of miRNA targets and used as the search

database.

Table 3.5 Score of each type of pairs.

G:C A:T G:U mismatch gap

Score 6 2 4 -3 -5

Subsequent to the filtering process, three computational prediction tools, miRanda, TargetScan and RNAhybrid, are applied for identifying miRNA targets.

To increase the accuracy of miRNA target prediction, we set four criteria for filtering the potential miRNA targets predicted by the three computational programs described above. The first criterion is target site was predicted by at least two tools among miRanda, TargetScan and RNAhybrid. The second one is target gene contains multiple target sites.

Third, target site locates in accessible regions which were calculated by Sfold. The last one is target site locates in the both ends of target 3’-UTR.

All of these criteria were observing from the experimentally determined miRNA target sites which were retrieved from TarBase and the detail about these criteria will be elaborated in the following section of this chapter. The results which remain after the filtering of these four criteria are the potential miRNA targets of this specific miRNA.

The prediction algorithm of our method was named MRT. Besides the basic information of the relationship between miRNA and its targets, we also provide the expression data of both miRNA and its target to support the prediction results.

3.3 Filtering process of miRNA target prediction

In order to reduce the false positive and retain the more potential miRNA targets, we set four criteria by observing the experimentally data we retrieved from TarBase and surveying previous researches. The detail of these criteria will be described following.

Table 3.6 Four criteria of filtering process.

Description Number Percentage

Target site was predicted by at least two tools 28 35%

Target gene contains multiple target sites 45 56.25%

Target site locates in 5’ end or 3’ end of target 3’-UTR 55 68.75%

Target site locates in accessible regions 10 1.25%

3.3.1 Criterion 1: Target site was predicted by at least two tools.

In this work, three common used computational prediction programs, miRanda, RNAhybrid and TargetScan, were applied to identify miRNA targets. This criterion reserve candidate miRNA targets which were predicted by at least two tools (Fig. 3.4).

Figure 3.5 Criteria of identifying miRNA targets.

3.3.2 Criterion 2: Target gene contains multiple target sites.

Previous research indicated that one gene can contain several miRNA target sites. Thus, this criterion keeps the miRNA targets that contain more than two target sites. In the 80 experimentally data we retrieved from TarBase, there are 48 unique genes and 15 of them contain multiple target sites. For example, the C. elegans miRNA let-7 binds to night and eight sites in NRAS and KRAS respectively [42]. Otherwise, one of homebox (HOX) clusters, HOXA7, also be regulated by miR-196 with 4 binding sites[43]. Thus, after this filtering process, only those genes contain multiple target sites would be kept.

3.3.3 Criterion 3: Target site locates in 5’ end or 3’ end of target 3’-UTR.

Previous researches indicated that the function of a target binding site is related to its location in 3’-UTR. The effective target sites preferentially reside near the both end of the 3’-UTR[44, 45].

Examined the experimentally data get from TarBase, we divide whole 3’-UTR into three equal parts (as Fig 3.5A), there are about 68.75% target sites located in the both ends. To be stricter, we separated each 3’-UTR into four equal parts (as Fig 3.5B) and there are still 48.75% of these target sites reside in the quarter parts of both ends. Thus, this criterion keeps the potential target sites which locate in the both ends of the target 3’UTR.

Figure 3.6 Criterion 3 of identifying miRNA targets.

3.3.4 Criterion 4: Target site locates in accessible regions.

The structural elements in RNA secondary structure include helix, hairpin loop, bulge loop, interior loop and multi-branched loop. These elements make the RNA secondary structure more complicated.

Several studies suggested that the structure of miRNA target would affect the miRNA biding ability. The sequence context that surrounds the miRNA target sites influences the binding affinities of miRNA/target duplex. Kertesz et al. [46] indicated that the secondary structures contribute to target recognition, because there is an energetic cost to free base-pairing interactions within mRNA in order to make the target accessible for

miRNA binding (Fig. 3.6). Long at el. [47] posited the accessible model of miRNA target sites for predicting miRNA targets and successfully interpreted the published data on the in vivo of C. elegans reporter genes that contain modified lin-41 3’-UTR sequences.

Figure 3.7 Energetic cost to free base-pairing interactions (Long, D., et al.

2007).

Figure 3.8 Criterion 4 of identifying miRNA targets.

In this work, if the miRNAs hybridize to the target sites are located in the accessible regions are more likely to be real, shown as Fig. 3.7. The accessibility of target sequence is calculated by Sfold.

Chapter 4 Results

4.1 Case study: miR-124

In this work, we used miR-124 as an example. miR-124 is highly expressed in brain and kidney[40]. miR-124a was first identified by cloning studies in mouse[48] and its expression was later verified in human embryonic stem cells[40, 49]. There are 183 known miR-124 targets in TarBase.

Figure 4.1 Bead-array miRNA expression profile of miR-124.

We downloaded the miR-124 overexpression profiles from the NCBI GEO database[35] for one published study (accession GSE6207). In the Wang study[33], miR-124 and negative control miRNA were transfected

into HepG2 cell line using the Reverse Transfection protocol recommend by Ambion. The changes in global gene expression profiles were evaluated by microarray experiments at 4, 8, 16, 24, 32, 72, and 120 h post transfection using Affymetrix human U133Plus2 chip.

To narrow down the candidate target database, we analysis the expression profiles to identify the downregulated genes before applying the computational prediction programs. Array signals were normalized using R which is a project of statistical computing. A gene was defined as downregulated if the expression reduction was at least 50% compared with negative control (fold change < -1).

Figure 4.2 The amount of downregulated genes at each time point.

Examined the expression data, there were only a small number of genes be downregulated by miR-124 at early stage (4 hour and 8 hours). The amount of downregulated targets increasing rapidly during 16 hour to 72

hour. Transfection time point at 72 hour has the most downregulated genes.

However, the rate of downregulated targets is slow down at the later points.

The amount of downregulated genes at each time point were shown in Fig.

4.2.

In this work, 744 genes were considered as the candidate targets and there are 46 genes were recorded in TarBase as the experimentally supported target genes of miR-124. Go through the system flow described above, 227 of these candidate genes were predicted as the potential targets of miR-124 and contained 709 target sites.

Figure 4.3 The number of target sites satisfy the four criteria.

Shown as Fig.4.3, There were a large number of target sites satisfied criterion 2, target gene contains multiple target sites, and criterion 3, target site locates in 5’ end or 3’ end of target 3’-UTR. Nevertheless, only a few percentages of predicted target sites satisfied criterion 1, target site was predicted by at least two tools, and criterion 4, target site locates in accessible regions.

As described above, there were 46 experimentally tested miR-124 target genes in the candidate targets. 39 of these experimentally tested miR-124 target genes were predicted as the potential targets by the systematic method. Furthermore, there are three genes were satisfied all of the four criteria we described above and also known as the target of miR-124.

Table 4.1 39 experimentally targets of has-miR-124 predicted by MRT.

Gene Type Indirect Support Paper

ACAA2 Downregulation/

Microarray assay Lim et al, 2005 ARAF1 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 ATP6V0E Downregulation/

Microarray assay Lim et al, 2005 FN5 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 C14orf24 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 FLJ20364 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 CD164 Downregulation/

Microarray assay Lim et al, 2005 RAM2 Downregulation/ Microarray assay Lim et al, 2005

Cleavage

CDK4 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 CHSY1 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 ELOVL1 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 ELOVL5 Downregulation/

Cleavage

Real-time RT-PCR assay Wang et al, 2006 F11R Downregulation/

Cleavage

Microarray assay Lim et al, 2005 G3BP Downregulation/

Cleavage

Microarray assay Lim et al, 2005 HADHSC Downregulation/

Cleavage

Microarray assay Lim et al, 2005 ITGB1 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 LASS2 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 LITAF Downregulation/

Cleavage

Microarray assay Lim et al, 2005 LRRC1 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 NEK6 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 NME4 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 PLOD3 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 POLR3G Downregulation/

Cleavage

Microarray assay Lim et al, 2005 PTBP1 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 PTPN12 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 RYK Downregulation/

Cleavage

Microarray assay Lim et al, 2005 SLC15A4 Downregulation/

Cleavage

Microarray assay Lim et al, 2005 SUCLG2 Downregulation/

Cleavage

Real-time RT-PCR assay Wang et al, 2006 SURF4 Downregulation/

Cleavage

Real-time RT-PCR assay Wang et al, 2006 SYPL Downregulation/

Cleavage

Microarray assay Lim et al, 2005 TEAD1 Downregulation/

Microarray assay Lim et al, 2005 TEAD1 Downregulation/

相關文件