• 沒有找到結果。

Experimental approach for quantifying the binding of proteins to specific DNA

CHAPTER 3 PREDICTION OF CAMP-CRP DEPENDENT SMALL NON-CODING RNAS . 34

3.1.2 Experimental approach for quantifying the binding of proteins to specific DNA

An electrophoretic mobility shift assay (EMSA) or mobility shift electrophoresis, also referred as a gel shift assay, gel mobility shift assay, band shift assay, or gel retardation assay, is a common affinity electrophoresis technique used to study protein:DNA/RNA interactions [47]. This procedure can determine if a protein or mixture of proteins is capable of binding to a given DNA or RNA sequence, and can sometimes indicate if more than one protein molecule is involved in the binding complex based on differences in their electrophoretic mobilities in polyacrylamide gels. The overview of EMSA is shown

36

in Figure S5.

3.2 Related works

Identification of TFs binding sites is a field that has evolved quite rapidly in the past years. Computational approaches, mostly sequence-based, are generally designed to solve two related issues: (1) For a given set of sequences that harbor the binding sites of a particular TF, find the location of the sites; and (2) For a given set of known binding sites, develop a representation of their binding signature, and use it to scan new sequences for additional binding sites [48]. Table 9 is an overview of methods involved in TFBS discovery.

37

Table 9 Bioinformatics tools for discovery of TFBSs/motif (Sacha, 2009).

TFBS/motif discovery

tools Description URL Reference

MEME Motif discovery algorithm http://meme.sdsc.edu/ [49]

AlignACE Motif discovery algorithm http://atlas.med.harvard.edu/ [50]

MDScan/BioProspector Motif discovery algorithm http://seqmotifs.stanford.edu/ [51]

Consensus Motif discovery algorithm ftp://www.genetics.wustl.edu/pub/stormo/Consensus/ [52]

PhyloCon Motif discovery algorithm http://ural.wustl.edu/~twang/PhyloCon/ [52]

PhyloGibbs Motif discovery algorithm http://www.phylogibbs.unibas.ch [53]

RSAT

Large series of regulatory analysis tools, containing oligonucleotide and dyad analysis

http://rsat.ulb.ac.be/rsat/ [54]

SCOPE Ensemble motif discovery tool http://genie.dartmouth.edu/scope/ [55]

MotifVoter Ensemble motif discovery tool http://compbio.ddns.comp.nus.edu.sg/~edward/MotifVoter2/ [56]

rVista Phylogenetic footprinting tool http://rvista.dcode.org/ [57]

38

MEME

MEME [49] (http://meme.sdsc.edu/) is a tool for discovering motifs in a group of related DNA or protein sequences. A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. MEME represents motifs as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern. Individual MEME motifs do not contain gaps. Patterns with variable-length gaps are split by MEME into two or more separate motifs.

AlignACE

AlignACE [50] (http://atlas.med.harvard.edu/) is a motif-finding algorithm.

Whole-genome mRNA quantitation can be used to identify the genes that are most responsive to environmental or genotypic change. By searching for mutually similar DNA elements among the upstream non-coding DNA sequences of these genes, they can identify candidate regulatory motifs and corresponding candidate sets of coregulated genes.

MDScan

MDScan [51] (http://robotics.stanford.edu/~xsliu/MDscan/) is a fast and accurate motif finding algorithm with applications to chromatin immunoprecipitation microarray experiments. They introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP-array-selected sequences and searches for DNA sequence motifs representing the protein-DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP-array ranking information to accelerate searches and enhance their success rates.

39

BioProspector

BioProspector [51] (http://robotics.stanford.edu/~xsliu/BioProspector/) is the discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes.

BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file.

PhyloCon

PhyloCon [52] (http://ural.wustl.edu/~twang/PhyloCon/) is the phylogenetic consensus for regulatory motif identification. PhyloCon takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles.

PhyloGibbs

PhyloGibbs [53] (http://www.phylogibbs.unibas.ch) is an algorithm for discovering regulatory sites in a collection of DNA sequences, including multiple alignments of orthologous sequences from related organisms. Many existing approaches to either search for sequence-motifs that are overrepresented in the input data, or for sequence-segments that are more conserved evolutionary than expected. PhyloGibbs combines these two approaches and identifies significant sequence-motifs by taking both over-representation and conservation signals into account.

40

RSAT

RSAT [54] (http://rsat.ulb.ac.be/rsat/) is a regulatory sequence analysis tools. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov).

SCOPE

SCOPE [55] (http://genie.dartmouth.edu/scope/) is an ensemble of programs aimed at identifying novel cis-regulatory elements from groups of upstream sequences. The SCOPE motif finder is designed to identify candidate regulatory DNA motifs from sets of genes that are coordinately regulated. SCOPE motif finder uses an ensemble of three programs behind the scenes to identify different kinds of motifs - BEAM identifies nondegenerate motifs (e.g. ACGTGC), PRISM identifies degenerate motifs (e.g.

AWCGRYH), and SPACER identifies bipartite motifs (e.g. ACCNNNNNNNNNGTT).

MotifVoter

MotifVoter [56] (http://compbio.ddns.comp.nus.edu.sg/~edward/MotifVoter2/) a variance based ensemble method for discovery of binding sites. Though the existing ensemble methods overall perform better than stand-alone motif finders, the improvement gained is not substantial.

rVista

rVista [57] (http://rvista.dcode.org/) is the evolutionary analysis of transcription factor binding sites. rVISTA attempts to fill this great gap in genomic analysis by offering a

41

powerful approach for eliminating TFBSs least likely to be biologically relevant. The rVISTA tool combines TFBS predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are evolutionarily conserved and present in a specific configuration within genomic sequences.

相關文件