系統生物學

全文

(1)

(2) 系統生物學學是一個試圖整合不同層次信息以理解生物系統如何行使功能的學術學領域。通過研究某生物系統各不同部分之間的相互關係和相互作用（例如，與細胞信號傳導，代謝通路，細胞器，細胞，生理系統與生物等相關的基因和蛋白網路）系統生物學期望最終能夠建立整個系統的可理解模型路），系統生物學期望最終能夠建立整個系統的可理解模型。 2001年第二屆國際系統生物學會 (2nd International Conference on Systems Biology； ICSB 2001) 對「系統生物學」的解釋為：系統生物學是對生物體整個過程做一全面性的定量研究，並非以生物體的某一部分為對象。目的是要建立模式並以實驗來證實其可預測的生物體的表現。簡單的說，這樣的研究方法就是利用資訊科學及微機電工程的技術來研究生物學的問題最後並希望能夠利用電腦運算的結學及微機電工程的技術來研究生物學的問題，最後並希望能夠利用電腦運算的結果，來預測細胞、器官、系統甚至完整生物體的表現。系統生物學的研究可以包含四大部分，分別利用資訊科學 (computation)、分析 (analysis)、技術 (technology)、基因組學 (genomics)、四者形成環型而連續的關係，建立出一個新的研究模式(Kitano, 2002)，並且利用這一模式所發展的一系列的工具來解決生物學家所面臨的研究問題.

(3) The Central Dogma DNA. Transcription. RNA. Translation. Protein. CHIP-on-chip p array CGH expression array. tissue array. protein chip.

(4) Research Population Organism g Development. Data organization: Basic Model. Resource Taxonomy Genome Project j Genome Resources. Phenotype Genetics. Map Viewer OMIM, OMIA. Chromosomes Scaffolds Gene mRNA ProProtein Mature peptide Structure. G Genome Trace Gene Nucleotide Protein RefSeq MMDB, Cn3D CDD. Domains Books Function Disease Phenotype. OMIM, OMIA PubMed PubMed Central.

(5) 高通量 High-throughput screening (HTS), is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology and chemistry. chemistry. Petri dish. 96 well plate. 384 well plate.

(6) High Throughput Technologies: The future of Molecular Diagnosis High Hi h Th Throughput h tT Technologies h l i (HTT (HTTs)) are d developed l d tto produce d h huge amountt off information from genome projects, but they have clear potential in mass screening and diagnostics of Infectious Diseases. The application of HTTs may revolutionize diagnostic techniques and replacing multiple individual assays.. Genomics. Microarray, EST ,SAGE SAGE. mRNA Gene. Proteomics, 3D Modeling. Protein. Gene Products. Nucleotide Sequencing. Amino Acid Sequencing.

(7) Transcription regulator. 2. 1 Gene. Upstream regulatory element. Promoter. Upstream g activating sequence. Transcription activator ti t. Upstream repressing sequence. Transcription repressor. Expression regulation. Sequencing Gene A. Treatment 1. C. Treatment 2. B. C. A. Database. B. Treatment 3. C. A. 3. B A. cDNA library Common database 4. Two dimensional electrophoresis Gene. Treatment 3. A. B. C. D. E. F. G. H. Treatment 2 Treatment 1. Treatment 3 Treatment 2 T Treatment t t1. A B 1 2 Treatment. C. 3. Serial analysis of gene expression. Microarray expression Candidate Gene. Protein Products P d t. Functional Protein. Biological s stem system. Database and te t mining text. Structural determination. Activity profiling. Data integration. Annotation Model. Quantitative protein profiling. Post-translational modification analysis: y Interred activity y. System simulation. Similar Model. Protein linkage map (catalog and dynamic). 5. I. Subcellular localization.

(8)

(9) Genomics.

(10) Genome Sequencing Project Sequences. Library C Construction t ti Genome S Sequencing i. Comparison transcriptome - transcribed DNA sequence proteome - peptide sequence genome - related genomic sequence. Genome Assembly. Identification. Gap Closure. Functions Gene Prediction Gene Annotation Genome Analysis. ORFeome based functional genomics RNAi phenotypes Gene Knockout Expression Microarray.

(11) Let s go for Let’s a genome project j t.

(12) Genome Sequencing Project Flowchart 1. 2. 4. 5. 6. 3.

(13) Sequencing the Plasmodium falciparum genome Isolated individual chromosomes by PFGE. Plasmodium falciparum. Fragmented the DNA and cloned into E. coli 13--14 3.2, 3.4 Mb 13. Generated thousands of sequences from randomly selected fragments Assembled the random sequences to form “contigs” - Hundreds of fragments per chromosome obtained. Ordered the fragments and closed the gaps. Identified the genes, compared them to genes in i other th organisms, i and d annotated t t d the genome sequence. 12 11 10. 2.4 Mb 2.3 Mb 2.1 Mb. 5-9 1.6 - 1.8 Mb 4. 1 4 Mb 1.4. 3. 1.2 Mb. 2. 1.0 Mb. 1. 0.8 Mb.

(14) Karyotype a yotype. Protozoa Unigene Number 105. Trichomonas vaginalis. Methodology. Coverage decision. Unig gene Nu umber. Genome size G i prediction. Trypanosoma cruzi Entamoeba histolytica. Leishmania Trypanosoma major brucei Plasmodium yoelii yoelii Theileria Plasmodium parva falciparum Theileria Cryptosporidium yp p annulata parvum. 104. 103 1. 10. 100. 1000. Genome Size (Mb).

(15) Karyotype a yotype. Plasmodium falciparum Genome size G i prediction.  ~30 million base pairs (Mb)  80% (A+T). 13--14 3.2, 13 3 2 3.4 3 4 Mb.  14 chromosomes  DNA “unstable” in E. coli. Methodology.  No large insert DNA Coverage decision. 12 11 10. 2.4 2 4 Mb 2.3 Mb 2.1 Mb. 5-9 1.6 - 1.8 Mb. clones suitable for sequencing  Too large for whole. 4. 1.4 Mb. genome shotgun 3. 1 2 Mb 1.2. 2. 1.0 Mb.  Whole chromosome shotgun strategy was. 1. 0.8 Mb. selected.

(16) Genome size and sequencing strategies Genome size (log Mb) 0. 1. 2. 3. 4 H. sapiens (3,000 Mb) D. melanogaster (170 Mb) C. elegans (100Mb) P. falciparum (30 Mb) S. cerevisiae (14 Mb) E. coli (4 Mb). Whole genome shotgun (WGS) Clone-by-clone Whole Chromosome Shotgun (WCS) Whole Genome Shotgun (WGS) with Clone ‘skims’.

(17) A History of Genome Sequencing 1981: Sanger g et al. sequence Lambda (50Kbp) ( ) by y the shotgun g method. Cloning: BACs permit 100100-250 Kbp inserts Technology: Cycle sequencing (linear PCR) permits efficient Sequencing of both insert ends Capillaries improve accuracy & efficiency 1998 3% off th 1998: the h human genome h has b been sequenced d using i a BAC BAC--based b d hierachical plan. Common wisdom is that shotgun approach does not scale beyond y BACs save for simple p bacterial sequences. q 2001: 97% of the chromatin of the human genome has been determined. Mouse, Drosophila, Rice, Fugu, and Anopheles have all been sequenced with a whole genome shotgun approach..

(18) Whole Genome Sh t Shotgun. Whole Chromosome Sh t Shotgun. BAC clone Shotgun. Chromosome Isolation. BAC clone Shotgun. Full shotgun sequencing Genomic DNA. Create clone library. Shearing/Sonication. Sequencing the two ends of each clone Assembly the. Subclone and Sequence. overlapped reads into contigs Shotgun reads. Assembly the contigs into super contigs. Assembly. Align the super contigs. Contigs. to the genome Genome Finishing g. Finishing read. Finishing. Complete sequence.

(19) Whole Genome Sh t Shotgun. Whole Chromosome Sh t Shotgun. BAC clone Shotgun. Chromosome Isolation. BAC clone Shotgun. Whole genome shotgun sequencing strategy.

(20) Building Supercontigs (Scaffold). Gene Prediction (GENSCAN). Supercontig creation and gap filling ( ) A supercontig (A) p g is constructed by y successively linking pairs of contigs that share at least two forward-reverse links. Here, 3 contigs are joined into one supercontig.. The layout now consists of a number of supercontigs with interleaved gaps. Most gaps belong to regions marked as repeat contigs, some correspond to regions of insufficient shotgun reads. (B) Arachne attempts to fill gaps by using paths of contigs. The first gap in the supercontig shown here is filled with one contig, and the second gap is filled by a path consisting of two contigs..

(21) Building Supercontigs (Scaffold). Eukaryotic annotation. Gene Prediction (GENSCAN). A Annotation t ti Station/Manatee St ti /M t http://manatee.sourceforge.net/. Project DB. Annotation DB Gene finders Alignments of genomic to proteins and ESTs. Gene models BLAST S PFAM/TIGRFAM SignalP/TMHMM Functional assignments.

(22) Transcriptomics.

(23) MICROARRAYS: MICROARRAYS: Chipping Away at the Mysteries of Science and Medicine. “I think you should be more explicit here in step two.“ Modified with permission from a cartoon by Sidney Harris and  from an image provided by Patrick Brown. Leung et al. Genome Biology 2001 2:reports4021.1 doi:10.1186/gb‐2001‐2‐9‐reports4021.

(24) 陣列. 微. In computer science an array is a data structure consisting of a group of elements that are accessed by indexing. In most programming languages each element l h the has h same data d type andd the h array occupies a contiguous area of storage. Most programming languages have a built-in array data type. Some programming languages support array programming (e.g., APL, newer versions of Fortran) which generalises operations and functions to work transparently over arrays as they do with scalars, instead of requiring looping over array members. Multi-dimensional arrays are accessed using more than one index: one for each dimension. Arrays can be classified as fixed-sized arrays (sometimes known as static arrays) whose size cannot change once their storage has been allocated, or dynamic arrays, which can be resized.. Micro is an English prefix of Greek origin that refers to an object as being smaller than an object or scale of focus, in contrast with macro..

(25) Comparison of various biological assemblies and technological devices Comparison of various biological assemblies and technological devices.

(26) Microarrays are miniaturized biological Microarrays are miniaturized biological devices consisting in molecules, for example DNA or protein, named the p , y g "probes", that are orderly arranged at a microscopic scale onto a solid support such as a membrane or a glass microscope slide..

(27) Wh What problems can it solve? bl i l ? 1. Differing expression of genes over time, between tissues, and disease i b i d di states 2 Identification of complex genetic 2. Identification of complex genetic diseases 3 Drug discovery and toxicology 3. Drug discovery and toxicology studies 4 Mutation/polymorphism 4. Mutation/polymorphism detection (SNP’s) 5. Pathogen analysis Pathogen analysis.

(28) Why Use DNA Microarrays for Expression Analysis? Why Use DNA Microarrays for Expression Analysis? 1. Conventional expression analysis only allows the study of the expression of a single gene in a single experiment. 2. The highly parallel nature of microarrays allows the simultaneous study of the expression of thousands or even tens of thousands of different genes in a single experiment. 3. Microarrays allow researchers to undertake global expression analysis that is not feasible with conventional techniques.. www.reactome.org.

(29) Evaluate global gene expression 1. Differential displaying 2. Suppressive subtractive hybridization (SSH) 3. Sequencing of expressed sequence tags (EST), serial analysis gene expression (SAGE & Long SAGE) 4 Hybridization to microarrays 4..

(30) Differential displaying Several significant limitations of the original protocol were:  the requirement of large quantities of mRNA  a bias against the identification of rare transcripts.  not a q quantitative method with a high g rate of false positives, or  gene fragments that seem to be differentially expressed as an artifact Advantages  to compare multiple experimental samples simultaneously  to identify genes that are either up or down-regulated down regulated in one sample relative to another.

(31) Subtractive hybridization Subtractive hybridization (or subtraction) is a method for rapid id isolation i l ti off differentially diff ti ll distributed di t ib t d nucleic l i acids id (differentially expressed, differentially present, or differentially arranged in two or more different sources: different cells, cell populations or cell types, types different tissues, tissues disease, disease or development stages). The Suppressive subtractive hybridization (SSH) method is designed to selectively amplify differentially expressed transcripts while suppressing the amplification of abundant transcripts, thus eliminating the need to separate single- and double-stranded molecules. Limitations  only to pair pair-wise wise treatment comparisons  does not provide a quantitative measure of expression differences.

(32) EST and SAGE  Expressed sequence tag (EST) sequencing are generated by randomly picking clones from a cDNA library and performing a single sequencing reaction to produce 300 to 500 bps of sequence per clone. Differences in gene expression may be identified by counting the number of times a particular sequence appears in EST libraries of genes from different sources. y of g gene expression p ((SAGE)) a technique q  Serial analysis for the analysis of gene expression that is essentially an accelerated version of EST sequencing. SAGE: 10-14 bps tags created by type IIS restriction endonuclease Nla III. Super SAGE: > 25 bps tag, created by type III restriction enzyme EcoP15I. Advantages:  SAGE data are quantitative and cumulative.  Accurate, Accurate quantitative transcript profiles describing the abundance of all genes expressed in a cell or tissue are generated by SAGE, if sufficient sequencing is completed..

(33) Microarray hybridization The analysis of array data may be divided into three components: 1) identification and quantification of hybridization intensities, intensities 2) visualization of data, and 3) clustering techniques Based on the assumption that genes with related functions are coregulated, clustering of microarray data becomes a powerful method th d to t assign i putative t ti functional f ti l classifications to novel genes..

(34) They are generally divided and differentiated by many ways 1. Type of target DNA (immobilized nucleic acid molecule) used in the array fabrication 2. Type of substrate to which the target DNA is printed or spotted 3. Methodology used to present the target DNA on the substrate 4. Density of probes on the array 5. Type of labeling for hybridization Hi h h High throughput technology h h l.

(35) 1 Type of target DNA (immobilized nucleic acid molecule) used in the array 1. T f DNA (i bili d l i id l l ) di h fabrication. cDNA, oligonucleotide, RNA, protein, antibody, tissue . DNA. Transcription. RNA. Translation. Protein. CHIP‐on‐chip p array CGH expression array. tissue array. protein chip.

(36) Construction of oligonucleotide arrays •. •. The light flash is produced by photolithography using g a mask to allow light g to strike onlyy the required features on the surface of the chip. Oligonucleotide are synthesized in situ in the silicon chip. In each step, a flash of light “deprotects” deprotects the oligonucleotides at the desired location on the chip; then “protected” nucleotides of one of the four types (A, C, G or T) are added so that a single nucleotide can add to the desired chains. There are four types of masks according to the added nucleotide..

(37) 4 Density of probes on the array. 4. D i f b h low density: 100 s (example: protein antibodies) medium density: 1,000s to 10s of 1,000s and more (example: cDNA di d i 1 000 10 f 1 000 d ( l DNA ) ) high density 100s to 1000s of 1000s (example: short oligonucloetide) . Isotope p Nylon – cDNA (300‐900 nt). Two‐colour cDNA or Oligo (80 nt) 500 – 11,000 elements. Affymetrix y Silicone – oligo (20 nt) 22 ,000 elements.

(38) Digital Image Analogy Digital Image Analogy Northern blot. Customized. Commercial. Expression. CGH (LOH) ( ) ChIP, CGR, Exp Tiling, Methylation, …. Low d L density it  low content. High Hi hd density it  high content. High Hi hd density it  high content. Short probes  limited contrast  limited channels. Long, isothermal probes  high contrast  multiple channels.

(39) 5 Type of labeling for hybridization. 5. T f l b li f h b idi i single channel using one color and double channel using two color cy3 cy5. Scan and detect with confocal laser system. G Green channel h l. R d channel Red h l. Overlay images and normalize. Image process and analyze.

(40) Central Dogma and Existing Microarrays Types Central Dogma and Existing Microarrays Types 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.. aCGH SNP array Tilling array ROMA Barcode array cDNA array qPCR array RNA array Protein array CHIP on chip Antibody array Organic compound Small molecule ELISA array Cell array Tissue array Lab‐on‐a‐chip. Cell array, Tissue array, ELISA array. Protein array, Antibody array, Small molecular array Organic Small molecular array, Organic compound. Protein array Expression array, qPCR array. Expression array CHIP on chip p aCGH, SNP array, Tilling array, ROMA, Barcode, .

(41) 1. aCGH 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.. SNP array Tilling array cDNA array qPCR array RNA array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. Array comparative genomic hybridization (also CMA, CMA Chromosomal Microarray Analysis, Microarray-based comparative genomic hybridization, array CGH, aCGH, aCGH, or virtual karyotype) genomic copy number variations at a higher resolution level than chromosome chromosome-based based comparative genomic hybridization (CGH). This is a molecular-cytogenetic method for the analysis of copy number changes (gains/losses) in the DNA content of a given subject's DNA and often in tumor cells..

(42) 1.. aCGH. 2. SNP array 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.. Tilling array cDNA array qPCR array RNA array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. A single nucleotide polymorphism (SNP, (SNP pronounced snip) is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome (or other shared sequence) differs between members of a p ((or between p paired chromosomes in an individual). ) For example, p , species two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. In molecular biology and bioinformatics, a SNP array is a type of DNA microarray which is used to detect polymorphisms within a population..

(43) 1. 2.. aCGH SNP array. 3. Tilling array 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.. cDNA array qPCR array RNA array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. Tiling arrays differ in the nature of the probes. probes Short fragments are designed to cover the entire genome or contigs of the genome. Depending on the probe lengths and spacing different degrees of g arrayy can resolution can be achieved. Number of features on a single range from 10,000 to greater than 6,000,000, with each feature containing millions of copies of one probe. Tiling arrays can produce an unbiased look at gene expression because previously unidentified genes can still be incorporated. On top of individual gene expression analysis, other tiling arrays can be used in transcriptome mapping, ChIP‐chip, MeDIP‐chip and DNase Chip studies, Array CGH among others..

(44) 1. 2. 3.. aCGH SNP array Tilling array. 4. cDNA array 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.. qPCR array RNA array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. A DNA microarray (also commonly known as gene or genome chip, chip DNA chip, or gene array) is a collection of microscopic DNA spots, commonly representing single genes, arrayed on a solid surface by covalent y are different from other attachment to a chemical matrix. DNA arrays types of microarray only in that they either measure DNA or use DNA as part of its detection system. Qualitative or quantitative measurements with DNA microarrays utilize the selective nature of DNA‐DNA or DNA‐ RNA hybridization under high‐stringency conditions and fluorophore‐ based detection. DNA arrays are commonly used for expression profiling, i.e., monitoring expression levels of thousands of genes simultaneously, or for comparative genomic hybridization.. Isotope Nylon – cDNA (300‐900 nt). Two‐colour cDNA or Oligo (80 nt) 500 – 11,000 elements. Affymetrix Silicone – oligo (20 nt) 22 ,000 elements.

(45) 1. 2. 3. 4.. aCGH SNP array Tilling array cDNA array. 5. qPCR array 6. 7. 8. 9. 10. 11. 12. 13. 14.. RNA array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. Tiling arrays Quantitative polymerase chain reaction (qPCR) is a modification of the polymerase chain reaction used to rapidly measure the quantity of DNA, complementary DNA or ribonucleic acid present in a sample. Like other forms of polymerase chain reaction, the process is used to amplify DNA samples, via the t temperature-mediated t di t d enzyme DNA polymerase. l qPCR PCR Arrays A are designed d i d to t profile the expression of a panel of genes relevant to a specific pathway or disease state..

(46) 1. 2. 3. 4. 5.. aCGH SNP array Tilling array cDNA array qPCR array. 6. RNA array 7. 8. 9. 10. 11. 12. 13. 14.. Protein array CHIP on chip Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. In genetics, microRNAs (miRNA) are single single‐stranded stranded RNA molecules of 21 21‐ 23 nucleotides in length, which regulate gene expression. miRNAs are encoded by genes from whose DNA they are transcribed but miRNAs are protein ((non‐codingg RNA); ); instead each p primaryy not translated into p transcript (a pri‐miRNA) is processed into a short stem‐loop structure called a pre‐miRNA and finally into a functional miRNA. Mature miRNA molecules are partially complementary to one or more messenger RNA (mRNA) molecules, and their main function is to down‐regulate gene expression. MicroRNA array is a kind of arrays which is designed for the detection of miRNA expression profile..

(47) 1. 2. 3. 4. 5. 6.. aCGH SNP array Tilling array cDNA array qPCR array RNA array. 7. Protein array 8. 9. 10. 11. 12. 13. 14.. CHIP on chip Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. A protein microarray is a piece of glass on which different molecules of protein have been affixed at separate locations in an ordered manner thus forming a microscopic array. These are used to identify protein‐protein protein kinases,, or to identifyy the interactions,, to identifyy the substrates of p targets of biologically active small molecules. The most common protein microarray is the antibody microarray, where antibodies are spotted onto the protein chip and are used as capture molecules to detect proteins from cell lysate solutions. Related microarray technologies also include DNA microarrays, Antibody microarrays, Tissue microarrays and Chemical Compound Microarrays..

(48) 1. 2. 3. 4. 5. 6. 7.. aCGH SNP array Tilling array cDNA array qPCR array RNA array Protein array. 8. CHIP on chip 9. 10. 11. 12. 13. 14.. Antibody array Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. ChIP on chip (also known as ChIP ChIP‐on‐chip ChIP‐chip) chip) is a technique that combines chromatin immunoprecipitation ("ChIP") with microarray technology ("chip"). Like regular ChIP, ChIP‐on‐chip is used to investigate interactions proteins and DNA in vivo. Specifically, p y, it allows the identification between p of binding sites of DNA‐binding proteins on a genome‐wide basis. One of the long‐term goals ChIP‐on‐chip was designed for is to establish a catalogue of (selected) organisms that lists all protein‐DNA interactions under various physiological conditions..

(49) 1. 2. 3. 4. 5. 6. 7. 8.. aCGH SNP array Tilling array cDNA array qPCR array RNA array Protein array CHIP on chip. 9. aAntibody 10. 11. 12. 13. 14.. Organic compound ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. An antibody microarray is a specific form of protein microarrays, microarrays a collection of capture antibodies are spotted and fixed on a solid surface, such as glass, plastic and silicon chip for the purpose of detecting antigens. protein expressions p from Antibodyy microarrayy is often used for detectingg p cell lysates in general research and special biomarkers from serum or urine for diagnostic applications..

(50) 1. 2. 3. 4. 5. 6. 7. 8. 9.. aCGH SNP array Tilling array cDNA array qPCR array RNA array Protein array CHIP on chip Antibody array. 10. aOrganic cpd 11. 12. 13. 14.. ELISA array ELISA array Cell array Tissue array Lab‐on‐a‐chip. A chemical compound microarray is a collection of organic chemical compounds spotted on a solid surface, such as glass and plastic. In chemical genetics research, they are routinely used for searching proteins p chemical compounds, p , and in ggeneral drugg discoveryy that bind with specific research, they provide a multiplex way to search potential drugs for therapeutic targets. There are three different forms of chemical compound microarrays based on the fabrication method, to covalently immobilize (usually called Small Molecule Microarray), to spot and dry organic compounds on the solid surface without immobilization (commercial name as Micro Arrayed Compound Screening (µARCS)), and to spot organic compounds in a homogenous solution without immobilization and drying effect..

(51) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.. aCGH SNP array Tilling array cDNA array qPCR array RNA array Protein array CHIP on chip Antibody array Organic compound. 11. ELISA array ELISA array 12. Cell array 13. Tissue array 14. Lab‐on‐a‐chip. Enzyme Linked ImmunoSorbent Assay, Enzyme-Linked Assay or ELISA, ELISA is a biochemical technique used mainly in immunology to detect the presence of an antibody or an antigen in a sample. The ELISA has been used as a diagnostic tool in medicine and plant pathology, as well as a quality control check in various industries. ELISA array (or array ELISA) is a new technology capable of simultaneously identifying multiple biomarkers to generate a biochemical profile. Once the ELISA protocol is p the arrayy is imaged g usingg either a CCD imaging g g system y or X-rayy film to completed capture the chemiluminescent signal. The pixel intensity of each spot is directly correlated to the concentration..

(52) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.. aCGH SNP array Tilling array cDNA array qPCR array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array. 12. Cell array 13. Tissue array 14. Lab‐on‐a‐chip. Cell array is a common term for different techniques, techniques which are used in genomic‐level cell biological testing. The live cell array, a microscope slide‐ based high content analysis tool, enables multi‐parametric imaging‐based y on thousands of intact individual cells,, includingg non‐adheringg assays blood and bone marrow cells. Cells can be observed in their own identified location, tracking individual, real‐time responses to intervention. Multiple functional assays can be performed on a living cell, followed by post‐fixation studies on the same cell to maximize usage of cell samples. LIVECELL ARRAY microscope slide. The Individual Cell Array with 15 (left) and 100‐micron..

(53) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.. aCGH SNP array Tilling array cDNA array qPCR array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array Cell array. Tissue microarrays are produced by a method of re re‐locating locating tissue from conventional histologic paraffin blocks so that tissue from multiple patients or blocks can be seen on the same slide. This is done by using a p y a standard histologic g sections and p placingg the core into needle to biopsy an array on a recipient paraffin block.. 13. Tissue array 14. Lab‐on‐a‐chip. 1. 2. whole tissue. 3. tissue cocktail. 4. tissue sausage . 5. 6. tissue array . 7.

(54) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.. aCGH SNP array Tilling array cDNA array qPCR array RNA array Protein array CHIP on chip Antibody array Organic compound ELISA array Cell array Tissue array. 14. Lab‐on‐a‐chip. Lab on a chip (LOC) is a term for devices that integrate (multiple) Lab‐on‐a‐chip laboratory functions on a single chip of only millimeters to a few square centimeters in size and that are capable of handling extremely small fluid pico liters. volumes down to less than p.

(55) make a choice, a e a c o ce, which one you like it and start on it.

(56) The 6 steps of a microarray experiment The 6 steps of a microarray experiment 1. 2. 3. 4. 5 5. 6.. Manufacturing of the microarray Experimental design and choice of reference: what to compare to what? Experimental design and choice of reference: what to compare to what? Target preparation (labeling) and hybridization Image acquisition (scanning) and quantification (signal intensity to numbers) D b Database building, filtering and normalization b ildi fil i d li i Statistical analysis and data mining. Microarrayer Experimental design Experimental design Hybridization Washing and Drying Scanning Application.

(57) Experimental methodology 1. Experimental E i t l Design. 2. 3. Microarray Mi Experiments. IImage Processing. 4. Data D t Normalization. 5. Data D t Analysis.

(58) 1. Manufacturing of the microarray 2. Experimental design and choice of reference: what to compare to what?. 3. Target preparation (labeling) and hybridization 4. Image acquisition (scanning) and quantification (signal intensity to numbers) 5. Database building, filtering and normalization 6. Statistical analysis and data mining Statistical analysis and data mining.

(59) 1. Manufacturing of the microarray 2. Experimental design and choice of reference: what to compare to what? 3. Target preparation (labeling) and hybridization. 4. Image acquisition (scanning) and quantification (signal intensity to numbers) 5. Database building, filtering and normalization 6. Statistical analysis and data mining Statistical analysis and data mining. Samples. Platform. Raw Data (image, 16bit .tif file) (1) image i acquisition i ii (2) spot location. Preprocessed Data (.gal file) (3) computation of spot intensities. Data Analysis (different ( ff levels)) (4) data reporting.

(60) Image Processing • Resolution – standard 10m (100,000 atoms wide) standard 10m (100 000 atoms wide) – 100m spot on chip = 10 pixels in diameter • Image format – TIFF 16 bit (64K grey levels) 6 b t (6 g ey e e s) – 1cm x 1cm image at 16 bit = 2Mb (uncompressed) • Separate image for each fluorescent sample Green channel – channel 1, channel 2. Scan and detect with confocal laser system. Red channel. laser detection 1) Laser beam excites each spot of DNA 2) Amount of fluorescence detected 3) Different lasers used for different wavelengths. Overlay images and normalize. Image process and analyze.

(61) Image Processing  Addressing or gridding background – locate centers, assigning coordinates to each of the spots g g p  Segmentation or spot picking – classifying pixels either as foreground or as background signal  Intensity extraction (for each spot) – Foreground fluorescence intensity pairs (R, G) – Background intensities – Quality measures  Information extraction – for each spot of the array, calculates signal intensity pairs, background and quality . measures. Raw (combined) image. Gridded. Spots picked & flagged. Intensity.

(62) Sources of Systematic Errors (bias)     . Different incorporation efficiency of dyes Diff Different amounts t t of mRNA f RNA Experimenter/protocol issues (comparing chips processed by different labs) Different scanning parameters Batch bias  Ideally: scatter plot coincides with the x=y diagonal y p y g – Due to Random errors: we expect to see a ‘cloud’ around the x=y diagonal.  In practice: Both Random and Systematic measurement errors (Bias) – Due to Biases scatter plots are not centered around the x‐y diagonal   . Hybridization of  the same sample to 2 chips/channels.

(63) 1. 2. 3. 4. 5.. Manufacturing of the microarray Experimental design and choice of reference: what to compare to what? Target preparation (labeling) and hybridization Image acquisition (scanning) and quantification (signal intensity to numbers) Database building, filtering and normalization. 6. Statistical analysis and data mining. Guidelines for the statistical analysis y of microarrayy experiments. p. Allison DB et al. (2005) Microarray data analysis: from disarray to consolidation and consensus Nat Rev gene. 7: 55–65 doi:10.1038/nri1749.

(64) Single-Test Platform of Microarray & Knowledge Discovery. training g data collection. feature integration. feature selection.

(65) Higher Level Microarray data analysis – – – – – – –. Clustering and pattern detection Data mining and visualization Controls and normalization of results Statistical validatation Linkage between gene expression data and gene sequence/function/metabolic pathways databases Discovery of common sequences in co co-regulated regulated genes Meta-studies using data from multiple experiments. Ge enes for cla ass dis stinction (n n=271). Diagnostic ALL BM samples (n=327). E2A E2APBX1. MLL. 1 0 ‐3 ‐2 ‐1  = std deviation from mean. T-ALL T ALL 2. 3. Hyperdiploid >50. BCR BCRABL. Novel. TEL-AML1 TEL AML1.

(66) Proteomics.

(67)

(68) “The dynamic range of protein abundance comprises up to ten orders of magnitude and cannot be d f d d b covered by a single analytical techniques without fractionation, depletion or concentration.”.

(69) The Lancet (2000) 356:1749. .

(70) Complete human prion protein sequence: MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGG GGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTN MKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVY YRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCIT QYERESQAYYQRGSSMVLFSSPPVILLISFLIFLIVG. Complete human prion nucleotide sequence: atggcgaaccttggctgctggatgctggttctctttgtggccacatggagtgac ctgggcctctgcaagaagcgcccgaagcctggaggatggaacactgggggcagc cgatacccggggcagggcagccctggaggcaaccgctacccacctcagggcggt ggtggctgggggcagcctcatggtggtggctgggggcagcctcatggtggtggc tgggggcagccccatggtggtggctggggacagcctcatggtggtggctggggt caaggaggtggcacccacagtcagtggaacaagccgagtaagccaaaaaccaac atgaagcacatggctggtgctgcagcagctggggcagtggtggggggccttggc ggctacatgctgggaagtgccatgagcaggcccatcatacatttcggcagtgac tatgaggaccgttactatcgtgaaaacatgcaccgttaccccaaccaagtgtac tacaggcccatggatgagtacagcaaccagaacaactttgtgcacgactgcgtc aatatcacaatcaagcagcacacggtcaccacaaccaccaagggggagaacttc accgagaccgacgttaagatgatggagcgcgtggttgagcagatgtgtatcacc cagtacgagagggaatctcaggcctattaccagagaggatcgagcatggtcctc ttctcctctccacctgtgatcctcctgatctctttcctcatcttcctgatagtg ggatga.

(71) Current Proteomics Technologies Current Proteomics Technologies. Nature Genetics 33, 311 ‐ 323 (2003) . Figure g 4. Time line indicating the convergence of different technologies and resources into a g g g proteomic process. Advances in mass spectrometry and the generation of large quantities of nucleotide sequence information, combined with computational algorithms that could correlate the two, led to the emergence of proteomics as a field.. Proteome profiling/separation • 2D SDS PAGE (two‐dimensional sodium dodecylsulphate polyacrylamide gel electro‐ dodecylsulphate polyacrylamide gel electro phoresis) • 2‐D LC/LC (LC = Liquid Chromatography) • 2‐D LC/MS 2 D LC/MS (MS= Mass spectrometry). Protein identification • Peptide mass fingerprint • Tandem Mass Spectrometry (MS/MS) Tandem Mass Spectrometry (MS/MS). Quantative proteomics • ICAT (isotope‐coded affinity tag).

(72)

(73) Two‐‐dimensional Two dimensional gelelectrophoresis dimensional gelelectrophoresis (2D 2D‐‐PAGE PAGE) of cell ) of cell lysates lysates generates global patterns of protein expression generates global patterns of protein expression Annotation Large‐scale visualization of differential protein expression . Mass spectrometry Peptide mass fingerprinting Peptide mass fingerprinting for protein identification. ‐ High resolution 2D High resolution 2D‐PAGE PAGE first developed in 1975 (O first developed in 1975 (O’Farrell Farrell and Klose) and Klose) ‐ Combination with biological mass spectrometry (1990s) ‐ Availability of genome sequences in databases.  central role in proteomic studies l l i i di.

(74) P. H. O'Farrell High resolution two‐dimensional High resolution two dimensional electrophoresis of proteins J. Biol. Chem., May 1975; 250: 4007 ‐ 4021 . A technique has been developed for the separation of proteins by two‐ dimensional polyacrylamidegel electrophoresis. Due to its resolution and sensitivity, this technique is a powerful tool for the analysis and detection of proteins from complex biological sources..

(75) B.W. Gibson and K. Biemann Strategy for the Mass Spectrometric Verification and Correction of the Primary Correction of the Primary Structures of Proteins Deduced from Their DNA Sequences PNAS 1984 81: 1956‐1960.. Fast atom bombardment mass spectrometry has been used to confirm and correct regions from the amino acid sequences of three large proteins glutaminyl and glycyl proteins, glutaminyl‐ and glycyl tRNA synthetase from Escherichia coli and methionyl‐tRNA synthetase from yeast, whose primary structures had been deduced from the base sequences of their corresponding genes. The strategy is based on a comparison of the molecular weights of the tryptic peptides predicted from all three reading frames of the predicted from all three reading frames of the gene sequences with those determined mass spectrometrically..

(76) Tools for Protein Identification Tools for Protein Identification Enzyme Digestion Enzyme Digestion. ESI: Protein MW can be calculated from a protein’s charge distribution. To get smaller fragments: Trypsin (99%), LysC, others. Mass Spectrometry Two ionization techniques: 1. Matrix Assisted Laser Desorption/Ionization (MALDI) 2. Electrospray Ionization  (ESI) Ionization (ESI) Both with many types of mass analyzers: TOF, Quadrupole (Q), Q‐TOF, FT ICR MS, Q‐ITOF, etc.. MALDI/TOF – whole protein detected.

(77) Edman P, Begg G. A protein sequenator. Eur J Biochem. 1967 h Mar;1(1):80‐91.. Edman degradation, developed by Pehr Victor Edman, is a method of sequencing amino acids d i h d f i i id in a peptide. In this method, the amino‐ terminal residue is labeled and cleaved from the peptide without disrupting the peptide p p p g p p bonds between other amino acid residues. ABI Procise 492.

(78) Two major techniques for sequencing a polypeptide o ajo tec ques o seque c g a po ypept de (Edman degradation versus  Mass Spectrometry). Edman degradation for N‐terminal analysis peptides Involves treatment of a peptide with phenyl isothiocyanate (PITC), C6H5–N=C=S, followed by treatment with trifluoroacetic acid acid . The phenylthiohydantoin (PTH) is identified chromatographically by comparison of its elution times with the known elution times of PTH derivatives of all 20 common amino acids.. Figure 1 A simple analogy to illustrate the two most commonly used proteomic strategies Clin. Sci. (2005) 109, 421‐430 .

(79) Protein expression profiling: ～1000 proteins routinely detectable in a 2D‐gel  p p g p y g gglobal changes in g the proteome readily detectable pI. • Identify Identify specific proteins in a cell that undergo specific proteins in a cell that undergo changes in abundance, localization, or modification in response to a specific biological condition • Often combined with complementary techniques (protein biochemistry, molecular biology and cell physiology). MW. • posttranscriptional control mechanisms can influence protein expression • posttranslational modifications of a protein such as phosphorylation, glycosylation, processing of signal sequences or degradation can be visualized SYPRO Ruby stained gel.

(80) Sample preparation Isoelectrofocusing (1st dimension) Equilibration incl. reduction, alkylation SDS‐PAGE (2nd dimension) Staining Imaging Spot detection and matching Normalization and quantification Analysis Cutting of selected spots Trypsin digestion Identification with mass spectroscopy Database comparison. subproteome depletion / enrichment. Detergents: solubilize membrane proteins-separation from lipids Reductants: Reduce S-S bonds Denaturing agents: Disrupt proteinprotein interactions-unfold p p proteins Enzymes: Digest contaminating molecules (nucleic acids etc.) Protease inhibitors. enrichment enrichment.

(81) Isoelectric point (pI):. Immobilized pH gradients . Isoelectric Focusing is an electrophoretic method that separates proteins according to their isoelectric points (pI). Proteins are amphoteric molecules; they carry either positive, negative, or zero net charge, charge depending on the pH of their surroundings. surroundings The net charge of a protein is the sum of all the negative and positive charges of its amino acid side chains and amino‐ and carboxyl‐ termini. The isoelectric point (pI) is the specific pH at which the net charge of the protein is zero. Proteins are positively charged at pH values below their pI and negatively charged at pH values above their pI. If the net charge of a protein is plotted versus the pH of its environment, the resulting curve intersects the x‐axis at the isoelectric p point.. Immobilized pH gradients are formed using two solutions, one containing a relatively acidic mixture of acrylamido buffers and the other containing a relatively basic buffer mixture. The concentrations of the various buffers in the two solutions define the range and shape of the pH gradient produced. Both solutions contain acrylamide monomers and catalysts. During polymerization, the acrylamide portion of the buffers copolymerizes with the acrylamide and bisacrylamide monomers to form a polyacrylamide gel.. FIGURE: Immobilized pH gradient polyacrylamide gel matrix showing attached buffering groups.. FIGURE: Plot of the net charge of a protein versus the pH of its environment. The point of intersection of the curve at the x‐axis represents the isoelectric point of the protein..

(82) Sample preparation Isoelectrofocusing (1st dimension) Equilibration incl. reduction, alkylation SDS‐PAGE (2nd dimension) Staining Imaging Spot detection and matching Normalization and quantification Analysis Cutting of selected spots Trypsin digestion Identification with mass spectroscopy Database comparison. Ettan IPGphor 3 IEF System. Wide and Narrow pH Gradients • Wide gradients are applied for: entire protein spectrum • Narrow gradients are applied for increased resolution increased loading capacity to detect and analyze more proteins analyze more proteins At  the isolectric point the protein has no net charge and therefore no longer migrates in the electric field..

(83) Sample preparation Isoelectrofocusing (1st dimension) Equilibration incl. reduction, alkylation SDS‐PAGE (2nd dimension) Staining Imaging Spot detection and matching Normalization and quantification Analysis Cutting of selected spots Trypsin digestion Identification with mass spectroscopy Database comparison. Proteins enter SDS‐Polyacrylamide gel and are dissolved according to Proteins enter SDS‐Polyacrylamide gel and are dissolved according to their molecular mass. Postelectrophoretic staining of the proteins with:   Coomassie,  Silver, Fluorescent stains (SYPRO Ruby) Fluorescent stains (SYPRO Ruby) Protein expression profiling:～ 1000 proteins routinely detectable in a 2D‐gel  global changes in the proteome readily detectable. Posttranscriptional control mechanisms can influence protein p p expression. Posttranslational modifications of a protein such as phosphorylation, glycosylation, processing of signal sequences or degradation can be visualized.. SSecond‐dimension separation with Ettan DALT electrophoresis system. First‐dimension d di i i i hE DALT l h i Fi di i separation: Ettan IPGphor IEF system, 24‐cm Immobiline DryStrip gel, pH 3‐10 NL. Second‐ dimension separation: Ettan DALT electrophoresis system, DALT Gel 12.5 (26 x 20 cm)..

(84) Dynamic range detectable on 2D‐gels: Dynamic range detectable on 2D gels: 10 104, protein expression levels of a cell can vary between 10 , protein expression levels of a cell can vary between 105 (yeast) (yeast) and even 1010 (humans).  enrichment or pre‐fractionation strategies needed to reach less abundant proteins Resolution of 2D gels has its limits Resolution of 2D‐gels has its limits.  use narrow pH range gels and combine Protein extraction and solubility during IEF can be a problem for poorly water‐soluble proteins e.g. membrane proteins or nuclear proteins. Challenges for further development in gel‐based proteomics:  improve sample preparation to be able to analyze extreme proteins (extremely basic or acidic extremely small or big extremely basic or acidic, extremely small or big, extremely hydrophobic), sensitivity, dynamic range, automation..

(85) M lti l d P t Multiplexed Proteomics Technology i T h l. SYPRO® Ruby [total protein] Pro‐Q® Diamond [phosphoprotein ] Pro‐Q® Emerald [glycoprotein]. Three stains in one gel phosphoprotein. glycoprotein. total protein. Identify changes in expression, phosphorylation h h l and glycosylation d l l Compare multiple samples with high accuracy and reproducibility Obtain more data from precious samples p p.

(86) General workflow of proteomics analysis (peptide mass fingerprinting method). Post‐ separation chemistry. Mass spectrometry Staining/imaging di ti digestion. b 0 5 1 0 1 0 b 0 2 # 6 7 9 9 R T: 8 2 .0 6 A V : 1 N L : 4 .6 5 E 7 T: F TM S + p N S I F ull m s [ 3 0 0 .0 0 -1 6 0 0 .0 0 ] 6 7 1 .8 8 3 2 1 00 95 90 85 80 75 70. 6 7 2 .3 8 4 9. Proteomics Protein Separation. 1D/2D gels LC Microfludic chips MDLC. Relative Abundance. 65. Mass Mass analysis. 60 55 50 45 40 35 30 6 7 2 .8 8 6 9. 25. 6 7 5 .8 9 0 7. 20. 6 7 6 .3 9 1 4. 15 10. 0. 6 7 3 .8 9 0 6. 6 7 1 .7 0 9 5 6 72. 2‐D based analysis experimental flow i t l fl. biofluid. tissue. cell. Sample preparation μ‐dissection, cell sorting, lysate, sub‐cellular fractionation. Laser capture microdissection. 6 7 6 .8 9 2 6 6 7 7 .3 9 3 7. 6 7 3 .3 8 8 6 5. 673. 67 4. 6 7 4 .8 8 7 5. 6 7 5 .3 8 9 4. 6 75 m /z. 67 6. 6 77. MALDI‐TOF PMF ESI‐MS/MS ESI MS/MS. Data analysis. Bioinformatics. reduction of spectral information to protein ID and characterization, database searching. (A) The unknown protein is excised from a gel and converted to peptides by  the  action of a specific protease. The mass of the peptides produced is then measured in a mass spectrometer. (B) The mass spectrum of the unknown protein is searched against (B) The mass spectrum of the unknown protein is searched against theoretical mass spectra produced by computer‐generated cleavage of proteins in the database.. 6 7 8 .4 0 0 0 67 8. 6 79.

(87) Built upon the classical gel approach to protein quantification (gel densitometry) Separate samples are treated with unique fluorophore tags (binding covalently with lysine ε‐amino groups) Samples are combined and run on the same 2D gel (ΔMW of proteins is negligible) Samples are combined and run on the same 2D gel (ΔMW of proteins is negligible) Quantitative Analysis is based on relative intensities of fluorescing labels at specific spots (relative quantitation) or to labeled standard (absolute quantitation). Allows use of an internal standard in each gel which reduces gel to gel variation reduces the number of Allows use of an internal standard in each gel which reduces gel‐to‐gel variation, reduces the number of gels to be run. . Pooled internal standard label with Cy2. Cy2. Protein extract 1 label with Cy3. Cy3. Protein extract 2 label with Cy5. Mix labelled extracts. Cy5. 2-D separation. Typhoon Variable Mode Imager. DeCyder Differential Analysis y Software.

(88) Sample preparation Isoelectrofocusing (1st dimension) Equilibration incl. reduction, alkylation SDS‐PAGE (2nd dimension) Staining Imaging Spot detection and matching Normalization and quantification Analysis Cutting of selected spots Trypsin digestion Identification with mass spectroscopy Database comparison. 1. 2. 1. 2. http://expasy.org/melanie/. manual. P Pre‐vaccine i. P t Post‐vaccine i. robotics. ImageQuant ECL. Commassie. Silver. Sypro Ruby. C 5 Cy5. C 3 Cy3. C 5/C 3 Cy5/Cy3. P t i Proteineer SP SP.

(89) Sample preparation Isoelectrofocusing (1st dimension) Equilibration incl. reduction, alkylation SDS‐PAGE (2nd dimension) Staining Imaging Spot detection and matching Normalization and quantification Analysis Cutting of selected spots Trypsin digestion Identification with mass spectroscopy Database comparison http://expasy.org/images/cartoon/2dgels.gif. Expression clustering. HCL GDM.

(90) Sample preparation Isoelectrofocusing (1st dimension) Equilibration incl. reduction, alkylation SDS‐PAGE (2nd dimension) Staining Imaging Spot detection and matching Normalization and quantification Analysis Cutting of selected spots Trypsin digestion Identification with mass spectroscopy Database comparison. 1. 2. 3. 4. 5. 6.. Destain Destain Shrink with acetonitrile Digest Extract peptides and wash Capture peptides on C18 Elute in MS‐compatible buffer. Trypsin Good activity both in gel digestion and in solution.. (Cleaves at lysine [K] and arginine [R], unless either is followed by proline [P] in C‐terminal direction). Other enzymes with more or less specific cleavage: Chymotrypsin (F, W, Y, L, M) Lys‐C (K) Arg‐C (R) Asp‐N (D, N‐terminal) V8‐bicarb (E) V8‐biphosph (E, D) {CNBr (M)}. Proteineer DP & MAP II/8 MALDI AutoPrep System.

(91) Sample preparation Isoelectrofocusing (1st dimension) Equilibration incl. reduction, alkylation SDS‐PAGE (2nd dimension) Staining Imaging Spot detection and matching Normalization and quantification Analysis Cutting of selected spots Trypsin digestion Identification with mass spectroscopy Database comparison. Biological Pre‐fractionation of sample Biological Pre‐fractionation of sample (organs, tissues, cell types, subcellular components). Protein extract. Chromatographic Fractionation. 1‐DE. 2‐DE. digest. digest. Label with amino acid targeted affinity reagents  (such as ICAT®). Enzymatic of chemical digest. digest. Multidimensional LC. M Mass spectrometry t t (ESI, MALDI, …) A glance at the typical sensitivity and mass ranges allowed by different ionization techniques allowed by different ionization techniques provides a clear answer to the question of which are most useful; electron ionization (EI), atmospheric pressure chemical ionization (APCI) and desorption/ionization on silicon (DIOS) are somewhat limiting in terms of upper mass range, while electrospray ionization (ESI), nanoelectrospray ionization (nanoESI), and matrix‐assisted laser desorption ionization (MALDI) have a high practical mass range.. Chromatography Validation Validation, clinical trail, … …. Biomark application (therapy, diagnosis, … … ).

(92) Mass Spectrometry of Small Molecules: Magnetic‐Sector Instruments • Mass spectrometry (MS) measures the mass and molecular weight (MW) of a molecule. • Provides structural information by finding the masses of fragments produced when molecules break apart. • Three basic parts of mass spectrometers. Ionization. Analyzer. Sample Sample input. Detector Bruker Autoflex Atomatic MALDI‐TOF MS .

(93) MPSESSYKVHRPAKSGGS. MPSESSYK VHR PAK SGGS. protein. peptides. spectrum. VALEACVQAR. Mass = 1059.25 Da. VALEACVQAR | H+. Mass = 1060.25 Da m/z = 1060.25 / 1 = 1060.25. H+－VALEACVQAR | H+. Mass = 1061.25 Da m/z = 1061.25 / 2 = 530.63. MS Spectrum.

(94) http://nobelprize.org.

(95) Mass Spectrometry ass Spect o et y. y analyzer. detector. produces the ions from the sample (vaporization /ionization). resolves ions based on their mass/charge (m/z) ratio. detection of mass separated ions. Ion source. Analyzer. ESI, MALDI. PSD, Ion trap, Q‐TOF, / , TOF/TOF, FT‐ICR. source. Strengths. Weaknesses. . . Precise molecular weight  Fragmentation  Automated. database identification and prediction. Best for a few molecules at a molecules at a time  Best for small molecules  Mass‐to‐charge Mass‐to‐charge ratio, not mass  Intensity ≠ Abundance. Single Stage MS. Tandem MS.

(96) Peptide Mass Fingerprinting A protein identification technique, that correlates experimental data with theoretical data. Protein. 1. 2. 3. 4.. proteasome activator protein PA26 [Trypanosoma brucei] MPPKRAALIQ GTLQEIEGIA EKSPEQLLGV VIAIRIPEHK EIKTLGSGEK ARSTVEDKLL LELRQIDADF AYLLNWKKLI. NLRDSYTETS KAAAEAHGTI LQRYQDLCHN EEDNLGVAVQ SGSGGAPTPI GSVDAESGKT MLKVELATTH QPRTGSDHMV. Experimental MS. Excise band from gel Tryptic Digestion of gel fragment Supernatant transferred to fresh eppendorf Sample transferred to target plate. Protein sequence from database. 1 31 61 91 121 151 181 211. Proteolytic digestion. SFAVIEEWAA RNSTYGRAQA VYCQAETIRT HAVLKIIDEL GMYALREYLS KGGSQSPSLL LSTMVRAVIN S. In silico digestion TGTDHMVS AALIQNLR IIDELEIK AAAEAHGTIR LLGSVDAESGK QIDADFMLK SPEQLLGVLQR AVINAYLLNWK GGSQSPSLLLELR VELATTHLSTMVR SGSGGAPTPIGMYALR EEDNLGVAVQHAVLK YQDLCHNVYCQAETIR DSYTETSSFAVIEEWAAGTLQEIEGIAK. Computer search. Theoretical MS . 1000. 1500. 2000. 2500 Mass (m/z). 1. In silico digest p 2. Mass computation 3. Compare computer generated masses with observed spectrum.

(97) Some Principles of Mass Spectrometry Some Principles of Mass Spectrometry .

(98) MS or MS/MS / Peptide Mass Fingerprint. Peptide Fragmentation Search. Fast, simple analysis High sensitivity Need database of protein sequences S Sequence must be present in database tb ti d t b Not good for mixture. Easily automated for high throughput Can get matches from marginal data Can be slow MS/MS is peptide identification MS/MS i tid id tifi ti. [Abs. Int. *10^ 3]H b y7.5. D. Q L. F A. A. H N. D L. V. A. V V. L D. A. 7.0. V. N H. A F. Q. D. T. y9. 6.5 6.0 b7. 5.5 b6 5.0. spot  picked. y 15 4.5. aT1L b3. 4.0 y1 3.5 3.0 2.5. y 13. V. y4. bQ1. b8 a 5b 5. 2.0 D N 1.5 Ab-17 1F 1.0. 1598.881 1598 881 1696.950. y3 b2. y5. b-17 y2 a-17 3 3. y6. by 910. y 7a 7. b4. a-17 6. a8 y8 b-17 8 a-17 8. 200. 400. 600. 800. y 14. b 10 a 10. 0.5. 1000. y 12 y 11 b 11 a 17 a-17 12 b 12 b 17 b-17. 1200. b 14. 1400. 1600. m/z. [Abs. Int. *10^ 3]. MS/MS from 1669 amu:  TDQFHDAVLVNAALR. 5.0. 4.5. 940.544. low peak 1697 amu selected. 4.0. 3.5. 3.0. 2.5 1304.669 2.0 1352.746 1975.947 1.5. 1046.496. 1.0. 2211.126. 500. 1750 m/z. 1598.881 1696.950. 2912.537. 05 0.5. 750. 1000. 1250. 1500. 1750. 2000. 2250. 2500. 2750. 3000. m/z. In‐Gel (Spot 16) Digest MS Spectrum . Mascot Score for UltraFlex TOF/TOF of 1669.93.

(99) LC/MS/MS of Peptide Mixtures LC/MS/MS of Peptide Mixtures LC. MS (MW Profile). MS/MS (AA Identity).

(100) Advantages of LC/MS over 2D PAGE Easier automation no robots no robots no gel handling Better separation power Monolithic columns Multi‐dimensional chromatography Simpler coupling to MS. Disadvantages of LC/MS Proteins are chopped up Quantization difficult Quantization difficult Huge amount of data (~10 GB/run) Data hard to manage/interpret.

(101) LC/MS/TOF. intensitty. 3D view: m/z, intensity, time. m/z. m/z time time. MALDI‐TOF. intensity. intensity. 2D view: m/z, intensity. m/z. m/z.

(102) b9+-H2O, y9-NH3, z9 y11+-NH3. y6+ c11. +++. y3+. b3+-H2O. b4+. b5+. y7+ b6+. y10+. y9+. b7+. 2+. y12+. y14+. b12+-NH3. b10+. b11+. b9+. y13+ b13+. y11+. y4+. Peptide Sequencing by LC/MS/MS Peptide Sequencing by LC/MS/MS. MGLAAAVFTK. y3. y5+. y8+. ++. b12+. y4 y1 FTK+. y2. y1. + AAVFTK+. K+. +. LAAAVFTK +. TTriple mass spectrum i l (MS3). On‐line separation. y5. b9+. +. +. b7+. b8+-NH3 y9+. b3+ y3+. STEP 1 {LC} [separation 1]. +. y4. +. b5+. b6+. b14. y12. b11+-NH3. +. y11 +. +. y13. +. +. y10+. b15. +. b11. y8. b3+-NH3. + GLAAAVFTK+. y7. +. y6. b15+, y15+. b 15+-H2O. STEP 3 {MS/MS} [peptide identification]. y3. STEP 2 {MS} [ionization]. b14+. y14+-H2O. b12. y13. +-NH. 3. +. STEP 4  {MS/MS/MS} [post translation modification].

(103) Shotgun Proteomics Shotgun Proteomics • Separation of peptides – Extensive chromatographic separation (one or g p p ( multiple dimensional separations). • Data acquisition – Data‐dependent acquisition (Automated acquisition of MS/MS spectra from as many precursor ions as possible). • Data analysis – Automated interpretation of the MS/MS A t t di t t ti f th MS/MS spectra (Database Search). K Y K F K H K H L K F D K. digestion. L F K I P V K A L E L F R S E D E M K N D M A A K. A L E L H P F R A A K G N D M D I P V K F A S E D L K G A G H P E T H K K D E M E K L E D S K S A K K H L Y K Q V E Q L L F K L A G I M MH D V A G K G W V G GQ A G G V Q E E GV M E L G F Q G G N L I I S L L K R M I K A G L V A T F K S D G E WQ L E F D K L S V T K E A E L T I T N G H P L A Q S H A G H H F Y L E. A S E D L K. separation. E L G F Q G G H P E T L E K H P G D F G A D A Q G A M S K V E A D V A G H G Q E V L I R Y L E F I S E A I. I Q V L Q S K. G H H E A E L T P A Q S H A T K M G L S D G E W Q L V L N V W G K. OR.

(104) Features: 2720. IDs: 363. CIDs: 1633 Finally: ID/CID: 22%, ID/feature: 13%.

(105) MALDI I i M S t t (IMS) MALDI Imaging Mass Spectrometry (IMS) 1. The analysis of a sample surface for its molecular content Sample consists of a thinly sliced section of tissue/organ/whole animal. 2. A 2 dimensional array of MALDI spectra are obtained over the surface of the p sample each spectra has location component. 3. An ion intensity map can then be produced for any mass that is detected over the scanned area. scanned area. Spray Coat with Matrix. Slice Tissue Sections. Data processing and analysis. Mount Sample. MALDI/TOF/TOF.

(106) compare spectra f from multiple ROIs lti l ROI. Marker A MS spectrum MS image MS image . Optical image Optical image. merged image merged image . House keep protein. Marker A MS image . compare intensity of detection protein in specific regions . Optical image. relative intensity. image overlay Marker B MS image . Marker B MS spectrum . mark of ROIs (regions of interest ). compare average intensity of a detected protein detected protein in specific regions .

(107) Proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and, ultimately, their function. Initially encompassing just two‐dimensional (2D) gel electrophoresis for protein separation and identification,, proteomics now refers to anyy procedure that p p p p characterizes large sets of proteins.. Science 16 February 2001: Vol. 291. no. 5507, pp. 1221 – 1224 PROTEOMICS: Proteomics in Genomeland Stanley Fields.

(108) Metabolomics.

(109) Metablomics Protein‐‐protein interaction network Protein protein interaction network. Jyh‐‐wei Shin Jyh.

(110) Potential benefits of genomics, proteomics and metabolomics for the patients.

(111) Interest in Systems Biology? Interest in Systems Biology? 1. Understand the structure structure of the system (Regulatory and biochemical networks) 2. Understand the dynamics dynamics of the system (Construct model with predictive capabilities) 3 Understand the control 3. U d t d th control t l methods th d. Human genome completed.

(112) A Broad Definition of Bioinformatics. • • •. • • •. IInformatics f i Its carrier is a set of digital codes and a language of digital codes and a language. . In its manifestation in the space‐time In its manifestation in the space time continuum, it has utility (e.g. to continuum it has utility (e g to decrease entropy of an open system).  . Bioinformatics The essence of life is information (i.e. from digital code to emerging properties of biosystems.) Bioinformatics is the study of information content of life.