Integrating Genome
劉恕維 洪士堯 李宗燁 林修竹 田耕豪 廖奎達
D. R. Zerbino, B. Paten, and D. Haussler Science 13 April 2012: 179-182.
Introduction
Obtaining genome sequence
Modeling the evolution of genotype
From genotype to phenotype
Looking ahead to application
Outline
Introduction
劉恕維
1970 Walter Fiers
History – The Pioneer
History – The technique
History – Hardware
3000 4000 5000 6000 7000 8000 9000 10000
Moor's law 欄 1
16
10000
Genotype & Phenotype
http://big5.ifeng.com/gate/big5 /baby.ifeng.com/yuer/special/de tail_2011_01/27/4481382_0.shtml
Autism
Genotype & Phenotype
A A
A O
B B B O
Phenotype
A B
O O
Genome evolution
Model molecular phenotype as consequen ce of genotype
Predict organismal phenotype
Challenge research
Obtaining Genomic Sequences
洪士堯
Process of reconstructing an entire ge nome from relatively short random DNA fragments, called reads.
Detect read overlaps and thereby progr essively reconstitute most of the geno me sequence.
Genome assembly
Genome assembly
PCR : Polymerase Chain Reactio n.
Procedure :
Denaturation step
Annealing step
Extension/elongation step
PCR
DNA replication in the pres ence of both dNTPs and ddNT Ps will terminate the growi ng DNA strand at each base.
In the presence of 5% ddTTP s and 95% dTTPs Taq polymer ase will incorporate a term inating ddTTP at each ‘T’
position in the growing DNA
Chain-termination method
Gel Electrophoresis se parates DNA by fragmen t size. The larger the DNA piece the slower i t will progress throug h the gel matrix towar d the positive cathode .
Chain-termination method (co
nt)
genomes commonly contain large redunda nt regions (repeats).
regions where the statistical distribu tion of bases is significantly biased (lowcomplexity DNA)
Problems
new genomes from that spe cies or closely related s pecies are generally not assembled de novo.
Using the reference genom e as a template.
After the first complete
Modeling the evolution of genot ype
李宗燁
Alignment and Assembly
Phylogenetic analysis
Evolutionary relationships between DNA
Modeling the Evolution of Genot
ype
Genomes are compared by alignment
Large scale
Indicate changes in segment order and copy number
Small scale
Indicate specific base substitutions
Alignment and Assembly
Alignment
Alignment and Assembly
Assembly
Primary challenge
Distinguish spurious sequence similaritie s from those due to common ancestry
Alignment and Assembly
Regions of genomes
Subject to purifying selection
Similarity of sequence is conserved
Orthologous protein-coding regions
Reliably aligned across great evolutionary distance s
Between vertebrates and invertebrates
Alignment and Assembly
Regions of genomes
Therefore, common to distinguish alignments of subregions
Local alignment
Used between conserved functional regions of mor e distantly related genomes
Full genome alighnments
Alignment and Assembly
Applied to
more than two species
or to multiple gene copies within a species
NP-hard.
Considerable effort has been devoted
Phylogenetic analysis
Complicated by homologous recombinatio n
Creates DNA molecules whose parts have diff erent evolutionary histories
Phylogenetic analysis
Balanced structural rearrangements
Change the order and the orientation of the bases in the genome
substitutions
Segmental duplications/gains/losses
Alter the number of copies of homologous bases
Short indels
Evolutionary relationships between DNA
Construction of a mathematically and a lgorithmically tractable unified theor y
remains a major challenge for the field.
Evolutionary relationships between DNA
FROM GENOTYPE TO
PHENOTYPE
Gregor Mendel
• Austrian Monk who experimented with pea plants
• He noticed that not all peas are the same:
• Green vs. yellow
• Tall vs. short
• Round vs. wrinkled
• He discovered that crossing peas depended on the genes of the plant rather than only the outward
appearance of the plant
Phenotype vs. Genotype
Phenotype: the physic al appearance of a pl ant or animal because of its genetic makeup (genotype)
Genotype: genetic con stitution (makeup) of
www.ansi.okstate.edu/breeds/s wine/
The Punnett Square
A way for determining the genotype and phenotype of offspring
Capital letters are assigned to domina nt genes and lower-case letters are as signed to recessive genes
Using the Punnet Square
T T
Purebred (homozygous) dominant – the genes only have the dominant trait in its code.
Example – Dominant Tall -- TT
Purebred (homozygous) recessive – the genes only have the recessive trait in its code.
Example – Recessive short – tt
Hybrid (heterozygous) – the genes are
Massive increase in Sequencing Sp
eed
New Methods of Exploring
Cross Species
History and dive rsity of life
Climate, competi tor, disease
More Studies will be Derived From E xperimental Data
Single Specie
Human Genome Studies
Number of
Genes Single Multiple
Frequency of genetic defects
Rare (<
1%) Common (>
1%)
Association studies are critical to the study of complex diseases
Association
Tag, or genotype, SNPs on the basis of Linkage Disequilibrium patterns.
Select tags to provide as much information about surrounding region based on association with
untagged SNPs.
Genome-Wide Association (GWA) addresses
some of these issues.
GWA has multiple advantages
Discovery
Studies not limited to current biological knowledge
Quantitative
Better characterize complex, quantitative traits
GWA has multiple advantages
Discovery
Studies not limited to current biological knowledge Coronary Heart Disease (CHD)
Type 2 Diabetes
Recent GWA studies discovered:
Associated regions containing no annotated genes
GWA has multiple advantages
Cardiac Arrhythm
ias
Quantitative
Better characterize complex, quantitative traits
Identification of polymorphism accounting for variance of quantitative trait
Going forward
Association Studies
Cannot provide unambiguous identification of causal genes
But
can highlight pathways and mechanisms of particular interest.
Leading to systems-level understand ing of genetics and disease
And Better Medicine!
Databases
• ENCODE
• Epigenetics roadmap
• modENCODE
• EMSEMBL
• UCSC Gene Browser
Epigenetics, RNA, Protein
Epigenetics, RNA, Protein
Can’t be directly measured
Inferred by mathematical model
Markov Models
Factor Graphs
Bayesian Networks
Markov Random Fields
Classification and Regression mod el
Classification of Epigenetic, Transcripit ional, Proteomic state to predict phenoty pe to genotype
MDS
Clustering analysis
Cox Regression
Looking ahead to applicatio n
田耕豪 廖奎達
Medicine a. Cancer b. Vaccine c. Stem cell
Agriculture
Applications
Cancer
Genomic modifications are the source o f nearly all cancers.
Applications
Applications
Acute Myeloid Leukemia ( 急性骨髓性白血病 )
骨髓性造血芽細胞異常
增殖的血液惡性腫瘤
Applications
Applications
Coding mutations identified in eight p rimary tumor–relapse pairs
Applications
Applications
Applications
High-throughput genomics data
Vaccine design
Treatment of disease
Infectious disease
Autoimmune diseases
Applications
Vaccine
Applications
Vaccine
Every year in February…
Application
Stem-cell
Genomic variants
Epigenetic state
Expression pattern
Induced pluripotent stem (iPS) cel
ls and lineage-specific directly r
Application
Application
Next step…
Integrating advances of different res earch fields
Combining above into mathematical mod els
Build comprehensive and computable mo dels