• 沒有找到結果。

Will Computers Crash Genomics

N/A
N/A
Protected

Academic year: 2022

Share "Will Computers Crash Genomics"

Copied!
47
0
0

加載中.... (立即查看全文)

全文

(1)

Will Computers Crash Genomics

SCIENCE VOL 331 11 FEBRUARY 2011

R01945014 黃博強 R01945037 林彥伯 R01945039 蘇醒宇 R01945043 吳卓翰  R01945046 蘇煒迪 R01945017 陳維

(2)

Introduction

(3)
(4)

Old Genome Informatics

(5)

The Evolution of DNA Sequencing

(6)

New Genome Informatics

(7)

Dizzy with data

(8)

Dizzy with data

• Human Genome Project

– Planned for 15 years

• Celera Genomics

– Shotgun Sequencing Method

(9)

Shotgun Sequencing Method

(10)

Assemble fragments

(11)

Assemble fragments

(12)

Dizzy with data

• After 2005

– Sequence generation

– Ability to handle the data

• “Next-generation” machines – Cheaply

– Faster

• Computer – Memory

– Processing

(13)

Dizzy with data

• Genome Project – More

• Third generation machines

– Smaller

(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)

Storage Issues

(23)

Cost v.s.Data

3.2 billion base pairs X 1,000 X 10,000 = USD$ 32,000,000

USD$

3,200

(24)

Problems facing Bioinformatic

Data storage Data transfer

(25)

Data Storage

• Bioinformatics field tend to archive all raw sequence data.

More than 90 GB

(26)

Data Transfer

• Want to analyze a genome?

More than 594 GB

(27)

Solving the problem (storage)

• Discard the original image files , an d only keep the sequence data.

• If necessary, just re-sequence the sa

mple.

(28)

Solving the problem (storage)

• Putting the data in an off-site facil ity.

$0.095 per GB-month of data stored (Singapore)

$0.100 per GB-month of data stored (Tokyo)

$0.500 - $1.000 per GB of data stored

(29)

Solving the problem (transfer)

• Put one copy of the data in the common c loud which everyone uses.

• Encouraged by the genomics community

– NCBI

• has put a copy of the data from the pilot project of the 1000 Genomes effort into off-site storage.

– Ensemble, the EBI sequence database

• are automatically funneled into a cloud environme nt as part of a test of the strategy.

(30)

Worries about security

• Data involving the health of human subjects , which is being linked more and more to ge nome information

• The Health Information Protection Regulatio ns came into force on July 22, 2005.

The Health Information Protection Act is designe d to improve the privacy of people’s health inf ormation while ensuring adequate sharing of info rmation is possible to provide health services.

(31)

Going To the Cloud

• National Human Genome Research Institute(NHGR I) hosted several meetings on cloud computing and on informatics and analysis in 2010.

• “One thing that is clear is that as computat ion becomes more and more necessary through- out biomedical research, the way these [infra structure] resources are funded will have to change to be more efficient,” says James Ta ylor, a bioinformaticist at Emory University

(32)

Growing Exponentially of Da

ta

(33)

• The primary goal of bioinformatics is to increase the understanding of biol ogical processes

• But “We live in the post-genomic era , when DNA sequence data is growing e xponentially“

Miami University (Ohio) computational biologaist Iddo Friedberg

(34)

NCBI Data Growth

(35)

EMBL Data Growth

(36)

grand area of research

• Sequence analysis

• Genome annotation

• Analysis of gene expression

• Analysis of protein expression

• Analysis of mutations in cancer

• Protein structure prediction

• Comparative genomics

• Modeling biological systems

• High-throughput image analysis

• Protein-protein docking

(37)

• Sequence analysis

– most primitive operation in computationa l biology

• Genome annotation

– the process of marking the genes and oth er biological features in a DNA sequence

• Analysis of gene expression

– The expression of many genes can be dete rmined by measuring mRNA levels

(38)

• Analysis of protein expression

– Gene expression is measured in many ways including mRNA and protein expression

• Analysis of mutations in cancer

– to identify previously unknown point mut ations in a variety of genes in cancer

• Protein structure prediction

– important for drug design and the design of novel enzymes

(39)

• Comparative genomics

– the study of the relationship of genome structure and function across different biological species

• Modeling biological systems

– a significant task of systems biology an d mathematical biology

(40)

• High-throughput image analysis

– Computational technologies are used to accelerat e or fully automate the processing, quantification and analysis of large amounts

• Protein-protein docking

– predict possible protein-protein interactions based on 3D shapes

(41)

Obstacles in Computing Tech

nology

(42)

Two Ways to Approach higher Computin g Ability

• One Computer Computing Ability

• Cloud Computing

(43)

One Computer Computing Ability

• TSMC 20nm manufacture procedure

• No direct co-relation of bus observed data with th e internal CPU activity

• Multi-core processor : record and replay (R&R) sys tem

Intel Corporation:

Virtues and Obstacles of Hardware-assisted Multi-processor Execution Replay (2010)

(44)

Cloud Computing

• Availability of a Service

• Data Lock-in

• Data Confidentiality and Auditability

• Data Transfer Bottlenecks

• Performance Unpredictability

• Scaling Quickly

“10 Obstacles To Cloud Computing” By UC Berkeley & How GoG rid Hurdles Them

(45)

Cloud Computing

(46)

Conclusion

• Development takes time, effort and mo ney.

• Computer is still developing fast, wi

thout comparing to bio-information.

(47)

Thanks for your attention !

參考文獻

相關文件

In this talk, we introduce a general iterative scheme for finding a common element of the set of solutions of variational inequality problem for an inverse-strongly monotone mapping

From these results, we study fixed point problems for nonlinear mappings, contractive type mappings, Caritsti type mappings, graph contractive type mappings with the Bregman distance

From the existence theorems of solution for variational relation prob- lems, we study equivalent forms of generalized Fan-Browder fixed point theorem, exis- tence theorems of

The purpose of this talk is to analyze new hybrid proximal point algorithms and solve the constrained minimization problem involving a convex functional in a uni- formly convex

According to a team at Baycrest’s Rotman Research Institute in Canada, there is a clear link between bilingualism and a delayed onset of the symptoms of Alzheimer ’s and other

Department of Physics and Taiwan SPIN Research Center, National Changhua University of Education, Changhua, Taiwan. The mixed state is a special phenomenon that the magnetic field

Wi-Fi Supported Network Environment and Cloud-based Technology to Enhance Collaborative Learning... • Curriculum is basically a lesson plan that functions as a map

• Instead of uploading and downloading the dat a from cloud to client for computing , we shou ld directly computing on the cloud ( public syst em ) to save data transferring time.