• 沒有找到結果。

Will Computer Crash Genomics?

N/A
N/A
Protected

Academic year: 2022

Share "Will Computer Crash Genomics?"

Copied!
42
0
0

加載中.... (立即查看全文)

全文

(1)

Will Computer Crash Genomics?

-Elizabeth Pennisi

Science 11 February 2011: 666-668.

         組員:吳宜瑾 何宜靜 林芳伃        魏裕明 范剛瑋 陳柏融     

2012/06/04

(2)

Outline

• Introduction

• Sequencing

• Storage

• Cloud Computing

• Application

• Conclusion

(3)

Introduction

(4)

Structure of Old Genome Informatics

Lincoln D Stein, Genome Biology 2010, 11:207 (5 May 2010)

(5)

Sequencing V.S. Storage

Lincoln D Stein, Genome Biology 2010, 11:207 (5 May 2010)

(6)

Structure of New Genome Informatics

Lincoln D Stein, Genome Biology 2010, 11:207 (5 May 2010)

(7)

Sequencing

(8)

First Generation

• Sanger sequencing

Ref : Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26 (10):1135–

1145.

(9)

Second –generation DNA sequencing

• Cyclic-array method

Ref : Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26

(10):1135–1145.

(10)

Third-generation

• Pacific Biosciences

(11)

Ref :李思元,莊以光。 DNA 定序計數之演 進與發展。

(12)

Ref : Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26

(10):1135–1145.

(13)

Cost and Growth of Bases

• The decline in sequencing costs (red line) has l ed to a surge in stored DNA data.

(14)

20060 2007 2008 2009 2010 100

200 300 400

500Mbp/run

Mbp/run

New generation

• Roughly estimate: 5 month/per

• MBP: million base pair

Illumina and Applied Biosystems (AB)

Roche 454 GS20

5-fold data output improvement GAII and the SOLiD systems

pyrose-quencing

GS-FLX

20 Gbp

Illumina GAII

(15)

Growth of GenBank

Human Genome Project Microarray, SAGE Protein 3-D structure

single nucleotide polymorphism

Start project NIH

High resolution image

(16)

Moore’s law

• Moore‘s law the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years. The period often quoted as "18 months.

(17)

Moore’s law

• Moore‘s law the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years. The period often quoted as "18 months.

Pho tog rap h: Cou rte sy of S ea ga te Te chn olo gy

1956 1979 1997: 2006

RAMAC 305 Piccolo Deskstar 16GP Titan Barracuda 7200.10 store 5MB store 64MB stores 16.8GB Store 750GB

$10,000

/megabyte six 8-inch platters 3.5-inch platters 3.5-inch platters

(18)

Cloud computing

(19)

James Taylor

Emory University in Atlanta

Anton Nekrutenko

Penn State, University Park

The goal was “to make collaborations between experimental and computational researchers easier and more efficient”

(20)

Genomic tools and Database

• Galaxy, a software package that can be downlo aded to a personal computer or accessed on P enn State s computers via any Internet-connec’ ted machine.

• The public portal for Galaxy works well, but, as a shared resource, it can get bogged down, say s Taylor.

(21)

Cloud computing

• It contains renting off-site computing memory to store data, running one s own software on a’ nother facility’ s computers.’

• Amazon Web Services and Microsoft are amo ng the heavy-weights running cloud-computin g facilities.

• Open Cloud Consortium

(22)

Galaxy on Cloud

• Virtual computer

• Worked with Penn State colleague Kateryna M akova, who wanted to look at how the genom es of mitochondria vary from cell to cell in an i ndividual.

• Generating 1.8 gigabases of DNA sequence, ab out 1/10 of the human genome scale.

(23)

Cloud computing

• Biology of Genomes meeting in Cold Spring Ha rbor, New York.

• Upload and analyze data on cloud, cost-effecti ve solution.

• Michael Schatz, CSHL.

• Ben Langmead, a computer scientist at the Joh ns Hopkins Bloomberg School of Public Health in Bal-timore, Maryland.

(24)

Identify common sites of DNA variation known as single-

nucleotide polymorphisms (SNPs)

Program called Myrna that determines the differential expression of genes from RNA

sequence data and for parallelizing.

(25)

Cloud computing

• ”Cloud computing may represent the democra“ tization of computation”, says Schatz.”

• But cloud computing is not mature. “ - Sneaker net (limited speed)

- Connections among the cloud processors ’ can be fairly slow

(26)

Application

(27)

Commercialized platform : Amazon Web Service

(28)

Commercialized platform : Amazon Web Service

• 1000 Genomes Project: detailed human geno me dataset

• Ensembl: include human and other 50 species genome sets

• GenBank: NIH genetic sequence database

(29)

Amazon Web Service: cost calculator

(30)

Academic Cloud Platform

• Open Cloud Consortium

– Group of American Universities and industrial com panies (IBM, Google)

– US National Science Foundation

• Academic clouds will be a better long-term sol ution

– High data read and write speeds

(31)

Galaxy

• Pennsylvania State University

• Web-based genome analysis tool

– Accessible: no need for programming experience.

– Reproducible: Galaxy gather information so any us er can repeat the complete analysis

– Transparent: allow users to share and publish geno me analysis

(32)

Galaxy Wiki

http://wiki.g2.bx.psu.edu/

(33)

Downside of moving to cloud

• Restricted-access databases: need to be encry pted on public clouds.

• Network bandwidth:

Transfer rate Uploading data days

Typical research

institution 5-10

megabytes/second About a week Major universities

of large research institutions

1.25

gigabytes/second Under a day

(34)

Conclusion

(35)

Data

• Sequencing technology develop

• Data cost decrease

• “Data tsunami”

(36)

Hardware

• Computer memory and processing

• Database

• Easier and more efficient

(37)

Cloud Computing

• Cloud computing-divided into many separate t asks handled by multiple processors

– Galaxy software

(38)

Disadvantage

• Cloud computing

– hot and sexy, but it’s not the answer to everything

• Issue

– data storage and transfer

• Others

– internet security

(39)

Issue of Cloud Computing

• Storage costs are dropping much more slowly tha n the costs of generating sequence data.

-> Spend an exponential amount on data storage

• Raw data storage type from next-generation mac hine (high- resolution image) have to be changed to stored by processed sequence data.

-> More efficient and economical data storage type .

(40)

Issue of Cloud Computing

• Putting the data in an off-site facility could reli eve some of the pressure, and it ‘s more econ omical than putting in local system.

For example :

Amazon Web server at 14 cents / GB*month Local system at 50-100 cents / GB*month

(41)

Issue of Cloud Computing

• Instead of uploading and downloading the dat a from cloud to client for computing , we shou ld directly computing on the cloud ( public syst em ) to save data transferring time.

• Safety of genome information on the cloud.

(42)

Advantage

• Cloud computing

– Open source – Portable

– Convenient – Low cost

• Make contributions to the society

參考文獻

相關文件

A network technician reports that he receives a “Request timed out” error message when he attempts to use the ping utility to connect to Server1 from his client computer.. The

Since we use the Fourier transform in time to reduce our inverse source problem to identification of the initial data in the time-dependent Maxwell equations by data on the

Wi-Fi Supported Network Environment and Cloud-based Technology to Enhance Collaborative Learning.. Centre for Learning Sciences and Technologies (CLST) The Chinese University of

A Cloud Computing platform supports redundant, self-recovering, highly scalable programming models that allow workloads to highly scalable programming models that allow workloads to

Provide all public sector schools with Wi-Fi coverage to enhance learning through the use of mobile computing devices, in preparation for the launch of the fourth IT in

Provide all public sector schools with Wi-Fi coverage to enhance learning through the use of mobile computing devices, in preparation for the launch of the fourth IT in

• National Human Genome Research Institute(NHGR I) hosted several meetings on cloud computing and on informatics and analysis in 2010.. • “One thing that is clear is that as

Experiment a little with the Hello program. It will say that it has no clue what you mean by ouch. The exact wording of the error message is dependent on the compiler, but it might