• 沒有找到結果。

A Fault-tolerant Method for HLA Typing with PacBio Data

N/A
N/A
Protected

Academic year: 2022

Share "A Fault-tolerant Method for HLA Typing with PacBio Data"

Copied!
44
0
0

加載中.... (立即查看全文)

全文

(1)

A Fault-tolerant Method for HLA Typing with PacBio Data

Speaker: Chia-Jung Chang

Advisors: Dr. Pei-Lung Chen and Prof. Kun-Mao Chao

(2)

Outline

Introduction

Simulation

Methods

Experiments

Discussion

Conclusion

(3)

Introduction

HLA genes

PacBio Sequencing Technology

HLA genotyping

(4)

Classical HLA Genes

Erlich et al., Immunity (2001)

Mackay et al., N Engl J Med (2000)

(5)

HLA Database

HLA Class I

Gene A B C E F G  

Alleles 2,579 3,285 2,133 15 22 50  

Proteins 1,833 2,459 1,507 6 4 16  

Nulls 121 109 63 0 0 2  

HLA Class II

Gene DRA DRB DQA1 DQB1 DPA1 DPB1 DMA DMB DOA DOB Alleles 7 1,512 51 509 37 248 7 13 12 13 Protein

s

2 1,118 32 337 19 205 4 7 3 5

Nulls 0 33 1 13 0 6 0 0 1 0

(6)

Regions of interest

Exons 2,3:

HLA-A, -B, -C

Exon 2

HLA-DRB1, -DQB1, -DPB1

Others

(7)

A Glimps

(8)

Comparison of NGS Technologies

From the University of Pennsylvania and The Children’s Hospital of Philadelphia

(9)

PacBio SMRT Sequencing

Developed by Pacific Biosciences

Single Molecule Real Time sequencing

(10)

PacBio SMRT Sequencing

(11)

Time for PacBio

(12)

Rea Length

(13)

PacBio - Error Rate

(14)

PacBio - Error Profile

(15)

Sequencing Protocols

(16)

Two Types of Reads

From PacBio Technical Note

(17)

Targeted Sequencing

Sequencing specific areas of interest

v.s. Whole genome sequencing

Benefits

Compound Mutations and Haplotype Phasing

Repeat Expansions

Full-Length Transcripts and Splice Variants

Minor Variants and Quasispecies

SNP Detection and Validation

pdf

(18)

Barcode Technology

48 pairs of 16bp barcodes attached to targets

e.g. 48 samples can be seq uenced parallelly

Barcode 5' Barcode 3'

Primer Primer

(19)

HLA Genotyping

HLA Matching before organ transportations

Serological (antibody based) approaches

Resolution is not enough

DNA-based

Sanger as the gold standard

NGS

Illumina

Roche 454

Ion Torrent

PacBio

(20)

Why Not and Why PacBio?

Why not PacBio?

High error rate

Sample identification error when multiplexing

Why PacBio?

Long enough to sequence exon 2 and exon 3 of class I HLA genes at the same time, which can solve the ambi guous allele combination problem

(21)

Why CCS instead of CLR?

Both are used to detect variants

CLR have more reads for consensus

How to identify samples?

Align barcode

CLR might lead to more barcode calling error

(22)

An illustration of the problem

(23)

An illustration of the problem

(24)

Simulation

The target sequence for each allele

The samples in a multiplexing sequencing experi ment

The pool of the reads in an experiment

Noise reads

(25)

The Target Sequence

• HLA database only contains CDS sequences for most of the alleles

(26)

Three HLA Loci and Their Correspond ing Reference Alleles

A B DRB1

reference A*01:01:01:0

1 B*07:02:01 DRB1*01:01:0 1

start 380 400 5400

length 1100 950 600

#unique

alleles 2335 3075 1388

(27)

Samples in an Experiment

Type 1 Type 2 Type 3

#samples 12 24 48

#reads/allele 40 20 10

Alleles of a sample

Taiwan Minnan population

http://www.allelefrequencies.net

30% of homozygous samples

(28)

The Pool of Reads

Produced by PBSIM

Ono, Y., Asai, K., Hamada, M.: PBSIM: PacBio reads simulator–toward a ccurate genome assembly. Bioinformatics 29(1) (January 2013) 119–121

CCS reads

length-mean=450

length-sd=170

accuracy-mean=0.98

accuracy-sd=0.02

(29)

Simulation of Correct Reads and Noi

se Reads

(30)

Pre-processing

(31)

Bays’ Theorem (BayesTyping0)

Denote the reads as r1... rn and a pair of allel es as ai, aj.

(32)

Bays’ Theorem (cont’d)

(33)

Bays’ Theorem (cont’d)

(34)

To Tolerate Noise Reads (BayesTyping1)

Assume there are m noise reads

(35)

Experiments

For Type 1 experiments (40 reads/allele), when typing HLA-A, NGSengine could only successfully predicted 274 pairs of alleles (22.83%).

On the other hand, BayesTyping0 successfully pr edicted 1193 pairs of alleles (99.42%).

Type 1 Type 2 Type 3

#samples 12 24 48

#reads/allele 40 20 10

(36)

Experiments without noise reads

A B DRB1

Type 1 99.92% 99.92% 100%

Type 2 99.50% 99.21% 100%

Type3 97.63% 96.87% 99.98%

(37)

HLA-A

(38)

HLA-B

(39)

HLA-DRB1

(40)

Type 2 HLA with Different m

(41)

Noise Reads from Pools Containing D

ifferent Numbers of Samples

(42)

Homozygous and Heterozygous Samples

• Fisher’s exact test

(43)

Conclusion

BayesTyping1 can tolerate sequencing errors, wh ich are introduced by the PacBio sequencing tec hnology, and noise reads, which are introduced by false barcode identi cations to some degreefi .

It is better to multiplex12 or 24 samples inste ad of 48 samples to maintain a high accuracy

(44)

Thanks for your attention

!

Q & A

參考文獻

相關文件

The hashCode method for a given class can be used to test for object equality and object inequality for that class. The hashCode method is used by the java.util.SortedSet

 Students are introduced to the writing task - a short story which includes the sentence “I feel rich.” They are provided with the opportunity to connect their learning

Two distinct real roots are computed by the Müller’s Method with different initial points... Thank you for

Numerical results are reported for some convex second-order cone programs (SOCPs) by solving the unconstrained minimization reformulation of the KKT optimality conditions,

Numerical results are reported for some convex second-order cone programs (SOCPs) by solving the unconstrained minimization reformulation of the KKT optimality conditions,

In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family F of smoothing functions.

Numerical results are reported for some convex second-order cone programs (SOCPs) by solving the unconstrained minimization reformulation of the KKT optimality conditions,

A=fscanf(fid , format, size) reads data from the file specified by file identifier fid , converts it according to the specified format string, and returns it in matrix A..