• 沒有找到結果。

Next-generation sequencing technology

2. Background

2.1 Next-generation sequencing technology

Next-generation sequencing (deep sequencing) technology is a new sequencing approach which produces millions of sequencing reads at a time. The general workflow of next-generation sequencing is preparing samples for sequencing by ligating specific adaptor oligos to 5’ and 3’ ends of DNA fragments. Then, the samples are subjected to the next-generation sequencers. The length of sequencing reads is 25~1000 nt according to different sequencing platforms. The quantities of sequencing reads are also different in each sequencer. The difference between next-generation sequencing (NGS) and Sanger sequencing are the number of produced reads at a time (NGS: 100 MB-30 GB, Sanger: 0.1 MB), the length of reads (NGS: 25-1000 nt, Sanger: 700-900 nt) and the cost per GB (NGS: $2K~84K, Sanger: >$2500K). Five next-generation sequencers can be chosen now. They are Roch454 FLX sequencer, Illumia Genome Analyzer, Applied Biosystems SOLiD sequencer (ABI SOLiD), HeliScope Single Molecule sequencer (Helicos tSMS) [80-82] and Pacific Biosciences [83].

Roche 454

Roche 454 is the first next-generation DNA sequencer which is commercially introduced in 2004. Roche 454 sequencer produces sequencing reads based on the principle of prosequencing. Figure S1 demonstrates the workflow of Roche 454 sequencer. The sample is constructed by ligating 454-specific adaptors to DNA fragments. Then, ligated DNA fragments are amplified by bead-based emulsion polymerase chain reaction (em-PCR). After em-RCR, amplified beads

5

are loaded into the picotiter plate (PTP). PTP is the solid surface which contains the single wells for packing beads and enzyme beads. In the PTP, all amplified beads are sequenced by pyrosequencing reactions. Nucleotides are flowed sequentially and sequenced by detecting the light which is generated through the release of pyrophosphate. Roche 454 sequencer can produce 700 Mb of sequences per run in 23 hours. The length of sequences is 700-1000 nt [80-82].

Illumia Genome Analyzer

Illumina Genome Analyzer which is developed based on the concept sequencing by synthesis (SBS) is available in 2006. Figure S2 is the workflow of Illumina Genome Analyzer. Before loading the sample into the flow cell, it needs to be done fragmentation and 5’ and 3’ adaptor ligation. The ligated DNA fragments are amplified by bridge amplification (an isothermal process that amplifies each fragment into a cluster). To sequence each cluster in the same direction, one strand of amplified clusters is selectively removed. Then, the flow cell is transferred to the Genome Analyzer. Each single-strand cluster is sequenced by SBS reactions (imaging, removing the fluorescent group and deblocking the 3’

end for next cycle). In each chemistry cycle, only a single base is identified. So, the length of sequencing read is determined by the number of cycles of nucleotide incorporation, image and cleavage. Illumina Genome Analyzer can produce 95 GB of 100-150 nt sequencing reads per run in 2 days [80-82].

6

ABI SOLiD

Applied Biosystems SOLiD sequencer (ABI SOLiD) is commercial release in October 2007. Like Roche 454, it uses em-PCR to amplify fragment DNA into beads. ABI SOLiD uses a unique sequencing process catalyzed by DNA ligase (Figure S3a). The ligation-based sequencing process begins with annealing a universal primer which is perfect complementary to the 5’ end adaptor. Then, 16 random 8-mer probes are added. The first and second 3’ end of the probes are labelled using one of four fluorescent dyes and complementary to the template sequences. After ligation, the fifth dinucleotide is imaged. Then, the 6-8 nucleotides of the probes are removed and adding the random probe set. Every dinucleotide profile is imaged after several rounds of ligation and the sequencing reads are identified (Figure S3b). This ligation-based approach is called as 2 base encoding. SOLiD sequencer can produce 10-20 GB of sequencing reads (25-75 nt) per run in 2-4.5 days [80-82].

Helicos’ tSMS

Helicos’ tSMS sequencing platform is available in 2008. It is the first next-next generation (3rd generation) sequencing platform. The technology of tSMS is different with Roche 454, Illumia and SOLiD. tSMS do not amplify the templates before sequencing. Library preparation is also different with other three sequencing technologies (only adding a poly-A tail and labeling the fluorescent).

Then, tailed templates are detected according to the fluorescent label through hybridiinge to poly-T oligonucleotides on the flow-cell surface. The sequencing flow of tSMS is called as terminator SBS. The terminator nucleotides are based on steric hindrance to deter the incorporation of more than one nucleotide per cycle.

Fluorescent is removed and the next fluorescent nucleotide is added singly per

7

cycle after identifying incorporate nucleotides. tSMS currently produces 21-28 GB of sequencing reads (25-50 nt) per run in 8-9 days [81].

Pacific Biosciences

Pacific Biosciences is the second 3rd generation sequencing platform (announced in 2010). Like Helicos’ tSMS, it does not need to do amplification before sequencing. The sequencing method of Pacific Biosciences is called as single-molecule real-time (SMRT) sequencing. It directly observed DNA synthesis on single DNA molecules in real time by using zero-mode waveguide (ZMW) technology. The first commercial SMRT array contained ~75000 ZMWs. Each ZMW contains a DNA polymerase loaded with DNA samples. The sequencing length of SMRT is >1000 nt (maximum length is more than 10000 nt). Moreover, the time per run is only several hours [83].

Comparison of next-generation sequencing technologies

Table 2 lists the performance comparison of current different next-generation sequencing platforms. Among these five sequencing platforms, tSMS and Pacific Biosciences are 3rd generation sequencing platform which do not do amplification. Therefore, the sequencing error rate is lower than other three platforms (the error rate of Pacific Biosciences could not be obtained from official site but it should be very low according to the sequencing method). This is because some templates which are sequenced do not incorporate a nucleotide at the corresponding cycle during the amplification processes. For Pacific Biosciences, the length of sequencing read is more than 1000 nt. It is more suitable than other sequencing platforms in de novo assembly and SNP identification.

8

Table 2. The performance comparison of next-generation sequencing technologies

Roche 454 (FLX-Titanium)

Illumia Genome Analyzer (IIx)

ABI SOLiD Helicos tSMS Pacific Biosciences Method of

amplification

Bead-based/

emulsion PCR

Bridge amplification Bead-based/

emulsion PCR

N/A N/A

Sequencing chemistry

Pyrosequencing Polymerase-based sequencing-by-synth

9