PLOS ONE
Check for updates
fi OPEN ACCESS
Citation: Ignatov KB, Blagodatskikh KA, Shcherbo DS, Kramarova TV, Monakhova YA, Kramarov VM (2019) Fragmentation Through Polymerization (FTP): A new method to fragment DNA for next-generation sequencing. PLoS ONE 14(4): e0210374. https://doi.org/10.1371/iournal. pone.0210374
Editor: Ruslan Kalendář, University of Helsinki, FINLAND
Received: December 18,2018
Accepted: March 16,2019
Published: April 1,2019
Copyright: © 2019 Ignatov et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability Statement: The raw data generated in this study have been deposited in the National Centre for Biotechnology Information (NCBI) Sequence Read Archive under BioProject accession number PRJNA509202 (https://www. ncbi.nlm.nih.gov/sra/PRJNA509202).
Funding: The work was supported by the Ministry of Education and Science of Russian Federation, grant 14.579.21.0012 (ID # RFMEFI57914X0012) and by the grant 0112-2016-0006 from the
RESEARCH ARTICLE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA for next-generation sequencing
Konstantin B. Ignatov1'2*, Konstantin A. Blagodatskikh3'4, Dmitry S. Shcherbo4, Tatiana V. Kramarova©5, Yulia A. Monakhova1'6, Vladimir M. Kramarov1'2
1 All-Russia Institute of Agricultural Biotechnology, Russian Academy of Sciences, Moscow, Russia,
2 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia, 3 Evrogen JSC, Moscow, Russia, 4 Pirogov Russian National Research Medical University, Moscow, Russia, 5 The Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden, 6 Syntol JSC, Moscow, Russia
* iqnatovkb@bk.ru
Abstract
Fragmentation of DNA is the very important first step in preparing nucleic acids for next-generation sequencing. Here we report a novel Fragmentation Through Polymerization (FTP) technique, which is a simple, robust, and low-cost enzymatic method of fragmentation. This method generates double-stranded DNA fragments that are suitable for direct use in NGS library construction and allows the elimination of the additional step of reparation of DNA ends.
Introduction
Next Generation Sequencing (NGS) has become one of the most widely used techniques in genomic research and genetic diagnostics. Fragmentation of DNA is the first main step in preparing a sequencing library for NGS. The well-known NGS technologies—like Illumina or Ion Torrent—generate a plethora of reads with lengths under 600-1000 bases. For library preparation, purified DNA samples are sheared into shorter fragments, then platform-specific adapters are ligated to the molecules to provid primer-binding sites for further amplification and sequencing. The high level of NGS resolution is achieved by multiple representations through different reads for every DNA region despite their sequence and context. In other words, the sequences of the fragments must overlap. Thus, the quality of NGS is largely dependent on the randomness of DNA fragmentation and the overlap of the resulting library fragments. This makes the fragmentation step critical in the process of library construction.
There are three typical approaches to shorten long DNA for library preparation: physical (by using acoustic sonication or by hydrodynamic shearing), enzymatic (based on the usage of endonucleases or transposase) and chemical shearing (by hydrolyzing DNA through heating it with divalent metal cations) [1, 2].
Acoustic shearing with Covaris ultrasonicators (Covaris, Woburn, MA, USA) is currently the gold standard for fragmentation at random nucleotide locations for an NGS library
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
1/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
Russian State Budget. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Syntol JSC provided support in the form of salary for Y.A. M. and reagents for NGS-analysis, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of this author are articulated in the 'author contributions' section. Evrogen JSC provided equipment for NGS-analysis and provided support in the form of salary for author K.A.B., but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section. Bioron GmbH provided enzymes and reagents (including SD DNA polymerase), but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: KA. Blagodatskikh is employed by Evrogen JSC. YA. Monakhova is employed by Syntol JSC. T.V. Kramarova is currently employed by Anocca AB. SD polymerase used in this project is the subject of a patent application (US20160145588, EP2981609). This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
construction; this process is very important for a high-quality NGS library sample preparation. Unfortunately, it can be financially inaccessible for many laboratories [3]. An additional disadvantage of acoustic shearing is that it can be a source of oxidative damage to DNA that may result in sequencing artifacts [4].
Enzymatic methods and acoustic shearing have similar levels of efficiency, but enzymatic methods do not need expensive equipment [2]. Commercially available Fragmentase (New England Biolabs, Ipswich MA, USA) and Nextera tagmentation (Illumina, San Diego, CA, USA) are the most popular enzymatic techniques. Nextera uses a transposase to simultaneously fragment and insert adapters into dsDNA [5]. Fragmentase contains two enzymes: one randomly nicks dsDNA and the other cuts the strand opposite to the nicks [2]. Enzymatic digestion is simple and very efficient, but it may introduce an enzymatic bias, such as insertions and deletions (indels) [2, 6]. These biases are associated with DNA sequence content and may produce a non-random fragmentation [6].
DNA fragments obtained by physical fragmentation or by the Fragmentase method require a repair of DNA ends for the ligation with adapters during subsequent NGS library construction [1, 2]. To improve the protocol for NGS library generation and reduce the end repair stage, we have developed a new enzymatic method for DNA fragmentation: Fragmentation Through Polymerization (FTP). Our FTP method is based on the use of two enzymes: a nonspecific endonuclease, which randomly nicks dsDNA (DNase I), and a thermostable DNA polymerase with strong strand-displacement activity (SD DNA polymerase) [7]. At the first stage of FTP, DNase I introduces nicks into the dsDNA, and at the second stage, SD DNA polymerase elongates the 3'-ends of the nicks in a strand-displacement manner. As a result, FTP generates multiple double-stranded DNA fragments with extended overlapping sequences at the ends (Fig 1). Additionally, the SD polymerase causes 3'-A-overhangs, which make the fragments suitable for direct ligation with T-tailed DNA adapters without a requiring DNA end repair.
A random fragmentation process is an important feature for high-quality NGS library sample preparation. It is known that DNA cleaving is not an entirely random process because cleaving/nicking enzymes—including DNase I—are sequence-dependent [8, 9], and physical methods for fragmentation are partly sequence-specific as well [10,11]. Like other enzymatic methods, FTP utilizes DNase I as a nicking enzyme. In contrast to other digesting techniques, the fragments obtained by FTP from a long DNA molecule have overlapping sequences at the ends (Fig 1) that may help to overcome the problem with sequence-dependent DNA-nicking by DNase I.
Here we describe the detailed FTP method of DNA fragmentation and compare it with the well-known and widely used Fragmentase technique (New England Biolabs). Systematic comparison of Fragmentase with other fragmentation methods has been described earlier [2].
Materials and methods
Enzymes and reagents
Lyophilized DNase I (deoxyribonuclease I from Bovine pancreas) was obtained from Sigma-Aldrich (St Louis, MO, USA) and dissolved in the storage buffer (50% glycerol, 100 mM NaCl, 0.2 mg/ml BSA, 1 mM EDTA, 0.2 mM DTT, 20 mM Tris-HCl, pH = 8.0) up to 1 mg/ml.
SD DNA polymerase (50 U/u.1) and the reaction buffer were supplied by Bioron GmbH, (Ludwigshafen, Germany). E.coli BL21(DE3) gDNA was supplied by Evrogen JSC (Moscow, Russia). dNTPs were obtained from Bioline Limited (London, UK).
NEBNext dsDNA Fragmentase and the NEBNext Ultra II DNA Library Prep kit were supplied by New England Biolabs, Inc. (Ipswich, MA, USA).
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
2/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
LongdsDNA 3/^^^^^^^^^^^^^^^^^^^^^^^^^™
^\   Nicking by DNAse I
Strand displacement DNA polymerization with SD polymerase
0
Disjointed dsDNA fragments with overlapping sequences
Fragmented dsDNA
Fig 1. A general overview of the dsDNA Fragmentation Through Polymerization (FTP) method. The FTP method is based on two enzymatic reactions: a DNA nicking reaction with DNase I and a strand-displacement DNA polymerization with SD DNA polymerase. As a result, multiple double-stranded DNA fragments with overlapping sequences are generated. De novo synthesized DNA is indicated in grey, and SD polymerase is indicated in red.
https://doi.orq/1Q.1371/iojrnal.pone.021Q374.qQQ1
dsDNA Fragmentation Through Polymerization (FTP)
For fragmentation, 200 ng of gDNA of the E. coli strain BL21(DE3) were added to the following reaction mixture: IX reaction buffer for SD polymerase (Bioron GmbH), 3.5 mM MgCl2, 0.25 mM dNTPs (each), DNase I 1 ng/ul, SD DNA polymerase 1.5 U/ul The total volume of the reaction was 25 ul The reaction mixture was completed at 4°C (wet ice). The fragmentation of gDNA was carried out by two-step incubation: 20 minutes at 30 °C and then 20 minutes at 70 °C. For incubation, we used a thermal cycler with a heated lid. The reaction was stopped by cooling down the mixture to 10°C. The mixture was diluted 1:1 with sterile water, and fragmented DNA was purified with SPRI beads.
DNA fragmentation with NEBNext dsDNA Fragmentase
gDNA of the E. coli strain BL21(DE3) was digested using NEBNext dsDNA Fragmentase (New England Biolabs, Inc.) according to the manufacturer's protocol. Briefly, 200 ng of gDNA were added to the following reaction mixture (total volume 25 ul):lX Fragmentase Reaction Buffer v2,10 mM MgCl, and IX dsDNA Fragmentase. The mixture was incubated at 37°C for 20 minutes. The digestion was stopped by adding EDTA up to 100 mM. The mixture was diluted 1:1 with sterile water and fragmented DNA was purified with SPRI beads.
Preparation of NGS libraries
We prepared four NGS libraries from four different samples of Fragmentase-digested gDNA and four NGS libraries from four different samples of FTP-digested gDNA. NGS libraries were
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
3/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
generated using NEBNext Ultra II DNA Library Prep kit (New England Biolabs, Inc.) according to the manufacturer's instructions. The conventional procedure for Fragmentase-digested DNA included repair of DNA ends with the NEBNext Ultra II End Prep Enzyme Mix, addition of adapters to the DNA fragments using NEBNext Ultra II Ligation Master Mix, and amplification of the adaptor-ligated DNA fragments with the NEBNext Ultra II Q5 Master Mix. The input amount of each DNA sample was 200 ng. The library indexing and amplification were performed for 5 PCR cycles as described in the kit's manual.
NGS libraries from FTP digested gDNA were constructed using the NEBNext Ultra II DNA Library Prep Kit procedure, excluding the DNA end repair stage. The input amount of each DNA sample was 200 ng. The library indexing and amplification were performed for 5 PCR cycles with the NEBNext Ultra II Q5 Master Mix.
After the amplification stage, all libraries were quantified with a Quant-iT PicoGreen dsDNA Assay Kit (Molecular Probes, Inc., Eugene, OR, USA) and with the Agilent 2200 TapeStation Instrument with a D1000 Tape System (Agilent Technologies, Waldbronn, Germany), pooled (500 ng of each), and purified with AMPure XP beads.
NGS and bioinformatic analysis
The pooled libraries were sequenced with the Illumina MiSeq Instrument (Illumina, California, USA) with a 300 Cycles MiSeq Sequencing Kit v2—paired-end mode—resulting in 12xl06 reads. Each of the reads was approximately 150 nt long. The FASTQ files generated on the instrument were uploaded to the NCBI SRArchive under project ID: PRJNA509202.
The FASTQ files were quality controlled using FASTQC vO.11.4 (Babraham bioinformatics, Cambridge, UK). PHRED scores were calculated with FASTQC vO.l 1.4. Adapters were trimmed with FLEXBAR v.2.5 [12]. Filtered reads with a minimum length of 30 bp were subsequently aligned to the E.coli BL21(DE3) genome (NCBI Reference Sequence: NC_012971.2) using BOWTIE2 software v2.3.4 [13]. For coverage uniformity evaluations, the Lorenz curves were built with the htSeqTools R package version 1.30.0 (https://rdrr.io/bioc/htSeqTools/) and the GC bias plots were obtained with the CollectGcBiasMetrics (Picard tools) software (https://software.broadinstitute.Org/gatk/documentation/tooldocs/4.0.l.0/picard analysis CollectGcBiasMetrics.php). Random samples of reads were generated using Seqtk software (https://github.com/lh3/seqtk). De novo assembly of contigs was carried out with the SPAdes tool v3.10.1 (http://cab.spbu.ru/software/spades/). Statistics were calculated using QUAST software v5 [14, 15] (http://quast.sourceforge.net/).
Results and discussion
Digestion of gDNA with the FTP method
We compared two enzymatic methods of dsDNA fragmentation for NGS library construction: digestion with Fragmentase from New England Biolabs and FTP. The FTP method consists of two enzymatic reactions: random DNA nicking and elongation in a strand-displacement manner of the 3' ends of the nicked DNA. As a result, multiple double-stranded DNA fragments with overlapping sequences at the ends are generated. The general overview of the FTP method is outlined in Fig 1.
We carried out FTP in a one-tube format as described above. Mesophilic DNase I and thermophilic SD DNA polymerase were added to the reaction mixture that contained the gDNA of the E. coli strain BL21(DE3). The reaction was incubated at 30°C for 20 minutes, plus an additional 20 minutes at 70 °C. DNase I has an optimum performance temperature between 30 °C and 40 °C. During the first stage of incubation at 30 °C, DNase I introduced nicks into the dsDNA. In order to optimally obtain average-sized fragments, we tested different DNase I
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
4/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
concentrations and incubation times (SI Fig). During the second stage, the DNase I was heat-inactivated and the SD polymerase was activated by increasing the reaction temperature to 70 °C. The SD polymerase is a Taq DNA polymerase mutant that has a strong 5'-3' strand displacement and 5'-3' polymerase activities [7]. It does not have 5'-3' and 3'-5' exonuclease activities. Unlike natural enzymes with strong strand displacement activity, such as Phi29 or Bst polymerase that are stable and active below 70 °C, SD polymerase is stable up to 93 °C and has its optimum level of enzymatic activity at 70-75 °C. Additionally, the enzyme does 3'-A-overhangs, which make the product of its polymerization suitable for ligation with T-tailed DNA adaptors. These properties of SD DNA polymerase make it very suitable for the FTP technique.
In summary, DNase I generated 3' ends by nicking dsDNA at 30 °C, followed by SD polymerase using these ends for strand displacement DNA polymerization at 70 °C, which resulted in disjointed dsDNA fragments (Fig 1). As the result, the A-tailed dsDNA fragments with overlapping sequences and with an average size of about 500 bp (in a range from 150 to 1500 bp) were obtained from the intact gDNA. Agarose-gel electrophoresis of gDNA fragmented by FTP is demonstrated in Fig 2. As seen in this figure, both DNase I and SD polymerase are required for the DNA fragmentation and complete separation of the fragments (Fig 2, lanes 4 and 5).
Fragmentase and other methods of fragmentation—with the exception of Illumina's Nex-tera tagmentation—generate DNA fragments by introducing nicks and counter nicks in DNA strands that disassociate at 8-12 nucleotides downstream or upstream from the nick site. Thus, the generated fragments need repair of DNA ends for the subsequent NGS library construction [1, 2]. Unlike in other methods, in FTP the DNA fragments are separated by strand-displacement DNA polymerization and not by counter nicks. SD polymerase also carries out A-tailing of the ends. As a result of FTP, double-stranded DNA fragments have ends that are suitable for direct NGS library construction and the additional step of DNA end repair is no longer necessary.
NGS library constructions from Fragmentase and FTP -digested gDNA
Two techniques—FTP and standard Fragmentase—were used to digest the gDNA of the E. coli strain BL21(DE3). The fragmented DNA samples were then used for the construction of NGS libraries with NEBNext Ultra II DNA Library Prep Kit from New England Biolabs. Four libraries were prepared from the DNA samples digested with Fragmentase by the standard protocol, which included the stage of DNA end repair.
Another four libraries were prepared using the same NEBNext kit, but the DNA samples for these libraries were generated with the FTP method without the stage of DNA end repair. It is worth noting that when the DNA fragments are obtained by physical fragmentation or from the Fragmentase method, the repair of the DNA ends is necessary for the library's construction [1, 2]. The FTP method does not require this step; therefore, the procedure of NGS library preparation is simpler. As mentioned above, FTP generates A-tailed DNA fragments which are suitable for direct ligation with T-tailed adaptors. As a result, the preparatory time for NGS library creation has decreased by 70 minutes—from 180 minutes (the preparation with the end repair stage) to 110 minutes (without the stage of end repair).
The DNA amount in each library was quantified with the Quant-iT PicoGreen dsDNA Assay Kit and with the Agilent 2200 TapeStation. All NGS libraries generated with both the Fragmentase and the FTP method contained similar amounts of ds DNA (800 ± 50 ng) and had similar mean insert sizes of the libraries in a range from 400 to 500 bp. This result shows that the yield of the NGS libraries generated with the FTP method is comparable to the yield obtained with the Fragmentase technique.
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
5/12
PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
SD pol	-	+	^1 +	+
DNAse 1	-	-	+ +	+
\- 1.5 Kb \- 1.0 Kb
\- 0.5 Kb
\- 0.1 Kb
M1
1
M2
Fig 2. Agarose-gel electrophoresis of gDNA fragmented by the FTP method. gDNA of E. coli BL21 was incubated as described in Materials and Methods: without enzymes (lane 1), with SD polymerase (lane 2), with DNase I (lane 3), and with both DNase I and SD polymerase (lane 4 and 5). Ml: 1 kb DNA Ladder; M2: 100 bp DNA Ladder.
https://doi.orq/1Q.1371/iojrnal.pone.Q21Q374.qQQ2
Assessment of NGS libraries generated from Fragmentase and FTP -digested gDNA
The NGS libraries of E. coli BL21(DE3) gDNA were sequenced at 48x depth with an Illumina MiSeq Instrument. The raw data (about 220 Mb for each DNA sample) generated in this study have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive under BioProject accession number PRJNA509202 (https://www.ncbi.nlm.nih. gov/sra/PRTNA509202).
Different fragmentation and NGS library preparation protocols could potentially affect the quality of the reads. We therefore estimated the quality of reads as described in [2] for comparison of different fragmentation methods. PHRED quality scores for each base provide a sequencing error estimate and are a good tool to assess the quality of sequences and to
PLOS ONE | https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
6/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
compare the reliability of different sequencing runs on the same instrument [16]. We did not detect any significant differences in the quality scores obtained from the Fragmentase and FTP NGS libraries (S2 Fig).
The randomization of DNA digestion for both fragmentation methods was compared by nucleotide composition plots which show the mean base composition for every read cycle of NGS and indicate—at the beginning of the reads—the quality of the random fragmentation (Fig 3A). The difference between the mean base composition for every read cycle and the average base composition in the reads was estimated using the chi-squared test (Fig 3B).
The deviations of the plots from the average base composition in the first three positions of the reads (Fig 3A) and the increased chi-square value at the first positions of the reads (Fig 3B) indicate that the sites of DNA fragmentation for both enzymatic methods are partly associated with DNA sequence contents. This is no surprise because all methods of fragmentation are partly sequence-specific [2, 6,10, JJJ. We expected a lower randomization of FTP DNA digestion in comparison with Fragmentase because DNase I—used in FTP—is a sequence-dependent enzyme [8, 9]. However, the FTP method provided the better randomization of the fragmentation sites than Fragmentase (Fig 3). Perhaps the generation of overlapping sequences at the ends of FTP fragments (Fig 1) counterbalances the sequence-dependent DNA nicking by DNase I.
For the efficient and complete extraction of information from the NGS assay, the full and uniform representation of the whole genome sequence in the NGS library is essential. Among other factors, this heavily depends on the level of randomization during the fragmentation step of the library preparation. To assess the representation of the sequences in the FTP and Fragmentase libraries, we visualized the read coverage uniformity over the genome (Fig 4A) and GC coverage bias (Fig 4B) for both methods. As the reference sequence, the E.coli BL21(DE3) genome sequence (NCBI Ref Seq: NC_012971.2) was used.
To evaluate the read coverage uniformity throughout the genome, Lorenz curves were used. A Lorenz curve shows the cumulative fraction of reads as a function of the cumulative
Base pair position from 3' end of the sequence reads Base pair position from 3' end of the sequence reads
Fig 3. Comparison of the mean nucleotide compositions in the reads of FTP- and Fragmentase-generated NGS libraries. (A) Bias plots showing mean (by replication) percentage of observed bases at each position of reads for Fragmentase (solid lines) and FTP (dotted lines) methods of fragmentation. (B) % values of observed bases at each position of reads (calculated by Pearson %2 test for given probabilities) for Fragmentase (red) and FTP (blue) methods. Given probabilities are mean probabilities for each nucleotide from position 11 to position 149. Smaller %2 values indicate that the observed probability of bases at the given position is closer to the mean probabilities at the non-bias region and less bias is observed.
https://doi.orq/1Q.1371/iojrnal.pone.021Q374.qQQ3
PLOS ONE | https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
7/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
I Fragmentase ■ FTP
Cumulative fraction of reads GC content per 100 bp window (%)
Fig 4. Coverage uniformity evaluation. Cumulative read coverage was visualized as Lorenz curves (A) and GC bias of the coverage was estimated as normalized coverage over GC content for both Fragmentase (red curves) and FTP (blue curves) methods. (A) The Lorenz curves show the cumulative fraction of the genome as a function of the cumulative fraction of the reads. Perfectly uniform coverage would result in a diagonal line (black). Fragmentase and FTP methods exhibit the same deviations from the diagonal as a result of biased coverage. (B) The GC bias plots show the normalized coverage as a function of GC content. The black horizontal line (normalized coverage = 1) represents an ideally uniform coverage and any divergence from it indicates either oversequencing (normalized coverage > 1) or underrepresentation (normalized coverage < 1) of the sequences of particular GC content. Both methods give similar uniformity, while FTP provides better coverage for GC reach (> 55% GC content) sequences.
https://doi.orq/1Q.1371/iojrnal.pone.021Q374.qQQ4
fraction of the genome. The plotted curves (Fig 4A) demonstrate that both the Fragmentase and FTP methods exhibit the same uniformity.
GC coverage bias plots allow the evaluation of the read coverage depending on GC content. A normalized (relative) coverage in the plots is a relative measure of sequence coverage by the reads at a particular GC content. The plot visualizes the normalized coverage across the entire GC spectrum by grouping all 100-base sliding windows across the genome by their GC content and reporting the average normalized coverage for each GC content percentage. A normalized coverage of 1 indicates that a particular base is covered at the expected average rate. A relative coverage above 1 indicates higher than expected coverage and below 1 indicates lower than expected coverage. The obtained GC bias plots (Fig 4B) demonstrate similar uniform coverage depending on GC content, while FTP provides better uniform coverage for GC reach sequences.
There are several key characteristics of NGS that depend on the quality of the library: genome coverage, identity with a reference sequence, the rate of errors, and the number of unmappable sequences. These characteristics were estimated for different sequencing depths of the NGS libraries. For the simulation of different depths, random samples of NGS reads were generated. To compare the genome coverage (the total number of aligned bases in the reference divided by the genome size), we used the genome sequence NCBI Ref Seq: NC_012971.2 as the reference with the assumption that this represented 100% coverage. For the computation of genome coverage, a base in the reference genome is counted as aligned if there is at least one contig with at least one alignment to this base. Contigs from repeat regions may map to multiple places and thus may be counted multiple times in this quantity. Unmappable sequences were calculated as a rate of unmappable reads. A large fraction of these reads
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
8/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
Table 1. Key averaged NGS characteristics of Fragmentase- and FTP- generated libraries.
Sequencing depth (number of reads)	Method of DNA fragmentation	Genome coverage (%)	Ref. Seq. identity (%)	Mismatch errors (per 100 kb)	Indel errors (per 100 kb)	Unmappable reads (%)
32 x depth (1 Ox 105 reads)	Fragmentase	98.226	99.999	1.01	0.24	3.07
	FTP	98.224	99.999	1.02	0.14	3.91
16x depth (5xl05 reads)	Fragmentase	98.193	99.999	1.05	0.13	3.09
	FTP	98.200	99.999	1.17	0.16	3.92
8x depth (2.5xl05 reads)	Fragmentase	98.042	99.996	3.70	0.22	3.17
	FTP	98.068	99.996	4.02	0.24	3.90
3x depth (lxlO5 reads)	Fragmentase	91.100	99.974	25.23	0.70	3.13
	FTP	90.908	99.971	27.70	1.21	3.90
The mean NGS statistics per library were calculated from the data of the four independent libraries for the each method. All metrics were obtained for different depths of E. coli BL21 genome sequencing. We found no significant differences between Fragmentase- and FTP- generated NGS libraries.
https://doi.orq/1Q.1371/iojrnal.pone.021Q374.tQQ1
would reduce the efficiency and the apparent coverage of the genome sequencing. The rate of indels was estimated as the average number of single nucleotide insertions or deletions per 100,000 aligned bases, and the rate of mismatches was estimated as the average number of mismatches per 100,000 aligned bases. The resulting average data from the NGS analyses are shown in Table 1. The statistics for the Fragmentase and FTP NGS libraries were calculated from the data of the four independent libraries for each fragmentation method. The detailed data for each NGS library are shown in the Supporting information (SI Table). The obtained characteristics are identical or very similar for the assembled sequences from the libraries generated by the different methods (Table 1). The FTP method gives a greater proportion of unmappable reads compared to Fragmentase, but the difference is less than 1% of all reads in the library. It can be explained by the assumption that FTP generates additional non-specific sequences during the polymerization stage of the fragmentation. Potentially, FTP may increase the level of mismatches, because SD polymerase does not have proofreading activity. In practice, we did not see any significant difference between the methods. Proportions of FTP/Frag-mentase mismatches are equal 1 for deep sequencing and 1.08 for shallow (3x depth) sequencing.
To evaluate the de novo genome assembly of the Fragmentase and FTP libraries, we used QUAST software (quality assessment tool for genome assemblies) [15]. We compared the following assembling metrics:
• Number of contigs: the total number of contigs in the assembly.
• Largest contig: the length of the largest contig in the assembly.
• Total length: the total number of bases in the assembly.
• N50 and N75: the contig length such that using equal or longer length contigs produces at least 50% and 75% (respectively) of the bases of the assembly length [15, 17,181.
• NG50 and NG75: the contig length such that using equal or longer length contigs produces at least 50% and 75% (respectively) of the length of the reference genome, rather than 50% and 75% of the assembly length [15,17,18].
The assembly metrics were calculated for different sequencing depths of the libraries obtained with the Fragmentase and FTP methods. The mean statistics calculated from the data
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
9/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
Table 2. The averaged assembly metrics of the NGS libraries obtained by Fragmentase and FTP methods.
Sequencing depth (number of reads)	Method of DNA fragmentation	Number of contigs	Largest contig (bp)	Total length (bp)	N50	NG50	N75	NG75
32 x depth (10x10s reads)	Fragmentase	182	272230	4485104	81481	80813	41851	40741
	FTP	195	265892	4484951	81981	80479	43990	41542
16x depth (5xl05 reads)	Fragmentase	204	217222	4483651	69766	69259	37530	36159
	FTP	196	194626	4484098	70654	69018	39773	39008
8x depth (2.5xl05 reads)	Fragmentase	304	134506	4478279	45010	44221	26078	24944
	FTP	274	133551	4479908	41368	40106	21611	19769
3x depth (lxlO5 reads)	Fragmentase	2414	14250	4178082	2886	2689	1753	1476
	FTP	2500	15256	4178040	2666	2456	1628	1348
The mean assembly statistics were calculated from the data of the four independent libraries for each method and for the different depths of the E. coli BL21 genome sequencing. No significant differences between Fragmentase- and FTP- generated NGS libraries were found.
https://doi.orq/1Q.1371/iojrnal.pone.Q21Q374.tQQ2
of the four independent libraries for each fragmentation method are shown in Table 2. The metrics for each NGS library are shown in the Supporting information (S2 Table). Our results demonstrate that the characteristics of the genome assembly of the libraries obtained by the novel FTP method are similar to those obtained by the Fragmentase method (Table 2). Fragmentase gives slightly better N50 and N75 metrics for 3x and 8x sequencing depths than FTP, but the difference is not significant because proportions of N50, NG50, N75, NG75 at 3x sequencing depth between Fragmentase and FTP are equal to 1.07-1.09 (close to 1). For deep NGS sequencing (16x and 32x depths), FTP gives the same or slightly better N50 and N75 metrics when compared to Fragmentase.
In summary, the Fragmentation Through Polymerization method is a novel, robust, and simple method of DNA fragmentation which is suitable for NGS. In comparison with Fragmentase, it provides very similar characteristics for NGS libraries. Potential disadvantages of FTP are associated with biases of the enzymes used in the method, such as non-random DNA fragmentation and mismatch errors. These characteristics of FTP were compared with the Fragmentase method. The experimental data demonstrate that FTP yields higher quality random fragmentations (Fig 3) and better coverage of GC reach contents (Fig 4B) than Fragmentase. Levels of mismatch errors are similar for both methods. FTP generates a greater number of unmappable reads than Fragmentase, but the difference is less than 1% of all reads in the library.
The main advantage of the FTP method lies in the simplification of NGS library preparation by eliminating the DNA end repair and A-tailing stage from the protocol. In the result, the work time of the procedure can be decreased from 180 minutes to 110 minutes (the repair/A-tailing stage takes 70 minutes according to the manual). Additionally, it can reduce the price of the library preparation. For example, the current price of the NEBNext Ultra II DNA Library Prep kit for 24 reactions is 535 Euros; the price of the NEBNext Ultra II End Repair/dA-Tailing Module for 24 reactions is 262 Euros. Thus, the elimination of this module from the kit can decrease the primary cost of NGS library preparation.
Based on our data we hope that the FTP method can become a helpful tool for NGS.
Supporting information
SI Table. Key NGS characteristics of individual libraries generated by Fragmentase (A) and FTP (B) methods. All metrics were obtained for different depths of E. coli BL21 genome sequencing. (DOC)
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
10/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
S2 Table. The assembly metrics of the individual NGS libraries obtained by Fragmentase and FTP methods. All metrics were obtained for different depths of E. coli BL21 genome sequencing. (DOC)
51 Fig. Optimization of FTP conditions to generate DNA fragments with an optimal average size. (A) FTP reactions were performed as described in the Materials and Methods with the different concentrations of DNase I in the reaction mixtures. The obtained DNA fragments were analyzed by agarose-gel electrophoresis. The mixtures contained the following concentrations of DNase I: 1 ng/ul (line 1); 1.5 ng/ul (line 2); 1.875 ng/ul (line 3); 2.25 ng/ul (line 4). M: 100 bp DNA Ladder. Concentration 1 ng/ul of DNase I (line 1) provided the targeted average size (400-600 bp) of the fragments.
(B) FTP reactions were performed as described in the Materials and Methods with the different times of incubation at 30°C. The following times were used for the incubation: 10 min. (line 1); 20 min. (line 2); 45 min. (line 3). M: 100 bp DNA Ladder. The incubation at 30°C for 20 minutes (line 2) provided the targeted average size of the fragments (400-600 bp). (TIF)
52 Fig. Comparison of the sequence qualities scores (PHRED) at the 38-ends of the sequences that have been generated from the NGS libraries constructed with the Fragmentase (red) and FTP (blue) methods of DNA fragmentation. No differences were found between the libraries.
(TIF)
Acknowledgments
We thank Syntol JSC (Moscow, Russia), Evrogen JSC (Moscow, Russia) and Bioron GmbH (Ludwigshafen, Germany) for support of this project.
Author Contributions
Conceptualization: Konstantin B. Ignatov, Konstantin A. Blagodatskikh, Dmitry S. Shcherbo, Vladimir M. Kramarov.
Data curation: Konstantin B. Ignatov, Konstantin A. Blagodatskikh, Dmitry S. Shcherbo, Vladimir M. Kramarov.
Formal analysis: Konstantin B. Ignatov, Dmitry S. Shcherbo, Vladimir M. Kramarov. Funding acquisition: Konstantin B. Ignatov, Vladimir M. Kramarov.
Investigation: Konstantin B. Ignatov, Konstantin A. Blagodatskikh, Dmitry S. Shcherbo, Yulia A. Monakhova.
Methodology: Konstantin B. Ignatov, Konstantin A. Blagodatskikh, Dmitry S. Shcherbo. Resources: Konstantin B. Ignatov.
Supervision: Konstantin B. Ignatov, Tatiana V. Kramarova.
Validation: Konstantin B. Ignatov, Tatiana V. Kramarova, Vladimir M. Kramarov. Visualization: Konstantin B. Ignatov.
Writing - original draft: Konstantin B. Ignatov, Konstantin A. Blagodatskikh, Dmitry S. Shcherbo, Tatiana V. Kramarova.
PLOS ONE I https://doi.orq/10.1371/iournal.pone.0210374  April 1,2019
11/12
•$PLOS ONE
Fragmentation Through Polymerization (FTP): A new method to fragment DNA
Writing - review & editing: Konstantin B. Ignatov, Tatiana V. Kramarova.
References
1. Head SR, Komori HK, LaMere SA, WhisenantT, Van Nieuwerburgh F, Salomon DR, et al. Library construction for next-generation sequencing: overviews and challenges. BioTechniques. 2014; 56: 61-64, 66, 68, passim. https://doi.Org/10.2144/000114133 PMID: 24502796
2. Knierim E, Lucke B, Schwarz JM, Schuelke M, Seelow D. Systematic comparison of three methods for fragmentation of long-range PCR products for next generation sequencing. PLoS ONE. 2011; 6: e28240. https://doi.org/10.1371/iournal.pone.0028240 PMID: 22140562
3. Kasoji SK, Pattenden SG, Male EP, Jayakody CN, Tsuruta JK, Mieczkowski PA, et al. Cavitation Enhancing Nanodroplets Mediate Efficient DNA Fragmentation in a Bench Top Ultrasonic Water Bath. PLoS ONE. 2015; 10: e0133014. https://doi.org/10.1371/iournal.pone.0133014 PMID: 26186461
4. Costello M, Pugh T, Fennell T, Stewart C, Lichtenstein L, Meldrim J, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Research. 2013; 41: e67. https://doi.org/10.1093/ nar/qks1443 PMID: 23303777
5. Marine R, Poison SW, Ravel J, Hatfull G, Russell D, Sullivan M, et al. Evaluation of atransposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA. Appl Environ Microbiol. 2011; 77: 8071-8079. https://doi.Org/10.1128/AEM.05610-11 PMID: 21948828
6. Lan JH, Yin Y, Reed EF, Moua K, Thomas K, Zhang Q. Impact of three lllumina library construction methods on GC bias and HLA genotype calling. Human Immunology. 2015; 76: 2-3. https://doi.org/10. 1016/i.humimm.2014.12.016 PMID: 25543015
7. Ignatov KB, Barsova EV, Fradkov AF, Blagodatskikh KA, Kramarova TV, Kramarov VM. A strong strand displacement activity of thermostable DNA polymerase markedly improves the results of DNA amplification. BioTechniques. 2014; 57: 81-87. https://doi.org/10.2144/000114198 PMID: 25109293
8. Brukner I, Jurukovski V, Savic A. Sequence-dependent structural variations of DNA revealed by DNase I. Nucleic Acids Research. 1990; 18: 891-894. https://doi.Org/10.1093/nar/18.4.891 PMID: 2179873
9. Brukner I, Sanchez R, Suck D, Pongor S. Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. The EMBO Journal. 1995; 14:1812-1818. https://doi.org/10. 1002/i.1460-2075.1995.tb07169.x PMID: 7737131
10. Grokhovsky SL, N'icheva IA, Nechipurenko DY, Golovkin MV, Panchenko LA, Polozov RV, Nechipur-enko YD. Sequence-Specific Ultrasonic Cleavage of DNA. Biophysical Journal. 2011; 100:117-125. https://doi.Org/10.1016/i.bpi.2010.10.052 PMID: 21190663
11. Poptsova MS, N'icheva IA, Nechipurenko DY, Panchenko LA, Khodikov MV, Oparina NY, Grokhovsky SL. Non-random DNA fragmentation in next-generation sequencing. Scientific Reports. 2014; 4: 4532. https://doi.Org/10.1038/srep04532 PMID: 24681819
12. Dodt M, Roehr JT, Ahmed R, Dieterich C. FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms. Biology (Basel). 2012; 1: 895-905. https://doi.org/10.3390/ bioloqyl030895 PMID: 24832523
13. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie2. Nat Methods. 2012; 9: 357-359. https://doi.Org/10.1038/nmeth. 1923 PMID: 22388286
14. Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018; 34: i142—H50. https://doi.Org/10.1093/bioinformatics/bty266 PMID: 29949969
15. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013; 29:1072-1075. https://doi.org/10.1093/bioinformatics/btt086 PMID: 23422339
16. Richterich P. Estimation of errors in "raw" DNA sequences: a validation study. Genome Res. 1998; 8: 251-259. PMID: 9521928
17. Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 2011; 21: 2224-2241. https://doi.org/10. 1101/qr.126599.111 PMID: 21926179
18. Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010; 95: 315-327. https://doi.orq/10.1016/i.yqeno.2010.03.001 PMID: 20211242
PLOS ONE | https://doi.org/10.1371/iournal.pone.0210374  April 1,2019
12/12