molecular methodology – Page 2

The X-games of PCR

This is not your Mom’s PCR. These new kids on the block are making PCR extremely fast. PCR (Polymerase Chain Reaction) technology won the Nobel Prize for allowing molecular research to advance much more rapidly (for an interesting read on the quirky Laureate who gave up science to go surfing, read more here: Wikipedia ). It has become the most commonly used work horse of most molecular diagnostic assays, usually in the form of real-time PCR. It is used for a variety of purposes from detecting bacteria and viruses, identity testing for forensics and bone marrow engraftment, cancer mutation analysis, and even sequencing by synthesis used by Illumina for massively parallel sequencing.

This technique is still limited by requiring highly trained technologists to perform DNA extraction, time-consuming processing, and the time of real-time PCR itself. Overall, this process takes about a 5-8 hours. While this is much faster than in the past, it would be unacceptable for use in the point-of-care (POC).

But why would DNA testing need to be POC? The term sounds like an oxymoron in a field where many results have a 2-month turnaround time. There are certain circumstances where molecular testing would impact patient care. For instance, a doctor testing a patient in their office for a sexually transmitted infection would want to know if they have gonorrhea/ chlamydia so they could prescribe proper antibiotics. Similarly, POC molecular testing could be applied in a bioterrorism incident to test samples for an infectious agent. Or POC testing would benefit low-resource areas internationally where HIV testing could be used to manage anti-retroviral therapy in patients many miles from a laboratory.

For PCR as a test to be useful at the POC setting, it would have to provide a result within 10-15 minutes and be performed as a waived test. Two recent examples the demonstrate how this is possible have been highlighted at recent conferences of the American Association of Clinical Chemistry, which I just got back from: Extreme PCR¹ and Laser-PCR.²

Extreme PCR refers to a technique of rapidly cycling the temperature of PCR reactions. The reaction occurs in a thin slide that evenly distributes the reagents, temperature and is clear to permit easy reading of fluorescence measurements (Figure 1). DNA Polymerase enzyme and primers to amplify the target DNA are added at much higher concentrations than normal (20x).

Figure 1. Thin reaction chamber for ultra-fast PCR.

This flies in the face of traditional PCR chemistry dogma as specificity would plummet and normal DNA could be amplified instead of target DNA. This would create a false positive. However, let’s think about what is actually happening with non-specific reactions. Primers are designed to match one region of DNA, which is very unique within the whole genome. However, the genome is so large that some segment may look very similar and be different in just 1 or 2 of the 20 base pairs that a primer matches. A primer could bind to this alternate region but less efficiently. So, the binding would be weaker and take more time to occur.

Therefore, by speeding up the cycling time to just a few seconds, only the most specific interactions can take place and non-specific binding is offset (Figure 2)!

Figure 2. Fluorescence from a dye that fluoresces when bound to double stranded DNA, which is increasing here within seconds (high point represents when the reaction temperature cools and dsDNA anneals, then low points represent heating to high temperatures).

Laser PCR does not report the use of increased reagents like Extreme PCR (it may be proprietary), but they boast a very innovative method to quickly heat and cool PCR reactions. GNA Biosciences use gold nanoparticles with many DNA adapters attached (Watch the video below for a great visual explanation!).

These adapters are short sequences of DNA that bring the target DNA and primers together to amplify the target DNA sequence. Then as the name implies, a laser zaps the gold beads and heats them up in a very localized area that releases the DNA strands. The released DNA binds another gold particle, replicates, rinses, and repeats. The laser energy thus heats the gold in a small area that allows for quick heating and cooling within a matter of seconds.

These new PCR methods are very interesting and can have a big impact on changing how molecular pathology advances are brought to the patient. On a scientific note, I hope you found them as fascinating as I did!

References

Myrick JT, Pryor RJ, Palais RA, Ison SJ, Sanford L, Dwight ZL, et al. Integrated extreme real-time PCR and high-speed melting analysis in 52 to 87 seconds. Clin Chem 2019;65:263–71.
CLN Stat. A Celebration of Innovation. AACC’s first disruptive technology award to recognize three breakthrough diagnostics. https://www.aacc.org/publications/cln/cln-stat/2018/july/10/a-celebration-of-innovation
G. Mike Makrigiorgos. Extreme PCR Meets High-Speed Melting: A Step Closer to Molecular Diagnostics “While You Wait” Clin Chem 2019.

-Jeff SoRelle, MD is a Chief Resident of Pathology at the University of Texas Southwestern Medical Center in Dallas, TX. His clinical research interests include understanding how the lab intersects with transgender healthcare and improving genetic variant interpretation.

Next Generation Sequencing: Types of Variants

We have reviewed from start to finish the next generation sequencing wet bench process, data review and troubleshooting. I’d like to take a more in-depth look at the types of variants that can be detected by the targeted amplicon NGS panels that our lab performs: single nucleotide variants, multi-allelic variants, multi-nucleotide variants, insertions (including duplications), deletions and complex indels. In our lab, we review every significant variant and variant of unknown significance in IGV to confirm the call is made correctly in the variant caller due to the difficult nature of some of these variants. I have included screenshots of the IGV windows of each of these types of variants, to show what we see when we review.

Single Nucleotide Variants (SNV)

The most common (and straight forward) type of variant is a single nucleotide variant – one base pair is changed to another, such as KRAS c.35G>A, p.G12D (shown below in reverse):

Multi-allelic Variants

A multi-allelic variant has more than one change as a single base pair (see below – NRAS c.35G>A, p.G12D, and c.35G>C, p.G12A – shown below in reverse). This may be the rarest type of variant – in our lab, we have maybe seen this type in only a handful of cases over the last four years. This could be an indication of several clones, or different variants occurring over a period of time.

Multi-nucleotide Variants (MNV)

Multi-nucleotide variants are variants that include more than one nucleotide at a time and are adjacent. A common example is BRAF p.V600K (see below – in reverse) that can occur in melanoma. Two adjacent nucleotides are changed in the same allele. These variants demonstrate one advantage NGS has over dideoxy (Sanger) sequencing. In dideoxy sequencing, we can see the two base pair change, but we cannot be certain they are occurring on the same allele. This is an important distinction because if they occurred on the same allele, they probably occurred at the same time, whereas, if they are on different alleles, they were probably two separate events. It is important to know for nomenclature as well – if they are on the same allele, it is listed as one event, as shown below (c.1798_1799delGTinsAA, p.V600K) as opposed to two separate mutations (c.1798G>A, p.V600M and c.1799T>A, p.V600E). As you can see in the IGV window below, both happen on one strand.

Insertions/Duplications

Insertions are an addition of nucleotides to the original sequence. Duplications are a specific type of insertion where a region of the gene is copied and inserted right after the original copy. These can be in-frame or frameshift. If they are a replicate of three base pairs, the insertion will move the original sequence down, but the amino acids downstream will not be affected, so the frame stays the same. If they are not a replicate of three base pairs, the frame will be changed, causing all of the downstream amino acids to be changed, so it causes a frameshift. A common example of a frameshift insertion is the 4bp insertion in NPM1 (c.863_864insCTTG, p.W288fs) that occurs in AML. In IGV, these are displayed by a purple hash that will show the sequence when you hover over it.

Deletions

Deletions, on the other hand, are when base pairs are deleted from the sequence. These can be in-frame or frameshift, as well. An example is the 52bp deletion (c.1099_1150del, p. L367fs) found in the CALR gene in cases of primary myelofibrosis or essential thrombocythemia.

Complex Indels

Lastly, NGS can detect complex indels. These, again, are a type of variant that we could not distinguish for sure using dideoxy sequencing. We would be able to detect the changes, but not whether or not they were occurring on the same strand, indicating the changes occurred at the same time. The first example is a deletion followed by a single nucleotide change – since these both occur on the same strand, they most likely occurred together, so they are called one complex deletion/insertion event (KIT c. 1253_1256delACGAinsC, p. Y418_D419delinsS). First the ACGA was deleted, then a C was inserted.

The last example involves multiple nucleotides changes all in the same vicinity (IGV is in reverse for this specimen as well). Using HGVS nomenclature as in all the previous examples, this would be named RUNX1 c.327_332delCAAGACinsTGGGGT, p.K110_T111delinsGV.

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.

Genetic Results: Set in Stone or Written in Sand?

This month, I’m switching gears to another interest of mine: Molecular Pathology. I am currently in fellowship for Molecular Genetic Pathology which exposes me to unique, thought-provoking cases.

Advances in genomic sequencing has allowed multiple genes to be analyzed in a single laboratory test. These so-called gene panels have increased diagnostic yield when compared to serial gene sequencing in syndromic and non-syndromic diseases with multiple genetic etiologies. However, interpretation of genetic information is complicated and evolving. This has led to wide variation in how results are reported. A genetic test result can either be positive (pathogenic or likely pathogenic), negative (benign or likely benign) or uncertain (variant of uncertain significance- VUS). A VUS may just be part of what makes each individual unique and doesn’t have enough evidence present to say that it is pathogenic or benign. Many results come back like this and can be frustrating for patients to hear and for genetic counselors and clinicians to explain.

Initial approaches to exclude benign variants through sequencing 100 “normal people” to determine the frequency of common variants in the population was fraught with bias. The “normal population” initially was constructed mostly of individuals with white European descent. Not surprisingly, lack of genetic diversity in control populations lead to errors in interpretation.

Fortunately, there are now several publicly available databases that exist to help determine whether gene variants are damaging. The first important piece comes from population sequencing efforts. These projects performed whole exome sequencing of hundreds or thousands of individuals to find variants that might be rarely expressed in a more genetically diverse population. If a variant occurs in a normal health population at a frequency >1%, then it likely doesn’t cause a severe congenital disease that would in turn prevent that genetic variant from being passed on.

The Exome Association Consortium (ExAC)¹, which has been rolled into the larger gnomAD (genome aggregation database) database now contains sequencing information on 120,000 individuals (Figure 1). The smaller ESP (Exome Sequencing Project) was a project by the NHLBI division of NIH and sequenced several patients with different cardiovascular and pulmonary diseases.

Figure 1. Number and percent of various ethnicities present in 4 major population sequencing projects.

While there is ethnic diversity present in this database, the 1000 genomes project² furthered efforts by searching all over the world to get genetic information from around 100 ethnically and geographically distinct sub-populations (Figure 2).

Figure 2. Geographic map of populations sequenced by the 1000 Genomes Project.

With use of these databases, we can effectively rule out rare polymorphisms as benign when they are expressed in several healthy individuals and especially when expressed in the homozygous state in a healthy individual. Before, it was common for a person of an ethnic minority to have different variants compared to predominantly European cohorts. In many cases, this led to uncertain test results.

One way to deal with these VUSs is for a lab to periodically review their test results in light of new knowledge. Although the CAP has a checklist³ item that requires a lab to have a policy about reassessing variants and actions taken. However, this item doesn’t require a lab to communicate the results with a physician and doesn’t specify how often to reanalyze variants. Before last year, there weren’t even any studies that indicated how often variant reanalysis should occur. Variant reanalysis had only been studied in a limited context of whole exome sequencing for rare diseases to improve the diagnostic yield⁴. However, this did not address the issue of frequent VUSs to determine how often they were downgraded to benign or upgraded to pathogenic.

One example of how reclassification can occur is illustrated in the case of a young African American boy who had epilepsy and received a genomic test that covered a panel of genes known to be involved in epilepsy in 2014. Two heterozygous VUS were reported back for EFHC1 (EFHC1 c.229C>A p. P77T and EFHC1 c.662G>A p. R221H), which causes an autosomal dominant epilepsy syndrome when one allele is damaged. However, this variant could later be reclassified as benign by looking at population databases. The ExAC database showed an allele frequency of 2.5% in African Americans and the 1000 Genomes database showed an 8.8% frequency in the GWD subpopulation (Gambian Western Divisions).

This case demonstrates the importance of reanalyzing genetic test results as medical knowledge continues to evolve. Recently studies looking at reclassification rates of epilepsy⁵ and inherited cancer syndromes⁶ have been published in JAMA journals and demonstrate that reclassification of variants is common. It is thus important for laboratories to periodically review previously reported variants to provide optimal quality results and patient care. I will elaborate on this further in the next blog post.

References:

Lek M, Karczewski KJ, Minikel EV, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285-291.
The 1000 Genomes Project Consortium, Auton A, Brooks LD, et al. A global reference for human genetic variation. Nature. 2015;526:68-74.
Sequence Variants – Interpretation and Reporting, MOL.36155. 2015 College of American Pathologists (CAP) Laboratory Accreditation Program Checklist.
Costain G, Jobling R, Walker S. Periodic reanalysis of whole-genome sequencing data enhances the diagnostic advantage over standard clinical genetic testing. Eur J Hu Gen. 2018.
SoRelle JA, Thodeson DM, Arnold S, Gotway G, Park JY. Clinical Utility of Reinterpreting Previously Reported Genomic Epilepsy Test Results for Pediatric Patients. JAMA Pediatr. 2018 Nov 5:e182302.
Mersch J, Brown N, Pirzadeh-Miller, Mundt E, Cox HC, Brown K, Aston M, Esterling L, Manley S, Ross T. Prevalence of Variant Reclassification Following Hereditary Cancer Genetic Testing. JAMA. 2018 Sep 25;320(12):1266-1274.

-Jeff SoRelle, MD is a Molecular Genetic Pathology fellow at the University of Texas Southwestern Medical Center in Dallas, TX. His clinical research interests include understanding how the lab intersects with transgender healthcare and advancing quality in molecular diagnostics.

This work was produced with the guidance and support of:

Dr. Jason Park, MD, PhD, Associate Professor of Pathology, UT Southwestern Medical Center

Dr. Drew Thodeson, MD, Child Neurologist and Pediatric Epileptologist

Evaluating and Analyzing Next Generation Sequencing Specimen Results

Welcome back – in my previous blog we discussed how a run is evaluated on the Ion Torrent instrument. This quarter’s blog will review the individual specimen results from that run.

First off, we take a look at how many reads per specimen have been sequenced and how those reads performed over the areas that are targeted. For the AmpliSeq Cancer Hotspot Panel v2 that we run, there are a total of 207 amplicons that are created and sequenced. To assess the depth of coverage over these amplicons, we need to think about the biology of the tumor cells and the limit of detection of the assay. We feel confident that we can detect 5% variant allele frequency for single nucleotide changes, and 10% variant allele frequency for insertions or deletions. In order to be confident that we are not missing variants, we require the specimen has a tumor percentage greater than 20%. This is because, for a given tumor, it can be assumed that if it is mutated, it will be only heterozygous – only one of the two alleles will have the variant. This automatically halves the possible allele frequencies from any given tissue. If a colon specimen that we are given to test has a tumor percentage of 40%, it can be assumed that any variant will have a variant allele frequency of no more than 20%. Because of this then, we also require the amplicons that are sequenced to have at least 500x coverage – they need to be sequenced at least 500 times so that if we have a 5% mutation, we will see it in 25 of the reads and we can feel confident this is an actual change, as opposed to background noise.

Next, we look at the On Target percentage and Uniformity percentage (over 95% for each is expected). The On Target value tells us what fraction of the amplicons actually cover the 207 amplicons that are in the panel. Uniformity informs us of how even the number of reads is over all the 207 amplicons – were they all equally represented or were there a subset of these that had more coverage than the others? This information can actually lead us to further testing – if there is a subset of amplicons that have more coverage than the rest, and it they are all from one gene, this may indicate gene amplification. In these cases, the clinician is alerted and additional testing can confirm the amplification.

All of this coverage information is provided by one of the “plugins” we run after the basecalling and alignment are finished:

The most useful (and interesting!) information is gathered from the variant calling plugin. This plugin compares the specimen sequences with the reference sequences and reports the differences – the “variants”. Many of the variants that are detected are single nucleotide polymorphisms (variants that are detected in greater than 1% of the population). They could also be known artifacts of the sequencing itself. These are all analyzed and categorized in the validation of the assay and then can be filtered out when analyzing clinical data. After filtering out the known SNPs and artifacts, the somatic changes can then be evaluated. Generally, the panel will detect 15-20 variants, but after filtering only 1-4 variants will be somatic changes. Each change that is detected is reviewed using a program called IGV, shown below. We compare the sequence to confirm that what the plugin is reporting looks correct in the actual reads from the sequencer. See screenshots below of a subset of variants called, then filtered, and analyzed in IGV. While the plugin is exceptionally good at variant calling, no program is perfect and visualizing the data is still necessary to confirm there is not anything else going on in the area that is sequenced. The fastq file from the run is also run through a secondary software to compare results. The variants for each specimen are assessed for variant allele frequency, coverage and quality in both software.

VariantCaller Output

Filtered Calls: White cells means SNP, Blue cells mean possible somatic call

IGV Output for KRAS and STK11 calls:

Lastly, the results are brought into yet another software to be reported. This software will allow the pathologists to assign significance to the variants. It will also pull in any treatment information linked to the variants and then allow the pathologist to pick any applicable clinical trials in order to assist the clinician as much as possible. In future blogs we will take a look at cases like this to see interesting findings of oncology cases.

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.

Data Analysis for NGS by Ion Torrent – Part One – How Did the Run Perform?

Here comes the fun part. It’s taken a day for library prep, an overnight run for the clonal amplification; the next day includes loading the chip with the ISPs and then running the chip on the sequencer. After the chip has run on the sequencer, the data is pushed from the sequencer (the PGM) to the server connected to the sequencer. This aspect of NGS surprised me – the size of the files is amazing – for one 316 chip, the file that includes all of the raw data averages about 100GB. To deal with this amount of data, the server attached to the sequencer is 12TB, and even still we have to have a procedure to deal with removing files off that sequencer to keep space for future runs.

Anyway, the raw data is pushed to the server and the data analysis begins. The Torrent Suite Software first analyzes the ISP info, as shown in the graphic below. It gives a “heat map” of the chip (the football shape) in which red means the wells in those areas were full with ISPs. Yellow means there are fewer ISPs and blue means there are none. So, you can see below, there is a small area of blue within the football shape – this area did not have any ISPs in it. 92% of the wells on this chip were filled, however, which is about the max a chip can be loaded.

dataana1

These ISPs are then broken down into categories. First, how many of the wells had ISPs in them – here, 92.5% of the 6,337,389 wells contained ISPs. Of those ISPs, 99.8% of them have product on them that can be sequenced (Live ISPs). Of those Live ISPs, 0.4% of them contain control Test Fragments and 99.6% of them contain actual patient sample library amplicons. The Test Fragments are spiked in prior to sequencing and act as a control to evaluate how the sequencing run performed. Lastly, the ISPs that contain patient sample library amplicons are analyzed. Those ISPs that contain more than one amplicon (say it has an amplicon of EGFR Exon 19 and another specimen’s amplicon of KRAS Exon 2) give mixed signals and cannot be analyzed, so they are thrown out of the data analysis and into a bin called “polyclonal”. Low quality ISPs are also thrown out – anything that did not pass the thresholds for quality. And lastly, ISPs that only contain adapter dimers are thrown out. For a run of AmpliSeq Cancer Hotspot Panel v2 specimens, most of which come from FFPE specimens that are low quality to start with, a run that contains over 50% Final Library ISPs is actually a very good run, interestingly enough. The 316v2 chips are rated to sequence 1 million reads (each ISP yields one read), and on this example run, over 3 million reads were sequenced, so this is a successful run.

After the ISPs are analyzed and the high quality ones are kept, the analysis goes on. The Torrent Suite software then calls the bases based on the raw flow data. These bases are then aligned to a reference, in our case hg19, a commonly used human genome reference. Quality scores are assigned at this point. A Phred-based quality score is used for NGS, shown in the table below.

dataana3

dataana4

Lastly, the reads are put into bins based on the barcode that was used for each patient specimen – remember the small part of the adapter that was added in library prep so that the specimens could be mixed together? The software reads that adapter sequence then assigns each read based on those sequences. The specimens should all have approximately the same number of reads since they were normalized to the same concentration at the end of library prep, but there may be some variability due to specimen quality, as you can see below.

dataana5

In next quarter’s post, we will dive into the individual specimen results!

rapp_small

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.

Next Generation Sequencing – Ion Torrent Semiconductor Sequencing

We’ve finally made it to the sequencing step of the NGS workflow. This post we will discuss the technology and process behind the Ion Torrent sequencing step. Next time, we will review the Illumina sequencing process.

When we left off, the final product of the clonal amplification had been prepared – Ion Sphere Particles (ISPs) covered in single stranded amplicons (hopefully all of the same amplicon). Next, control Ion Sphere Particles are added to the mix, along with sequencing primer, which is complimentary to one of the adapter sequences added back in library preparation. The primer is annealed to each of the amplicons on every ISP. This mixture of control ISPs and specimen ISPs is then loaded onto the chip. The size of the chip is determined by the number of bases needing to be sequenced. There are three different types of chips for the Personal Genome Machine (PGM) – 314, 316, 318 – and five different types for their GeneStudio S5 system (510, 520, 530, 540, 550), offering enough coverage for a single sample of a hotspot panel, all the way up to enough coverage for a specimen of exome sequencing. Each of the chips contains a top layer covered in tiny wells. Each well is just large enough to fit a single ISP. The ISP solution is loaded onto the chip, then flowed over it by centrifuging it in different directions, in order to attempt to get as many ISPs into wells as possible. The chip is then ready for sequencing.

Each well of the chip can be thought as of the smallest pH meter in the world. So before sequencing can be started, the instrument must be prepped (initialized) so that all of the reagents added to the chip are in the correct pH range. On the PGM, this takes approximately an hour and requires some hands-on steps and high quality 18MΩ water. On the GeneStudio S5, the reagents are added and the initialization is begun and, as long as everything works correctly, doesn’t require any other hands on time.

After the initialization is complete, the chip is loaded onto the instrument. The sequencing run is started and runs according to the plan prepared before the run. Thermo Fisher’s Ion Torrent uses semiconductor sequencing technology. Nucleotides are flowed over the chip one at a time. If the nucleotide is incorporated, a hydrogen ion is released. This release of hydrogen decreases the pH of the liquid surrounding the ISP. This pH change is then detected by the sensing layer beneath the well, where it is converted to a voltage change and is picked up by the software and recorded as that nucleotide. Let’s say two nucleotides in a row are incorporated (two G’s complementary to two C’s) – double the hydrogen is released, which results in double the signal, so the software will record two G’s in a row. The benefit of this type of technology is that it is fast – it only takes 15 seconds for each nucleotide flow, so a 200bp fragment can be sequenced in less than 3 hours.

ion-torrent — Image courtesy of http://www.genomics.cn/en/

rapp_small

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.

Template Preparation, Clonal Amplification, and Cluster Generation – Oh My! – Step Two in an NGS Setup

Hello again – let’s continue our discussion of Next Generation, or Massively Parallel, Sequencing and how it is performed. Over the last two blogs we have seen why NGS is being used in a Molecular Diagnostics Lab and how library preparation is executed. Specifically, we reviewed how Ion Torrent and MiSeq libraries may be prepared for DNA amplicon sequencing. The final product of this work is a collection of amplicons that have been amplified, barcoded, tagged with the appropriate platform adapters and purified. These are what compose a specimen’s “library.”

The next step in NGS preparation is template preparation. The main goal of this step is to create multiple copies of the same amplicon in close proximity so that when it is sequenced, it creates a strong enough signal to be detected. This occurs for each amplicon in the specimen’s library. Again, this technique is platform specific, so each has a different way to achieve this goal.

Ion Torrent “Template Preparation by Emulsion PCR” or “Clonal Amplification”

In the Ion Torrent method of template preparation, the multiple copies are created on an Ion Sphere Particle or ISP. This looks like a bead with primers all over the surface of it. Eventually this ISP will be deposited in a well on a chip and be sequenced. In order for this ISP to create enough of a signal to be detected by the instrument, it must have many copies of the fragment all over the surface of the ISP.

At the beginning of the clonal amplification step, a specific concentration of combined libraries is added to the instrument, along with all the components of a standard PCR (buffer, dNTPs, polymerase) with the addition of the Ion Sphere Particles, which provide the primer, and oil. The primers on the ISP are complementary to one of the adapters added during library preparation so that only the universal primer is necessary on the ISPs, instead of each individual gene-specific primer. Through a series of steps, ideally, what is produced is a droplet of oil containing one ISP, one sample’s amplicon, and the components of the master mix. This, along with millions of other ISPs in droplets of oil, will undergo cycles of PCR, with the primers on the ISP priming the specimen’s amplicon. These amplicons will replicate all over the ISP, and as a final step, NaOH will be added to separate the strands. The strands that are not anchored to the ISP by the universal primer will be lost, leaving each ISP single stranded and ready for priming in the sequencing step.

tempprep1

One thing to consider is the concentration of the combined libraries that are added at the beginning of the template preparation. If the concentration is too low, obviously not enough amplicons will be amplified on the ISPs, and the end result will be not enough data. Conversely, if the concentration is too high, there is a possibility of more than one sample amplicon ending up in the droplet of oil. In the end, more than one fragment gets amplified on the ISP. This ISP is called “polyclonal” and the data from it will get thrown out. Optimizing the concentration takes a few runs and the concentration can be different for each instrument in the lab.

Illumina MiSeq “Cluster Generation by Bridge Amplification”

Illumina’s method of template preparation is termed cluster generation by bridge amplification and actually takes place on the MiSeq a step before the sequencing step. The multiple copies are created in close proximity to each other, just as with clonal amplification, but instead of using a separate ISP for each specimen, a separate location on the flow cell is used. A flow cell is essentially a glass slide that has universal primer anchored all over it. This universal primers are, again, complimentary to the adapters added during the library preparation. The combined libraries are flowed over the slide at the beginning of the run and they anneal to the universal primer. The fragment then folds over and anneals to the second universal primer. This strand is then replicated. After replication, the strands are denatured creating two single strands. These then replicate again, thus producing a cluster of the same fragment in a localized area on the slide. This occurs for each specimen’s amplicons all over the slide. At the end of the cluster generation step, the reverses are all cleaved off leaving only the single stranded forwards ready for sequencing.

tempprep2 — (https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html)

tempprep3 — Top View of Flow Cell After Cluster Generation – Each color represents one amplicon of one specimen

Concentration is just as important in this setup as in the Ion Torrent setup. If the concentration is too high with this assay, the clusters generated will be too close together on the flow cell, thus the sequencing signal from each cluster will overlap. The data generated from these areas will not be able to be discerned so it will get thrown out.

Join me next quarter for the next installment – sequencing!

rapp_small

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.

Library Preparation – The First Step in a NGS Setup

Welcome back! Last quarter we discussed why Next Generation, or Massively Parallel, Sequencing is the next big thing in the world of Molecular Diagnostics. The sensitivity, the depth of coverage and the ability to interrogate many different areas of the genome at the same time were just a few of the benefits of these types of assays. Next, I would like to describe a couple different methods of library preparation, which is the first step necessary to run an NGS assay.

First of all, let’s define “Library.” I find this is the most common question technologists new to this technology ask. Essentially, a library is a specimen’s collection of amplicons produced by the assay that have been barcoded, tagged with appropriate platform adapters and purified. These will serve as the input for the next part of the NGS workflow, clonal amplification (the topic of next quarter’s blog!). How these libraries are prepared differ depending on platform (i.e, Ion Torrent vs. MiSeq), starting material (RNA vs. DNA), and type of assay (targeted amplicon vs. exome).

Before we begin the library prep discussion, a note about the input specimen. The DNA must be quantitated using a method that is more specific than spectrophotometry – it must be specific for double-stranded DNA. It will lead to an overestimation of the amount of DNA in the specimen, which will lead to over-dilution and consequently, lower quantity of final library. Real-time PCR and a double-stranded kit with fluorometry are two examples of assays that will give accurate concentrations of double-stranded DNA.

Our lab has begun using NGS for some of our oncology assays, so I will focus on the two types we perform currently, but keep in mind, there are many other types of assays and platforms.

Image 1: ion torrent amplicon library preparation. Source: Ion AmpliSeqTM Library Preparation User Guide – MAN0006735, Rev. 10 September 2012.

The assay we use for our Ion Torrent platform is a PCR amplicon based assay. The first step is to amplify up the 207 regions over 50 genes that contain hotspots areas for a number of different cancer types. This all occurs in one well for each specimen. Once those areas are amplified, the next step is to partially digest the primer sequences in order to prepare the ends of amplicons for the adapters necessary for the sequencing step. As shown in the figure above, two different combinations of adapters may be used. The top one, listed as the A adapter (red) and the P1 adapter (green), would be used if only one specimen was to be sequenced on the run. The A and P1 adapters provide universal priming sites so that every amplicon of every sample can be primed with the same primers, rather than having to use gene specific primers each time. The second possibility is listed below that, with the same P1 adapter (green) and a Barcode Adapter labeled X (red and blue) – it still contains the A adapter necessary for sequencing (red), but it also contains a short oligonucleotide sequence called a “barcode” (blue) that will be recognized during the analysis step based on the sequence. For example, Barcode 101’s sequence is CTAAGGTAAC – this will be assigned to specimen 1 in the run and all of the amplicons for that specimen will be tagged with this sequence. Specimen 2 will have the barcode 102 (TAAGGAGAAC) tag on all of its amplicons. During analysis, the barcodes will be identified and all of the reads with the 101 sequence will be binned together and all of the reads with the 102 sequence will be binned together. This allows many specimens to be run at the same time, thus increasing the efficiency of NGS even more. Lastly, the tagged amplicons are purified and normalized to the same concentration.

library2 — Image 2: MiSeq amplicon library preparation. Image source: https://www.illumina.com/content/dam/illumina-marketing/documents/applications/ngs-library-prep/for-all-you-seq-dna.pdf

The assay we use for our MiSeq platform is a hybridization followed by PCR amplicon based assay. The first step is to hybridize probes to 568 regions over 54 genes that contain hotspots for a number of different cancer types. This occurs in one well for each specimen. Once the probes have hybridized, the unbound probes are washed away using a size selection filter plate. Next, the area between the probes is extended and ligated so that each of the 568 amplicons are created. These are then amplified in a PCR step using primers that are complimentary to a universal priming site on the probes, but also contain adapters plus the two indices required for paired end sequencing (the Ion Torrent platform utilizes single-end sequencing – this will be discussed in the sequencing portion in an upcoming blog!). As in the previous method, after PCR, these tagged amplicons are purified and normalized to the same concentration in preparation for the next step – clonal amplification.

Stay tuned for next quarter’s post – clonal amplification!

rapp_small

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.

Massively Parallel – the Next Generation of Sequencing

Sounds like a good title for a sci-fi novel, right? What is the big deal about Next Generation Sequencing (NGS)? Otherwise known as massively parallel sequencing or high throughput sequencing, NGS has become a technique used by many molecular labs to interrogate multiple areas of the genome in a short amount of time with high confidence in the results. Throughout the next few blogs, we’ll discuss why NGS has become the next big thing in the world of molecular. We’ll go through the steps of setting up the specimens to prepare them to be sequenced (library preparation), what types of platforms are available and what technologies they use to sequence. Lastly, we’ll go through some of the challenges with this type of technology.

Let’s start with a review of dideoxy sequencing, otherwise known as Sanger sequencing, which has been the gold standard since its inception in 1977. A typical setup in our lab for this assay begins with a standard PCR to amplify a region of the genome that we are interested in, say PIK3CA exon 21, specifically amino acid 1047, a histidine (CAT). The setup would include primers complementary to an area around exon 21, a 10x buffer, MgCl_2,a deoxynucleotide mix (dNTP’s), and Taq polymerase. After amplification, the resulting products would be purified with exonuclease and shrimp alkaline phosphatase (SAP). Next, another PCR would be set up using the purified products as the sample and using a similar mix as in the original amp, but with the addition of a low concentration of fluorescently labeled dideoxynucleotides. These bases have no -OH group, so when they are incorporated into the product, amplification ceases on that strand. Because they are present in a lower concentration, the incorporation of these is random and will occur at each base in the strand eventually. The resulting products are then run and analyzed on a capillary electrophoresis instrument that will detect the fluorescent label on the dideoxynucleotides at the end of each fragment. Shown below is an example of the output of the data:

NGS1

The bases will be shown as peaks as they are read across the laser. The base in question in the middle of the picture is, in a “normal” sequence, an adenine (A), as seen in green. In this case, there is also a thymine (T) detected at that same location, as seen in red. This indicates that some of the DNA in this tumor sample has mutated from an A to a T at this location. This causes a change from a histidine amino acid to a leucine (p.His1047Leu) and is a common mutation in colorectal cancers.

So all of this looks great, right? Why do we need to have another method since we have been using this one for so long and it works so well? There are a few reasons:

The sensitivity of dideoxy sequencing is only about 20%. This means lower level mutations could be missed. The sensitivity of NGS can get down to 5% or even lower in some instances.
The above picture shows the sequencing in the forward direction as well as the reverse direction. This area then has 2x coverage – we can see the mutation in both reads. If we could get a higher coverage of this area and be able to sequence it multiple times and see that data, we could feel more confident that this mutation is real. In our lab, we require each area has 500x coverage so that we feel sure that we have not missed anything. The picture below displays the same sequenced area as in the dideoxy sequencing above. This a typical readout from an NGS assay and, as you can see, this base has a total of 4192 reads, so it has been sequenced over four thousand times. In 1195 of those reads, a T was detected, not an A. We can feel very confident in these results due to how many times the area was covered.
The steps above detailed only amplifying this one area, but with colorectal cancer specimens, we want to know the status of the KRAS, BRAF, NRAS, and HRAS genes as well as other exons in PIK3CA Using the dideoxy sequencing method is a lot of time and effort. NGS can cover these areas in these five genes as well as multiple other areas (our assay looks at 207 areas total) all in the same workflow

NGS2

Join me for the next installment to discover the first steps in NGS workflow!

rapp_small

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.

Association for Molecular Pathology – A Bunch of Party Loving Pathologists…

I was privileged to attend this year’s Association for Molecular Pathology (AMP) meeting in Charlotte, North Carolina, in the beginning of November. I really enjoy this meeting – it is relevant to everything our lab does with sessions offered in topics of Hematopathology, Infectious Diseases, Solid Tumors, Inherited Diseases, and just recently added, Bioinformatics.

It is exciting to meet and discuss with others in this field, especially other laboratory technologists. AMP has done a wonderful job of including those of us who perform the bench work, offering discounted memberships, as well as learning opportunities on their website, and even an award especially for technologists’ exemplary posters/abstracts presented at the annual meeting.

This year’s meeting offered the previously mentioned topics, but an emerging trend was evident – testing cell-free DNA (cfDNA) obtained from sources other than tissue biopsies, such as plasma or urine. This quarter’s post will deal with the reason behind this and the technology for testing such specimens, specifically plasma.

Cell-free DNA has become an attractive source for tumor testing recently. This source can be tested when a tissue biopsy is just not possible, such as when a patient has progressed to the point that surgery is not recommended. Here is the biology behind why this can work as a source of tumor DNA:

1-17-fig1

Figure 1. http://www.intechopen.com/books/methylation-from-dna-rna-and-histones-to-diseases-and-treatment/circulating-methylated-dna-as-biomarkers-for-cancer-detection

The sources of DNA in a sample of whole blood (as shown in Figure 1) are:

white blood cells
degraded white blood cells (cfDNA)
degraded tumor cells (cfDNA)
circulating tumor cells (CTCs).

Because of the biology of tumor cells, they have higher turnover than other cells in the body. Due to this, a larger fraction of the cfDNA in the plasma is from tumor cells. We can take advantage of this with a so called “liquid biopsy” – with 10 cc’s of whole blood, we can attempt to capture about 10ng of cfDNA and test this for possible resistance mutations to the therapies the patient may be on.

Many of the posters and several of the sessions at the AMP meeting dealt with cfDNA. Several pre-analytical steps were stressed in order to have success with this type of specimen.

The whole blood needs to be collected, as any other blood specimen should, with care taken to not lyse any of the cells during collection.
The collection tube type varies depending on how much time it will take to centrifuge the specimen to obtain the plasma. If it can be spun within two to four hours, a simple EDTA tube is sufficient. If it cannot be spun within a short time, then another tube with special preservatives is required. A Streck tube has been the tube of choice in these situations, but others are becoming available on the market as the demand increases. These specific tubes offer a greater amount of time to capture the cfDNA without white blood cell lysis becoming an issue. This is important, because as the white blood cells lyse, the plasma is flooded with the patient’s normal cfDNA that will dilute out the tumor cfDNA fraction, making it even more difficult to detect.
Centrifugation procedures must be altered. The brake should not be applied when stopping the centrifuge because braking can cause the white blood cells to be sheared, which will, again, flood the plasma sample with normal cfDNA. An initial spin should be performed to obtain the plasma, then an additional spin should be performed before extraction of the DNA.

There are multiple kits available on the market for extraction of cfDNA. Once the DNA is extracted it is suggested to measure the DNA fraction with a method that will display the size of the fragments, such as with a Bioanalyzer. Cell-free DNA is about 160-170bp in size and, with the readout from an instrument such as the Bioanalyzer, one can see the size of the DNA, quantitate it, as well as observe any contamination from genomic DNA (shown by a peak >>170bp in size).

Many types of testing are being performed on this cfDNA fraction such as real time PCR, digital droplet PCR, and next generation sequencing. Whichever platform is used, a validation must be performed to ensure a fairly low level of detection (as low as 0.1% or 0.01%) because, many times, the positive tumor cfDNA allele fraction will be very low due to the normal cfDNA in the plasma.

This method of testing non-invasive specimens from patients is an amazing way to help save possibly very sick people from having to undergo a risky surgery. This is yet another use of a new technique in the ever changing world of Molecular Diagnostics!

rapp_small

-Sharleen Rapp, BS, MB (ASCP)^CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.