Next Generation Sequencing: Types of Variants

We have reviewed from start to finish the next generation sequencing wet bench process, data review and troubleshooting.  I’d like to take a more in-depth look at the types of variants that can be detected by the targeted amplicon NGS panels that our lab performs:  single nucleotide variants, multi-allelic variants, multi-nucleotide variants, insertions (including duplications), deletions and complex indels.  In our lab, we review every significant variant and variant of unknown significance in IGV to confirm the call is made correctly in the variant caller due to the difficult nature of some of these variants.  I have included screenshots of the IGV windows of each of these types of variants, to show what we see when we review.

Single Nucleotide Variants (SNV)

The most common (and straight forward) type of variant is a single nucleotide variant – one base pair is changed to another, such as KRAS c.35G>A, p.G12D (shown below in reverse):

Multi-allelic Variants

A multi-allelic variant has more than one change as a single base pair (see below – NRAS c.35G>A, p.G12D, and c.35G>C, p.G12A – shown below in reverse).  This may be the rarest type of variant – in our lab, we have maybe seen this type in only a handful of cases over the last four years.  This could be an indication of several clones, or different variants occurring over a period of time. 

Multi-nucleotide Variants (MNV)

Multi-nucleotide variants are variants that include more than one nucleotide at a time and are adjacent.  A common example is BRAF p.V600K (see below – in reverse) that can occur in melanoma.  Two adjacent nucleotides are changed in the same allele.  These variants demonstrate one advantage NGS has over dideoxy (Sanger) sequencing.  In dideoxy sequencing, we can see the two base pair change, but we cannot be certain they are occurring on the same allele.  This is an important distinction because if they occurred on the same allele, they probably occurred at the same time, whereas, if they are on different alleles, they were probably two separate events.  It is important to know for nomenclature as well – if they are on the same allele, it is listed as one event, as shown below (c.1798_1799delGTinsAA, p.V600K) as opposed to two separate mutations (c.1798G>A, p.V600M and c.1799T>A, p.V600E).  As you can see in the IGV window below, both happen on one strand.

Insertions/Duplications

Insertions are an addition of nucleotides to the original sequence.  Duplications are a specific type of insertion where a region of the gene is copied and inserted right after the original copy.  These can be in-frame or frameshift.  If they are a replicate of three base pairs, the insertion will move the original sequence down, but the amino acids downstream will not be affected, so the frame stays the same.   If they are not a replicate of three base pairs, the frame will be changed, causing all of the downstream amino acids to be changed, so it causes a frameshift.   A common example of a frameshift insertion is the 4bp insertion in NPM1 (c.863_864insCTTG, p.W288fs) that occurs in AML.  In IGV, these are displayed by a purple hash that will show the sequence when you hover over it.

Deletions

Deletions, on the other hand, are when base pairs are deleted from the sequence.  These can be in-frame or frameshift, as well.   An example is the 52bp deletion (c.1099_1150del, p. L367fs) found in the CALR gene in cases of primary myelofibrosis or essential thrombocythemia.

Complex Indels

Lastly, NGS can detect complex indels.  These, again, are a type of variant that we could not distinguish for sure using dideoxy sequencing.  We would be able to detect the changes, but not whether or not they were occurring on the same strand, indicating the changes occurred at the same time.  The first example is a deletion followed by a single nucleotide change – since these both occur on the same strand, they most likely occurred together, so they are called one complex deletion/insertion event (KIT c. 1253_1256delACGAinsC, p. Y418_D419delinsS).  First the ACGA was deleted, then a C was inserted. 

The last example involves multiple nucleotides changes all in the same vicinity (IGV is in reverse for this specimen as well).  Using HGVS nomenclature as in all the previous examples, this would be named RUNX1 c.327_332delCAAGACinsTGGGGT, p.K110_T111delinsGV.

rapp_small

-Sharleen Rapp, BS, MB (ASCP)CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine. 

Genetic Results: Set in Stone or Written in Sand?

This month, I’m switching gears to another interest of mine: Molecular Pathology. I am currently in fellowship for Molecular Genetic Pathology which exposes me to unique, thought-provoking cases.  

Advances in genomic sequencing has allowed multiple genes to be analyzed in a single laboratory test. These so-called gene panels have increased diagnostic yield when compared to serial gene sequencing in syndromic and non-syndromic diseases with multiple genetic etiologies. However, interpretation of genetic information is complicated and evolving. This has led to wide variation in how results are reported. A genetic test result can either be positive (pathogenic or likely pathogenic), negative (benign or likely benign) or uncertain (variant of uncertain significance- VUS). A VUS may just be part of what makes each individual unique and doesn’t have enough evidence present to say that it is pathogenic or benign. Many results come back like this and can be frustrating for patients to hear and for genetic counselors and clinicians to explain.

Initial approaches to exclude benign variants through sequencing 100 “normal people” to determine the frequency of common variants in the population was fraught with bias. The “normal population” initially was constructed mostly of individuals with white European descent. Not surprisingly, lack of genetic diversity in control populations lead to errors in interpretation.

Fortunately, there are now several publicly available databases that exist to help determine whether gene variants are damaging. The first important piece comes from population sequencing efforts. These projects performed whole exome sequencing of hundreds or thousands of individuals to find variants that might be rarely expressed in a more genetically diverse population. If a variant occurs in a normal health population at a frequency >1%, then it likely doesn’t cause a severe congenital disease that would in turn prevent that genetic variant from being passed on.

The Exome Association Consortium (ExAC)1, which has been rolled into the larger gnomAD (genome aggregation database) database now contains sequencing information on 120,000 individuals (Figure 1). The smaller ESP (Exome Sequencing Project) was a project by the NHLBI division of NIH and sequenced several patients with different cardiovascular and pulmonary diseases.

Figure 1. Number and percent of various ethnicities present in 4 major population sequencing projects.

While there is ethnic diversity present in this database, the 1000 genomes project2 furthered efforts by searching all over the world to get genetic information from around 100 ethnically and geographically distinct sub-populations (Figure 2).

Figure 2. Geographic map of populations sequenced by the 1000 Genomes Project.

With use of these databases, we can effectively rule out rare polymorphisms as benign when they are expressed in several healthy individuals and especially when expressed in the homozygous state in a healthy individual. Before, it was common for a person of an ethnic minority to have different variants compared to predominantly European cohorts. In many cases, this led to uncertain test results.

One way to deal with these VUSs is for a lab to periodically review their test results in light of new knowledge. Although the CAP has a checklist3 item that requires a lab to have a policy about reassessing variants and actions taken. However, this item doesn’t require a lab to communicate the results with a physician and doesn’t specify how often to reanalyze variants. Before last year, there weren’t even any studies that indicated how often variant reanalysis should occur. Variant reanalysis had only been studied in a limited context of whole exome sequencing for rare diseases to improve the diagnostic yield4. However, this did not address the issue of frequent VUSs to determine how often they were downgraded to benign or upgraded to pathogenic.

One example of how reclassification can occur is illustrated in the case of a young African American boy who had epilepsy and received a genomic test that covered a panel of genes known to be involved in epilepsy in 2014. Two heterozygous VUS were reported back for EFHC1 (EFHC1 c.229C>A p. P77T and EFHC1 c.662G>A p. R221H), which causes an autosomal dominant epilepsy syndrome when one allele is damaged. However, this variant could later be reclassified as benign by looking at population databases. The ExAC database showed an allele frequency of 2.5% in African Americans and the 1000 Genomes database showed an 8.8% frequency in the GWD subpopulation (Gambian Western Divisions).

This case demonstrates the importance of reanalyzing genetic test results as medical knowledge continues to evolve. Recently studies looking at reclassification rates of epilepsy5 and inherited cancer syndromes6 have been published in JAMA journals and demonstrate that reclassification of variants is common. It is thus important for laboratories to periodically review previously reported variants to provide optimal quality results and patient care. I will elaborate on this further in the next blog post.

References:

  1. Lek M, Karczewski KJ, Minikel EV, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285-291.
  2. The 1000 Genomes Project Consortium, Auton A, Brooks LD, et al. A global reference for human genetic variation. Nature. 2015;526:68-74.
  3. Sequence Variants – Interpretation and Reporting, MOL.36155. 2015 College of American Pathologists (CAP) Laboratory Accreditation Program Checklist.
  4. Costain G, Jobling R, Walker S. Periodic reanalysis of whole-genome sequencing data enhances the diagnostic advantage over standard clinical genetic testing. Eur J Hu Gen. 2018.
  5. SoRelle JA, Thodeson DM, Arnold S, Gotway G, Park JY. Clinical Utility of Reinterpreting Previously Reported Genomic Epilepsy Test Results for Pediatric Patients. JAMA Pediatr. 2018 Nov 5:e182302.  
  6. Mersch J, Brown N, Pirzadeh-Miller, Mundt E, Cox HC, Brown K, Aston M, Esterling L, Manley S, Ross T. Prevalence of Variant Reclassification Following Hereditary Cancer Genetic Testing. JAMA. 2018 Sep 25;320(12):1266-1274.

-Jeff SoRelle, MD is a Molecular Genetic Pathology fellow at the University of Texas Southwestern Medical Center in Dallas, TX. His clinical research interests include understanding how the lab intersects with transgender healthcare and advancing quality in molecular diagnostics.

This work was produced with the guidance and support of:

Dr. Jason Park, MD, PhD, Associate Professor of Pathology, UT Southwestern Medical Center

Dr. Drew Thodeson, MD, Child Neurologist and Pediatric Epileptologist

Evaluating and Analyzing Next Generation Sequencing Specimen Results

Welcome back – in my previous blog we discussed how a run is evaluated on the Ion Torrent instrument. This quarter’s blog will review the individual specimen results from that run.

First off, we take a look at how many reads per specimen have been sequenced and how those reads performed over the areas that are targeted. For the AmpliSeq Cancer Hotspot Panel v2 that we run, there are a total of 207 amplicons that are created and sequenced. To assess the depth of coverage over these amplicons, we need to think about the biology of the tumor cells and the limit of detection of the assay. We feel confident that we can detect 5% variant allele frequency for single nucleotide changes, and 10% variant allele frequency for insertions or deletions. In order to be confident that we are not missing variants, we require the specimen has a tumor percentage greater than 20%. This is because, for a given tumor, it can be assumed that if it is mutated, it will be only heterozygous – only one of the two alleles will have the variant. This automatically halves the possible allele frequencies from any given tissue. If a colon specimen that we are given to test has a tumor percentage of 40%, it can be assumed that any variant will have a variant allele frequency of no more than 20%. Because of this then, we also require the amplicons that are sequenced to have at least 500x coverage – they need to be sequenced at least 500 times so that if we have a 5% mutation, we will see it in 25 of the reads and we can feel confident this is an actual change, as opposed to background noise.

Next, we look at the On Target percentage and Uniformity percentage (over 95% for each is expected). The On Target value tells us what fraction of the amplicons actually cover the 207 amplicons that are in the panel. Uniformity informs us of how even the number of reads is over all the 207 amplicons – were they all equally represented or were there a subset of these that had more coverage than the others? This information can actually lead us to further testing – if there is a subset of amplicons that have more coverage than the rest, and it they are all from one gene, this may indicate gene amplification. In these cases, the clinician is alerted and additional testing can confirm the amplification.

All of this coverage information is provided by one of the “plugins” we run after the basecalling and alignment are finished:

The most useful (and interesting!) information is gathered from the variant calling plugin. This plugin compares the specimen sequences with the reference sequences and reports the differences – the “variants”. Many of the variants that are detected are single nucleotide polymorphisms (variants that are detected in greater than 1% of the population). They could also be known artifacts of the sequencing itself. These are all analyzed and categorized in the validation of the assay and then can be filtered out when analyzing clinical data. After filtering out the known SNPs and artifacts, the somatic changes can then be evaluated. Generally, the panel will detect 15-20 variants, but after filtering only 1-4 variants will be somatic changes. Each change that is detected is reviewed using a program called IGV, shown below. We compare the sequence to confirm that what the plugin is reporting looks correct in the actual reads from the sequencer. See screenshots below of a subset of variants called, then filtered, and analyzed in IGV. While the plugin is exceptionally good at variant calling, no program is perfect and visualizing the data is still necessary to confirm there is not anything else going on in the area that is sequenced. The fastq file from the run is also run through a secondary software to compare results. The variants for each specimen are assessed for variant allele frequency, coverage and quality in both software.

VariantCaller Output

Filtered Calls: White cells means SNP, Blue cells mean possible somatic call

IGV Output for KRAS and STK11 calls:


Lastly, the results are brought into yet another software to be reported. This software will allow the pathologists to assign significance to the variants. It will also pull in any treatment information linked to the variants and then allow the pathologist to pick any applicable clinical trials in order to assist the clinician as much as possible. In future blogs we will take a look at cases like this to see interesting findings of oncology cases.

rapp_small

-Sharleen Rapp, BS, MB (ASCP)CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine. 

Data Analysis for NGS by Ion Torrent – Part One – How Did the Run Perform?

Here comes the fun part.  It’s taken a day for library prep, an overnight run for the clonal amplification; the next day includes loading the chip with the ISPs and then running the chip on the sequencer.  After the chip has run on the sequencer, the data is pushed from the sequencer (the PGM) to the server connected to the sequencer.  This aspect of NGS surprised me – the size of the files is amazing – for one 316 chip, the file that includes all of the raw data averages about 100GB.  To deal with this amount of data, the server attached to the sequencer is 12TB, and even still we have to have a procedure to deal with removing files off that sequencer to keep space for future runs.

Anyway, the raw data is pushed to the server and the data analysis begins.  The Torrent Suite Software first analyzes the ISP info, as shown in the graphic below.  It gives a “heat map” of the chip (the football shape) in which red means the wells in those areas were full with ISPs.  Yellow means there are fewer ISPs and blue means there are none.  So, you can see below, there is a small area of blue within the football shape – this area did not have any ISPs in it.  92% of the wells on this chip were filled, however, which is about the max a chip can be loaded.

dataana1

These ISPs are then broken down into categories.  First, how many of the wells had ISPs in them – here, 92.5% of the 6,337,389 wells contained ISPs.  Of those ISPs, 99.8% of them have product on them that can be sequenced (Live ISPs).  Of those Live ISPs, 0.4% of them contain control Test Fragments and 99.6% of them contain actual patient sample library amplicons.  The Test Fragments are spiked in prior to sequencing and act as a control to evaluate how the sequencing run performed.  Lastly, the ISPs that contain patient sample library amplicons are analyzed.  Those ISPs that contain more than one amplicon (say it has an amplicon of EGFR Exon 19 and another specimen’s amplicon of KRAS Exon 2) give mixed signals and cannot be analyzed, so they are thrown out of the data analysis and into a bin called “polyclonal”.  Low quality ISPs are also thrown out – anything that did not pass the thresholds for quality.  And lastly, ISPs that only contain adapter dimers are thrown out.  For a run of AmpliSeq Cancer Hotspot Panel v2 specimens, most of which come from FFPE specimens that are low quality to start with, a run that contains over 50% Final Library ISPs is actually a very good run, interestingly enough.  The 316v2 chips are rated to sequence 1 million reads (each ISP yields one read), and on this example run, over 3 million reads were sequenced, so this is a successful run.

After the ISPs are analyzed and the high quality ones are kept, the analysis goes on.  The Torrent Suite software then calls the bases based on the raw flow data.  These bases are then aligned to a reference, in our case hg19, a commonly used human genome reference.  Quality scores are assigned at this point.  A Phred-based quality score is used for NGS, shown in the table below.

dataana2.png

dataana3

dataana4

Lastly, the reads are put into bins based on the barcode that was used for each patient specimen – remember the small part of the adapter that was added in library prep so that the specimens could be mixed together?  The software reads that adapter sequence then assigns each read based on those sequences.  The specimens should all have approximately the same number of reads since they were normalized to the same concentration at the end of library prep, but there may be some variability due to specimen quality, as you can see below.

dataana5

In next quarter’s post, we will dive into the individual specimen results!

 

rapp_small

-Sharleen Rapp, BS, MB (ASCP)CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine. 

Next Generation Sequencing – Ion Torrent Semiconductor Sequencing

We’ve finally made it to the sequencing step of the NGS workflow. This post we will discuss the technology and process behind the Ion Torrent sequencing step. Next time, we will review the Illumina sequencing process.

When we left off, the final product of the clonal amplification had been prepared – Ion Sphere Particles (ISPs) covered in single stranded amplicons (hopefully all of the same amplicon). Next, control Ion Sphere Particles are added to the mix, along with sequencing primer, which is complimentary to one of the adapter sequences added back in library preparation. The primer is annealed to each of the amplicons on every ISP. This mixture of control ISPs and specimen ISPs is then loaded onto the chip. The size of the chip is determined by the number of bases needing to be sequenced. There are three different types of chips for the Personal Genome Machine (PGM) – 314, 316, 318 – and five different types for their GeneStudio S5 system (510, 520, 530, 540, 550), offering enough coverage for a single sample of a hotspot panel, all the way up to enough coverage for a specimen of exome sequencing. Each of the chips contains a top layer covered in tiny wells. Each well is just large enough to fit a single ISP. The ISP solution is loaded onto the chip, then flowed over it by centrifuging it in different directions, in order to attempt to get as many ISPs into wells as possible. The chip is then ready for sequencing.

Each well of the chip can be thought as of the smallest pH meter in the world. So before sequencing can be started, the instrument must be prepped (initialized) so that all of the reagents added to the chip are in the correct pH range. On the PGM, this takes approximately an hour and requires some hands-on steps and high quality 18MΩ water. On the GeneStudio S5, the reagents are added and the initialization is begun and, as long as everything works correctly, doesn’t require any other hands on time.

After the initialization is complete, the chip is loaded onto the instrument. The sequencing run is started and runs according to the plan prepared before the run. Thermo Fisher’s Ion Torrent uses semiconductor sequencing technology. Nucleotides are flowed over the chip one at a time. If the nucleotide is incorporated, a hydrogen ion is released. This release of hydrogen decreases the pH of the liquid surrounding the ISP. This pH change is then detected by the sensing layer beneath the well, where it is converted to a voltage change and is picked up by the software and recorded as that nucleotide. Let’s say two nucleotides in a row are incorporated (two G’s complementary to two C’s) – double the hydrogen is released, which results in double the signal, so the software will record two G’s in a row. The benefit of this type of technology is that it is fast – it only takes 15 seconds for each nucleotide flow, so a 200bp fragment can be sequenced in less than 3 hours.

ion-torrent
Image courtesy of http://www.genomics.cn/en/

 

 

rapp_small

-Sharleen Rapp, BS, MB (ASCP)CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine. 

Template Preparation, Clonal Amplification, and Cluster Generation – Oh My! – Step Two in an NGS Setup

Hello again – let’s continue our discussion of Next Generation, or Massively Parallel, Sequencing and how it is performed.  Over the last two blogs we have seen why NGS is being used in a Molecular Diagnostics Lab and how library preparation is executed.  Specifically, we reviewed how Ion Torrent and MiSeq libraries may be prepared for DNA amplicon sequencing.  The final product of this work is a collection of amplicons that have been amplified, barcoded, tagged with the appropriate platform adapters and purified.  These are what compose a specimen’s “library.”

The next step in NGS preparation is template preparation.  The main goal of this step is to create multiple copies of the same amplicon in close proximity so that when it is sequenced, it creates a strong enough signal to be detected.  This occurs for each amplicon in the specimen’s library.  Again, this technique is platform specific, so each has a different way to achieve this goal.

Ion Torrent “Template Preparation by Emulsion PCR” or “Clonal Amplification”

In the Ion Torrent method of template preparation, the multiple copies are created on an Ion Sphere Particle or ISP.  This looks like a bead with primers all over the surface of it.  Eventually this ISP will be deposited in a well on a chip and be sequenced.  In order for this ISP to create enough of a signal to be detected by the instrument, it must have many copies of the fragment all over the surface of the ISP.

At the beginning of the clonal amplification step, a specific concentration of combined libraries is added to the instrument, along with all the components of a standard PCR (buffer, dNTPs, polymerase) with the addition of the Ion Sphere Particles, which provide the primer, and oil.  The primers on the ISP are complementary to one of the adapters added during library preparation so that only the universal primer is necessary on the ISPs, instead of each individual gene-specific primer.  Through a series of steps, ideally, what is produced is a droplet of oil containing one ISP, one sample’s amplicon, and the components of the master mix.  This, along with millions of other ISPs in droplets of oil, will undergo cycles of PCR, with the primers on the ISP priming the specimen’s amplicon.  These amplicons will replicate all over the ISP, and as a final step, NaOH will be added to separate the strands.  The strands that are not anchored to the ISP by the universal primer will be lost, leaving each ISP single stranded and ready for priming in the sequencing step.

tempprep1

 

One thing to consider is the concentration of the combined libraries that are added at the beginning of the template preparation.  If the concentration is too low, obviously not enough amplicons will be amplified on the ISPs, and the end result will be not enough data.  Conversely, if the concentration is too high, there is a possibility of more than one sample amplicon ending up in the droplet of oil.  In the end, more than one fragment gets amplified on the ISP.  This ISP is called “polyclonal” and the data from it will get thrown out.  Optimizing the concentration takes a few runs and the concentration can be different for each instrument in the lab.

Illumina MiSeq “Cluster Generation by Bridge Amplification”

Illumina’s method of template preparation is termed cluster generation by bridge amplification and actually takes place on the MiSeq a step before the sequencing step.  The multiple copies are created in close proximity to each other, just as with clonal amplification, but instead of using a separate ISP for each specimen, a separate location on the flow cell is used.  A flow cell is essentially a glass slide that has universal primer anchored all over it.   This universal primers are, again, complimentary to the adapters added during the library preparation.  The combined libraries are flowed over the slide at the beginning of the run and they anneal to the universal primer.  The fragment then folds over and anneals to the second universal primer.  This strand is then replicated.  After replication, the strands are denatured creating two single strands.  These then replicate again, thus producing a cluster of the same fragment in a localized area on the slide.  This occurs for each specimen’s amplicons all over the slide.  At the end of the cluster generation step, the reverses are all cleaved off leaving only the single stranded forwards ready for sequencing.

tempprep2
(https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html)

tempprep3
Top View of Flow Cell After Cluster Generation – Each color represents one amplicon of one specimen

Concentration is just as important in this setup as in the Ion Torrent setup.  If the concentration is too high with this assay, the clusters generated will be too close together on the flow cell, thus the sequencing signal from each cluster will overlap.  The data generated from these areas will not be able to be discerned so it will get thrown out.

Join me next quarter for the next installment – sequencing!

 

rapp_small

-Sharleen Rapp, BS, MB (ASCP)CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine. 

Library Preparation – The First Step in a NGS Setup

Welcome back! Last quarter we discussed why Next Generation, or Massively Parallel, Sequencing is the next big thing in the world of Molecular Diagnostics. The sensitivity, the depth of coverage and the ability to interrogate many different areas of the genome at the same time were just a few of the benefits of these types of assays. Next, I would like to describe a couple different methods of library preparation, which is the first step necessary to run an NGS assay.

First of all, let’s define “Library.” I find this is the most common question technologists new to this technology ask. Essentially, a library is a specimen’s collection of amplicons produced by the assay that have been barcoded, tagged with appropriate platform adapters and purified. These will serve as the input for the next part of the NGS workflow, clonal amplification (the topic of next quarter’s blog!).  How these libraries are prepared differ depending on platform (i.e, Ion Torrent vs. MiSeq), starting material (RNA vs. DNA), and type of assay (targeted amplicon vs. exome).

Before we begin the library prep discussion, a note about the input specimen. The DNA must be quantitated using a method that is more specific than spectrophotometry – it must be specific for double-stranded DNA. It will lead to an overestimation of the amount of DNA in the specimen, which will lead to over-dilution and consequently, lower quantity of final library. Real-time PCR and a double-stranded kit with fluorometry are two examples of assays that will give accurate concentrations of double-stranded DNA.

Our lab has begun using NGS for some of our oncology assays, so I will focus on the two types we perform currently, but keep in mind, there are many other types of assays and platforms.

library1.png
Image 1: ion torrent amplicon library preparation. Source: Ion AmpliSeqTM Library Preparation User Guide – MAN0006735, Rev. 10 September 2012.

The assay we use for our Ion Torrent platform is a PCR amplicon based assay. The first step is to amplify up the 207 regions over 50 genes that contain hotspots areas for a number of different cancer types. This all occurs in one well for each specimen. Once those areas are amplified, the next step is to partially digest the primer sequences in order to prepare the ends of amplicons for the adapters necessary for the sequencing step. As shown in the figure above, two different combinations of adapters may be used. The top one, listed as the A adapter (red) and the P1 adapter (green), would be used if only one specimen was to be sequenced on the run. The A and P1 adapters provide universal priming sites so that every amplicon of every sample can be primed with the same primers, rather than having to use gene specific primers each time. The second possibility is listed below that, with the same P1 adapter (green) and a Barcode Adapter labeled X (red and blue) – it still contains the A adapter necessary for sequencing (red), but it also contains a short oligonucleotide sequence called a “barcode” (blue) that will be recognized during the analysis step based on the sequence. For example, Barcode 101’s sequence is CTAAGGTAAC – this will be assigned to specimen 1 in the run and all of the amplicons for that specimen will be tagged with this sequence. Specimen 2 will have the barcode 102 (TAAGGAGAAC) tag on all of its amplicons. During analysis, the barcodes will be identified and all of the reads with the 101 sequence will be binned together and all of the reads with the 102 sequence will be binned together. This allows many specimens to be run at the same time, thus increasing the efficiency of NGS even more. Lastly, the tagged amplicons are purified and normalized to the same concentration.

library2
Image 2: MiSeq amplicon library preparation. Image source: https://www.illumina.com/content/dam/illumina-marketing/documents/applications/ngs-library-prep/for-all-you-seq-dna.pdf

The assay we use for our MiSeq platform is a hybridization followed by PCR amplicon based assay. The first step is to hybridize probes to 568 regions over 54 genes that contain hotspots for a number of different cancer types. This occurs in one well for each specimen. Once the probes have hybridized, the unbound probes are washed away using a size selection filter plate. Next, the area between the probes is extended and ligated so that each of the 568 amplicons are created. These are then amplified in a PCR step using primers that are complimentary to a universal priming site on the probes, but also contain adapters plus the two indices required for paired end sequencing (the Ion Torrent platform utilizes single-end sequencing – this will be discussed in the sequencing portion in an upcoming blog!). As in the previous method, after PCR, these tagged amplicons are purified and normalized to the same concentration in preparation for the next step – clonal amplification.

Stay tuned for next quarter’s post – clonal amplification!

 

rapp_small

-Sharleen Rapp, BS, MB (ASCP)CM is a Molecular Diagnostics Coordinator in the Molecular Diagnostics Laboratory at Nebraska Medicine.