ProfessionalResearchOur scientific researchCODEC enables ‘single duplex’ sequencing

    CODEC enables ‘single duplex’ sequencing

    In 2022 Cryos started an ongoing research collaboration with a group of researchers from University Grossman School of Medicine, New York, and University Hospitals Cleveland Medical Center Ohio, in validating the method called Concatenating Original Duplex for Error Correction (CODEC) that can detect unique mutations of low abundance, which are usually lost in the noise of other sequencing techniques.


      Scientists have developed the gene sequencing method which combines Next Generation Sequencing (NGS) and long-read sequencing technology. Unlike other sequencing techniques, CODEC creates connected copies of the opposing DNA strands, allowing for simultaneous copying of each, which increases the accuracy of the analysis and reduces the number of amplifications necessary.

      Read the full study

      Addressing the issues of genome sequencing

      Over the past five decades, genome sequencing has been a field of constant development. Scientists have come up with various sequencing techniques, including Next Generation Sequencing (NGS) and Duplex Sequencing, to optimize the accuracy and accessibility of gene sequencing. However, both methods have their own issues. Duplex Sequencing, which tags corresponding DNA strands (the ‘single duplex’), provides accurate reads but is expensive. On the other hand, NGS becomes inaccurate when investigating sequences that are too long. This means that Whole Genome Sequencing (WGS) using NGS has many mistakes known as noise. To accommodate this problem, the genome is often split into smaller pieces, heavily increasing the workload during analysis.

      DNA samples in test tubes.

      Another risk of sequencing is the occurrence of errors during transcription. When nucleotide substitutions are found only on one strand of DNA, it can be hard to determine whether the damage occurred before the analysis or if it’s an artifact of the polymerase chain-reaction (PCR) analysis. If the general error rate is high, it can become impossible to distinguish at all. NGS struggles to accurately detect guanine and cytosine-rich sequences. A high number of strand amplifications is needed to have enough material for an accurate comparison, and mutations, DNA damage, and PCR artifacts are distinguished by sorting out complementary strands and comparing the rates at which each abnormality occurs. This can increase both the cost of the technique and the time consumed.

      To address these issues, a group of researchers from University Grossman School of Medicine, New York, and University Hospitals Cleveland Medical Center, Ohio, collaborated with Cryos to test CODEC. Rather than making separate copies of each strand, CODEC creates a single molecule of the whole duplex that can be transcribed. Each amplification provides new molecules that consist of corresponding strands that remain attached, and there is no sorting of DNA pairs, nor any question of which strands were read together. This both diminishes the number of amplifications necessary and increases the accuracy of the analysis.

      Methodology

      To test the claims of CODEC, it was compared to various other WGS methods by using them to analyze the same DNA samples. These DNA samples were derived from healthy somatic cells, breast cancer cells, sperm cells, and liquid biopsy samples with increasing microsatellite instability (MSI). The results showed that CODEC has a similar error rate as Duplex Sequencing, the most accurate of other techniques, but the cost per base of DNA was approximately 100 times lower. Furthermore, CODEC requires 230 times fewer read pairs compared to Duplex Sequencing for accurate sampling. A similar decrease of transcriptions needed occurred throughout the comparisons, with a decrease in reads necessary for the accurate detection of mutations of at least 100 times.

      Results

      CODEC managed to detect unique mutations of low abundance which are usually lost in the NGS noise. CODEC reached a similarity of 90-98% when compared with Mutect2, the most accurate measure available to detect cancer. CODEC also detected samples with low levels of MSI at a decrease of 290 times in error rates compared to standard NGS. These results suggest that CODEC is a highly accurate and cost-effective technique for WGS.
      However, it is important to note that CODEC is not a perfect tool, and inaccuracy still exists. There is still room for improvement, but at present, CODEC is the technique of the highest cost value for WGS.
      The findings of this study provide a new direction for the development of gene sequencing techniques, as CODEC is highly effective and offers a cost-effective and time-efficient alternative to traditional sequencing methods. The ability to accurately analyze genetic material has numerous implications across multiple industries, including healthcare and environmental science.