Home >Technology peripherals >AI >New Transformer-based method accurately predicts DNA methylation from nanopore sequencing

New Transformer-based method accurately predicts DNA methylation from nanopore sequencing

王林
王林Original
2024-07-19 14:55:29671browse

New Transformer-based method accurately predicts DNA methylation from nanopore sequencing

Editor | Radish peel

DNA methylation plays an important role in various biological processes, including cell differentiation, aging and cancer development. The most important methylation in mammals is 5-methylcytosine, which occurs primarily in the context of CpG dinucleotides. Sequencing methods such as

Whole-genome bisulfite sequencing can successfully detect 5-methylcytosine DNA modifications. However, they suffer from the serious drawback of short read lengths, which may introduce amplification bias.

Researchers at Singapore A*STAR have developed a deep learning algorithm Rockfish that significantly improves read-level 5-methylcytosine by using Oxford Nanopore Sequencing (ONT) Pyrimidine detection capability.

The study was titled "Rockfish: A transformer-based model for accurate 5-methylcytosine prediction from nanopore sequencing" and was published in "Nature Communications" on July 3, 2024.

New Transformer-based method accurately predicts DNA methylation from nanopore sequencing

Given the need for a highly accurate read-level prediction method, researchers set out to develop a new, state-of-the-art deep learning method using modern architecture Transformers. Their method, Rockfish, relies on raw nanopore signal, nucleobase sequence, and alignment information to detect 5mC modifications.

New Transformer-based method accurately predicts DNA methylation from nanopore sequencing

Illustration: Rockfish architecture overview. (Source: Paper)

The researchers trained the model using high-quality human and mouse datasets and tested it on multiple R9.4.1 and R10.4.1 datasets, including:

  1. In-house sequenced R9.4.1 H1 embryonic stem cell (H1ESc) native dataset
  2. R9.4.1 and R10.4.1 neonatal mouse (C57BL/6 neonatal) data
  3. Some publicly available human cancer and blood datasets

Given that both R9.4.1 and R10.4.1 NA12878 as well as neonatal mouse datasets were used for evaluation, the researchers pointed out the well versions to distinguish them. The remaining data sets were sequenced using only the R9.4.1 well version.

Extensive evaluation of Rockfish models and comparisons with the following tools:

  • Megalodon Remora, Megalodon Rerio and Nanopolish for R9.4.1 dataset
  • Remora for R10.4.1 dataset

The comparison includes:

  1. Read-level prediction
  2. Site-level prediction
  3. Site-level correlation with WGBS
  4. Call coverage
  5. Execution time
  6. Resource utilization

    New Transformer-based method accurately predicts DNA methylation from nanopore sequencing

    Illustration: Read -level evaluation. (Source: Paper)

Single base accuracy and F1 metric improved by up to 5 percentage points on the R.9.4.1 dataset and up to 0.82 percentage points on the R10.4.1 dataset.

In addition, Rockfish exhibits high correlation with whole-genome bisulfite sequencing, requires lower read depth, and is computationally efficient with higher confidence in biologically important regions such as CpG-rich promoters. Spend.

Its excellent performance in human and mouse samples highlights its versatility in studying 5-methylcytosine methylation in different organisms and diseases. Finally, its adaptable architecture ensures compatibility with new versions of pores and chemistries and modification types.

New Transformer-based method accurately predicts DNA methylation from nanopore sequencing

Illustration: Correlation analysis between ONT-based tools and WGBS. (Source: Paper)

Despite this, Rockfish is currently unable to distinguish between 5mC and 5hmC methylation, due to the lack of high-quality control data sets for other types of modifications. There is still room for improvement in computational efficiency of the model, and efficiency is expected to be improved through architecture and engineering optimization in the future.

Rockfish demonstrated the ability to extract methylation information from ONT raw signals, with its small model performing better and taking shorter run times on all datasets, demonstrating the benefits of additional data and knowledge distillation.

5mC modification is related to a variety of biological phenomena, such as transcriptional regulation, disease, aging, etc. Therefore, it is crucial to deeply understand the role of DNA methylation through single-base resolution detection, which may help in the prevention of diseases. Early diagnosis and treatment strategy selection. Rockfish's architecture makes it easily scalable to detect various types of DNA and RNA modifications.

Paper link: https://www.nature.com/articles/s41467-024-49847-0

The above is the detailed content of New Transformer-based method accurately predicts DNA methylation from nanopore sequencing. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn