MolE: A Transformer Model for Molecular Graph Learning-web3.0-php.cn

Home

web3.0

MolE: A Transformer Model for Molecular Graph Learning

Barbara Streisand

Nov 12, 2024 pm 09:22 PM

Molecular graphs foundation model disentangled attention

introduce MolE, a transformer-based model for molecular graph learning. MolE directly works with molecular graphs by providing both atom identifiers and graph connectivity as input tokens. Atom identifiers are calculated by hashing different atomic properties into a single integer, and graph connectivity is given as a topological distance matrix. MolE uses a Transformer as its base architecture, which also has been applied to graphs previously. The performance of transformers can be attributed in large part to the extensive use of the self-attention mechanism. In standard transformers, the input tokens are embedded into queries, keys and values (Q,K,Vin {R}^{Ntimes d}), which are used to compute self-attention as:

MolE: A Transformer Model for Molecular Graph Learning

MolE is a transformer model designed specifically for molecular graphs. It directly works with graphs by providing both atom identifiers and graph connectivity as input tokens and relative position information, respectively. Atom identifiers are calculated by hashing different atomic properties into a single integer. In particular, this hash contains the following information:

- number of neighboring heavy atoms,

- number of neighboring hydrogen atoms,

- valence minus the number of attached hydrogens,

- atomic charge,

- atomic mass,

- attached bond types,

- and ring membership.

Atom identifiers (also known as atom environments of radius 0) were computed using the Morgan algorithm as implemented in RDKit.

In addition to tokens, MolE also takes graph connectivity information as input which is an important inductive bias since it encodes the relative position of atoms in the molecular graph. In this case, the graph connectivity is given as a topological distance matrix d where dij corresponds to the length of the shortest path over bonds separating atom i from atom j.

MolE uses a Transformer as its base architecture, which also has been applied to graphs previously. The performance of transformers can be attributed in large part to the extensive use of the self-attention mechanism. In standard transformers, the input tokens are embedded into queries, keys and values (Q,K,Vin {R}^{Ntimes d}), which are used to compute self-attention as:

where ({H}_{0}in {R}^{Ntimes d}) are the output hidden vectors after self-attention, and (d) is the dimension of the hidden space.

In order to explicitly carry positional information through each layer of the transformer, MolE uses the disentangled self-attention from DeBERTa:

where ({Q}^{c},{K}^{c},{V}^{c}in {R}^{Ntimes d}) are context queries, keys and values that contain token information (used in standard self-attention), and ({Q}_{i,j}^{p},{K}_{i,j}^{p}in {R}^{Ntimes d}) are the position queries and keys that encode the relative position of the (i{{{rm{th}}}}) atom with respect to the (j{{{rm{th}}}}) atom. The use of disentangled attention makes MolE invariant with respect to the order of the input atoms.

As mentioned earlier, self-supervised pretraining can effectively transfer information from large unlabeled datasets to smaller datasets with labels. Here we present a two-step pretraining strategy. The first step is a self-supervised approach to learn chemical structure representation. For this we use a BERT-like approach in which each atom is randomly masked with a probability of 15%, from which 80% of the selected tokens are replaced by a mask token, 10% replaced by a random token from the vocabulary, and 10% are not changed. Different from BERT, the prediction task is not to predict the identity of the masked token, but to predict the corresponding atom environment (or functional atom environment) of radius 2, meaning all atoms that are separated from the masked atom by two or less bonds. It is important to keep in mind that we used different tokenization strategies for inputs (radius 0) and labels (radius 2) and that input tokens do not contain overlapping data of neighboring atoms to avoid information leakage. This incentivizes the model to aggregate information from neighboring atoms while learning local molecular features. MolE learns via a classification task where each atom environment of radius 2 has a predefined label, contrary to the Context Prediction approach where the task is to match the embedding of atom environments of radius 4 to the embedding of context atoms (i.e., surrounding atoms beyond radius 4) via negative sampling. The second step uses a graph-level supervised pretraining with a large labeled dataset. As proposed by Hu et al., combining node- and graph-level pretraining helps to learn local and global features that improve the final prediction performance. More details regarding the pretraining steps can be found in the Methods section.

MolE was pretrained using an ultra-large database of ~842 million molecules from ZINC and ExCAPE-DB, employing a self-supervised scheme (with an auxiliary loss) followed by a supervised pretraining with ~456K molecules (see Methods section for more details). We assess the quality of the molecular embedding by finetuning MolE on a set of downstream tasks. In this case, we use a set of 22 ADMET tasks included in the Therapeutic Data Commons (TDC) benchmark This benchmark is composed of 9 regression and 13 binary classification tasks on datasets that range from hundreds (e.g, DILI with 475 compounds) to thousands of compounds (such as CYP inhibition tasks with ~13,000 compounds). An advantage of using this benchmark is

The above is the detailed content of MolE: A Transformer Model for Molecular Graph Learning. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

The Pi Network token price has increased by more than 14% over the past week.Apr 16, 2025 am 11:22 AM

As of press time, Pi is trading at $0.6711 after its integration with Chainlink on April 12th. The announcement caused a surge in the price of Pi

More Reasons to Be Bullish on SUI as Price Enters Prime Buying ZoneApr 16, 2025 am 11:20 AM

An analyst on X, Frigg, highlights multiple reasons to be optimistic about SUI Network price trajectory.

Bitcoin (BTC) Sentiment Is Turning Bullish as Over $467 Million of the Cryptocurrency Was Withdrawn From Exchanges YesterdayApr 16, 2025 am 11:18 AM

Following today's data released by IntoTheBlock, sentiment around Bitcoin appears heightening towards bullishness.

Movement Labs and the Movement Network Foundation have launched an independent investigation into recent market-making irregularities related to the MOVE token.Apr 16, 2025 am 11:16 AM

nt Labs and the Movement Network Foundation Launch Independent Investigation into MOVE Token Market-Making Irregularities

A wave of capital is flowing out of Ethereum [ETH] and into Tron [TRX]Apr 16, 2025 am 11:14 AM

With $1.52 billion in stablecoins migrating to Tron, investors appear to be favoring lower-cost chains and diversifying beyond traditional USD-backed assets.

Mantra CEO John Patrick Mullin Burns His Allocation of OM Tokens to Restore Investor ConfidenceApr 16, 2025 am 11:12 AM

Mantra CEO John Patrick Mullin has proposed burning his allocation of OM tokens in a move aimed at restoring investor confidence after the protocol's native token suffered a sharp collapse.