Home  >  Article  >  Understanding Positional Embeddings in Transformers: From Absolute to Rotary

Understanding Positional Embeddings in Transformers: From Absolute to Rotary

WBOY
WBOYOriginal
2024-07-20 21:49:31404browse

A deep dive into absolute, relative, and rotary positional embeddings with code examples

Understanding Positional Embeddings in Transformers: From Absolute to Rotary

Understanding Positional Embeddings in Transformers: From Absolute to Rotary

A deep dive into absolute, relative, and rotary positional embeddings with code examples

Mina Ghashami

Follow

Towards Data Science

--

Share

One of the key components of transformers are positional embeddings. You may ask: why? Because the self-attention mechanism in transformers is permutation-invariant; that means it computes the amount of `attention` each token in the input receives from other tokens in the sequence, however it does not take the order of the tokens into account. In fact, attention mechanism treats the sequence as a bag of tokens. For this reason, we need to have another component called positional embedding which accounts for the order of tokens and it influences token embeddings. But what are the different types of positional embeddings and how are they implemented?

In this post, we take a look at three major types of positional embeddings and dive deep into their implementation.

Here is the table of content for this post:

1. Context and Background

2. Absolute Positional Embedding

The above is the detailed content of Understanding Positional Embeddings in Transformers: From Absolute to Rotary. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn