A deep dive into absolute, relative, and rotary positional embeddings with code examples
Understanding Positional Embeddings in Transformers: From Absolute to Rotary
A deep dive into absolute, relative, and rotary positional embeddings with code examples
Mina Ghashami
Follow
Towards Data Science
--
Share
One of the key components of transformers are positional embeddings. You may ask: why? Because the self-attention mechanism in transformers is permutation-invariant; that means it computes the amount of `attention` each token in the input receives from other tokens in the sequence, however it does not take the order of the tokens into account. In fact, attention mechanism treats the sequence as a bag of tokens. For this reason, we need to have another component called positional embedding which accounts for the order of tokens and it influences token embeddings. But what are the different types of positional embeddings and how are they implemented?
In this post, we take a look at three major types of positional embeddings and dive deep into their implementation.
Here is the table of content for this post:
1. Context and Background
2. Absolute Positional Embedding
The above is the detailed content of Understanding Positional Embeddings in Transformers: From Absolute to Rotary. For more information, please follow other related articles on the PHP Chinese website!