search
HomeTechnology peripheralsAIReLU replaces softmax in visual Transformer, DeepMind's new trick reduces costs rapidly

The Transformer architecture has been widely used in the field of modern machine learning. The key point is to focus on one of the core components of transformer, which contains a softmax, which is used to generate a probability distribution of tokens. Softmax has a higher cost because it performs exponential calculations and summing sequence lengths, which makes parallelization difficult to perform.

Google DeepMind thought of a new idea: Replace the softmax operation with a new method that does not necessarily output a probability distribution. They also observed that using ReLU divided by the sequence length can approach or rival traditional softmax when used with a visual Transformer.

ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

Paper link: https://arxiv.org/abs/2309.08586

This result Brings new solutions to parallelization, because ReLU can be parallelized in the sequence length dimension, and requires fewer gather operations than traditional ones

Method

The key point is to concentrate

The key point is to concentrate on the function Convert d-dimensional queries, keys and values ​​{q_i, k_i, v_i} through a two-step process

In the first step, it is important to focus on getting the key points by Force weight ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

##where ϕ is usually softmax.

The next step, using this focus is to focus on weights to calculate the output This paper explores the use of point-wise calculations as an alternative to ϕ. ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

The key point of ReLU is to focus on

DeepMind observed that for ϕ = softmax in Eq. 1, ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly is a better alternative. They will use ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidlyfocus is called ReLU.

Expanded point-by-point focus is to focus

The researchers also experimentally explored more A wide range of ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly choices, where α ∈ [0, 1] and h ∈ {relu,relu², gelu,softplus, identity,relu6,sigmoid}.

What needs to be rewritten is: the extension of sequence length

They also found that if using a Expanding items with sequence length L can improve accuracy. Previous research work trying to remove softmax has not used this extension scheme

Among the Transformers currently designed to focus on using softmax, there is ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly , which means ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly although this is unlikely to be A necessary condition, but ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly can ensure that the complexity of ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly during initialization is ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly , retain this Conditions may reduce the need to change other hyperparameters when replacing softmax.

At the time of initialization, the elements of q and k are O (1), so ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly will also be O (1). Activation functions like ReLU maintain O (1), so a factor of ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly is needed to make ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly have a complexity of ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly.

Experiments and results

Main results

Figure 1 Description In terms of ImageNet-21k training, ReLU focuses on focusing and softmax focuses on the scaling trend. The x-axis shows the total kernel computation time required for the experiment in hours. A big advantage of ReLU is that it can be parallelized in the sequence length dimension, requiring fewer gather operations than softmax.

ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

The content that needs to be rewritten is: the effect of extending the sequence length

Figure 2 compares what needs to be rewritten: the results of the sequence length extension method and various other point-by-point solutions that replace softmax. Specifically, it is to use relu, relu², gelu, softplus, identity and other methods to replace softmax. The X-axis is α. The Y-axis is the accuracy of the S/32, S/16, and S/8 Vision Transformer models. The best results are usually obtained when α is close to 1. Since there is no clear optimal nonlinearity, they used ReLU in their main experiments because it is faster.

ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

## The effect of qk-layernorm can be restated as follows:

The main experiments used qk-layernorm, where queries and keys are passed through LayerNorm before calculating weights. DeepMind states that the reason for using qk-layernorm by default is that it is necessary to prevent instability when scaling model sizes. Figure 3 shows the impact of removing qk-layernorm. This result indicates that qk-layernorm has little impact on these models, but the situation may be different when the model size becomes larger.

ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

##Redescription: The additional effect of the door

Previous research on removing softmax has adopted the method of adding a gating unit, but this method cannot scale with the sequence length. Specifically, in the gated attention unit, there is an additional projection that produces an output that is obtained by an element-wise multiplicative combination before the output projection. Figure 4 explores whether the presence of gates eliminates the need for rewriting what is: an extension of the sequence length. Overall, DeepMind observes that the best accuracy is achieved with or without gates, with and without gates, by requiring rewriting: Sequence length extensions. Also note that for the S/8 model using ReLU, this gating mechanism increases the core time required for the experiment by approximately 9.3%.

ReLU replaces softmax in visual Transformer, DeepMinds new trick reduces costs rapidly

The above is the detailed content of ReLU replaces softmax in visual Transformer, DeepMind's new trick reduces costs rapidly. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
undress free porn AI tool websiteundress free porn AI tool websiteMay 13, 2025 am 11:26 AM

https://undressaitool.ai/ is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How to create pornographic images/videos using undressAIHow to create pornographic images/videos using undressAIMay 13, 2025 am 11:26 AM

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

undress AI official website entrance website addressundress AI official website entrance website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How does undressAI generate pornographic images/videos?How does undressAI generate pornographic images/videos?May 13, 2025 am 11:26 AM

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

undressAI porn AI official website addressundressAI porn AI official website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

UndressAI usage tutorial guide articleUndressAI usage tutorial guide articleMay 13, 2025 am 10:43 AM

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyright[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyrightMay 13, 2025 am 01:57 AM

The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

Explaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsExplaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsMay 13, 2025 am 01:53 AM

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment