search
HomeTechnology peripheralsAINeural networks may no longer need activation functions? Layer Normalization also has non-linear expression!

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!
The AIxiv column is a column where academic and technical content is published on this site. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The authors of this article are all from the team of Associate Professor Huang Lei, School of Artificial Intelligence, Beihang University and National Key Laboratory of Complex Critical Software Environment. The first author, Ni Yunhao, is a first-year graduate student, the second author, Guo Yuxin, is a third-year graduate student, and the third author, Jia Junlong, is a second-year graduate student. The corresponding author is Associate Professor Huang Lei (Homepage: https://huangleibuaa.github.io/)

Neural networks are usually composed of three Partly composed: linear layer, nonlinear layer (activation function) and normalization layer. The linear layer is the main location where network parameters exist. The nonlinear layer improves the expressive ability of the neural network, while the normalization layer (Normalization) is mainly used to stabilize and accelerate neural network training. There is little work to study their expressive ability. For example, with Batch Normalization is an example. It can be considered as a linear transformation in the prediction stage and does not introduce nonlinearity in expression. Therefore, researchers generally believe that Normalization cannot improve the expressive ability of the model.

However, the paper "On the Nonlinearity of Layer Normalization" recently published at ICML2024 by the team of Professor Huang Lei from the School of Artificial Intelligence of Beihang University pointed out that layer normalization (Layer Normlization, LN) and its computationally degraded version RMSNorm are nonlinear Expression ability, and the universal approximate classification ability of LN is discussed in detail.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

  • Paper address: https://arxiv.org/abs/2406.01255

This paper mathematically proves the nonlinearity of LN. And a simple neural network LN-Net containing only linear layers and LN is proposed. If it is deep enough, in theory, given samples and sample categories can be arbitrarily classified. This discovery breaks people's inertia that regards various Normalizations as linear transformations without fitting capabilities, and the nonlinear layer and normalization layer are no longer disjoint neural network modules.

Currently, with the widespread use of transformers, LN, as a fixed component, has become a commonly used technology. This research may provide a new theoretical basis for neural network architecture in this direction in the future. on, it is of groundbreaking significance.

The mathematical discovery of LN nonlinearity

For nonlinear research, the article does not directly discuss the analytical properties of LN itself, but explores the relationship between LN and data in a more practical way interaction.

The author first proposed the statistic SSR (Sum of Squares Ratio) to describe the linear separability of samples under two categories. When a sample is linearly transformed, the SSR also changes. Therefore, the minimum SSR corresponding to the sample under all linear transformations is defined as LSSR. The article points out that when the LSSR is smaller, the linear separability between samples is stronger.

However, when the linear change imposed on the sample is replaced by the structure of "linear transformation-LN-linear transformation", it is found that the new SSR obtained may be lower than the LSSR, which verifies the nonlinear expression of LN— —If LN is linear, then "linear transformation-LN-linear transformation" is also linear, and the resulting new SSR cannot be lower than the LSSR.

Arbitrary separability of LN in classification problems

For further research, the author splits LN into two steps: centering and scaling. Centralization is mathematically a linear transformation, so the nonlinearity of LN mainly exists in the scale scaling operation (also called spherical projection in the article, which is the operation performed by RMSNorm). The author took the simplest linearly inseparable XOR data as an example, and correctly classified these four points through linear transformation and spherical projection.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

More generally, the author proposes an algorithm to correctly classify any number of samples using LN and linear layers, exploring the universal approximation capability of LN-Net.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

By constructing algorithm steps, the layer-by-layer transformation of the neural network is converted into a similar sample merging problem, and the universal approximate classification problem is converted into a sample merging problem, and pointed out that - for m samples with any label, You can construct an O(m) layer LN-Net to correctly classify these m samples. This construction method also provides new ideas for calculating the VC dimension of neural networks. The author pointed out that on this basis, it can be inferred that the LN-Net with L normalization layers has a VC dimension of at least L+2.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

LN nonlinear enhancement and practical application

Based on proving the nonlinearity of LN, the author proposed a grouping layer standardization technology to further enhance the nonlinearity of LN for practical applications. (LN-G). The author mathematically predicts that grouping can strengthen the nonlinearity of LN from the perspective of the Hessian matrix, and preliminarily explores the expressive ability of LN-G experimentally.

The author pointed out that on the CIFAR-10 random label data set, for the usual linear layer model, the accuracy does not exceed 20%; while using the neural network composed of linear layer and LN-G (without introducing traditional Activation function as a nonlinear unit) can achieve an accuracy of 55.85%.
神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!
The author further explored the classification effect of LN-G in the convolutional neural network without activation function, and experimentally proved that this neural network without activation function does have powerful fitting ability. In addition, the author proposed LN-G-Position by analogy with MLP where GN acts on the entire sample (stretching a single sample into a one-dimensional vector and then performing GN). Using the LN-G-Position method on the ResNet network without non-linear layers can achieve an accuracy of 86.66% on the CIFAR-10 data set, which reflects the powerful expression ability of LN-G-Position.
神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!
The author then conducted an experimental study on Transformer, replacing the original LN with LN-G. According to the experimental results, it was found that group layer standardization can effectively improve the performance of the Transformer network, proving that in real networks, this feasibility of the theory.

Conclusion and Outlook

In the paper "On the Nonlinearity of Layer Normalization", the author theoretically proved for the first time the universal classification ability of a model containing only linear layers and LN and given a specific depth The VC dimension lower bound of the model. The most important significance here is that the analysis of the expressive ability of traditional deep neural networks has taken a big step towards the widely used modern real networks. This may provide new ideas for future neural network structure design. ideas.

The above is the detailed content of Neural networks may no longer need activation functions? Layer Normalization also has non-linear expression!. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Pixtral 12B vs Qwen2-VL-72BPixtral 12B vs Qwen2-VL-72BApr 12, 2025 am 09:52 AM

Introduction The AI revolution has given rise to a new era of creativity, where text-to-image models are redefining the intersection of art, design, and technology. Pixtral 12B and Qwen2-VL-72B are two pioneering forces drivin

What is PaperQA and How Does it Assist in Scientific Research?What is PaperQA and How Does it Assist in Scientific Research?Apr 12, 2025 am 09:51 AM

Introduction With the advancement of AI, scientific research has seen a massive transformation. Millions of papers are published annually on different technologies and sectors. But, navigating this ocean of information to retr

DataGemma: Grounding LLMs Against Hallucinations - Analytics VidhyaDataGemma: Grounding LLMs Against Hallucinations - Analytics VidhyaApr 12, 2025 am 09:46 AM

Introduction Large Language Models are rapidly transforming industries—today, they power everything from personalized customer service in banking to real-time language translation in global communication. They can answer quest

How to Build Multi-Agent System with CrewAI and Ollama?How to Build Multi-Agent System with CrewAI and Ollama?Apr 12, 2025 am 09:44 AM

Introduction Don’t want to spend money on APIs, or are you concerned about privacy? Or do you just want to run LLMs locally? Don’t worry; this guide will help you build agents and multi-agent frameworks with local LLMs t

AV Bytes: OpenAI's o1 Models, Apple's Visual AI and More - Analytics VidhyaAV Bytes: OpenAI's o1 Models, Apple's Visual AI and More - Analytics VidhyaApr 12, 2025 am 09:38 AM

Introduction This week has been packed with major updates in the world of artificial intelligence (AI). From OpenAI’s o1 models showcasing advanced reasoning to Apple’s groundbreaking Visual Intelligence technology, tech

How to Monitor Production-grade Agentic RAG Pipelines?How to Monitor Production-grade Agentic RAG Pipelines?Apr 12, 2025 am 09:34 AM

Introduction In 2022, the launch of ChatGPT revolutionized both tech and non-tech industries, empowering individuals and organizations with generative AI. Throughout 2023, efforts concentrated on leveraging large language mode

How to Optimize Data Warehouse with STAR Schema?How to Optimize Data Warehouse with STAR Schema?Apr 12, 2025 am 09:33 AM

The STAR schema is an efficient database design used in data warehousing and business intelligence. It organizes data into a central fact table linked to surrounding dimension tables. This star-like structure simplifies complex q

A Comprehensive Guide to Building Multimodal RAG SystemsA Comprehensive Guide to Building Multimodal RAG SystemsApr 12, 2025 am 09:29 AM

Retrieval Augmented Generation systems, better known as RAG systems, have become the de-facto standard for building intelligent AI assistants answering questions on custom enterprise data without the hassles of expensive fine-tun

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.