search
HomeTechnology peripheralsAIThe first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Two years ago, when MeliusNet came out, Machine Heart published a technical article "The binary neural network that beats MobileNet for the first time, - 1 and 1’s three-year arduous journey​》, reviewed the development history of BNN. At that time, XNOR.AI, which was founded on the early BNN work XNOR-Net, was acquired by Apple. Everyone had imagined whether this low-power, high-performance binary neural network technology would soon open up broad application prospects.

However, in the past two years, it has been difficult for us to obtain more information about the application of BNN technology from Apple, which strictly keeps the technology confidential, and neither academia nor industry has appeared. Other particularly eye-catching application cases. On the other hand, as the number of terminal devices skyrockets, edge AI applications and markets are growing rapidly: it is expected that 500 to 125 billion edge devices will be produced by 2030, and the edge computing market will skyrocket to US$60 billion. There are several currently popular application areas: AIoT, Metaverse and robotic terminal equipment. Relevant industries are accelerating the implementation of technology. At the same time, AI capabilities have been embedded in many core technical links in the above fields, such as the widespread application of AI technology in three-dimensional reconstruction, video compression, and real-time robot perception of scenes. Against this background, the industry's demand for edge-based high-energy-efficiency, low-power AI technology, software tools, and hardware acceleration has become increasingly urgent.

Currently, there are two main bottlenecks restricting the application of BNN: first, the inability to effectively narrow the accuracy gap with traditional 32-bit deep learning models; second, the lack of performance on different hardware High-performance algorithm implementation. Speedups in machine learning papers often don't translate to the GPU or CPU you're using. The second reason may arise from the first reason. BNN cannot achieve satisfactory accuracy and therefore cannot attract widespread attention from practitioners in the fields of system and hardware acceleration and optimization. The machine learning algorithm community often cannot develop high-performance hardware code on its own. Therefore, to achieve both high accuracy and strong acceleration, BNN applications or accelerators will undoubtedly require the collaboration of developers from these two different fields.

Why BNN is computationally and memory efficient

For example, the Meta recommendation system model DLRM uses 32-bit floating point numbers to store weights and activation parameters, and its model The size is approximately 2.2GB. A binary version of the model with a small reduction in accuracy (

The second significant advantage of BNN is that the calculation method is extremely efficient. It only uses 1 bit, that is, two states, to represent variables. This means that all operations can be completed only by bit operations. With the help of AND gates, XOR gates and other operations, traditional multiplication and addition operations can be replaced. Bit operations are the basic unit in the circuit. Students who are familiar with circuit design should understand that effectively reducing the area of ​​the multiplication and addition calculation unit and reducing off-chip memory access are the most effective ways to reduce power consumption, and BNN focuses on both memory and calculation. All have unique advantages. WRPN [1] demonstrated that on customized FPGA and ASIC, BNN can achieve 1000 times power saving compared to full precision. More recent work BoolNet [2] demonstrated a BNN structural design that can use almost no floating point operations and maintain pure binary information flow, which achieves excellent power consumption and accuracy trade-offs in ASIC simulation.

What does the first BNN with 80% accuracy look like?

Researchers such as Nianhui Guo and Haojin Yang from the Hasso Plattner Institute of Computer Systems Engineering in Germany proposed the BNext model, becoming the first BNN to achieve a top1 classification accuracy of over 80% on the ImageNet data set. :

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

##Figure 1 Performance comparison of SOTA BNN based on ImageNet

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Paper address: https://arxiv.org/pdf/2211.12933.pdf

Author First, based on the Loss Landscape visualization form, we deeply compared the huge difference in optimization friendliness between the current mainstream BNN model and the 32-bit model (Figure 2). It was proposed that the rough Loss Landscape of BNN hinders the current research community from further exploring the performance boundaries of BNN. One of the main reasons.

Based on this assumption, the author tries to use novel structural design to improve the optimization friendliness of the BNN model, and constructs a binary neural network architecture with a smoother Loss Landscape to reduce the sensitivity to high Difficulty of optimizing precision BNN models. Specifically, the author emphasizes that model binarization greatly limits the feature patterns that can be used for forward propagation, forcing binary convolution to only extract and process information in a limited feature space, and this restricted feed-forward propagation mode The optimization difficulties caused by it can be effectively alleviated through two levels of structural design: (1) constructing a flexible contiguous convolution feature calibration module to improve the model's adaptability to binary representation; (2) exploring efficient bypass structures to Alleviate the information bottleneck problem caused by feature binarization in feedforward propagation.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 2 Visual comparison of Loss Landscape for popular BNN architecture (2D contour perspective)

Based on the above analysis, the author proposed BNext, the first binary neural network architecture to achieve > 80% accuracy in the ImageNe image classification task. The specific network architecture design is shown in Figure 4 shown. The author first designed a basic binary processing unit based on the Info-Recoupling (Info-RCP) module. To address the information bottleneck problem between adjacent convolutions, the preliminary calibration design of the binary convolution output distribution is completed by introducing additional Batch Normalization layers and PReLU layers. Then the author constructed a quadratic dynamic distribution calibration design based on the inverse residual structure and Squeeze-And-Expand branch structure. As shown in Figure 3, compared with the traditional Real2Binary calibration structure, the additional inverse residual structure fully considers the feature gap between the binary unit input and output, avoiding suboptimal distribution calibration based entirely on input information. This two-stage dynamic distribution calibration can effectively reduce the difficulty of feature extraction in subsequent adjacent binary convolution layers.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 3 Convolution module design comparison chart

Secondly, the author proposes an enhanced binary Basic Block module combined with Element-wise Attention (ELM-Attention). The author completed the basic construction of the Basic Block by stacking multiple Info-RCP modules, and introduced additional Batch Normalization and continuous residual connections to each Info-RCP module to further alleviate the information bottleneck problem between different Info-RCP modules. Based on the analysis of the impact of the bypass structure on the optimization of the binary model, the author proposes to use the Element-wise matrix multiplication branch to perform distribution calibration on the output of the first 3x3 Info-RCP module of each Basic Block. The additional spatial attention weighting mechanism can help Basic Block perform forward information fusion and distribution with a more flexible mechanism, improving the smoothness of the model Loss Landscape. As shown in Figure 2.e and Figure 2.f, the proposed module design can significantly improve the model Loss Landscape smoothness.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 4 BNext architecture design. "Processor represents the Info-RCP module, "BN" represents the Batch Normalization layer, "C" represents the basic width of the model, "N" and "M" represent the depth scale parameters of different stages of the model.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Table 1 BNext series. “Q” represents the input layer, SEbranch and output layer quantization settings.

The author combined the above structural design with the popular MoboleNetv1 benchmark model, and constructed four BNext model series of different complexity (Table 1) by changing the proportional coefficient of model depth and width: BNex-Tiny, BNext -Small, BNext-Middle, BNext-Large.

Due to the relatively rough Loss Landscape, current binary model optimization generally relies on finer supervision information provided by methods such as knowledge distillation to get rid of widespread suboptimal convergence. For the first time, the author of BNext considered the possible impact of the huge gap in the prediction distribution between the teacher model and the binary student model during the optimization process, and pointed out that teacher selection based solely on model accuracy will lead to counter-intuitive student overfitting results. To solve this problem, the author proposes knowledge-complexity (KC) as a new teacher-selection metric, taking into account the correlation between the effectiveness of the output soft labels of the teacher model and the complexity of the teacher model parameters.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

As shown in Figure 5, based on knowledge complexity, the author conducted complexity measurement and comparison of popular full-precision model series such as ResNet, EfficientNet, and ConvNext. Ranking, combined with BNext-T as a student model, preliminarily verified the effectiveness of this metric, and the ranking results were used for knowledge distillation model selection in subsequent experiments.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 5 Counter-intuitive overfitting effect and the impact of knowledge complexity under different teacher selections

On this basis, the author of the paper further considered the optimization problems caused by the early prediction distribution gap during the strong teacher optimization process, and proposed Diversified Consecutive KD. As shown below, the author modulates the objective function in the optimization process through the knowledge integration method of strong and weak teachers combination. On this basis, the knowledge-boosting strategy is further introduced, using multiple predefined candidate teachers to evenly switch weak teachers during the training process, guiding the combined knowledge complexity in a curricular manner from weak to strong, and reducing the prediction distribution. Optimization interference caused by differences.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

In terms of optimization techniques, the BNext authors fully consider the gains that data augmentation may bring in modern high-precision model optimization and provide the first In view of the analysis results of the possible impact of existing popular data augmentation strategies in binary model optimization, experimental results show that existing data augmentation methods are not fully suitable for binary model optimization, which is specific to binary models in subsequent research. Optimized data enhancement strategy design provides ideas.

Based on the proposed architecture design and optimization method, the author conducted method verification on the large-scale image classification task ImageNet-1k. The experimental results are shown in Figure 6.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 6 Comparison of SOTA BNN methods based on ImageNet-1k.

Compared with existing methods, BNext-L pushed the performance boundary of binary models to 80.57% for the first time on ImageNet-1k, achieving a 10% accuracy improvement over most existing methods. Compared with PokeBNN from Google, BNext-M is 0.7% higher with similar parameters. The author also emphasizes that the optimization of PokeBNN relies on higher computing resources, such as a Bacth Size of up to 8192 and a TPU of 720 Epochs. Computational optimization, while BNext-L only iterated 512 Epochs with a conventional Batch Size of 512, which reflects the effectiveness of the BNext structural design and optimization method. In comparisons based on the same baseline model, both BNext-T and BNext-18 have greatly improved accuracy. In comparison with full-precision models such as RegNetY-4G (80.0%), BNext-L demonstrates matching visual representation learning capabilities while using only limited parameter space and computational complexity, which makes it ideal for edge deployment. The downstream visual task model based on the binary model feature extractor provides rich imagination space.

What next?

BNext The authors mentioned in the paper that they and their collaborators are actively implementing and verifying this high-precision BNN architecture on GPU hardware operation efficiency, and plans to expand to other wider hardware platforms in the future. However, in the opinion of the editor, the community has regained confidence in BNN and attracted the attention of more geeks in the system and hardware fields. Perhaps the more important significance of this work is to reshape the imagination of BNN's application potential. In the long term, as more and more applications migrate from cloud-centric computing paradigms to decentralized edge computing, the massive number of edge devices in the future will require more efficient AI technology, software frameworks, and hardware computing platforms. However, the current most mainstream AI models and computing architectures are not designed and optimized for edge scenarios. Therefore, until the answer to edge AI is found, I believe that BNN will always be an important option full of technical challenges and huge potential.

The above is the detailed content of The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Tool Calling in LLMsTool Calling in LLMsApr 14, 2025 am 11:28 AM

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthHow ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthApr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

UN Input On AI: Winners, Losers, And OpportunitiesUN Input On AI: Winners, Losers, And OpportunitiesApr 14, 2025 am 11:25 AM

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Learning Negotiation Skills Via Generative AILearning Negotiation Skills Via Generative AIApr 14, 2025 am 11:23 AM

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

TED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfTED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfApr 14, 2025 am 11:22 AM

The ​TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerJoseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerApr 14, 2025 am 11:21 AM

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

What is Graph Database?What is Graph Database?Apr 14, 2025 am 11:19 AM

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

LLM Routing: Strategies, Techniques, and Python ImplementationLLM Routing: Strategies, Techniques, and Python ImplementationApr 14, 2025 am 11:14 AM

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment