Are modular machine learning systems enough? Bengio teachers and students tell you the answer-AI-php.cn

Home

Technology peripherals

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 12, 2023 pm 10:49 PM

modulesystemmachine learning

Deep learning researchers draw inspiration from neuroscience and cognitive science. From hidden units and input methods to the design of network connections and network architecture, many breakthrough studies are based on imitating brain operation strategies. There is no doubt that modularity and attention have been frequently used in combination in artificial networks in recent years and achieved impressive results.

In fact, cognitive neuroscience research shows that the cerebral cortex represents knowledge in a modular way, with communication between different modules, and the attention mechanism for content selection, which is what is mentioned above. The mentioned modularity and attention combinations are used. In recent research, it has been suggested that this mode of communication in the brain may have implications for inductive bias in deep networks. The sparsity of dependencies between these high-level variables breaks down knowledge into recombinable fragments that are as independent as possible, making learning more efficient.

Although much recent research relies on such modular architectures, researchers have used a large number of techniques and architectural modifications that make it possible to analyze real, usable systems. Architectural principles become challenging.

Machine learning systems are gradually revealing the advantages of sparser and more modular architectures. Modular architectures not only have good generalization performance, but also bring better distribution out-of-distribution. (OoD) Generalization, scalability, learning speed, and interpretability. A key to the success of such systems is that data-generating systems used in real-world settings are considered to consist of sparsely interacting parts, and it would be helpful to give the model a similar inductive bias. However, since these real-world data distributions are complex and unknown, the field has been lacking rigorous quantitative evaluations of these systems.

A paper written by three researchers from the University of Montreal in Canada: Sarthak Mittal, Yoshua Bengio, and Guillaume Lajoie. They used simple and known modular data distribution to analyze common modules. A comprehensive assessment of the architecture was conducted. The study highlights the benefits of modularity and sparsity and reveals insights into the challenges faced when optimizing modular systems. The first author and corresponding author, Sarthak Mittal, is a master student of Bengio and Lajoie.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

Paper address: https://arxiv.org/pdf/2206.02713.pdf
GitHub Address: https://github.com/sarthmit/Mod_Arch

Specifically, this study extends the analysis of Rosenbaum et al. and proposes a Methods to evaluate, quantify, and analyze common components of modular architecture. To this end, the research developed a series of benchmarks and metrics designed to explore the effectiveness of modular networks. This reveals valuable insights that help identify not only where current approaches succeed, but also when and how these approaches fail.

The contribution of this study can be summarized as:

This study develops benchmark tasks and metrics based on probabilistic selection rules, and uses benchmarks and metrics to quantify modularity Two important phenomena in systems: collapse and specialization.
This study extracts commonly used modular inductive biases and systematically evaluates them through a series of models designed to extract commonly used architectural properties (Monolithic, Modular, Modular-op, GT-Modular models ).
The study found that specialization in a modular system can significantly improve model performance when there are many potential rules in a task, but not when there are only few.
The study found that standard modular systems tend to be suboptimal in both their ability to focus on the right information and their ability to specialize, suggesting the need for additional inductive bias.

Definition / Terminology

In this paper, researchers explore how a series of modular systems perform common tasks that Formulated by a synthetic data generation process we call rule data. They introduce the definition of key components, including (1) rules and how these rules form tasks, (2) modules and how these modules adopt different model architectures, (3) specialization and how models are evaluated. The detailed settings are shown in Figure 1 below.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

rule. In order to properly understand modular systems and analyze their advantages and disadvantages, the researchers considered a comprehensive setup that allows fine-grained control over different task requirements. In particular, operations, which they call rules, must be learned on the data-generating distributions shown in Equation 1-3 below.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

Given the above distribution, the researcher defines a rule to become an expert on it, that is, the rule r is defined as p_y(·|x, c = r), where c is a categorical variable representing context and x is the input sequence.

Task. A task is described by a set of rules (data generating distributions) shown in Equation 1-3. Different sets of {p_y(· | x, c)}_c mean different tasks. For a given number of rules, the model is trained on multiple tasks to eliminate any task-specific bias.

Module. A modular system consists of a set of neural network modules, where each module contributes to the overall output. This can be seen through the following functional form.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

where y_m represents the output and p_m represents the activation of the m^th module.

Model architecture. The model architecture describes what architecture is chosen for each module of a modular system or for individual modules of a monolithic system. In this paper, the researchers consider using multi-layer perceptron (MLP), multi-head attention (MHA) and recurrent neural network (RNN). It is important that the rules (or data-generating distributions) are adapted to fit the model architecture, such as MLP-based rules.

Data generation process

Since the researchers’ goal is to explore modular systems through synthetic data, they introduced in detail the method based on the above Describes the data generation process for the rule scheme. Specifically, the researchers used a simple mixed-of-experts (MoE) style data generation process, hoping that different modules could be specialized for different experts in the rules.

They explain the data generation process for three model architectures, namely MLP, MHA and RNN. Additionally, there are two versions below each task: regression and classification.

MLP. The researchers defined a data scheme suitable for learning based on modular MLP systems. In this synthetic data generation scheme, a data sample consists of two independent numbers and a regular selection sampled from some distribution. Different rules generate different linear combinations of two numbers to give an output, that is, the selection of the linear combination is dynamically instantiated according to the rules, as shown in Equation 4-6 below.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

MHA. Now, researchers have defined a data scheme tuned for learning in a modular MHA system. Therefore, they designed a data generation distribution with the following property: each rule consists of different search and retrieval concepts and the final linear combination of retrieved information. Researchers describe this process mathematically in Equation 7-11 below.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

RNN. For circulatory systems, the researchers defined rules for a linear dynamic system in which one of multiple rules can be triggered at any point in time. Mathematically, this process is shown in Equation 12-15 below.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

Model

Some previous work claimed that end-to-end trained module systems are superior to single systems, especially in distributed environments. However, there has been no detailed and in-depth analysis of the benefits of these modular systems and whether they actually specialize based on the data generation distribution.

Therefore, the researchers considered four types of models that allow different degrees of specialization, namely Monolithic (single), Modular (modular), Modular-op and GT-Modular . Table 1 below illustrates these models.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

Monolithic. A monolithic system is a large neural network that takes as input a whole set of data (x, c) and makes a prediction y^ based on it. The modularity or sparsity of the explicitly baked systems in the system suffers no inductive bias and relies entirely on backpropagation to learn whatever functional form is required to solve the task.

Modular. A modular system consists of many modules, each of which is a neural network of a given architecture type (MLP, MHA, or RNN). Each module m takes data (x, c) as input and computes an output yˆ_m and a confidence score, normalized across modules to the activation probability p_m.

Modular-op. A modular operating system is very similar to a modular system, with one difference. Instead of defining the activation probability p_m of module m as a function of (x, c), the researchers ensured that the activation is determined only by the rule context C.

GT-Modular. True-value modular systems serve as oracle benchmarks, i.e., perfectly specialized modular systems.

Researchers show that from Monolithic to GT-Modular, models increasingly include inductive biases for modularity and sparsity.

Metrics

To reliably evaluate modular systems, researchers have proposed a series of metrics that can not only measure the performance advantages of such systems , and can also be assessed through two important forms: collapse and specialization.

performance. The first set of evaluation metrics is based on performance in both in-distribution and out-of-distribution (OoD) settings, reflecting the performance of different models on various tasks. For the classification setting, we report the classification error; for the regression setting, we report the loss.

collapse. The researchers proposed a set of metrics, Collapse-Avg and Collapse-Worst, to quantify the amount of collapse a modular system encounters (i.e., the extent to which modules are underutilized). Figure 2 below shows an example where you can see that module 3 is not used.

specialization. To complement the collapse metrics, we also propose the following set of metrics, namely (1) alignment, (2) adaptation, and (3) inverse mutual information that quantifies the degree of specialization achieved by a modular system.

Experiment

The figure below shows that the GT-Modular system is optimal in most cases (left), which indicates specialization is beneficial. We also see that between the standard end-to-end trained modular system and the monolithic system, the former outperforms the latter but not by much. Together, these two pie charts demonstrate that current modular systems for end-to-end training do not achieve good specialization and are therefore largely suboptimal.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

The study then looks at specific architectural choices and analyzes them across a growing set of rules performance and trends.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

Figure 4 shows that while a perfectly specialized system (GT-Modular) would bring benefits, a typical modular system for end-to-end training is sub-optimal and cannot achieve these benefits, especially as the number of rules increases increase. Furthermore, while such end-to-end modular systems often outperform monolithic systems, the advantage is usually only small.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

In Figure 7 we also see the average of the training modes for the different models on all other settings, The average includes classification error and regression loss. As can be seen, good specialization not only leads to better performance, but also speeds up training.

Are modular machine learning systems enough? Bengio teachers and students tell you the answer

The following figure shows two collapse metrics: Collapse-Avg and Collapse-Worst. In addition, the figure below also shows three specialization indicators for different models with different number of rules, alignment, adaptation and inverse mutual information:

Are modular machine learning systems enough? Bengio teachers and students tell you the answer ##

The above is the detailed content of Are modular machine learning systems enough? Bengio teachers and students tell you the answer. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

A Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),