Home > Article > Technology peripherals > Nearly ten thousand people watched Hinton’s latest speech: Forward-forward neural network training algorithm, the paper has been made public
The NeurIPS 2022 conference is in full swing. Experts and scholars from all walks of life are communicating and discussing many subdivided fields such as deep learning, computer vision, large-scale machine learning, learning theory, optimization, and sparsity theory.
At the meeting, Turing Award winner and deep learning pioneer Geoffrey Hinton was invited to give a speech in recognition of the paper "ImageNet Classification" he co-wrote ten years ago with his graduate students Alex Krizhevsky and Ilya Sutskever. with Deep Convolutional Neural Networks,” which was awarded the Time-testing Prize for its “tremendous impact” on the field. Published in 2012, this work was the first time a convolutional neural network achieved human-level performance in the ImageNet image recognition competition, and it was a key event that launched the third wave of artificial intelligence.
The theme of Hinton’s speech is “The Forward-Forward Algorithm for Training Deep Neural Networks”. In his speech, Geoffrey Hinton said, "The machine learning research community has been slow to realize the impact of deep learning on the way computers are built." He believes that the machine learning form of artificial intelligence will trigger The revolution in computer systems, this is a new combination of software and hardware that puts AI "into your toaster".
He continued, "I think we're going to see a completely different kind of computer, and it won't be possible for a few years. But there are good reasons to work on this completely different kind of computer." Computer."
All digital computers to date have been built to be "immortal" (immortal), where the hardware is designed to be extremely reliable so that the same software can run everywhere. "We can run the same program on different physical hardware, and the knowledge is immortal."
Hinton said that this design requirement means that digital computers have missed "the various aspects of hardware. "variable, random, unstable, simulated, and unreliable properties" that may be very useful to us.
In Hinton’s view, future computer systems will take a different approach: they will be “neuromorphic” and ordinary ( mortal). This means that every computer will be a tight marriage of neural network software and disjointed hardware, in the sense of having analog rather than digital components, which can contain an element of uncertainty and evolve over time.
Hinton explained, "The alternative now is that we would give up the separation of hardware and software, but computer scientists really don't like that. ."
The so-called mortal computation means that the knowledge learned by the system is inseparable from the hardware. These ordinary computers can "grow" out of expensive chip manufacturing plants.
Hinton points out that if we do this, we can use extremely low-power analog calculations and use memristor weights to perform terascale parallel processing. This refers to a decades-old experimental chip based on nonlinear circuit components. Additionally we can evolve hardware without understanding the precise quality of the precise behavior of different bits of hardware.
However, Hinton also said that the new ordinary computer will not replace the traditional digital computer. "It is not a computer that controls your bank account, nor does it know exactly how much you have." Money."
This kind of computer is used to put (i.e. process) other things. It can put something like GPT-3 into your toaster using a dollar. Medium", so you can talk to your toaster using only a few watts of power.
In this speech, Hinton spent most of the time talking about a new neural network method, which he called It is the Forward-Forward (FF) network, which replaces the backpropagation technique used in almost all neural networks. Hinton proposed that by removing backpropagation, forward networks might more reasonably approximate what happens in the brain in real life.
A draft of this paper is posted on the University of Toronto's Hinton homepage:
Paper link: https://www.cs.toronto.edu/~hinton/FFA13.pdf
Hinton said that the FF method may be more suitable for ordinary computing hardware. "To do something like this currently, we have to have a learning program that's going to run in proprietary hardware, and it has to learn to exploit the specific properties of that proprietary hardware, without knowing what all those properties are. But I think the forward algorithm is An option with potential." One obstacle to building new analog computers, he said, is the importance placed on the reliability of running a piece of software on millions of devices. "Each of these phones has to start out as a baby phone, and it has to learn how to become a phone," Hinton said. "And it's very painful."
Even the most skilled engineers will be reluctant to give up on perfect, identical immortal computers because of fear of uncertainty. paradigm.
Hinton said: "There are still very few people who are interested in analog computing who are willing to give up immortality. This is because of the attachment to consistency, predictability. But if You want the analog hardware to do the same thing every time, and sooner or later you run into real problems with all this mess."
Paper content
In the paper, Hinton introduced a new neural network learning procedure and experimentally demonstrated that it works well enough on some small problems. The specific content is as follows:What are the problems with backpropagation?
The success of deep learning over the past decade has established the effectiveness of performing stochastic gradient descent with large numbers of parameters and large amounts of data. Gradient is usually calculated by backpropagation, which has led to interest in whether the brain implements backpropagation or whether there are other ways to obtain the gradients needed to adjust the connection weights.As a model of how the cerebral cortex learns, backpropagation remains implausible, despite considerable efforts to implement it like real neurons. There is currently no convincing evidence that the cerebral cortex explicitly propagates error derivatives or stores neural activity for use in subsequent backpropagation. Top-down connections from one cortical area to areas earlier in the visual pathway were not as expected, i.e., bottom-up connections would occur if backpropagation was used in the visual system. Instead, they form loops in which neural activity passes through about half a dozen cortical layers in two regions and then returns to where it started.
Backpropagation through time is particularly unreliable as a way to learn sequences. In order to process a stream of sensory input without frequent timeouts, the brain needs to pipeline sensory data through different stages of sensory processing. It requires a learning program that can learn "on the fly." Representations in later stages of the pipeline may provide top-down information that affects representations in earlier stages of the pipeline in subsequent time steps, but the perception system needs to reason and learn in real time without stopping to perform backpropagation.
Another serious limitation of backpropagation is that it requires complete knowledge of the calculations performed in the forward pass in order to calculate the correct derivatives. If we insert a black box in the forward pass, then backpropagation is no longer possible unless we learn a differentiable model of the black box. As we will see, the black box does not change the learning procedure of the FF algorithm at all, since there is no need to backpropagate through it.
In the absence of a perfect forward pass model, one might resort to one of the many forms of reinforcement learning. The idea is to perform random perturbations on the weights or neural activity and relate these perturbations to changes in the payoff function. But reinforcement learning programs suffer from high variability: it’s hard to see the effect of perturbing one variable when many other variables are perturbed simultaneously. In order to average out the noise caused by all other perturbations, the learning rate needs to be inversely proportional to the number of variables being perturbed, which means that reinforcement learning scales poorly and cannot be compared with inverse for large networks containing millions or billions of parameters. Communication competition.
The main point of this paper is that neural networks containing unknown nonlinearities do not need to resort to reinforcement learning. The FF algorithm is comparable in speed to backpropagation, but has the advantage of being used when the precise details of the forward computation are not known. It also has the advantage of being able to learn while pipelined on sequential data through a neural network, without the need to store neural activity or stop propagating error derivatives.
Generally speaking, the FF algorithm is slower than backpropagation, and in several toy problems studied in this article, its generalization is not ideal, so when the power is low It is unlikely to replace backpropagation in applications that are too restricted. For very large models trained on very large datasets, this type of exploration will continue to use backpropagation. The FF algorithm may be superior to backpropagation in two ways, as a learning model for the cerebral cortex, and as a model using very low-power simulation hardware without resorting to reinforcement learning.
The Forward-Forward algorithm is a greedy multi-layer learning procedure inspired by Boltzmann machines and noise contrastive estimation. The idea is to use two forward passes to replace the forward and backward passes of backpropagation. These two forward passes are in exactly the same way. Operate on each other, but on different data, with opposite goals. Among them, the positive pass operates on the real data and adjusts the weights to increase the goodness in each hidden layer; the negative pass operates on the negative data and adjusts the weights to reduce the goodness in each hidden layer.
In the paper, Hinton demonstrated the performance of the FF algorithm through experiments on CIFAR-10.
CIFAR-10 has 50,000 training images that are 32 x 32 pixels in size with three color channels per pixel. Therefore, each image has 3072 dimensions. The backgrounds of these images are complex and highly variable, and cannot be modeled well with such limited training data. Generally speaking, when a fully connected network with two to three hidden layers is trained with the backpropagation method, unless the hidden layer is very small, the overfitting effect is very poor, so almost all reported results are for convolutional networks.
Since FF is intended for use in networks where weight sharing is not feasible, it was compared with a backpropagation network, which uses local receptive fields to limit the number of weights, without overly limiting the number of hidden units. The purpose is simply to show that, with a large number of hidden units, FF performs comparably to backpropagation for images containing highly variable backgrounds.
Table 1 shows the test performance of networks trained with backpropagation and FF, both of which use weight decay to reduce overfitting.
For more research details, please refer to the original paper.
The above is the detailed content of Nearly ten thousand people watched Hinton’s latest speech: Forward-forward neural network training algorithm, the paper has been made public. For more information, please follow other related articles on the PHP Chinese website!