Home >Technology peripherals >AI >KAN, which replaces MLP, has been extended to convolution by open source projects
Early this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN.
KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters.
KAN has the same strong mathematical foundation as MLP, which is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem.
As shown in the figure below, KAN has activation functions on edges, while MLP has activation functions on nodes. KAN seems to be more parameter efficient than MLP, but each KAN layer has more parameters than MLP layers. picture: [Figure 1: Schematic diagram] Brief explanation: KAN is an edge-based neural network structure, and each node has an edge weight and activation function. It realizes the transmission and update of information through edge propagation. MLP is a node-based neural network structure, each node has an input Extended to convolutional neural networks, the classic linear transformation of convolution is changed to a learnable nonlinear activation function in each pixel, and KAN convolution (CKAN) is proposed and open sourced.
Project address: https://github.com/AntonioTepsich/Convolutional-KANs
KAN Convolution
KAN Convolution is very similar to convolution, but instead of applying a dot product between a kernel and the corresponding pixel in the image, a learnable nonlinearity is applied to each element Activate the functions and then add them together. The kernel of the KAN convolution is equivalent to the KAN linear layer with 4 input and 1 output neurons. For each input i, applying the ϕ_i learnable function, the resulting pixel of this convolution step is the sum of ϕ_i (x_i).
Parameters in KAN convolution
Assume there is a KxK kernel , for each element of the matrix, there is a ϕ, whose parameter count is: gridsize 1, ϕ is defined as:
This is the activation Function b provides more expressibility, and the parameter count of the linear layer is gridsize 2. Therefore, KAN convolution has a total of K^2(gridsize 2) parameters, while ordinary convolution only has K^2.
Initial evaluation
The different architectures tested by the author are:
KAN convolutional layer (CKAN) connected to MLP
It is worth noting that the reason for publishing these "preliminary" results is because they hope to introduce this idea to the outside world as soon as possible and promote broader research in the community.
#Each element of the list in the convolutional layer contains the convolution number and the corresponding kernel size.
Based on the 28x28 MNIST dataset, it can be observed that the KANConv & MLP model achieves acceptable accuracy compared to ConvNet (large). However, the difference is that KANConv & MLP requires 7 times the number of parameters than standard ConvNet. Furthermore, KKAN’s accuracy is 0.04 lower than ConvNet Medium, while the number of parameters (94k vs. 157k) is almost half that of ConvNet Medium, which shows the potential of this architecture. We also need to conduct experiments on more data sets to draw conclusions on this.
In the coming days and weeks, the authors will also thoroughly tune the model and the hyperparameters of the models used for comparison. While some hyperparameter and architecture variations have been tried, this is only heuristic and does not follow any precise approach. They have not yet worked with large or more complex data sets due to computational power and time constraints, and are working on solving this problem.
In the future, the authors will conduct experiments on more complex data sets, which means that the parameter amount of KANS will increase because more KAN convolutional layers need to be implemented.
Currently, compared with traditional convolutional networks, the author stated that he has not seen a significant improvement in the performance of KAN convolutional networks. . Their analysis believes that this is due to the use of simple data sets and models. Compared with the best architecture tried (ConvNet Big, this comparison is unfair based on the scale factor), the advantage of this architecture is that it has good control over parameters. The requirements are much less.
Comparison between 2 identical convolutional layers and a KAN convolutional layer with the same MLP connected at the end shows that the classic method is slightly better, with an accuracy improvement of 0.06, The number of parameters of the KAN convolutional layer and KAN linear layer is almost half that of the classic method, but the accuracy is reduced by 0.04.
The authors stated that as the complexity of the model and data set increases, the performance of the KAN convolutional network should improve. At the same time, as the input dimension increases, the number of parameters of the model will also grow faster.
The above is the detailed content of KAN, which replaces MLP, has been extended to convolution by open source projects. For more information, please follow other related articles on the PHP Chinese website!