Home  >  Article  >  Technology peripherals  >  How to write LoRA code from scratch, here is a tutorial

How to write LoRA code from scratch, here is a tutorial

王林
王林forward
2024-03-20 15:06:45436browse

LoRA (Low-Rank Adaptation) is a popular technique designed to fine-tune large language models (LLM). This technology was originally proposed by Microsoft researchers and included in the paper "LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS". LoRA differs from other techniques in that instead of adjusting all parameters of the neural network, it focuses on updating a small number of low-rank matrices, significantly reducing the amount of computation required to train the model.

Since LoRA’s fine-tuning quality is comparable to full-model fine-tuning, many people refer to this method as a fine-tuning artifact. Since its release, many people have been curious about the technology and wanted to write code to better understand the research. In the past, the lack of proper documentation has been an issue, but now, we have tutorials to help.

The author of this tutorial is Sebastian Raschka, a well-known machine learning and AI researcher. He said that among various effective LLM fine-tuning methods, LoRA is still his first choice. To this end, Sebastian wrote a blog "Code LoRA From Scratch" to build LoRA from scratch. In his opinion, this is a good learning method.

How to write LoRA code from scratch, here is a tutorial

This article introduces low-rank adaptation (LoRA) by writing code from scratch. Sebastian fine-tuned the DistilBERT model in the experiment and used it applied to classification tasks.

The comparison results between the LoRA method and the traditional fine-tuning method show that the LoRA method achieved 92.39% in test accuracy, which is better than fine-tuning only the last few layers of the model (86.22% of the test accuracy) shows better performance. This shows that the LoRA method has obvious advantages in optimizing model performance and can better improve the model's generalization ability and prediction accuracy. This result highlights the importance of adopting advanced techniques and methods during model training and tuning to obtain better performance and results. By comparing how

#Sebastian achieves it, we will continue to look down.

Writing LoRA from scratch

Expressing a LoRA layer in code is like this:

How to write LoRA code from scratch, here is a tutorial

Among them, in_dim is the input dimension of the layer you want to modify using LoRA, and the corresponding out_dim is the output dimension of the layer. A hyperparameter, the scaling factor alpha, is also added to the code. Higher alpha values ​​mean greater adjustments to model behavior, and lower values ​​mean the opposite. Additionally, this article initializes matrix A with smaller values ​​from a random distribution and initializes matrix B with zeros.

It’s worth mentioning that where LoRA comes into play is usually the linear (feedforward) layer of a neural network. For example, for a simple PyTorch model or module with two linear layers (for example, this might be the feedforward module of the Transformer block), the forward method can be expressed as:

How to write LoRA code from scratch, here is a tutorial

When using LoRA, LoRA updates are usually added to the output of these linear layers, and the code is as follows:

How to write LoRA code from scratch, here is a tutorial

If you want to implement LoRA by modifying an existing PyTorch model, a simple way is to replace each linear layer with a LinearWithLoRA layer:

How to write LoRA code from scratch, here is a tutorial

A summary of the above concepts is shown in the figure below:

How to write LoRA code from scratch, here is a tutorial

In order to apply LoRA, this article replaces the existing linear layers in the neural network with a combination of the original linear Layer and LoRALayer's LinearWithLoRA layer.

How to get started using LoRA for fine-tuning

LoRA can be used for models such as GPT or image generation. For simple explanation, this article uses a small BERT (DistilBERT) model for text classification.

How to write LoRA code from scratch, here is a tutorial

Since this article only trains new LoRA weights, it is necessary to set the requires_grad of all trainable parameters to False to freeze all model parameters:

How to write LoRA code from scratch, here is a tutorial

Next, use print (model) to check the structure of the model:

How to write LoRA code from scratch, here is a tutorial

It can be seen from the output that the model consists of 6 transformer layers, including linear layers:

How to write LoRA code from scratch, here is a tutorial

In addition , the model has two linear output layers:

How to write LoRA code from scratch, here is a tutorial

LoRA can be selectively enabled for these linear layers by defining the following assignment function and loop:

How to write LoRA code from scratch, here is a tutorial

Check the model again using print (model) to check its updated structure:

How to write LoRA code from scratch, here is a tutorial ##

As you can see above, the Linear layer has been successfully replaced by the LinearWithLoRA layer.

If you train the model using the default hyperparameters shown above, it results in the following performance on the IMDb movie review classification dataset:

  • Training accuracy: 92.15%
  • Verification accuracy: 89.98%
  • ##Test accuracy: 89.44%

In the next section, this paper compares these LoRA fine-tuning results with traditional fine-tuning results.

Comparison with traditional fine-tuning methods

In the previous section, LoRA achieved a test accuracy of 89.44% under default settings, How does this compare to traditional fine-tuning methods?

For comparison, this article conducted another experiment, taking training the DistilBERT model as an example, but only updated the last 2 layers during training. The researchers achieved this by freezing all model weights and then unfreezing the two linear output layers:

How to write LoRA code from scratch, here is a tutorial

Classification performance obtained by training only the last two layers As follows:

  • Training accuracy: 86.68%
  • Validation accuracy: 87.26%
  • Test accuracy: 86.22%

The results show that LoRA performs better than the traditional method of fine-tuning the last two layers, but it uses 4 times fewer parameters . Fine-tuning all layers required updating 450 times more parameters than the LoRA setup, but only improved test accuracy by 2%.

Optimize LoRA configuration

The results mentioned above are all performed by LoRA under the default settings, and the hyperparameters are as follows:

How to write LoRA code from scratch, here is a tutorial

If the user wants to try different hyperparameter configurations, he can use the following command:

How to write LoRA code from scratch, here is a tutorial

However, the optimal hyperparameter configuration is as follows:

How to write LoRA code from scratch, here is a tutorial

Under this configuration, the result is:

  • Verification accuracy: 92.96%
  • Test accuracy: 92.39%

Notable Yes, even with only a small set of trainable parameters in the LoRA setting (500k VS 66M), the accuracy is slightly higher than that obtained with full fine-tuning.

Original link: https://lightning.ai/lightning-ai/studios/code-lora-from-scratch?cnotallow=f5fc72b1f6eeeaf74b648b2aa8aaf8b6

The above is the detailed content of How to write LoRA code from scratch, here is a tutorial. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete