Home >Technology peripherals >AI >Choose GPT-3.5 or fine-tune open source models such as Llama 2? After comprehensive comparison, the answer is

Choose GPT-3.5 or fine-tune open source models such as Llama 2? After comprehensive comparison, the answer is

WBOY
WBOYforward
2023-10-16 14:17:06910browse

It is well known that fine-tuning GPT-3.5 is very expensive. This paper uses experiments to verify whether manually fine-tuned models can approach the performance of GPT-3.5 at a fraction of the cost. Interestingly, this article does exactly that.

Comparing the results on SQL tasks and functional representation tasks, this article found:

  • GPT-3.5 in two data sets (A subset of the Spider data set and the Viggo functional representation data set) performs slightly better than Code Llama 34B, which has been fine-tuned by Lora.
  • GPT-3.5 is 4-6 times more expensive to train and more expensive to deploy.

One of the conclusions of this experiment is that fine-tuning GPT-3.5 is suitable for initial verification work, but after that, a model like Llama 2 may be the best choice, simply To summarize:

  • #If you want to verify that fine-tuning is the right way to solve a specific task/dataset, or want a fully managed environment, then fine-tune GPT-3.5.
  • If you want to save money, get maximum performance from your data set, have more flexibility in training and deploying infrastructure, or want to keep some data private, Then fine-tune an open source model like Llama 2.

Next let’s see how this article is implemented.

The following figure shows the performance of Code Llama 34B and GPT-3.5 trained to convergence on SQL tasks and functional representation tasks. The results show that GPT-3.5 achieves better accuracy on both tasks.

选择GPT-3.5、还是微调Llama 2等开源模型?综合比较后答案有了

In terms of hardware usage, the experiment used an A40 GPU, which costs about $0.475 per hour.

选择GPT-3.5、还是微调Llama 2等开源模型?综合比较后答案有了

In addition, the experiment selected two data sets that are very suitable for fine-tuning, a subset of the Spider data set and Viggo functional representation data set.

In order to make a fair comparison with the GPT-3.5 model, the experiment performed minimal hyperparameter fine-tuning on Llama.

Two key choices for this article’s experiments are to use Code Llama 34B and Lora fine-tuning instead of full-parameter fine-tuning.

The experiment followed the rules regarding Lora hyperparameter fine-tuning to a large extent. The Lora adapter was configured as follows:

选择GPT-3.5、还是微调Llama 2等开源模型?综合比较后答案有了

SQL prompt examples are as follows:

选择GPT-3.5、还是微调Llama 2等开源模型?综合比较后答案有了

SQL prompt is partially shown, please give the complete prompt View the original blog

The experiment did not use the complete Spider data set, the specific form is as follows

department : Department_ID [ INT ] primary_key Name [ TEXT ] Creation [ TEXT ] Ranking [ INT ] Budget_in_Billions [ INT ] Num_Employees [ INT ] head : head_ID [ INT ] primary_key name [ TEXT ] born_state [ TEXT ] age [ INT ] management : department_ID [ INT ] primary_key management.department_ID = department.Department_ID head_ID [ INT ] management.head_ID = head.head_ID temporary_acting [ TEXT ]

The experiment chose to use the sql-create-context data set Intersection with the Spider dataset. The context provided for the model is a SQL creation command as follows:

CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

The code and data address of the SQL task: https://github.com/samlhuillier/spider-sql- An example of the finetune

functional representation prompt looks like this:

选择GPT-3.5、还是微调Llama 2等开源模型?综合比较后答案有了

functional The representation prompt is partially displayed. For the complete prompt, please check the original blog

The output is as follows:

verify_attribute(name[Little Big Adventure], rating[average], has_multiplayer[no], platforms[PlayStation])

In the evaluation phase, the two experiments were quickly Converged:

选择GPT-3.5、还是微调Llama 2等开源模型?综合比较后答案有了

functional representation task code and data address: https://github.com/samlhuillier/viggo-finetune

For more information, please view the original blog.

The above is the detailed content of Choose GPT-3.5 or fine-tune open source models such as Llama 2? After comprehensive comparison, the answer is. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete