Home > Article > Technology peripherals > Choose GPT-3.5 or fine-tune open source models such as Llama 2? After comprehensive comparison, the answer is
It is well known that fine-tuning GPT-3.5 is very expensive. This paper uses experiments to verify whether manually fine-tuned models can approach the performance of GPT-3.5 at a fraction of the cost. Interestingly, this article does exactly that.
Comparing the results on SQL tasks and functional representation tasks, this article found:
One of the conclusions of this experiment is that fine-tuning GPT-3.5 is suitable for initial verification work, but after that, a model like Llama 2 may be the best choice, simply To summarize:
Next let’s see how this article is implemented.
The following figure shows the performance of Code Llama 34B and GPT-3.5 trained to convergence on SQL tasks and functional representation tasks. The results show that GPT-3.5 achieves better accuracy on both tasks.
In terms of hardware usage, the experiment used an A40 GPU, which costs about $0.475 per hour.
In addition, the experiment selected two data sets that are very suitable for fine-tuning, a subset of the Spider data set and Viggo functional representation data set.
In order to make a fair comparison with the GPT-3.5 model, the experiment performed minimal hyperparameter fine-tuning on Llama.
Two key choices for this article’s experiments are to use Code Llama 34B and Lora fine-tuning instead of full-parameter fine-tuning.
The experiment followed the rules regarding Lora hyperparameter fine-tuning to a large extent. The Lora adapter was configured as follows:
SQL prompt examples are as follows:
SQL prompt is partially shown, please give the complete prompt View the original blog
The experiment did not use the complete Spider data set, the specific form is as follows
department : Department_ID [ INT ] primary_key Name [ TEXT ] Creation [ TEXT ] Ranking [ INT ] Budget_in_Billions [ INT ] Num_Employees [ INT ] head : head_ID [ INT ] primary_key name [ TEXT ] born_state [ TEXT ] age [ INT ] management : department_ID [ INT ] primary_key management.department_ID = department.Department_ID head_ID [ INT ] management.head_ID = head.head_ID temporary_acting [ TEXT ]
The experiment chose to use the sql-create-context data set Intersection with the Spider dataset. The context provided for the model is a SQL creation command as follows:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)
The code and data address of the SQL task: https://github.com/samlhuillier/spider-sql- An example of the finetune
functional representation prompt looks like this:
functional The representation prompt is partially displayed. For the complete prompt, please check the original blog
The output is as follows:
verify_attribute(name[Little Big Adventure], rating[average], has_multiplayer[no], platforms[PlayStation])
In the evaluation phase, the two experiments were quickly Converged:
functional representation task code and data address: https://github.com/samlhuillier/viggo-finetune
For more information, please view the original blog.
The above is the detailed content of Choose GPT-3.5 or fine-tune open source models such as Llama 2? After comprehensive comparison, the answer is. For more information, please follow other related articles on the PHP Chinese website!