Home >Technology peripherals >AI >Fine-Tuning DeepSeek R1 (Reasoning Model)
DeepSeek's groundbreaking AI models challenge OpenAI's dominance. These advanced reasoning models are freely available, democratizing access to powerful AI. Learn how to fine-tune DeepSeek with our video tutorial:
This tutorial fine-tunes the DeepSeek-R1-Distill-Llama-8B model using the Hugging Face Medical Chain-of-Thought Dataset. This distilled model, derived from Llama 3.1 8B, offers comparable reasoning capabilities to the original DeepSeek-R1. New to LLMs and fine-tuning? Consider our Introduction to LLMs in Python course.
Image by Author
DeepSeek AI has open-sourced DeepSeek-R1 and DeepSeek-R1-Zero, rivaling OpenAI's o1 in reasoning tasks (math, coding, logic). Explore our comprehensive DeepSeek R1 guide for details.
This pioneering model uses large-scale reinforcement learning (RL), bypassing initial supervised fine-tuning (SFT). While enabling independent chain-of-thought (CoT) reasoning, it presents challenges like repetitive reasoning and readability issues.
Addressing DeepSeek-R1-Zero's limitations, DeepSeek-R1 incorporates cold-start data before RL. This multi-stage training achieves state-of-the-art performance, matching OpenAI-o1 while enhancing output clarity.
DeepSeek also offers distilled models, balancing power and efficiency. These smaller models (1.5B to 70B parameters) retain strong reasoning, with DeepSeek-R1-Distill-Qwen-32B surpassing OpenAI-o1-mini in benchmarks. This highlights the effectiveness of the distillation process.
Source: deepseek-ai/DeepSeek-R1
Learn more about DeepSeek-R1's features, development, distilled models, access, pricing, and OpenAI o1 comparison in our blog post: "DeepSeek-R1: Features, o1 Comparison, Distilled Models & More".
Follow these steps to fine-tune your DeepSeek R1 model:
We utilize Kaggle's free GPU access. Create a Kaggle notebook, adding your Hugging Face and Weights & Biases tokens as secrets. Install the unsloth
Python package for faster, more memory-efficient fine-tuning. See our "Unsloth Guide: Optimize and Speed Up LLM Fine-Tuning" for details.
<code>%%capture !pip install unsloth !pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git</code>
Authenticate with the Hugging Face CLI and Weights & Biases (wandb):
<code>from huggingface_hub import login from kaggle_secrets import UserSecretsClient user_secrets = UserSecretsClient() hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN") login(hf_token) import wandb wb_token = user_secrets.get_secret("wandb") wandb.login(key=wb_token) run = wandb.init( project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', job_type="training", anonymous="allow" )</code>
Load the Unsloth version of DeepSeek-R1-Distill-Llama-8B using 4-bit quantization for optimized performance:
<code>from unsloth import FastLanguageModel max_seq_length = 2048 dtype = None load_in_4bit = True model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, token = hf_token, )</code>
Define a prompt style with placeholders for the question and response. This guides the model's step-by-step reasoning:
<code>prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. ### Instruction: You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. ### Question: {} ### Response: <think>{}"""</think></code>
Test the model with a sample medical question:
<code>question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?" FastLanguageModel.for_inference(model) inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda") outputs = model.generate( input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, max_new_tokens=1200, use_cache=True, ) response = tokenizer.batch_decode(outputs) print(response[0].split("### Response:")[1])</code>
Observe the model's pre-fine-tuning reasoning and identify areas for improvement through fine-tuning.
Modify the prompt style to include a placeholder for the complex chain of thought:
<code>train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. ### Instruction: You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. ### Question: {} ### Response: <think> {} </think> {}"""</code>
Create a function to format the dataset:
<code>EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN def formatting_prompts_func(examples): inputs = examples["Question"] cots = examples["Complex_CoT"] outputs = examples["Response"] texts = [] for input, cot, output in zip(inputs, cots, outputs): text = train_prompt_style.format(input, cot, output) + EOS_TOKEN texts.append(text) return { "text": texts, }</code>
Load and process the dataset:
<code>from datasets import load_dataset dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True) dataset = dataset.map(formatting_prompts_func, batched = True,) dataset["text"][0]</code>
Configure the model using LoRA:
<code>model = FastLanguageModel.get_peft_model( model, r=16, target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", ], lora_alpha=16, lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", # True or "unsloth" for very long context random_state=3407, use_rslora=False, loftq_config=None, )</code>
Set up the trainer:
<code>from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is_bfloat16_supported trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", max_seq_length=max_seq_length, dataset_num_proc=2, args=TrainingArguments( per_device_train_batch_size=2, gradient_accumulation_steps=4, # Use num_train_epochs = 1, warmup_ratio for full training runs! warmup_steps=5, max_steps=60, learning_rate=2e-4, fp16=not is_bfloat16_supported(), bf16=is_bfloat16_supported(), logging_steps=10, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="linear", seed=3407, output_, ), )</code>
Train the model:
<code>trainer_stats = trainer.train()</code>
(Note: The original response included images of training progress; these are omitted here as image reproduction is not possible.)
Compare results by querying the fine-tuned model with the same question as before. Observe the improvement in reasoning and response conciseness.
(Note: The original response included the improved model output; this is omitted here for brevity.)
Save the model locally and push it to the Hugging Face Hub:
<code>new_model_local = "DeepSeek-R1-Medical-COT" model.save_pretrained(new_model_local) tokenizer.save_pretrained(new_model_local) model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",) new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT" model.push_to_hub(new_model_online) tokenizer.push_to_hub(new_model_online) model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")</code>
(Note: The original response included images showing successful model saving and pushing; these are omitted here.)
The tutorial concludes by suggesting deployment options using BentoML or local conversion to GGUF format. It emphasizes the growing importance of open-source LLMs and highlights OpenAI's counter-moves with o3 and Operator AI. The links to those resources are preserved.
The rewritten response maintains the core information while simplifying the structure and removing unnecessary repetitions. The code blocks are retained for completeness. The images are referenced but not reproduced.
The above is the detailed content of Fine-Tuning DeepSeek R1 (Reasoning Model). For more information, please follow other related articles on the PHP Chinese website!