Home >Technology peripherals >AI >Fine-Tuning DeepSeek R1 (Reasoning Model)

Fine-Tuning DeepSeek R1 (Reasoning Model)

Lisa Kudrow
Lisa KudrowOriginal
2025-03-01 09:08:13500browse

DeepSeek's groundbreaking AI models challenge OpenAI's dominance. These advanced reasoning models are freely available, democratizing access to powerful AI. Learn how to fine-tune DeepSeek with our video tutorial:

This tutorial fine-tunes the DeepSeek-R1-Distill-Llama-8B model using the Hugging Face Medical Chain-of-Thought Dataset. This distilled model, derived from Llama 3.1 8B, offers comparable reasoning capabilities to the original DeepSeek-R1. New to LLMs and fine-tuning? Consider our Introduction to LLMs in Python course.

Fine-Tuning DeepSeek R1 (Reasoning Model)

Image by Author

Introducing DeepSeek R1 Models

DeepSeek AI has open-sourced DeepSeek-R1 and DeepSeek-R1-Zero, rivaling OpenAI's o1 in reasoning tasks (math, coding, logic). Explore our comprehensive DeepSeek R1 guide for details.

DeepSeek-R1-Zero

This pioneering model uses large-scale reinforcement learning (RL), bypassing initial supervised fine-tuning (SFT). While enabling independent chain-of-thought (CoT) reasoning, it presents challenges like repetitive reasoning and readability issues.

DeepSeek-R1

Addressing DeepSeek-R1-Zero's limitations, DeepSeek-R1 incorporates cold-start data before RL. This multi-stage training achieves state-of-the-art performance, matching OpenAI-o1 while enhancing output clarity.

DeepSeek Distillation

DeepSeek also offers distilled models, balancing power and efficiency. These smaller models (1.5B to 70B parameters) retain strong reasoning, with DeepSeek-R1-Distill-Qwen-32B surpassing OpenAI-o1-mini in benchmarks. This highlights the effectiveness of the distillation process.

Fine-Tuning DeepSeek R1 (Reasoning Model)

Source: deepseek-ai/DeepSeek-R1

Learn more about DeepSeek-R1's features, development, distilled models, access, pricing, and OpenAI o1 comparison in our blog post: "DeepSeek-R1: Features, o1 Comparison, Distilled Models & More".

Fine-Tuning DeepSeek R1: A Practical Guide

Follow these steps to fine-tune your DeepSeek R1 model:

1. Setup

We utilize Kaggle's free GPU access. Create a Kaggle notebook, adding your Hugging Face and Weights & Biases tokens as secrets. Install the unsloth Python package for faster, more memory-efficient fine-tuning. See our "Unsloth Guide: Optimize and Speed Up LLM Fine-Tuning" for details.

<code>%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git</code>

Authenticate with the Hugging Face CLI and Weights & Biases (wandb):

<code>from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)

import wandb

wb_token = user_secrets.get_secret("wandb")

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', 
    job_type="training", 
    anonymous="allow"
)</code>

2. Loading the Model and Tokenizer

Load the Unsloth version of DeepSeek-R1-Distill-Llama-8B using 4-bit quantization for optimized performance:

<code>from unsloth import FastLanguageModel

max_seq_length = 2048 
dtype = None 
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token, 
)</code>

3. Pre-Fine-tuning Inference

Define a prompt style with placeholders for the question and response. This guides the model's step-by-step reasoning:

<code>prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>{}"""</think></code>

Test the model with a sample medical question:

<code>question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])</code>

Observe the model's pre-fine-tuning reasoning and identify areas for improvement through fine-tuning.

4. Loading and Processing the Dataset

Modify the prompt style to include a placeholder for the complex chain of thought:

<code>train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""</code>

Create a function to format the dataset:

<code>EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }</code>

Load and process the dataset:

<code>from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]</code>

5. Setting up the Model

Configure the model using LoRA:

<code>model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)</code>

Set up the trainer:

<code>from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_,
    ),
)</code>

6. Model Training

Train the model:

<code>trainer_stats = trainer.train()</code>

(Note: The original response included images of training progress; these are omitted here as image reproduction is not possible.)

7. Post-Fine-tuning Inference

Compare results by querying the fine-tuned model with the same question as before. Observe the improvement in reasoning and response conciseness.

(Note: The original response included the improved model output; this is omitted here for brevity.)

8. Saving and Pushing the Model

Save the model locally and push it to the Hugging Face Hub:

<code>new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)

model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")</code>

(Note: The original response included images showing successful model saving and pushing; these are omitted here.)

9. Deployment and Conclusion

The tutorial concludes by suggesting deployment options using BentoML or local conversion to GGUF format. It emphasizes the growing importance of open-source LLMs and highlights OpenAI's counter-moves with o3 and Operator AI. The links to those resources are preserved.

The rewritten response maintains the core information while simplifying the structure and removing unnecessary repetitions. The code blocks are retained for completeness. The images are referenced but not reproduced.

The above is the detailed content of Fine-Tuning DeepSeek R1 (Reasoning Model). For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn