Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide-AI-php.cn

Home

Technology peripherals

Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide

William Shakespeare

Mar 03, 2025 am 10:30 AM

Unlocking the Power of Llama 3.2: A Comprehensive Guide to Fine-tuning and Local Deployment

The landscape of large language models (LLMs) is rapidly evolving, with a focus on smaller, more efficient models. Llama 3.2, with its lightweight and vision model variations, exemplifies this trend. This tutorial details how to leverage Llama 3.2's capabilities, specifically the 3B lightweight model, for fine-tuning on a customer support dataset and subsequent local deployment using the Jan application.

Before diving in, beginners are strongly encouraged to complete an AI fundamentals course to grasp the basics of LLMs and generative AI.

Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide

Image by Author

Exploring Llama 3.2 Models

Llama 3.2 offers two model families: lightweight and vision. Lightweight models excel at multilingual text generation and tool use, ideal for resource-constrained environments. Vision models, on the other hand, specialize in image reasoning and multimodal tasks.

Lightweight Models

The lightweight family includes 1B and 3B parameter variants. Their compact size allows for on-device processing, ensuring data privacy and fast, cost-effective text generation. These models utilize pruning and knowledge distillation for efficiency and performance. The 3B model surpasses competitors like Gemma 2 and Phi 3.5-mini in tasks such as instruction following and summarization.

Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide

Source: Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Vision Models

The vision models (11B and 90B parameters) are designed for image reasoning, capable of interpreting documents and charts. Their multimodal capabilities stem from integrating pre-trained image encoders with language models. They outperform Claude 3 Haiku and GPT-4o mini in visual understanding tasks.

Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide

Source: Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

For deeper insights into Llama 3.2's architecture, benchmarks, and security features (Llama Guard 3), refer to the official Llama 3.2 Guide.

Accessing Llama 3.2 on Kaggle

While Llama 3.2 is open-source, access requires accepting terms and conditions. Here's how to access it via Kaggle:

Visit llama.com, complete the access form, selecting both lightweight and vision models.
Navigate to the Meta | Llama 3.2 model page on Kaggle and submit the form.
Accept the terms and conditions.
Await the notebook creation option. Select the Transformers tab, choose your model variant, and create a new notebook.
Configure the accelerator to "GPU T4 x2".
Update the transformers and accelerate packages using %pip install -U transformers accelerate.

The subsequent steps involve loading the tokenizer and model using the transformers library, specifying the local model directory, setting pad_token_id, creating a text generation pipeline, and running inference with custom prompts. Detailed code examples are provided in the accompanying Kaggle notebook. Similar steps apply to accessing the Llama 3.2 Vision models, though GPU requirements are significantly higher.

Fine-tuning Llama 3.2 3B Instruct

This section guides you through fine-tuning the Llama 3.2 3B Instruct model on a customer support dataset using the transformers library and QLoRA for efficient training.

Setup

Launch a new Kaggle notebook and set environment variables for Hugging Face and Weights & Biases (WandB) access.
Install necessary packages: transformers, datasets, accelerate, peft, trl, bitsandbytes, and wandb.
Log in to Hugging Face and WandB using your API keys.
Define variables for the base model, new model name, and dataset name.

Loading the Model and Tokenizer

Determine the appropriate torch_dtype and attn_implementation based on your GPU capabilities.
Load the model using BitsAndBytesConfig for 4-bit quantization to minimize memory usage.
Load the tokenizer.

Loading and Processing the Dataset

Load the bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Shuffle and select a subset of the data (e.g., 1000 samples for faster training).
Create a "text" column by combining system instructions, user queries, and assistant responses into a chat format using the tokenizer's apply_chat_template method.

Setting up the Model

Identify all linear module names using a helper function.
Configure LoRA using LoraConfig to fine-tune only specific modules.
Set up the TrainingArguments with appropriate hyperparameters for efficient training on Kaggle.
Create an SFTTrainer instance, providing the model, dataset, LoRA config, training arguments, and tokenizer.

Model Training

Train the model using trainer.train(). Monitor training and validation loss using WandB.

Model Inference

Test the fine-tuned model with sample prompts from the dataset.

Saving the Model

Save the fine-tuned model locally and push it to the Hugging Face Hub.

Merging and Exporting the Fine-tuned Model

This section details merging the fine-tuned LoRA adapter with the base model and exporting it to the Hugging Face Hub. It involves loading the base model and the LoRA adapter, merging them using PeftModel.from_pretrained and model.merge_and_unload(), and then saving and pushing the merged model to the Hub.

Converting to GGUF and Local Deployment

Finally, the tutorial explains converting the merged model to the GGUF format using the GGUF My Repo tool on Hugging Face and deploying it locally using the Jan application. This involves downloading the GGUF file, importing it into Jan, and setting up the system prompt and stop tokens for optimal performance.

Conclusion

Fine-tuning smaller LLMs offers a cost-effective and efficient approach to customizing models for specific tasks. This tutorial provides a practical guide to leveraging Llama 3.2's capabilities, from access and fine-tuning to local deployment, empowering users to build and deploy custom AI solutions. Remember to consult the accompanying Kaggle notebooks for detailed code examples.

The above is the detailed content of Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Why Sam Altman And Others Are Now Using Vibes As A New Gauge For The Latest Progress In AIMay 06, 2025 am 11:12 AM

Let's discuss the rising use of "vibes" as an evaluation metric in the AI field. This analysis is part of my ongoing Forbes column on AI advancements, exploring complex aspects of AI development (see link here). Vibes in AI Assessment Tradi

Inside The Waymo Factory Building A Robotaxi FutureMay 06, 2025 am 11:11 AM

Waymo's Arizona Factory: Mass-Producing Self-Driving Jaguars and Beyond Located near Phoenix, Arizona, Waymo operates a state-of-the-art facility producing its fleet of autonomous Jaguar I-PACE electric SUVs. This 239,000-square-foot factory, opened

Inside S&P Global's Data-Driven Transformation With AI At The CoreMay 06, 2025 am 11:10 AM

S&P Global's Chief Digital Solutions Officer, Jigar Kocherlakota, discusses the company's AI journey, strategic acquisitions, and future-focused digital transformation. A Transformative Leadership Role and a Future-Ready Team Kocherlakota's role

The Rise Of Super-Apps: 4 Steps To Flourish In A Digital EcosystemMay 06, 2025 am 11:09 AM

From Apps to Ecosystems: Navigating the Digital Landscape The digital revolution extends far beyond social media and AI. We're witnessing the rise of "everything apps"—comprehensive digital ecosystems integrating all aspects of life. Sam A

Mastercard And Visa Unleash AI Agents To Shop For YouMay 06, 2025 am 11:08 AM

Mastercard's Agent Pay: AI-Powered Payments Revolutionize Commerce While Visa's AI-powered transaction capabilities made headlines, Mastercard has unveiled Agent Pay, a more advanced AI-native payment system built on tokenization, trust, and agentic

Backing The Bold: Future Ventures' Transformative Innovation PlaybookMay 06, 2025 am 11:07 AM

Future Ventures Fund IV: A $200M Bet on Novel Technologies Future Ventures recently closed its oversubscribed Fund IV, totaling $200 million. This new fund, managed by Steve Jurvetson, Maryanna Saenko, and Nico Enriquez, represents a significant inv

As AI Use Soars, Companies Shift From SEO To GEOMay 05, 2025 am 11:09 AM

With the explosion of AI applications, enterprises are shifting from traditional search engine optimization (SEO) to generative engine optimization (GEO). Google is leading the shift. Its "AI Overview" feature has served over a billion users, providing full answers before users click on the link. [^2] Other participants are also rapidly rising. ChatGPT, Microsoft Copilot and Perplexity are creating a new “answer engine” category that completely bypasses traditional search results. If your business doesn't show up in these AI-generated answers, potential customers may never find you—even if you rank high in traditional search results. From SEO to GEO – What exactly does this mean? For decades

Big Bets On Which Of These Pathways Will Push Today's AI To Become Prized AGIMay 05, 2025 am 11:08 AM

Let's explore the potential paths to Artificial General Intelligence (AGI). This analysis is part of my ongoing Forbes column on AI advancements, delving into the complexities of achieving AGI and Artificial Superintelligence (ASI). (See related art

See all articles