This tutorial demonstrates fine-tuning Google's Gemma 2 model on a patient-doctor conversation dataset and deploying it for offline use. We'll cover model preparation, fine-tuning with LoRA, model merging, quantization, and local deployment with the Jan application.
Understanding Gemma 2
Gemma 2, Google's latest open-source large language model (LLM), offers 9B and 27B parameter versions under a permissive license. Its improved architecture provides faster inference across various hardware, integrating seamlessly with Hugging Face Transformers, JAX, PyTorch, and TensorFlow. Enhanced safety features and ethical AI deployment tools are also included.
Accessing and Running Gemma 2
This section details downloading and running inference with 4-bit quantization (necessary for memory efficiency on consumer hardware).
-
Install packages: Install
bitsandbytes
,transformers
, andaccelerate
. -
Hugging Face Authentication: Use a Hugging Face token (obtained from your Hugging Face account) to authenticate.
-
Load Model and Tokenizer: Load the
google/gemma-2-9b-it
model using 4-bit quantization and appropriate device mapping. -
Inference: Create a prompt, tokenize it, generate a response, and decode it.
Fine-tuning Gemma 2 with LoRA
This section guides you through fine-tuning Gemma 2 on a healthcare dataset using LoRA (Low-Rank Adaptation) for efficient training.
-
Setup: Install required packages (
transformers
,datasets
,accelerate
,peft
,trl
,bitsandbytes
,wandb
). Authenticate with Hugging Face and Weights & Biases. -
Model and Tokenizer Loading: Load Gemma 2 (9B-It) with 4-bit quantization, adjusting data type and attention implementation based on your GPU capabilities. Configure LoRA parameters.
-
Dataset Loading: Load and preprocess the
lavita/ChatDoctor-HealthCareMagic-100k
dataset, creating a chat format suitable for the model. -
Training: Set training arguments (adjust hyperparameters as needed) and train the model using the
SFTTrainer
. Monitor training progress with Weights & Biases.
-
Evaluation: Finish the Weights & Biases run to generate an evaluation report.
-
Saving the Model: Save the fine-tuned LoRA adapter locally and push it to the Hugging Face Hub.
Merging the Adapter and Base Model
This step merges the fine-tuned LoRA adapter with the base Gemma 2 model for a single, deployable model. This is done on a CPU to manage memory constraints.
-
Setup: Create a new notebook (CPU-based), install necessary packages, and authenticate with Hugging Face.
-
Load and Merge: Load the base model and the saved adapter, then merge them using
PeftModel.merge_and_unload()
. -
Save and Push: Save the merged model and tokenizer locally and push them to the Hugging Face Hub.
Quantizing with Hugging Face Space
Use the GGUF My Repo Hugging Face Space to easily convert and quantize the model to the GGUF format for optimal local deployment.
Using the Fine-tuned Model Locally with Jan
-
Download and install the Jan application.
-
Download the quantized model from the Hugging Face Hub.
-
Load the model in Jan, adjust parameters (stop sequences, penalties, max tokens, instructions), and interact with the fine-tuned model.
Conclusion
This tutorial provides a comprehensive guide to fine-tuning and deploying Gemma 2. Remember to adjust hyperparameters and settings based on your hardware and dataset. Consider exploring Keras 3 for potentially faster training and inference.
The above is the detailed content of Fine-Tuning Gemma 2 and Using it Locally. For more information, please follow other related articles on the PHP Chinese website!

AI Streamlines Wildfire Recovery Permitting Australian tech firm Archistar's AI software, utilizing machine learning and computer vision, automates the assessment of building plans for compliance with local regulations. This pre-validation significan

Estonia's Digital Government: A Model for the US? The US struggles with bureaucratic inefficiencies, but Estonia offers a compelling alternative. This small nation boasts a nearly 100% digitized, citizen-centric government powered by AI. This isn't

Planning a wedding is a monumental task, often overwhelming even the most organized couples. This article, part of an ongoing Forbes series on AI's impact (see link here), explores how generative AI can revolutionize wedding planning. The Wedding Pl

Businesses increasingly leverage AI agents for sales, while governments utilize them for various established tasks. However, consumer advocates highlight the need for individuals to possess their own AI agents as a defense against the often-targeted

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

In 2022, he founded social engineering defense startup Doppel to do just that. And as cybercriminals harness ever more advanced AI models to turbocharge their attacks, Doppel’s AI systems have helped businesses combat them at scale— more quickly and

Voila, via interacting with suitable world models, generative AI and LLMs can be substantively boosted. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including

Labor Day 2050. Parks across the nation fill with families enjoying traditional barbecues while nostalgic parades wind through city streets. Yet the celebration now carries a museum-like quality — historical reenactment rather than commemoration of c


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Linux new version
SublimeText3 Linux latest version

SublimeText3 Chinese version
Chinese version, very easy to use

SublimeText3 Mac version
God-level code editing software (SublimeText3)

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function
