LLM: TensorFlow、Keras、Hugging Face を使用した転移学習

Oct 18, 2024 pm 05:52 PM

Transfer learning is one of the most powerful techniques in deep learning, especially when working with Large Language Models (LLMs). These models, such as Flan-T5, are pre-trained on vast amounts of data, allowing them to generalize across many language tasks. Instead of training a model from scratch, we can fine-tune these pre-trained models for specific tasks, like question-answering.

In this guide, we will walk you through how to perform transfer learning on Flan-T5-large using TensorFlow and Hugging Face. We’ll fine-tune this model on the SQuAD (Stanford Question Answering Dataset), a popular dataset used to train models for answering questions based on a given context.

Key points we’ll cover include:

A detailed introduction to Hugging Face and how it helps in NLP.
Step-by-step explanation of the code, including how to load and fine-tune the Flan-T5-large model.
Freezing the large encoder and decoder layers, and unfreezing only the final layer for efficient fine-tuning.
A brief introduction to the SQuAD dataset and how to process it for our task.
An in-depth explanation of the T5 architecture and how Hugging Face’s AutoModel works.
Ways to improve the fine-tuning process for better performance.

What is Hugging Face?

Hugging Face is a popular platform and library that simplifies working with powerful models in Natural Language Processing (NLP). The key components include:

Model Hub: A repository of pre-trained models that are ready to be fine-tuned on specific tasks.
Transformers Library: Provides tools to load and fine-tune models easily.
Datasets Library: A quick and easy way to load datasets, such as SQuAD, for training.

With Hugging Face, you don't need to build models from scratch. It offers access to a wide variety of pre-trained models, including BERT, GPT-3, and T5, which significantly reduces the time and resources needed to develop NLP solutions. By leveraging these models, you can quickly fine-tune them for specific downstream tasks like question-answering, text classification, and summarization.

What is AutoModel?

Hugging Face provides various model classes, but AutoModel is one of the most flexible and widely used. The AutoModel API abstracts away the complexities of manually selecting and loading models. You don’t need to know the specific class of each model beforehand; AutoModel will load the correct architecture based on the model's name.

For instance, AutoModelForSeq2SeqLM is used for sequence-to-sequence models like T5 or BART, which are typically used for tasks such as translation, summarization, and question-answering. The beauty of AutoModel is that it is model-agnostic—meaning you can swap out models with ease and still use the same code.

Here’s how it works in practice:

 from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer# Load the pre-trained Flan-T5-large model and tokenizermodel_name = "google/flan-t5-large"model = TFAutoModelForSeq2SeqLM.from_pretrained(model_name) # Load modeltokenizer = AutoTokenizer.from_pretrained(model_name) # Load tokenizer

The AutoModel dynamically loads the correct model architecture based on the model's name (in this case, flan-t5-large). This flexibility makes the development process much smoother and faster because you don’t need to worry about manually specifying each model's architecture.

Understanding the T5 Architecture

To understand how T5 works, let's first break down its architecture. T5 stands for Text-to-Text Transfer Transformer, and it was introduced by Google in 2019. The key idea behind T5 is that every NLP task can be cast as a text-to-text problem, whether it's translation, summarization, or even question-answering.

Key Components of T5:

Encoder-Decoder Architecture: T5 is a sequence-to-sequence (Seq2Seq) model. The encoder processes the input text, while the decoder generates the output.
Task-Agnostic Design: T5 converts every task into a text-to-text problem. For example, for question-answering, the input would be structured as “question: context: ,” and the model is tasked with predicting the answer as text.
Pre-training with Span Corruption: T5 was pre-trained using a method called "span corruption," where random spans of text are replaced with special tokens, and the model is tasked with predicting these spans.

Here’s an example of how T5 might be applied to a question-answering task:

 Input: "question: What is T5? context: T5 is a text-to-text transfer 
transformer developed by Google."Output: "T5 is a text-to-text transfer transformer."

The beauty of T5’s text-to-text framework is its flexibility. You can use the same model architecture for various tasks simply by rephrasing the input. This makes T5 highly versatile and adaptable for a range of NLP tasks.

Why T5 is Perfect for Transfer Learning

T5 has been pre-trained on a massive dataset known as C4 (Colossal Clean Crawled Corpus), which gives it a solid understanding of the structure of language. Through transfer learning, we can fine-tune this pre-trained model to specialize in a specific task, such as question-answering with the SQuAD dataset. By leveraging T5’s pre-trained knowledge, we only need to tweak the final layer to make it perform well on our task, which reduces training time and computational resources.

Loading and Preprocessing the SQuAD Dataset

Now that we have the model, we need data to fine-tune it. We'll use the SQuAD dataset, a collection of question-answer pairs based on passages of text.

 from datasets import load_dataset# Load the SQuAD datasetsquad = load_dataset("squad")
train_data = squad["train"]
valid_data = squad["validation"]

The SQuAD dataset is widely used for training models in question-answering tasks. Each data point in the dataset consists of a context (a passage of text), a question, and the corresponding answer, which is a span of text found within the context.

Preprocessing the Dataset

Before feeding the data into the model, we need to tokenize it. Tokenization converts raw text into numerical values (tokens) that the model can understand. For T5, we must format the input as a combination of the question and context.

 # Preprocessing function to tokenize inputs and outputsdef preprocess_function(examples): # Combine the question and context into a single string
 inputs = ["question: " + q + " context: " + c for q, c in zip(examples["question"], examples["context"])]
 model_inputs = tokenizer(inputs, max_length=512, truncation=True, 
padding="max_length", return_tensors="tf") # Tokenize the answer (label)
 labels = tokenizer(examples["answers"]["text"][0], max_length=64, 
truncation=True, padding="max_length", return_tensors="tf")
 model_inputs["labels"] = labels["input_ids"] return model_inputs# Preprocess the datasettrain_data = train_data.map(preprocess_function, batched=True)
valid_data = valid_data.map(preprocess_function, batched=True)

This function tokenizes both the question-context pairs (the input) and the answers (the output). Tokenization is necessary for transforming raw text into tokenized sequences that the model can process.

Fine-Tuning the Model (Transfer Learning)

Here’s where we perform transfer learning. To make fine-tuning efficient, we freeze the encoder and decoder layers, and unfreeze only the final layer. This strategy ensures that the computationally heavy layers are kept intact while allowing the final layer to specialize in the task of answering questions.

 from tensorflow.keras.optimizers import Adam# Freeze all layers by default (encoder, decoder, embedding layers)for layer in model.layers:
 layer.trainable = False# Unfreeze only the final task-specific layermodel.layers[-1].trainable = True# Compile the model with the correct Hugging Face loss function for TensorFlow
optimizer = Adam(learning_rate=3e-5)
model.compile(optimizer=optimizer, loss=model.hf_compute_loss)# Fine-tune the model on the SQuAD datasetmodel.fit(train_data.shuffle(1000).batch(8), epochs=3, 
validation_data=valid_data.batch(8))

Explanation:

Freezing the encoder and decoder layers: We freeze these layers because they are very large and already pre-trained on vast amounts of data. Fine-tuning them would require significant computational resources and time. By freezing them, we preserve their general language understanding and focus on fine-tuning the final layer.
Unfreezing the final layer: This allows the model to learn task-specific information from the SQuAD dataset. The final layer will be responsible for generating the answer based on the question-context pair.
Fine-tuning: We use a small learning rate and train the model for 3 epochs to adapt it to our dataset.

Evaluating the Model

Once the model is fine-tuned, it’s important to test how well it performs on the validation set.

 # Select a sample from the validation setsample = valid_data[0]# Tokenize the input textinput_text = "question: " + sample["question"] + " context: " + sample["context"]
input_ids = tokenizer(input_text, return_tensors="tf").input_ids# Generate the output (the model's answer)output = model.generate(input_ids)
answer = tokenizer.decode(output[0], skip_special_tokens=True)print(f"Question: {sample['question']}")print(f"Answer: {answer}")

This code takes a sample question-context pair, tokenizes it, and uses the fine-tuned model to generate an answer. The tokenizer decodes the output back into human-readable text.

Ways to Improve Fine-Tuning

Although we’ve covered the basics of fine-tuning, there are several ways you can further improve the performance of your model:

Data Augmentation: Use data augmentation techniques to increase the size of your training data. This could include paraphrasing questions or slightly modifying the context to create more training samples.
Use of Transfer Learning Techniques: Explore other transfer learning techniques like Parameter Efficient Fine-Tuning (PEFT), which allows fine-tuning of smaller subsets of the model’s parameters.
Optimization: Try using more advanced optimizers like AdamW or LAMB for better convergence. Additionally, consider experimenting with different learning rates, batch sizes, and warmup steps.
Experiment with Hyperparameters: You can experiment with hyperparameters like learning rate, number of epochs, and dropout rates. Use a small validation set to tune these hyperparameters.
Leverage TPUs or Multi-GPU Training: If you’re working with a large dataset or model, consider using TPUs (Tensor Processing Units) or multiple GPUs to speed up the training process.

Conclusion

In this guide, we walked through the entire process of fine-tuning a pre-trained LLM (Flan-T5-large) using TensorFlow and Hugging Face. By freezing the computationally expensive encoder and decoder layers and only fine-tuning the final layer, we optimized the training process while still adapting the model to our specific task of question-answering on the SQuAD dataset.

T5’s text-to-text framework makes it highly flexible and adaptable to various NLP tasks, and Hugging Face’s AutoModel abstraction simplifies the process of working with these models. By understanding the architecture and principles behind models like T5, you can apply these techniques to a variety of other NLP tasks, making transfer learning a powerful tool in your machine learning toolkit.

以上がLLM: TensorFlow、Keras、Hugging Face を使用した転移学習の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

1つのプロンプトは、すべての主要なLLMのセーフガードをバイパスできますApr 25, 2025 am 11:16 AM

HiddenLayerの画期的な研究は、主要な大規模な言語モデル（LLMS）における重大な脆弱性を明らかにしています。彼らの発見は、ほぼすべての主要なLLMSを回避できる「政策の人形劇」と呼ばれる普遍的なバイパス技術を明らかにしています

5つの間違いほとんどの企業が今年持続可能性を備えていますApr 25, 2025 am 11:15 AM

環境責任と廃棄物の削減の推進は、企業の運営方法を根本的に変えています。この変革は、製品開発、製造プロセス、顧客関係、パートナーの選択、および新しいものの採用に影響します

H20チップバンジョルツチャイナ企業ですが、彼らはインパクトのために長い間支えられてきましたApr 25, 2025 am 11:12 AM

高度なAIハードウェアに関する最近の制限は、AI優位のためのエスカレートする地政学的競争を強調し、中国の外国半導体技術への依存を明らかにしています。 2024年、中国は3,850億ドル相当の半導体を大量に輸入しました

OpenaiがChromeを購入すると、AIはブラウザ戦争を支配する場合がありますApr 25, 2025 am 11:11 AM

GoogleからのChromeの強制的な売却の可能性は、ハイテク業界での激しい議論に火をつけました。 Openaiが65％の世界市場シェアを誇る大手ブラウザを取得する見込みは、THの将来について重要な疑問を提起します

AIが小売メディアの成長する痛みをどのように解決できるかApr 25, 2025 am 11:10 AM

全体的な広告の成長を上回っているにもかかわらず、小売メディアの成長は減速しています。この成熟段階は、生態系の断片化、コストの上昇、測定の問題、統合の複雑さなど、課題を提示します。ただし、人工知能

「aiは私たちであり、それは私たち以上のものです」Apr 25, 2025 am 11:09 AM

古いラジオは、ちらつきと不活性なスクリーンのコレクションの中で静的なパチパチと鳴ります。簡単に不安定になっているこの不安定な電子機器の山は、没入型展示会の6つのインスタレーションの1つである「e-waste land」の核心を形成しています。

Google Cloudは、次の2025年にインフラストラクチャについてより深刻になりますApr 25, 2025 am 11:08 AM

Google Cloudの次の2025年：インフラストラクチャ、接続性、およびAIに焦点を当てています Google Cloudの次の2025年の会議では、多くの進歩を紹介しました。特定の発表の詳細な分析については、私の記事を参照してください

Baby Ai Meme、Arcanaの550万ドルのAI映画パイプライン、IRの秘密の支援者が明らかにした話Apr 25, 2025 am 11:07 AM

今週はAIとXR：AIを搭載した創造性の波が、音楽の世代から映画制作まで、メディアとエンターテイメントを席巻しています。見出しに飛び込みましょう。 AIに生成されたコンテンツの影響力の高まり：テクノロジーコンサルタントのShelly Palme

See all articles

ホットAIツール

Undresser.AI Undress

リアルなヌード写真を作成する AI 搭載アプリ

AI Clothes Remover

写真から衣服を削除するオンライン AI ツール。

Undress AI Tool

脱衣画像を無料で

Clothoff.io

AI衣類リムーバー

Video Face Swap

完全無料の AI 顔交換ツールを使用して、あらゆるビデオの顔を簡単に交換できます。

ホットツール

SecLists

SecLists は、セキュリティテスターの究極の相棒です。これは、セキュリティ評価中に頻繁に使用されるさまざまな種類のリストを 1 か所にまとめたものです。 SecLists は、セキュリティテスターが必要とする可能性のあるすべてのリストを便利に提供することで、セキュリティテストをより効率的かつ生産的にするのに役立ちます。リストの種類には、ユーザー名、パスワード、URL、ファジングペイロード、機密データパターン、Web シェルなどが含まれます。テスターはこのリポジトリを新しいテストマシンにプルするだけで、必要なあらゆる種類のリストにアクセスできるようになります。

mPDF

mPDF は、UTF-8 でエンコードされた HTML から PDF ファイルを生成できる PHP ライブラリです。オリジナルの作者である Ian Back は、Web サイトから「オンザフライ」で PDF ファイルを出力し、さまざまな言語を処理するために mPDF を作成しました。 HTML2FPDF などのオリジナルのスクリプトよりも遅く、Unicode フォントを使用すると生成されるファイルが大きくなりますが、CSS スタイルなどをサポートし、多くの機能強化が施されています。 RTL (アラビア語とヘブライ語) や CJK (中国語、日本語、韓国語) を含むほぼすべての言語をサポートします。ネストされたブロックレベル要素 (P、DIV など) をサポートします。

SublimeText3 Linux 新バージョン

SublimeText3 Linux 最新バージョン

メモ帳++7.3.1

使いやすく無料のコードエディター

DVWA

Damn Vulnerable Web App (DVWA) は、非常に脆弱な PHP/MySQL Web アプリケーションです。その主な目的は、セキュリティ専門家が法的環境でスキルとツールをテストするのに役立ち、Web 開発者が Web アプリケーションを保護するプロセスをより深く理解できるようにし、教師/生徒が教室環境で Web アプリケーションを教え/学習できるようにすることです。安全。 DVWA の目標は、シンプルでわかりやすいインターフェイスを通じて、さまざまな難易度で最も一般的な Web 脆弱性のいくつかを実践することです。このソフトウェアは、