Quickly gaining 2,500 stars, Andrej Karpathy rewrote a minGPT library-AI-php.cn

Home

Technology peripherals

Quickly gaining 2,500 stars, Andrej Karpathy rewrote a minGPT library

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 13, 2023 pm 02:01 PM

training library

As a representative work of "violent aesthetics" in the field of artificial intelligence, GPT can be said to have stolen the limelight. From the 117 million parameters of GPT at the beginning of its birth, it has soared all the way to 175 billion parameters of GPT-3. With the release of GPT-3, OpenAI has opened its commercial API to the community, encouraging everyone to try more experiments using GPT-3. However, use of the API requires an application, and your application is likely to come to nothing.

In order to allow researchers with limited resources to experience the fun of playing with large models, former Tesla AI director Andrej Karpathy wrote it based on PyTorch with only about 300 lines of code. A small GPT training library was developed and named minGPT. This minGPT can perform addition operations and character-level language modeling, and the accuracy is not bad.

After two years, minGPT has been updated, and Karpathy has launched a new version named NanoGPT. This library is used to train and fine-tune medium-sized GPT. In just a few days since it was launched, it has collected 2.5K stars.

Quickly gaining 2,500 stars, Andrej Karpathy rewrote a minGPT library

## Project address: https://github.com/karpathy/nanoGPT

In the project introduction, Karpathy wrote: "NanoGPT is the simplest and fastest library for training and fine-tuning medium-scale GPT. It is a rewrite of minGPT, because minGPT It's so complex that I don't want to use it anymore. NanoGPT is still under development and is currently working on reproducing GPT-2 on the OpenWebText dataset.

NanoGPT code design goals It is simple and easy to read, among which train.py is a code of about 300 lines; model.py is a GPT model definition of about 300 lines, which can choose to load GPT-2 weights from OpenAI."

Quickly gaining 2,500 stars, Andrej Karpathy rewrote a minGPT library

In order to render the data set, the user first needs to tokenize some documents into a simple 1D index array.

$ cd data/openwebtext
$ python prepare.py

This will generate two files: train.bin and val.bin, each containing a raw sequence of uint16 bytes representing the GPT-2 BPE token id. This training script attempts to replicate the smallest version of GPT-2 provided by OpenAI, which is the 124M version.

$ python train.py

If you want to use PyTorch distributed data parallelism (DDP) for training, please use torchrun to run the script.

$ torchrun --standalone --nproc_per_node=4 train.py

To make the code more efficient, users can also sample from the model:

$ python sample.py

Karpathy said the project is currently in 1 The training loss for one night on the A100 40GB GPU is about 3.74, and the training loss on 4 GPUs is about 3.60. Training dropped to 3.1 atm with 400,000 iterations (~1 day) on 8 x A100 40GB nodes.

As for how to fine-tune GPT on new text, users can visit data/shakespeare and look at prepare.py. Unlike OpenWebText, this will run in seconds. Fine-tuning takes very little time, e.g. just a few minutes on a single GPU. The following is an example of running fine-tuning

$ python train.py config/finetune_shakespeare.py

As soon as the project went online, someone has already started trying it:

Quickly gaining 2,500 stars, Andrej Karpathy rewrote a minGPT library

Friends who want to try it can refer to the original project.

The above is the detailed content of Quickly gaining 2,500 stars, Andrej Karpathy rewrote a minGPT library. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

What is Few-Shot Prompting? - Analytics VidhyaApr 22, 2025 am 09:13 AM

Few-Shot Prompting: A Powerful Technique in Machine Learning In the realm of machine learning, achieving accurate responses with minimal data is paramount. Few-shot prompting offers a highly effective solution, enabling AI models to perform specific

What is Temperature in prompt engineering? - Analytics VidhyaApr 22, 2025 am 09:11 AM

Prompt Engineering: Mastering the "Temperature" Parameter for AI Text Generation Prompt engineering is crucial when working with large language models (LLMs) like GPT-4. A key parameter in prompt engineering is "temperature," whi

Are You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

See all articles