Home >Backend Development >Python Tutorial >How to create a model from my data on Kaggle

How to create a model from my data on Kaggle

DDD
DDDOriginal
2025-01-26 10:12:091016browse

This tutorial demonstrates how to use the FastAI library to train an image classification model to distinguish between cats and dogs. We'll go step by step, from data preparation to model training and usage.

Step 1: Data preparation

  1. Image search function: First, we define a function for searching images from the DuckDuckGo search engine. This function accepts keywords and the maximum number of images as input and returns a list of image URLs.
<code class="language-python">import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    !pip install -Uqq fastai 'duckduckgo_search>=6.2'

from duckduckgo_search import DDGS
from fastcore.all import *
import time, json
def search_images(keywords, max_images=200):
    return L(DDGS().images(keywords, max_results=max_images)).itemgot('image')</code>
  1. Search and download sample images: Let’s search for “dog photos” and “cat photos” respectively and download a sample image.
<code class="language-python">urls = search_images('dog photos', max_images=1)
from fastdownload import download_url
dest = 'dog.jpg'
download_url(urls[0], dest, show_progress=False)
from fastai.vision.all import *
im = Image.open(dest)
im.to_thumb(256,256)</code>

How to create a model from my data on Kaggle

Similarly, we download a picture of a cat:

<code class="language-python">download_url(search_images('cat photos', max_images=1)[0], 'cat.jpg', show_progress=False)
Image.open('cat.jpg').to_thumb(256,256)</code>

How to create a model from my data on Kaggle

  1. Batch download and pre-process images: We download multiple pictures of cats and dogs and save them into dog_or_not/dog and dog_or_not/cat folders respectively. At the same time, we resize the image to improve efficiency.
<code class="language-python">searches = 'dog', 'cat'
path = Path('dog_or_not')

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    time.sleep(5)
    resize_images(path/o, max_size=400, dest=path/o)</code>
  1. Clean invalid images: Delete images that failed to download or are damaged.
<code class="language-python">failed = verify_images(get_image_files(path))
failed.map(Path.unlink)</code>

Step 2: Model training

  1. Create DataLoader: Use DataBlock to create DataLoader for loading and processing image data.
<code class="language-python">dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path, bs=32)
dls.show_batch(max_n=6)</code>

How to create a model from my data on Kaggle

  1. Fine-tuning the pre-trained model: Use a pre-trained ResNet50 model and fine-tune it on our dataset.
<code class="language-python">learn = vision_learner(dls, resnet50, metrics=error_rate)
learn.fine_tune(3)</code>

How to create a model from my data on Kaggle

Step 3: Model use

  1. Prediction: Predict the previously downloaded example dog image using the trained model.
<code class="language-python">is_dog,_,probs = learn.predict(PILImage.create('dog.jpg'))
print(f'This is a: {is_dog}.')
print(f"Probability it's a dog: {probs[1]:.4f}")</code>

Output result:

This is a: dog. Probability it's a dog: 1.0000

This tutorial shows how to use FastAI to quickly build a simple image classification model. Remember, the accuracy of your model depends on the quality and quantity of your training data.

The above is the detailed content of How to create a model from my data on Kaggle. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn