Home >Backend Development >Python Tutorial >Why is Keras only training on a portion of my dataset, even though I have 60,000 samples?
Keras Training with Limited Dataset
In attempting to train a neural network with Keras, you noticed that it only utilizes a portion of the available dataset, despite having access to 60,000 samples. While you followed the official TensorFlow guide, the training process exhibits a discrepancy. This article aims to explain why Keras behaves in this manner and provide a solution.
The Reason Behind Partial Dataset Usage
The number "1875" encountered during model fitting does not represent the number of training samples; rather, it denotes the number of batches. By default, Keras uses a batch size of 32 during training. For a dataset with 60,000 samples, this equates to:
60,000 / 32 = 1875
Therefore, Keras divides your dataset into 1875 batches, each containing 32 samples. As a result, each epoch iterates over these 1875 batches instead of the entire dataset.
Solution
To utilize the entire dataset, you can explicitly set the batch size to the total number of samples:
<code class="python">model.fit(train_images, train_labels, epochs=10, batch_size=60000)</code>
By doing so, Keras will train the model on the entirety of your dataset, which may result in improved performance.
The above is the detailed content of Why is Keras only training on a portion of my dataset, even though I have 60,000 samples?. For more information, please follow other related articles on the PHP Chinese website!