MNIST Fashion Classification
The MNIST fashion dataset is a popular dataset containing grayscale 28x28 pixel images of fashion items, such as shirts, shoes, and pants. This post explores the use of this dataset to train two neural network models in the identification of these garments.
Import Statements
The following libraries will be used for this post:
pickle
- for saving the model training histories.matplotlib
- for data plots and visualizations.tensorflow
andkeras
- for training the neural network.sklearn
- for spectral embedding visualization of the results.
%matplotlib notebook
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt
import tensorflow
from tensorflow import keras
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, Flatten, MaxPooling2D
from sklearn.manifold import SpectralEmbedding
Load and Plot Sample Data
The following code loads the MNIST fashion data set and plots a sample of 9 garments.
The MNIST dataset is already partitioned into separate training and validation images and labels. The training images and labels will be used to train the models, while the validation images will be used to determine the models accuracy and ensure that the model has not been overfit.
# Load the MNIST data set
(training_images, training_labels), (test_images, test_labels) = fashion_mnist.load_data()
label_names = np.array(["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"])
# Plot some sample images from data set
plt.figure()
plt.suptitle("MNIST Fashion Dataset Samples", fontsize = 'x-large')
label_indexes = { training_labels[i]: i for i in range(len(training_labels)) }
for i in range(9):
index = label_indexes[i]
plt.subplot(3, 3, i + 1)
plt.title(label_names[training_labels[index]])
plt.imshow(training_images[index], cmap = 'Greys')
plt.tight_layout()
Preprocessing Data
Pior to training the models, the image data will need to be normalized to the interval [0, 1]. This is done by dividing by 255, as shown below.
processed_training_images = training_images / 255.0
processed_test_images = test_images / 255.0
One-hot vectors will also need to be created for each label in the data set:
label_set = np.sort(np.unique(training_labels))
training_one_hots = keras.utils.to_categorical(training_labels, len(label_set))
test_one_hots = keras.utils.to_categorical(test_labels, len(label_set))
print("Sample One-Hots:")
for i in range(9):
print(f"{training_labels[i]}: {training_one_hots[i]}")
Sample One-Hots:
9: [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
0: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
0: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
3: [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
0: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
2: [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
7: [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
2: [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
5: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
Model 1: ReLU Activations
The first model will consist of a deep neural network with ReLU activation layers and dropout layers, to prevent overfitting.
model1 = Sequential()
model1.add(Dense(512, input_shape = (28 * 28,), activation = "relu"))
model1.add(Dropout(0.15))
model1.add(Dense(512, activation = "relu"))
model1.add(Dropout(0.15))
model1.add(Dense(10, activation = "softmax"))
model1.compile(loss = "categorical_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
_________________________________________________________________
dropout (Dropout) (None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 5130
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
The training of the model is performed below, over the course of 5 epochs.
# Reshape data for model
training_vectors1 = processed_training_images.reshape(len(processed_training_images), 28 * 28)
test_vectors1 = processed_test_images.reshape(len(processed_test_images), 28 * 28)
model1_path = "mnist_model1.saved_model"
model1_history_path = "mnist_model1.saved_model_history"
if os.path.exists(model1_path) and os.path.exists(model1_history_path):
# Load trained model
model1 = keras.models.load_model(model1_path)
history1 = pickle.load(open(model1_history_path, "rb"))
else:
# Train new model
tensorflow.random.set_seed(12345)
model1.fit(training_vectors1, training_one_hots,
batch_size = 64,
epochs = 5,
verbose = 1,
validation_data = (test_vectors1, test_one_hots))
history1 = model1.history.history
model1.save(model1_path)
pickle.dump(history1, open(model1_history_path, "wb"))
print(f"Training Accuracy: {history1['accuracy'][-1]:.4}")
print(f"Validation Accuracy: {history1['val_accuracy'][-1]:.4}")
Training Accuracy: 0.887
Validation Accuracy: 0.8651
To ensure that the model has not been overfit, the accuracies on the training and validation sets are plotted below. The training and validation accuracies are both generally increasing, indicating that the model has not been overfit.
plt.figure()
plt.title("Model 1 Accuracies")
plt.plot(history1["accuracy"], marker = "o", label = "Training Accuracy")
plt.plot(history1["val_accuracy"], marker = "o", label = "Validation Accuracy")
plt.legend()
plt.grid()
Model 2: Convolutional Network with ReLU Activation
The second model will consist of convolutional activation layers, following by a ReLU activation layer. Dropout layers have also been included to prevent overfitting.
model2 = Sequential()
model2.add(Conv2D(32, (3, 3), input_shape = (28, 28, 1)))
model2.add(Conv2D(32, (3, 3), activation = "relu"))
model2.add(MaxPooling2D(pool_size = (2, 2)))
model2.add(Dropout(0.2))
model2.add(Flatten())
model2.add(Dense(128, activation = "relu"))
model2.add(Dropout(0.2))
model2.add(Dense(10, activation = "softmax"))
model2.compile(loss = "categorical_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model2.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 32) 9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 12, 12, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 4608) 0
_________________________________________________________________
dense_3 (Dense) (None, 128) 589952
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_4 (Dense) (None, 10) 1290
=================================================================
Total params: 600,810
Trainable params: 600,810
Non-trainable params: 0
_________________________________________________________________
The training of the model is performed below, over the course of 5 epochs.
# Reshape data for model
training_vectors2 = training_images.reshape(len(training_images), 28, 28, 1)
test_vectors2 = test_images.reshape(len(test_images), 28, 28, 1)
model2_path = "mnist_model2.saved_model"
model2_history_path = "mnist_model2.saved_model_history"
if os.path.exists(model2_path) and os.path.exists(model2_history_path):
# Load trained model
model2 = keras.models.load_model(model2_path)
history2 = pickle.load(open(model2_history_path, "rb"))
else:
# Train new model
tensorflow.random.set_seed(12345)
model2.fit(training_vectors2, training_one_hots,
batch_size = 64,
epochs = 5,
verbose = 1,
validation_data = (test_vectors2, test_one_hots))
history2 = model2.history.history
model2.save(model2_path)
pickle.dump(history2, open(model2_history_path, "wb"))
print(f"Training Accuracy: {history2['accuracy'][-1]:.4}")
print(f"Validation Accuracy: {history2['val_accuracy'][-1]:.4}")
Training Accuracy: 0.8839
Validation Accuracy: 0.8705
The accuracies on the training and validation sets are plotted below. A certain degree of overfitting may be present based on the dip in validation accuracy with the increase in training accuracy over epochs.
plt.figure()
plt.title("Model 2 Accuracies")
plt.plot(history2["accuracy"], marker = "o", label = "Training Accuracy")
plt.plot(history2["val_accuracy"], marker = "o", label = "Validation Accuracy")
plt.legend()
plt.grid()
Best Model Selection
The best model will be selected based on which one provided the greatest validation and training accuracies. Based on this criteria, the best model is Model 1. While Model 2 had a higher validation accuracy, its average accuracy was not better than Model 1.
best_model_index = np.argmax([x["val_accuracy"][-1] + x["accuracy"][-1] for x in (history1, history2)])
best_model = (model1, model2)[best_model_index]
history = (history1, history2)[best_model_index]
test_vectors = (test_vectors1, test_vectors2)[best_model_index]
print(f"Best Model: {best_model_index + 1}")
print(f"Training Accuracy: {history['accuracy'][-1]:.4}")
print(f"Validation Accuracy: {history['val_accuracy'][-1]:.4}")
Best Model: 2
Training Accuracy: 0.8839
Validation Accuracy: 0.8705
Best Model Predictions
An example of the best models predictions are plotted below. The incorrect predictions are fairly reasonable. For instance, shoes are incorrectly identified as other types of shoes, and you can somewhat understand where a mistake in classification could be made for many of the other garments.
# Predict the class labels
predictions = best_model.predict(test_vectors)
predicted_labels = predictions.argmax(axis = -1)
correct_filter = predicted_labels == test_labels
correct_predictions = np.flatnonzero(correct_filter)
incorrect_predictions = np.flatnonzero(~correct_filter)
# Plot sample of correct predictions
plt.figure()
plt.suptitle("Correct Predictions", fontsize = 'x-large')
for i in range(9):
index = correct_predictions[i]
plt.subplot(3, 3, i + 1)
plt.title(f"Class: {label_names[test_labels[index]]}\nPredicted: {label_names[predicted_labels[index]]}")
plt.imshow(test_images[index], cmap = 'Greys')
plt.tight_layout()
# Plot sample of incorrect predictions
plt.figure()
plt.suptitle("Incorrect Predictions", fontsize = 'x-large')
for i in range(9):
index = incorrect_predictions[i]
plt.subplot(3, 3, i + 1)
plt.title(f"Class: {label_names[test_labels[index]]}\nPredicted: {label_names[predicted_labels[index]]}")
plt.imshow(test_images[index], cmap = 'Greys')
plt.tight_layout()
2021-09-30 16:02:24.329335: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Label Percent Correct
While the best model has a 87% validation accuracy, it unfortunately is inconsistent in that accuracy acrossed labels. As seen below, T-shift/tops (0), Pullovers (2), and Shirts (6) in particular have relatively poor accuracies. Attempting to find a better model or applying normalization algorithms in an attempt to improve accuracies for these cases may be desirable.
print("Label Percent Correct:")
labels_correct = []
for i in label_set:
label_filter = i == test_labels
count = np.sum(label_filter)
correct = np.sum(correct_filter & label_filter)
ratio = correct / float(count)
labels_correct.append(ratio)
print(f"[{i}] {label_names[i]}: {ratio:.2%}")
plt.figure()
plt.title("Label Percent Correct")
plt.xticks(label_set, rotation = "vertical")
plt.ylim(0.5, 1)
plt.bar([f"{label_names[i]} [{i}]" for i in label_set], labels_correct)
plt.tight_layout()
Label Percent Correct:
[0] T-shirt/top: 78.40%
[1] Trouser: 96.40%
[2] Pullover: 68.10%
[3] Dress: 89.90%
[4] Coat: 84.80%
[5] Sandal: 96.00%
[6] Shirt: 65.10%
[7] Sneaker: 97.40%
[8] Bag: 97.90%
[9] Ankle boot: 96.50%
Confusion Matrix
From the below confusion matrix, it can be observed that:
- Pullovers (2) are frequently misidentified as Coats (4). There are 211 of such instances.
- Shirts (6) are frequently misidentified as T-shirt/tops (0), Pullovers (2), Dresses (3), and Coats (4), and vice versa. This can be seen in the high numbers along the Shirts column and row.
confusion_matrix = tensorflow.math.confusion_matrix(test_labels, predicted_labels)
print("Confusion Matrix:")
print(confusion_matrix)
Confusion Matrix:
tf.Tensor(
[[784 5 13 23 8 4 157 0 6 0]
[ 5 964 0 17 5 0 5 0 4 0]
[ 13 0 681 9 211 0 84 0 2 0]
[ 16 3 3 899 47 0 28 0 4 0]
[ 2 1 43 21 848 1 82 0 2 0]
[ 0 0 0 0 0 960 0 30 0 10]
[123 3 78 27 100 0 651 0 18 0]
[ 0 0 0 0 0 7 0 974 0 19]
[ 1 1 2 2 5 2 4 4 979 0]
[ 0 0 0 0 0 5 0 30 0 965]], shape=(10, 10), dtype=int32)
Spectral Embedding
The spectral embedding plot for the data seems to mostly coincide with the results of our model.
spectral_model = SpectralEmbedding(n_neighbors = 5)
projections = spectral_model.fit_transform(test_vectors1)
fig = plt.figure()
fig.suptitle("Spectral Embedding", fontsize="x-large")
ax = fig.add_subplot(111)
scatter = ax.scatter(projections[:,0], projections[:,1],
s = 3,
c = test_labels,
cmap = plt.cm.get_cmap("jet", 10))
color_bar = fig.colorbar(scatter)
fig.tight_layout()
Read Next
- 28 Sep 2021 MNIST Handwritten Digit Classification
- 01 Oct 2021 An Approach to Weight-Based Battle AI