The MNIST fashion dataset is a popular dataset containing grayscale 28x28 pixel images of fashion items, such as shirts, shoes, and pants. This post explores the use of this dataset to train two neural network models in the identification of these garments.

Import Statements

The following libraries will be used for this post:

  • pickle - for saving the model training histories.
  • matplotlib - for data plots and visualizations.
  • tensorflow and keras - for training the neural network.
  • sklearn - for spectral embedding visualization of the results.
%matplotlib notebook
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt
import tensorflow
from tensorflow import keras
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, Flatten, MaxPooling2D
from sklearn.manifold import SpectralEmbedding

Load and Plot Sample Data

The following code loads the MNIST fashion data set and plots a sample of 9 garments.

The MNIST dataset is already partitioned into separate training and validation images and labels. The training images and labels will be used to train the models, while the validation images will be used to determine the models accuracy and ensure that the model has not been overfit.

# Load the MNIST data set
(training_images, training_labels), (test_images, test_labels) = fashion_mnist.load_data()
label_names = np.array(["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"])

# Plot some sample images from data set
plt.figure()
plt.suptitle("MNIST Fashion Dataset Samples", fontsize = 'x-large')
label_indexes = { training_labels[i]: i for i in range(len(training_labels)) }

for i in range(9):
    index = label_indexes[i]
    plt.subplot(3, 3, i + 1)
    plt.title(label_names[training_labels[index]])
    plt.imshow(training_images[index], cmap = 'Greys')
    
plt.tight_layout()

Preprocessing Data

Pior to training the models, the image data will need to be normalized to the interval [0, 1]. This is done by dividing by 255, as shown below.

processed_training_images = training_images / 255.0
processed_test_images = test_images / 255.0

One-hot vectors will also need to be created for each label in the data set:

label_set = np.sort(np.unique(training_labels))
training_one_hots = keras.utils.to_categorical(training_labels, len(label_set))
test_one_hots = keras.utils.to_categorical(test_labels, len(label_set))
print("Sample One-Hots:")

for i in range(9):
    print(f"{training_labels[i]}: {training_one_hots[i]}")
Sample One-Hots:
9: [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
0: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
0: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
3: [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
0: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
2: [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
7: [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
2: [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
5: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

Model 1: ReLU Activations

The first model will consist of a deep neural network with ReLU activation layers and dropout layers, to prevent overfitting.

model1 = Sequential()
model1.add(Dense(512, input_shape = (28 * 28,), activation = "relu"))           
model1.add(Dropout(0.15))
model1.add(Dense(512, activation = "relu"))
model1.add(Dropout(0.15))
model1.add(Dense(10, activation = "softmax"))
model1.compile(loss = "categorical_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5130      
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________

The training of the model is performed below, over the course of 5 epochs.

# Reshape data for model
training_vectors1 = processed_training_images.reshape(len(processed_training_images), 28 * 28)
test_vectors1 = processed_test_images.reshape(len(processed_test_images), 28 * 28)

model1_path = "mnist_model1.saved_model"
model1_history_path = "mnist_model1.saved_model_history"

if os.path.exists(model1_path) and os.path.exists(model1_history_path):
    # Load trained model
    model1 = keras.models.load_model(model1_path)
    history1 = pickle.load(open(model1_history_path, "rb"))
else:
    # Train new model
    tensorflow.random.set_seed(12345)
    model1.fit(training_vectors1, training_one_hots,
               batch_size = 64,
               epochs = 5,
               verbose = 1,
               validation_data = (test_vectors1, test_one_hots))
    history1 = model1.history.history
    model1.save(model1_path)
    pickle.dump(history1, open(model1_history_path, "wb"))
    
print(f"Training Accuracy: {history1['accuracy'][-1]:.4}")
print(f"Validation Accuracy: {history1['val_accuracy'][-1]:.4}")
Training Accuracy: 0.887
Validation Accuracy: 0.8651

To ensure that the model has not been overfit, the accuracies on the training and validation sets are plotted below. The training and validation accuracies are both generally increasing, indicating that the model has not been overfit.

plt.figure()
plt.title("Model 1 Accuracies")
plt.plot(history1["accuracy"], marker = "o", label = "Training Accuracy")
plt.plot(history1["val_accuracy"], marker = "o", label = "Validation Accuracy")
plt.legend()
plt.grid()

Model 2: Convolutional Network with ReLU Activation

The second model will consist of convolutional activation layers, following by a ReLU activation layer. Dropout layers have also been included to prevent overfitting.

model2 = Sequential()
model2.add(Conv2D(32, (3, 3), input_shape = (28, 28, 1)))
model2.add(Conv2D(32, (3, 3), activation = "relu"))
model2.add(MaxPooling2D(pool_size = (2, 2)))
model2.add(Dropout(0.2))
model2.add(Flatten())
model2.add(Dense(128, activation = "relu"))
model2.add(Dropout(0.2))
model2.add(Dense(10, activation = "softmax"))
model2.compile(loss = "categorical_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model2.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 32)        9248      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 12, 12, 32)        0         
_________________________________________________________________
flatten (Flatten)            (None, 4608)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               589952    
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 10)                1290      
=================================================================
Total params: 600,810
Trainable params: 600,810
Non-trainable params: 0
_________________________________________________________________

The training of the model is performed below, over the course of 5 epochs.

# Reshape data for model
training_vectors2 = training_images.reshape(len(training_images), 28, 28, 1)
test_vectors2 = test_images.reshape(len(test_images), 28, 28, 1)

model2_path = "mnist_model2.saved_model"
model2_history_path = "mnist_model2.saved_model_history"

if os.path.exists(model2_path) and os.path.exists(model2_history_path):
    # Load trained model
    model2 = keras.models.load_model(model2_path)
    history2 = pickle.load(open(model2_history_path, "rb"))
else:
    # Train new model
    tensorflow.random.set_seed(12345)
    model2.fit(training_vectors2, training_one_hots,
               batch_size = 64,
               epochs = 5,
               verbose = 1,
               validation_data = (test_vectors2, test_one_hots))
    history2 = model2.history.history
    model2.save(model2_path)
    pickle.dump(history2, open(model2_history_path, "wb"))

print(f"Training Accuracy: {history2['accuracy'][-1]:.4}")
print(f"Validation Accuracy: {history2['val_accuracy'][-1]:.4}")
Training Accuracy: 0.8839
Validation Accuracy: 0.8705

The accuracies on the training and validation sets are plotted below. A certain degree of overfitting may be present based on the dip in validation accuracy with the increase in training accuracy over epochs.

plt.figure()
plt.title("Model 2 Accuracies")
plt.plot(history2["accuracy"], marker = "o", label = "Training Accuracy")
plt.plot(history2["val_accuracy"], marker = "o", label = "Validation Accuracy")
plt.legend()
plt.grid()

Best Model Selection

The best model will be selected based on which one provided the greatest validation and training accuracies. Based on this criteria, the best model is Model 1. While Model 2 had a higher validation accuracy, its average accuracy was not better than Model 1.

best_model_index = np.argmax([x["val_accuracy"][-1] + x["accuracy"][-1] for x in (history1, history2)])
best_model = (model1, model2)[best_model_index]
history = (history1, history2)[best_model_index]
test_vectors = (test_vectors1, test_vectors2)[best_model_index]

print(f"Best Model: {best_model_index + 1}")
print(f"Training Accuracy: {history['accuracy'][-1]:.4}")
print(f"Validation Accuracy: {history['val_accuracy'][-1]:.4}")
Best Model: 2
Training Accuracy: 0.8839
Validation Accuracy: 0.8705

Best Model Predictions

An example of the best models predictions are plotted below. The incorrect predictions are fairly reasonable. For instance, shoes are incorrectly identified as other types of shoes, and you can somewhat understand where a mistake in classification could be made for many of the other garments.

# Predict the class labels
predictions = best_model.predict(test_vectors)
predicted_labels = predictions.argmax(axis = -1)
correct_filter = predicted_labels == test_labels
correct_predictions = np.flatnonzero(correct_filter)
incorrect_predictions = np.flatnonzero(~correct_filter)

# Plot sample of correct predictions
plt.figure()
plt.suptitle("Correct Predictions", fontsize = 'x-large')

for i in range(9):
    index = correct_predictions[i]
    plt.subplot(3, 3, i + 1)
    plt.title(f"Class: {label_names[test_labels[index]]}\nPredicted: {label_names[predicted_labels[index]]}")
    plt.imshow(test_images[index], cmap = 'Greys')
    
plt.tight_layout()

# Plot sample of incorrect predictions
plt.figure()
plt.suptitle("Incorrect Predictions", fontsize = 'x-large')

for i in range(9):
    index = incorrect_predictions[i]
    plt.subplot(3, 3, i + 1)
    plt.title(f"Class: {label_names[test_labels[index]]}\nPredicted: {label_names[predicted_labels[index]]}")
    plt.imshow(test_images[index], cmap = 'Greys')

plt.tight_layout()
2021-09-30 16:02:24.329335: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

Label Percent Correct

While the best model has a 87% validation accuracy, it unfortunately is inconsistent in that accuracy acrossed labels. As seen below, T-shift/tops (0), Pullovers (2), and Shirts (6) in particular have relatively poor accuracies. Attempting to find a better model or applying normalization algorithms in an attempt to improve accuracies for these cases may be desirable.

print("Label Percent Correct:")
labels_correct = []

for i in label_set:
    label_filter = i == test_labels
    count = np.sum(label_filter)
    correct = np.sum(correct_filter & label_filter)
    ratio = correct / float(count)
    labels_correct.append(ratio)
    print(f"[{i}] {label_names[i]}: {ratio:.2%}")
    
plt.figure()
plt.title("Label Percent Correct")
plt.xticks(label_set, rotation = "vertical")
plt.ylim(0.5, 1)
plt.bar([f"{label_names[i]} [{i}]" for i in label_set], labels_correct)
plt.tight_layout()
Label Percent Correct:
[0] T-shirt/top: 78.40%
[1] Trouser: 96.40%
[2] Pullover: 68.10%
[3] Dress: 89.90%
[4] Coat: 84.80%
[5] Sandal: 96.00%
[6] Shirt: 65.10%
[7] Sneaker: 97.40%
[8] Bag: 97.90%
[9] Ankle boot: 96.50%

Confusion Matrix

From the below confusion matrix, it can be observed that:

  • Pullovers (2) are frequently misidentified as Coats (4). There are 211 of such instances.
  • Shirts (6) are frequently misidentified as T-shirt/tops (0), Pullovers (2), Dresses (3), and Coats (4), and vice versa. This can be seen in the high numbers along the Shirts column and row.
confusion_matrix = tensorflow.math.confusion_matrix(test_labels, predicted_labels)
print("Confusion Matrix:")
print(confusion_matrix)
Confusion Matrix:
tf.Tensor(
[[784   5  13  23   8   4 157   0   6   0]
 [  5 964   0  17   5   0   5   0   4   0]
 [ 13   0 681   9 211   0  84   0   2   0]
 [ 16   3   3 899  47   0  28   0   4   0]
 [  2   1  43  21 848   1  82   0   2   0]
 [  0   0   0   0   0 960   0  30   0  10]
 [123   3  78  27 100   0 651   0  18   0]
 [  0   0   0   0   0   7   0 974   0  19]
 [  1   1   2   2   5   2   4   4 979   0]
 [  0   0   0   0   0   5   0  30   0 965]], shape=(10, 10), dtype=int32)

Spectral Embedding

The spectral embedding plot for the data seems to mostly coincide with the results of our model.

spectral_model = SpectralEmbedding(n_neighbors = 5)
projections = spectral_model.fit_transform(test_vectors1)

fig = plt.figure()
fig.suptitle("Spectral Embedding", fontsize="x-large")
ax = fig.add_subplot(111)
scatter = ax.scatter(projections[:,0], projections[:,1],
                     s = 3,
                     c = test_labels,
                     cmap = plt.cm.get_cmap("jet", 10))

color_bar = fig.colorbar(scatter)
fig.tight_layout()

Read Next