- How to Use Class Weights with Focal Loss in PyTorch for Imbalanced dataset for MultiClass Classification
- How to solve "ValueError: y should be a 1d array, got an array of shape (3, 5) instead." for naive Bayes?
- How to create image of confusion matrix in Python
- What are the numbers in torch.transforms.normalize and how to select them?
- How to assign a name for a pytorch layer?
- How to solve dist.init_process_group from hanging or deadlocks?
- How to use sample weights with tensorflow datasets?
- How to Fine-tune HuggingFace BERT model for Text Classification
- How to Convert Yolov5 model to tensorflow.js
- Machine Learning Project: Hotel Booking Prediction [Part 1]
- Machine Learning Project: Airline Tickets Price Prediction
- Machine Learning Project: Hotel Booking Prediction [Part 2]
- Machine Learning Project Environment Setup
- Computer vision final year project ideas and guidelines
- Build Your First Machine Learning Project in Python(Step by Step Tutorial)
- Virtual assistant final year project ideas and guidelines
- Self-driving car github repositories and projects
- Self-Driving car research topics and guidelines
- Self-Driving car final year project ideas and guidelines
- Artificial Intelligence in Self Driving Car and how it works
How to plot confusion matrix for prefetched dataset in Tensorflow
How to plot confusion matrix for prefetched dataset in Tensorflow
Solution 1:
Disclaimer: this won't work for shuffled datasets. I will update this answer as soon as I can.
You can use tf.stack
to concatenate all the dataset values. Like so:
true_categories = tf.concat([y for x, y in test_dataset], axis=0)
For reproducibility, let's say you have a dataset, a neural network, and a training loop:
import tensorflow_datasets as tfds
import tensorflow as tf
from sklearn.metrics import confusion_matrix
data, info = tfds.load('iris', split='train',
as_supervised=True,
shuffle_files=True,
with_info=True)
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_dataset = data.take(120).batch(4).prefetch(buffer_size=AUTOTUNE)
test_dataset = data.skip(120).take(30).batch(4).prefetch(buffer_size=AUTOTUNE)
model = tf.keras.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(info.features['label'].num_classes, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
metrics='accuracy')
history = model.fit(train_dataset, validation_data=test_dataset, epochs=50, verbose=0)
Now that your model has been fitted, you can predict the test set:
y_pred = model.predict(test_dataset)
array([[2.2177568e-05, 3.0841196e-01, 6.9156587e-01],
[4.3539176e-06, 1.2779665e-01, 8.7219906e-01],
[1.0816366e-03, 9.2667454e-01, 7.2243840e-02],
[9.9921310e-01, 7.8686583e-04, 9.8775059e-09]], dtype=float32)
This is going to be a (n_samples, 3)
array because we're working with three categories. We want a (n_samples, 1)
array for sklearn.metrics.confusion_matrix
, so take the argmax:
predicted_categories = tf.argmax(y_pred, axis=1)
<tf.Tensor: shape=(30,), dtype=int64, numpy=
array([2, 2, 2, 0, 2, 2, 2, 2, 1, 1, 2, 0, 0, 2, 1, 1, 1, 2, 0, 2, 1, 2,
1, 0, 2, 0, 1, 2, 1, 0], dtype=int64)>
Then, we can take all the y
values from the prefetch dataset:
true_categories = tf.concat([y for x, y in test_dataset], axis=0)
[<tf.Tensor: shape=(4,), dtype=int64, numpy=array([1, 1, 1, 0], dtype=int64)>,
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([2, 2, 2, 2], dtype=int64)>,
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([1, 1, 1, 0], dtype=int64)>,
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([0, 2, 1, 1], dtype=int64)>,
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([1, 2, 0, 2], dtype=int64)>,
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([1, 2, 1, 0], dtype=int64)>,
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([2, 0, 1, 2], dtype=int64)>,
<tf.Tensor: shape=(2,), dtype=int64, numpy=array([1, 0], dtype=int64)>]
Then, you are ready to get the confusion matrix:
confusion_matrix(predicted_categories, true_categories)
array([[ 9, 0, 0],
[ 0, 9, 0],
[ 0, 2, 10]], dtype=int64)
(9 + 9 + 10) / 30 = 0.933
is the accuracy score. It corresponds to model.evaluate(test_dataset)
:
8/8 [==============================] - 0s 785us/step - loss: 0.1907 - accuracy: 0.9333
Also the results are consistent with sklearn.metrics.classification_report
:
precision recall f1-score support
0 1.00 1.00 1.00 8
1 0.82 1.00 0.90 9
2 1.00 0.85 0.92 13
accuracy 0.93 30
macro avg 0.94 0.95 0.94 30
weighted avg 0.95 0.93 0.93 30
Here's the entire code:
import tensorflow_datasets as tfds
import tensorflow as tf
from sklearn.metrics import confusion_matrix
data, info = tfds.load('iris', split='train',
as_supervised=True,
shuffle_files=True,
with_info=True)
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_dataset = data.take(120).batch(4).prefetch(buffer_size=AUTOTUNE)
test_dataset = data.skip(120).take(30).batch(4).prefetch(buffer_size=AUTOTUNE)
model = tf.keras.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(info.features['label'].num_classes, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
metrics='accuracy')
history = model.fit(train_dataset, validation_data=test_dataset, epochs=50, verbose=0)
y_pred = model.predict(test_dataset)
predicted_categories = tf.argmax(y_pred, axis=1)
true_categories = tf.concat([y for x, y in test_dataset], axis=0)
confusion_matrix(predicted_categories, true_categories)
Solution 2:
This code will work with shuffled tf.data.Dataset
y_pred = [] # store predicted labels
y_true = [] # store true labels
# iterate over the dataset
for image_batch, label_batch in dataset: # use dataset.unbatch() with repeat
# append true labels
y_true.append(label_batch)
# compute predictions
preds = model.predict(image_batch)
# append predicted labels
y_pred.append(np.argmax(preds, axis = - 1))
# convert the true and predicted labels into tensors
correct_labels = tf.concat([item for item in y_true], axis = 0)
predicted_labels = tf.concat([item for item in y_pred], axis = 0)
Thank you for reading the article.