Accuracy of Resnet50 is over than 100% on PYNQ-DPU

Hello,

I am working on the PYNQ DPU on zcu104. I retrained Resnet50 on the MNIST dataset, and I got an accuracy of 97.29%. However, after running the obtained Xmodel on ZCU104, I got an accuracy of 101.83 % .

I don’t know what’s the problem?

1 Like

Hi there,

Can you share some more info/code/model input/output shapes? I assume you just swapped the output layer of resnet for the 10 classes of mnist? Did you change anything in the inference notebook?

Thanks
Shawn

For the training of Resnet 50 I used the following code

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
train_images_gr = x_train.reshape(x_train.shape[0], 28, 28, 1)
train_images_3ch = np.stack([x_train]*3, axis=-1)
import cv2

def resize_image_array(img, img_size_dims):
img = cv2.resize(img, dsize=img_size_dims,
interpolation=cv2.INTER_CUBIC)
img = np.array(img, dtype=np.float32)
return img

IMG_DIMS = (32, 32)
X_train = np.array([resize_image_array(img, img_size_dims=IMG_DIMS) for img in train_images_3ch])
Y_train =np.eye(10)[y_train]
input = tf.keras.Input(shape=(32,32,3))
efnet = tf.keras.applications.ResNet50(weights=‘imagenet’,
include_top = False,
input_tensor = input)
gap = tf.keras.layers.GlobalMaxPooling2D()(efnet.output)

output = tf.keras.layers.Dense(10, activation=‘softmax’, use_bias=True)(gap)
func_model = tf.keras.Model(efnet.input, output)

func_model.compile(optimizer=‘adam’,
loss=“categorical_crossentropy”,
metrics=[‘accuracy’])

func_model.fit(X_train, Y_train, epochs=5, steps_per_epoch = 60000//32)

And for the deployement of the final Xmodel using PYNQ DPU on ZCU104, I used the following

from pynq_dpu import DpuOverlay
overlay = DpuOverlay(“dpu.bit”)
overlay.load_model(“Resnet50_test.xmodel”)
from time import time
import numpy as np
import mnist
import matplotlib.pyplot as plt
%matplotlib inline
from six.moves import urllib

opener = urllib.request.build_opener()
opener.addheaders = [(‘User-agent’, ‘Mozilla/5.0’)]
urllib.request.install_opener(opener)
with np.load(“mnist.npz”, allow_pickle=True) as f:
x_test, y_test = f[‘x_test’], f[‘y_test’]

test_images_gr = x_test.reshape(x_test.shape[0], 28, 28, 1)
test_images_3ch = np.stack([x_test]*3, axis=-1)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
import cv2

def resize_image_array(img, img_size_dims):
img = cv2.resize(img, dsize=img_size_dims,
interpolation=cv2.INTER_CUBIC)
img = np.array(img, dtype=np.float32)
return img

IMG_DIMS = (32, 32)
test_data = np.array([resize_image_array(img, img_size_dims=IMG_DIMS) for img in test_images_3ch])
test_label=np.eye(10)[y_test]
print(‘Test_images.shape: {}, of {}’.format(test_data.shape, test_data.dtype))
print(‘Test_label.shape: {}, of {}’.format(test_label.shape, test_data.dtype))
dpu = overlay.runner

inputTensors = dpu.get_input_tensors()
outputTensors = dpu.get_output_tensors()

shapeIn = tuple(inputTensors[0].dims)
shapeOut = tuple(outputTensors[0].dims)
outputSize = int(outputTensors[0].get_data_size() / shapeIn[0])

softmax = np.empty(outputSize)
output_data = [np.empty(shapeOut, dtype=np.float32, order=“C”)]
input_data = [np.empty(shapeIn, dtype=np.float32, order=“C”)]
image = input_data[0]
def calculate_softmax(data):
result = np.exp(data)
return result
total = test_data.shape[0]
predictions = np.empty_like(test_label)
print(“Classifying {} digit pictures …”.format(total))

start = time()
for i in range(total):
image[0,…] = test_data[i]
job_id = dpu.execute_async(input_data, output_data)
dpu.wait(job_id)
temp = [j.reshape(1, outputSize) for j in output_data]
softmax = calculate_softmax(temp[0][0])
predictions[i] = softmax.argmax()

stop = time()
correct = np.sum(predictions==test_label)
execution_time = stop-start
print(“Overall accuracy: {}”.format(correct/total))
print(" Execution time: {:.4f}s".format(execution_time))
print(" Throughput: {:.4f}FPS".format(total/execution_time))

Train_images.shape: (60000, 32, 32, 3), of float32
Train_label.shape: (60000, 10), of float32
Test_images.shape: (10000, 32, 32, 3), of float32
Test_label.shape: (10000, 10), of float32

In the future please use code blocks for code instead of blockquote… Will make it much easier for a reader to parse. I’m a bit confused why there’s 3 channels on the last dimension on your mnist data?

When you run this code does it not give you any warnings? I would double check all the shapes of your intermediary arrays to make sure there’s no unexpected behavior when doing matrix operations. For example, you shouldn’t need to format your test labels into one-hot encoded representation with np.eye(10)[y_test] when the output from the dpu is already softmax’d and the maximum argument taken.

Thanks
Shawn

Dear Shawn,

Thank you for your prompt reply!
Ok I will and sorry for that.
There is 3 channels because I am using Resnet50 and I have read here that for Resnet50 we should resize MNIST to be (32,32,3).
Actually when running the part of predictions I got the following warning /usr/lib/python3/dist-packages/ipykernel_launcher.py:2: RuntimeWarning: overflow encountered in exp .
Considering the formatting of test labels into one-hot encoded representation with np.eye(10)[y_test] , I have do it to keep the same preprocessing as in the training which is needed as I told you when retraining Resnet50 with MNIST dataset.
I have checked all the shapes of the intermediary arrays as you mentionned and I found that the output_data shape is (1, 1, 1, 1, 10) which I think should be (1,1,10). But I don’t know what’s causing this and how to solve it.

Thanks

I have changed test_label= y_test instead of np.eye(10)[y_test] and I got an accuracy of 98.52%. I don’t know if what I did is correct ?

That seems like a fairly common result for MNIST. I suppose the extra dimensions by doing the eye operation is what mucked up the original calculation.

Thanks
Shawn

I was wondering if it is correct because I have had an eye operation during the training.

1 Like