Wrong Input Tensor Shape with Custom DPU design

Hello Guys,

I created a “custom” DPU design, which basiclly just adds some GPIO. The added GPIO works.

But if I load my .xmodel and load the InputTensors I get the wrong Dimensions. I recompiled my custom NN Models with both of these arch.json files .
/ZCU104/binary_container_1/link/vivado/vpl/prj/prj.gen/sources_1/bd/dpu/ip/dpu_DPUCZDX8G_1_0/arch.json

/ZCU104/binary_container_1/link/vivado/vpl/prj/prj.gen/sources_1/bd/dpu/ip/dpu_DPUCZDX8G_2_0/arch.json

Both yield the same wrong Input Tensors. I am using the example mnist notebook and just added my .xmodel and .bit

This is my Error:

ValueError                                Traceback (most recent call last)
<ipython-input-19-adf53e77cd67> in <module>
      2 
      3 for i in range(num_pics):
----> 4     image[0,...] = test_data[i]
      5 
      6     job_id = dpu.execute_async(input_data, output_data)

ValueError: could not broadcast input array from shape (28,28,1) into shape (4,14,14)

The Notebook works with the dpu_mnist_classifier.xmodel, but not with my own compiled model. What could cause this issue?

Edit: This is only an issue for this model, so far:

def create_fc_small(input_shape,output_shape):
    x = x_in = Input(input_shape, name='input_1_m')
    x = Flatten(name='flatten_1_m')(x)
    x = Dense(16, name='dense_1_m')(x)
    x = Activation("relu", name='act_1_m')(x)
    x = Dense(128, name='dense_2_m')(x)
    x = Activation("relu",name='act_2_m')(x)
    x = Dense(output_shape, name='dense_3_m')(x)
    x = Activation("softmax", name='act_3_m')(x)
    
    model = Model(inputs=[x_in], outputs=[x])

    return model

A CNN is working like intended.

Edit2:

It works if I just reshape my Data like this: x_test = x_test.reshape(10000,4,14,14), but that cant be intended right?

Greetings Henning

Hi Henning,

What does your compilation output look like (from vai_c_tensorflow)? Also I would sanity check the model.summary(), to make sure the shapes are as intended.

Thanks
Shawn

So how I am creating the Model is seen above. I just found out about Netron and my Graph looks like this. I think the first flatten layer is messing with the graph.

I also found this image in the Vitis AI docu:


This does not actually belong here but is it safe for me to assume that uploads are to the DPU and processed there and downloads are to the CPU and processed there? So in my chase only the softmax is processed on the CPU?

For my CNN:


The Graph looks weird as well. Because there the Flatten layer is also processed twice?

Also why are Dense Layer represented as Conv2d layers?

Hi there,

Is this still related to the same problem you were originally posting on the output shapes being wrong when running in the MNIST notebook? It would be helpful to see the vai_c_tensorflow output in that case.

I believe that under the hood the Vitis AI compiler treats all layers as convolutional layers (converts dense to conv, since the DPU is a CNN accelerator). For specific Vitis AI questions like this I would recommend you ask the Vitis AI maintainers or read the docs on the Vitis AI github.

Thanks
Shawn

Yes its still related to that issue.

This is the Ouput of the compiler:

[INFO] Namespace(batchsize=1, inputs_shape=None, layout='NHWC', model_files=['final_vitis/quant_model/quant_fc_small.h5'], model_type='tensorflow2', named_inputs_shape=None, out_filename='/tmp/fc_small_org.xmodel', proto=None)
[INFO] tensorflow2 model: /workspace/final_vitis/quant_model/quant_fc_small.h5
[INFO] keras version: 2.4.0
[INFO] Tensorflow Keras model type: functional
[INFO] parse raw model     :100%|███████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 4662.64it/s]                  
[INFO] infer shape (NHWC)  :100%|███████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 542.95it/s]                 
[INFO] perform level-0 opt :100%|███████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 605.24it/s]                   
[INFO] perform level-1 opt :100%|███████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2730.67it/s]                  
[INFO] generate xmodel     :100%|███████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 7500.99it/s]                
[INFO] dump xmodel: /tmp/fc_small_org.xmodel
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA0_B4096_MAX_BG2
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: function
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA0_B4096_MAX_BG2
[UNILOG][INFO] Graph name: functional_1, with op num: 24
[UNILOG][INFO] Begin to compile...
[UNILOG][INFO] Total device subgraph number 4, DPU subgraph number 1
[UNILOG][INFO] Compile done.
[UNILOG][INFO] The meta json is saved to "/workspace/final_vitis/compiled_zcu104/fc_small/meta.json"
[UNILOG][INFO] The compiled xmodel is saved to "/workspace/final_vitis/compiled_zcu104/fc_small//fc_small.xmodel"
[UNILOG][INFO] The compiled xmodel's md5sum is 55f73ce7fb950faf4bde9402a1a4977e, and has been saved to "/workspace/final_vitis/compiled_zcu104/fc_small/md5sum.txt"

and I am compiling like this:

compile(){
    vai_c_tensorflow2 \
        --model             final_vitis/quant_model/"$quant_model" \
        --arch              /opt/vitis_ai/compiler/arch/DPUCZDX8G/ZCU104/arch.json \
        --output_dir        final_vitis/compiled_zcu104/"$model"/ \
        --net_name          "$model"
}

compile

Right… I’m not 100% certain but I think this may just be a limitation of the DPU – it’s primarily built for CNN acceleration, and doesn’t play very nice with purely dense networks. I do vaguely remember similar issues coming up on the Xilinx forums, if you look up keywords like “square shape” or “dense layer first” you might come across some of these.

If you simply add a conv layer before your first flatten it should work with no problems.

from tensorflow.keras.layers import Input, Flatten, Dense, Activation, Conv2D
from tensorflow.keras.models import Model

def create_fc_small(input_shape,output_shape):
    x = x_in = Input(input_shape, name='input_1_m')
    x = Conv2D(16, (3,3), name='conv')(x)
    x = Flatten(name="flatten")(x)
    x = Dense(16, name='dense_1_m')(x)
    x = Activation("relu", name='act_1_m')(x)
    x = Dense(128, name='dense_2_m')(x)
    x = Activation("relu",name='act_2_m')(x)
    x = Dense(output_shape, name='dense_3_m')(x)
    x = Activation("softmax", name='act_3_m')(x)
    
    model = Model(inputs=[x_in], outputs=[x])

    return model

model = create_fc_small(input_shape=(28,28,1), output_shape=(10))

Thanks
Shawn

Okay thank you! I cant to that because I am comparing Vitis AI to hls4ml and an embedded GPU so I want pure Dense networks in my testbench aswell, but the reshaping of my input works just fine, so I will just mention that Vitis doesnt work well with pure Dense Networks. Thanks for your help! :slight_smile: