Data is not write back from DMA

No, I am following this example.

https://github.com/ZhaoqxCN/PYNQ-CNN-ATTEMPT/blob/master/Minst-CNN/FPGA_CNN.ipynb

Like this

    input_mat = test_data[0:batch_size]
    input_val = np.append([0, batch_size, 0, input_ch, input_dim, output_ch, output_dim], input_mat.ravel())
    in_buffer = allocate(shape=(input_val.shape[0]), dtype=np.int16)
    out_buffer = allocate(shape=(7 + output_ch * batch_size * output_dim * output_dim), dtype=np.int16)

I didn’t get the entire picture here.
I need more info on the DMA block of your design

Here I am using a stream data width of 16 bit because my stream input and output are 16 bit.

input_val.shape[0]
[0] it represents the dataset dimension

Nagendra,

I am not sure why you got a complete project from Github and you need to redo from yourself.
Just try this and see any different:
Minst-CNN.xpr.zip (690.6 KB)

Hi,

I just took that code as an example and I modified it according to my algorithm specifications in the HLS.
I have changed to int32 and it starts printing 4096, but it still keeps running without showing any result.

Screenshot 2022-05-24 at 16-39-01 FPGA_CNN - Jupyter Notebook (1)

Can you open my enclosed project?
The DMA expected settings and the approach is different from yours

BTW I will always evaluate others work before starting my own as this give a better understand how and what need to pay attention to and what limitations are.

Hi

Thanks for your reply. I am able to run the example one on different boards but not on PYNQ Z2.
I am using a ZCU111 board and it has a Zynq Ultrascale +. I modified the DMA parameters but still the result doesn’t show up.

The ZYNQ U+ burst length and AW user and AD user data width is not matching with my DMA. it shows warnings

Please find the below attachments.

file2.pdf (62.1 KB)

design_1.pdf (175.9 KB)

So Nagendra did you synthesis the project:
Minst-CNN.xpr.zip (690.6 KB)
And check if the sanity of the repository is good?

Yes, I had generated the bitstream and verified it with minst data.
Actually, the repository used less than 2.7 PYNQ image.

do i need to modify this ?

self.axi_dma_0.sendchannel.transfer(in_buffer)
self.axi_dma_0.recvchannel.transfer(out_buffer)
self.axi_dma_0.sendchannel.wait()
self.axi_dma_0.recvchannel.wait()

Like

dma = overlay.cnn.axi_dma_0

dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()

It looks like you have a mismatch in the data widths of your DMA/IP. I see 128 (DMA MM data width) /64/32/16 (streams) at different places in your design.

What are your data widths in your design?
Try to match the data widths of the DMA MM ports to the Zynq interfaces. They DMA can remap your data width to the width of your stream, but you should think about the number of data transfers.

Did you try add an ILA like I suggested earlier?

You could also try replace the Smart Interconnect with a AXI interconnect. Sometimes there can be issues with Smart Connect.

If you see more detail the address map is excluded.
And the default project does not use High-performance AXI



Hi

My streaming data width size is 16 bit and I gave the 32 bit data width for ZYNQ slave and master ports.

I am using LPD instead of HPD but still there is unmapped ports and I cant avoid that. any suggestions to map this port ?

I have attached an ILA to my IP. This is a simulation results.

Block diagram

design_1.pdf (186.2 KB)

my streaming input and output data width is 16 bit

Address

But still the recvchannel.wait() not finished yet.

This is the ZYNQ or ZYNQ ultra?

I think you need to step by step ensure each part:

Separate part A)
1st HLS tlast syntax is added and did you tried the I/O sanity of the main.cpp to check all layers are normal?
2nd Any violations when the HLS is completed?

Separate part B)
1st do a simple dummy data test on the DMA by replacing the HLS just a simple fifo feedforward (Did this success or not)
2nd replace the fifo to HLS block and check this is normal or not

If the basic fifo cannot grep all data you don’t need to consider time on the HLS block as the basic communication blocks are not sure yet.