Data is not write back from DMA

Hi everyone,

When I provide input, it continues to run without producing any results. So I interrupted the kernel and verified that the data was not read back to the PS. What exactly is the problem? Is it a memory map data width issue or something else? If anyone has any ideas on how to solve this problem, please share them with me.

thank You.

From where the code has been interrupted, it looks like you are waiting to receive all the data you expect from the DMA.

Some possible reasons for this are that “TLAST” is not being asserted in the data stream, or you are expecting to read more data than the DMA is sending.


Hi @cathalmccabe

The code had interrupted from the second arrow indication. I ma trying to give input uisng my CNN dimensions. i am able to load the weights into hardware but the inference part got stuck.

Yes, I am trying to send all my data through the DMA but I have checked my data dimensions and mentioned the HLS but still the output doesnt show up.

TLAST is there in my HLS code. I am not sure what could be that exact issue whether it would be in HLS or DMA buffer width.

Can you modify your code to send more data (through the sendchannel) before you try to read it through the recvchannel?
i.e. run sendchannel.transfer() a few times.
If you need to, you can remove the recvchannel.wait().
You can check the DMA status with register_map.


Hi @cathalmccabe

The DMA is working fine I have checked with the print statements. but the recvchannel.wait() was not finished. it keeps running without showing up any result.

I have tried to send more data through the sendchannel.transfer() , recvchannel.transfer() and it was working fine but the data is not passing through the sendchannel.wait().

To test I would remove your IP and do a loopback test between the DMA channels.
If this is OK then your IP is either not providing sufficient output data, or it is not setting TLAST. Either of these will cause your hardware pipeline to stall.
You can also insert an ILA on the output of your IP to test the output data stream.


1 Like

Hi Nagendra,
It would be easy to test out this with such tutorial as it is reading back accordingly to you question.
Or a small adjustment can also make to fit your application.
All detail settings and setups are show in the link.

Hi @cathalmccabe

Thank you for your ideas. I removed IP and tested DMA with AXI data stream FIFO, which worked perfectly. To load the data, I created a hierarchy for my CNN IP and DMA blocks. Later, I realized the data was not communicating with my IP because I removed the IP and tried to load weights on the DMA using FIFO, which worked, indicating that my IP was not communicating properly, and I have designed the block diagram using my CNN IP without hierarchy. when I ran overlay.ip dict, I couldn’t see my IP.

I have checked my design and it has TLAST

I am wondering why I can’t see my IP block in PYNQ.

Please see the below attachments.

Thank you.

design_1.pdf (194.2 KB)
design_2.pdf (206.4 KB)

Your design doesn’t have a memory mapped interface (i.e. AXI interconnect) so it can’t be controlled directly from the PS or PYNQ so it won’t show in the IP dict. Only memory mapped IP will show here.

As you only have AXI streams, you “control” your IP indirectly by sending and receiving data through the AXI DMAs.

Can you check if you receive any data from your IP (from the receive DMA)?
E.g. if you initialise the memory buffer where you will receive your results to all zeros, when you start the receive DMA, does it write anything back to the first few elements of the buffer?

You can put an ILA on the input and output AXI streams of your IP to check the AXI stream signals, and what data is actually being transferred. You would also be able to see if TLAST is actually being set.


Just watch the example video and replace the fir acc block which your HLS design for the CNN.
I think you are trying to work out some BNN or small CNN design here.
Which it is directly the same concept of this tutorial and extending the concept.
Had you tried the HLS test bench before including to the stream block and try simulation on the sanity of the stream interface via the stream verification IP.

design_fir.pdf (191.7 KB)

Hi @cathalmccabe @briansune

Thank you for your suggestions.

I am following this example “PYNQ-CNN-ATTEMPT/FPGA_CNN.ipynb at master · ZhaoqxCN/PYNQ-CNN-ATTEMPT · GitHub”.

  1. I have initialized the input buffer and output buffer to take the data from my dataset and print the result. The calculation part was correct, and it was transmitted through the DMA using my IP.
  2. I have tried to setup_debug after synthesis to see the TLAST in Vivado.
  3. I was unable to attach ILA to IP in the block diagram. because it doesn’t have mapped ports.

Please see the below attachments.

Screenshot from 2022-05-18 16-58-02

I’m not sure what you mean by this. You can connect an ILA in two main ways in IPI. You can add the ILA block manually and configure it and connect the signals you want.
You can automate this (and I would recommend this way). In IPI, in the diagram, right click on the signal. E.g. the whole AXI stream output from your IP. CLick “Mark Debug” and you should see a pop up prompting you to configure the ILA. The defualt config should be OK. You can increase the depth of the ILA buffers if you want, and configure all signals for data and trigger.


Had you verify the HLS IP via stream verify IP default included from xilinx to check the sanity?

Hi @briansune

Yes, I have designed the block diagram, but the simulation results are not successful.
design_10.pdf (25.1 KB)
ERROR: [USF-XSim-62] ‘elaborate’ step failed with error(s).

Well Nagendra this is on to yourself this stage, google a bit on the verification IP usage and debug.
Stream protocol should be simply enough to test out.
Enclose a short but best video I had.

Hi @briansune

Thanks for your suggestions.
sorry I was checking HLS and PYNQ and became confused as to where I went wrong. I’ll look into that video.

Hi @cathalmccabe @briansune

I have printed the sendchannel.wait() and recvchannel.wait(). The buffer size is 4096, and my data read size is 4098. it expects 4096. thats why it keeps running without a result.

Can you guys tell me where I should modify to adjust to 4096?

I have printed example data and the size is 4096.

Example output:
Screenshot 2022-05-24 at 12-39-34 FPGA_CNN - Jupyter Notebook

Screenshot 2022-05-24 at 12-53-37 FPGA_CNN - Jupyter Notebook

Screenshot 2022-05-24 at 12-40-03 FPGA_CNN - Jupyter Notebook

My data :
Screenshot 2022-05-24 at 12-39-07 FPGA_CNN - Jupyter Notebook

How do you allocate the memory?
in_buff = allocate(shape=(n,), dtype=np.int32)
out_buff = allocate(shape=(n,), dtype=np.int32)

Hi @briansune
Thank you for your reply.
I have allocated the same as you mentioned, but I used int16.

in_buff = allocate(shape=(n,), dtype=np.int16)
out_buff = allocate(shape=(n,), dtype=np.int16)

The shape #n is set to?