AXI DMA not working on PYNQ-Z1

Hi there,
I am trying to run a toy example for AXI DMA on PYNQ-Z1.
I use a simple AXI4-Stream DATA FIFO with the DMA, so that I receive in the PS what I send.
I tried 4 different topologies, and none of them worked for me:

  • In the first trial (1.pdf (62.5 KB) ), I used the simple DMA mode (Direct Register Mode) without any interrupt. In this case, dma.sendchannel.wait() hangs forever.

  • In the second trial (2.pdf (70.4 KB) ), I used the same topology. However, The DMA is in the Scatter-Gather Mode without any interrupt. Same thing; dma.sendchannel.wait() hangs forever.

  • In the third trial (3.pdf (98.5 KB) ), I enabled the interrupt on PS, and connected both mm2s_introut and s2mm_introut from AXI DMA to the IRQ_F2P of the PS using a concat block. In this case, I got the following error even before I reach dma.sendchannel.wait()

Traceback (most recent call last):
File “”, line 11, in
dma = FIFO.axi_dma_0
File “/usr/local/lib/python3.6/dist-packages/pynq/”, line 327, in getattr
return getattr(self._ip_map, key)
File “/usr/local/lib/python3.6/dist-packages/pynq/”, line 585, in getattr
driver = ipdescription’driver’
File “/usr/local/lib/python3.6/dist-packages/pynq/lib/”, line 190, in init
File “/usr/local/lib/python3.6/dist-packages/pynq/”, line 517, in init
setattr(self, interrupt, Interrupt(details[‘fullpath’]))
File “/usr/local/lib/python3.6/dist-packages/pynq/”, line 98, in init
File “/usr/local/lib/python3.6/dist-packages/pynq/”, line 159, in get_controller
ret = _InterruptController(name)
File “/usr/local/lib/python3.6/dist-packages/pynq/”, line 177, in init
self.mmio = MMIO(PL.ip_dict[name][‘phys_addr’], 32)
KeyError: ‘’

  • I solved this error by moving to the fourth trial (4.pdf (78.5 KB) ) in which I use an AXI Interrupt Controller to connect the concat block with IRQ_F2P in PS. In this case, the same old symptom showed up again: dma.sendchannel.wait() hangs forever !

I use Vivado 2019.2 for system integration.

The following Python code runs on PS:

import numpy
from pynq import Xlnk
from pynq import Overlay
from pynq import MMIO

FIFO = Overlay('fifo_buffer.bit')

dma = FIFO.axi_dma_0

SIZE = 2

xlnk = Xlnk()
input_buffer = xlnk.cma_array(shape=(SIZE,), dtype=numpy.uint32)
output_buffer = xlnk.cma_array(shape=(SIZE,), dtype=numpy.uint32)
input_range = numpy.arange(SIZE, dtype=numpy.uint32)
for i in range(SIZE):
    input_buffer[i] = i
import time
start_time = time.time()
print('finished copying')

print('finished sendchannel.transfer')

print('finished recvchannel.transfer')

print('finished sendchannel.wait()')

print('finished recvchannel.wait()')

stop_time = time.time()

hw_exec_time = stop_time-start_time

print('Hardware Execution Time: ', hw_exec_time)
data_out = numpy.copy(output_buffer)



Even I tried to set the device to xc7z020clg400-1 and to set the board to PYNQ-Z1 but no chance.
Any Idea on how to make this implementation work?

Thanks !

1 Like

Did you try the example here:



Yes I tried that (as you can see in my main post above), but it didn’t work for me.

Looks like you edited your post since I replied.

You have only connected the stream side of your IP in the diagram in 1.pdf to the FIFO. There is no connection to DRAM.

Your code tells the DMA to read data from the “input_buffer”, located in PS DRAM, and write it to the DMA stream interface M_AXIS_MM2S. The write accepts data from the DMA stream S_AXIS_S2MM, and writes it back to output_buffer, in PS DRAM. In your design, but you don’t have a connection from the DMA to the DRAM.

You need to enable 1 or 2 HP ports on the Zynq PS, and connect both ports M_AXI_* on the DMA to the HP port(s).


1 Like

( you’re right, I posted by mistake before the post was complete :slight_smile: )

Okay that was something the automatic connection did without a notice, but in the other trials (2,3,4) the DMA was correctly connected to the DRAM but didn’t work neither.

However, I tried to reconnect the first trial as shown here ( 1_modified.pdf (69.0 KB) ) and it worked !

Looks like the SG mode is the problem (needs some special handling ? )

I also tried to use the interrupt without interrupt controller and without SG ( 5.pdf (71.7 KB) ) and I was getting the same error as the third trial.

Furthermore, I tried to use the interrupt with interrupt controller and without SG ( 6.pdf (77.1 KB) ) and it’s working again.

So now I am sure to some extent that the SG is the reason behind this.
Does anybody know why ?


and thanks @cathalmccabe for the trick

Scatter gather is not currently supported.


But for some reason this is not mentioned here:
it only says (The DMA class supports simple mode only) and it’s not really clear that this is referring to Direct Register Mode. I would suggest to clearly state that Scatter/Gather is not currently supported.

As a summary, here are my trials:

The corresponding designs can be found here:
1.pdf (62.5 KB) 1_modified.pdf (69.0 KB) 2.pdf (70.4 KB) 3.pdf (98.5 KB) 4.pdf (78.5 KB) 5.pdf (71.7 KB) 6.pdf (77.1 KB)

Here are my notes:
1- Don’t use Scatter/Gather.
2- If you want to use interrupt, use AXI Interrupt Controller.
3- if you’re making your custom IP in HLS, the input and output ports should be axis interfaces, while the return port should be s_axilite:

#pragma HLS INTERFACE axis port=input_stream
#pragma HLS INTERFACE axis port=output_stream
#pragma HLS INTERFACE s_axilite port=return bundle=CTRL_BUS

Afterwards, you should MMIO your CTRL_BUS address and only set the ap_start bit without waiting for the ap_idle or ap_done, and you should give the correct amount of input, otherwise it will hang.


1 Like

Hi again,
In DMA documentation it’s mentioned that TLAST signal must be set in order for the transaction to complete the transaction.
If I want to use my own IP (generated by Vivado HLS), how to set the AXIS signals (TLAST, TUSER, TID, TKEEP, TSTRB, TDEST) ?
I guess when I send the last data chunk in the stream, I have to set TLAST to 1, otherwise 0. Is that correct ?
What about the other signals ? Especially that the data rates of input and output streams are different. For example, for each n input, I would like to generate m output using some processing function.
How to set these signals properly please ?
Thanks !


If using HLS, for most IP, you can pass through USER and LAST, and ignore the others.

This is one way to do it; pass through LAST (or set it when you want to set it in a conditional statement)

I’m not sure I fully understand this.
You set the control signals on the output. This can be a different data rate to the input. Does this answer your question?


Hi Cathal,
thank you for you answer.
What I meant is that I don’t necessarily write an output for each input, I might only write an output for every set of inputs.
In this case:

  • I don’t know how to pass the the signals from input to output (especially that I have more than a sub-function in the top level function … should I pass them through each sub-function?).
  • If I want to set them by myself, how to do this manually without passing them through? I mean what value should TUSER have ? what value should TKEEP have? and so on …

You can write values to these signals inside conditions in your HLS code.


if (i==16){
   last = 1;
   last = 0;

USER is a user defined signal, so you don’t need to set it to anything. If your IP doesn’t use it, and you know any downstream IP doesn’t use it, then you can ignore it. Otherwise, it is good practice to pass through.

TKEEP is essentially byte enable. If you are writing HLS, you don’t need to set this. If this is your own IP, and you don’t make use of byte enables you can also ignore this.


1 Like