Load data to IP via DMA

I try to load data(numpy array) to my IP for the purpose of reuse between several operations. However, when I try to validate the writing correctness, I stream out the cached data via DMA, the whole process hanged at dma.recvchannel.wait(). My platform is Ultra96-v2 and PYNQ version is 2.5.

When streaming the data to IP, my IP has a control port which can be configured to write data to BRAM via DMA in a streaming manner. The python code is as follow,

conv_accel.write(0x10, 13)
conv_accel.write(0x0, 0)
conv_accel.write(0x0, 1)

I have also monitored the control signal of my IP(axi_lite), I can read the signal initialized as 4(idle) and change to 1 right after I write 1 to 0x00, then the status stay 1 and never changed. This “ipython block” can complete execution(not stuck on wait()). I noticed that this is different from my several previous practices, which initialized as 4 and changed to 6(done+idle) than changed to 4 at final.

When streaming data from IP to PS, the Python code is,

conv_accel.write(0x10, 0)
conv_accel.write(0x0, 1)
dma_recv.wait()  # process hang on here.

Where I configure the IP as streaming out the cached data, and trigger IP to start by writing 1 to 0x00, then call the DMA to work. This “ipython block” is stuck at dma.recvchannel.wait().

My top function in C++ is as follows

 * read to Channel * (k_dim*k_dim), 16x9
void write_weight(stream<axis_t> &in){

	for(int i = 0; i < KERNEL_DIM*KERNEL_DIM; i++){
		// read stream data, one column
		axis_t data_in =;
		for(int j = 0; j < IFM_CHANNEL; j++){
#pragma HLS UNROLL
			ap_uint<PRECISION> temp * PRECISION-1, j*PRECISION);
			weights[j][i] = temp;

void DoCompute(stream<axis_t> &in, stream<axis_t> &out, int control){
#pragma HLS INTERFACE s_axilite port=return bundle=control_bus
#pragma HLS INTERFACE s_axilite port=control bundle=control_bus
#pragma HLS INTERFACE axis register both port=out
#pragma HLS INTERFACE axis register both port=in

	if(control == 13){

		for(int j = 0; j<KERNEL_DIM*KERNEL_DIM; j++){
			axis_t data_out;
			for(int i = 0; i < IFM_CHANNEL; i++){*PRECISION-1, i*PRECISION) = weights[i][j];
			data_out.last = (j == KERNEL_DIM*KERNEL_DIM-1? 1:0);

My questions are

  1. Is there any issue in my top function? I refer to the implementation from SpooNN.
  2. I think it would be help to monitor on the AXI streaming interface to my IP, is there any way to do this by using PYNQ?


I have exactly the same problem. Seems that something is broken with the DMA on PYNQ V2.5. I ran a working script on PYNQ 2.4 on ultra96 and the same script hangs on DMA receive on V2.5

I believe I am also having the same issue. I will try loading v2.4 and report back if that fixes my issue

I can also confirm, my issue went away when I dropped to 2.4 and removed the wait calls

1 Like