- PYNQ Z1 board & vivado, vitis hls 2020.2 version
I’m following this tutorial :
Lab: Axistream Multiple DMAs (axis)
the hls code is quite old, so I change the streamAdd.cpp as below.
streamAdd.hpp is not needed for this code.
#include "ap_axi_sdata.h"
#include "hls_stream.h"
void sadd(hls::stream< ap_axis<32,0,0,0> > &INPUT1, hls::stream< ap_axis<32,0,0,0> > &INPUT2,
hls::stream< ap_axis<32,0,0,0> > &OUTPUT, unsigned int length){
//#pragma HLS INTERFACE s_axilite port=return bundle=CTRL
#pragma HLS INTERFACE s_axilite port=length bundle=CTRL
#pragma HLS INTERFACE axis depth=50 port=OUTPUT
#pragma HLS INTERFACE axis depth=50 port=INPUT1
#pragma HLS INTERFACE axis depth=50 port=INPUT2
ap_axis<32,0,0,0> curl1;
ap_axis<32,0,0,0> curl2;
for(unsigned int i=0; i<length; i++){
INPUT1.read(curl1);
INPUT2.read(curl2);
if(curl1.last || curl2.last) break;
cur1.data = cur1.data + cur2.data;
OUTPUT.write(curl1);
}
}
The settings for vivado are same except for HP port usage.
HP0 is for sadd_dma1 (read INPUT1 and write OUTPUT).
HP2 is for sadd_dma2 (only read INPUT1).
the block diagram is below.
Also, there were some warning message with yellow circled exclamation mark that there are some unused part, but I ignored because the bitstream is generated and validation of this block design is passed.
pynq library is quite old for this tutorial, so I wrote new code
import time from pynq import Overlay import pynq.lib.dma from pynq import allocate import numpy as np from pynq import MMIO import random ol = Overlay("/home/xilinx/jupyter_notebooks/multiple_dma_sadd/sadd.bit") ol.download() # this downloads your bitstream into FPGA dma1 = ol.streamAdd.sadd_dma1 # first DMA. Note that we had to access the hierarchy before accessing the DMA dma2 = ol.streamAdd.sadd_dma2 # second DMA #sadd_ip = MMIO(0x40000000, 0x10000) # we got this address from sadd_ip = ol.streamAdd.sadd length = 8 in_buffer1 = allocate(shape=(length,), dtype=np.int32) # input buffer 1 in_buffer2 = allocate(shape=(length,), dtype=np.int32) # input buffer 2 out_buffer = allocate(shape=(length,), dtype=np.int32) # output buffer samples = random.sample(range(0, length), length) np.copyto(in_buffer1, samples) samples = random.sample(range(0, length), length) np.copyto(in_buffer2, samples) sadd_ip.write(0x10, length) #we got this address from Vivado source. Since we didn't do port=return, and we set a constant for ap_start, we only have to write length. t_start = time.time() dma1.sendchannel.transfer(in_buffer1) dma2.sendchannel.transfer(in_buffer2) dma1.recvchannel.transfer(out_buffer) dma1.sendchannel.wait() dma2.sendchannel.wait() #dma1.recvchannel.transfer(out_buffer) dma1.recvchannel.wait() t_stop = time.time() in_buffer1.close() in_buffer2.close() out_buffer.close() print('Hardware execution time: ', t_stop-t_start) for i in range(0, length): print('{}+{} = {}'.format(in_buffer1[i], in_buffer2[i], out_buffer[i]))
However, the problem is that, “dma1.recvchannel.wait()” code keeps waiting, and it is stuck here.
I keep figuring out what problem is, but I can’t find it