Problem with xfOpenCV "Standalone_HLS_AXI_Example" overlay

Hi, I’m trying to get my hands on image processing so I decided to start making an overlay with this example of the library xfOpenCV, I should also mention that I’m working with the Ultra96v1 board and PYNQ v2.5, although I also tried with the previous version of PYNQ (v2.4).

Well, the fact is that I’m having problems getting this working, and as I have no previous experience with PYNQ, HLS or Vivado IP Integrator, I’ll be adding all the information that I consider may be helpful to diagnose what is it that I’m doing wrong.

First of all, I modified a little bit the “xf_ip_accel_app.cpp” file to make it have a proper interface to be handled from the address space of the Zynq platform. This is how my actual file is:

#include "xf_dilation_config.h"

void dilation_accel(xf::Mat<TYPE, HEIGHT, WIDTH, NPC1> &_src,xf::Mat<TYPE, HEIGHT, WIDTH, NPC1> &_dst, unsigned char kernel[FILTER_SIZE*FILTER_SIZE]);

void ip_accel_app(hls::stream< ap_axiu<8,1,1,1> >& input_stream,hls::stream< ap_axiu<8,1,1,1> >& output_stream,int height,int width, unsigned char kernel[FILTER_SIZE*FILTER_SIZE])
{
#pragma HLS INTERFACE s_axilite port=kernel bundle=CNTRL_BUS
#pragma HLS INTERFACE s_axilite port=width bundle=CNTRL_BUS
#pragma HLS INTERFACE s_axilite port=height bundle=CNTRL_BUS
#pragma HLS INTERFACE axis port=output_stream
#pragma HLS INTERFACE axis port=input_stream
#pragma HLS INTERFACE s_axilite port=return bundle=CNTRL_BUS

	 xf::Mat<TYPE, HEIGHT, WIDTH, NPC1> imgInput1(height,width);
	 xf::Mat<TYPE, HEIGHT, WIDTH, NPC1> imgOutput1(height,width);

#pragma HLS stream variable=imgInput1.data dim=1 depth=1
#pragma HLS stream variable=imgOutput1.data dim=1 depth=1
	#pragma HLS dataflow

	xf::AXIvideo2xfMat(input_stream, imgInput1);

	 dilation_accel(imgInput1,imgOutput1, kernel);

	xf::xfMat2AXIvideo(imgOutput1, output_stream);


}

It simulates, synthesizes, and cosimulates without errors, below is the waveform of the input and output streams. As it seems, at least under my inexperienced eyes, all is working as it should be expected.

Then, I export the design as IP and create a block design with it around the Zynq UltraScale+ in order to get the bistream and .tcl .hwh files for the overlay. Here are both my block design:

The design is validated without any critical warnings, and the same goes to synthesis and implementation. I have 56 critical warnings in the implementation step, though. They refer to the constraints file that I added for the Ultra96, so I don’t think it to be the problem because I’ve tried simpler designs (from here to be precise) with the same constraints files (and the same block design) and they worked without problems.

Once I have all the files required for the overlay, It’s time to go the Ultra96, first of all, when I try to access the “foo.register_map” property of my ip, I get the following error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-affb02eb4e83> in <module>()
----> 1 dilate_ip.register_map

/usr/local/lib/python3.6/dist-packages/pynq/overlay.py in register_map(self)
    623                 self._register_map = RegisterMap.create_subclass(
    624                         self._register_name,
--> 625                         self._registers)(self.mmio.array)
    626             else:
    627                 raise AttributeError(

/usr/local/lib/python3.6/dist-packages/pynq/registers.py in __init__(self, buffer)
    424                 continue
    425             self._instances[k] = v[0](
--> 426                 address=v[1], width=align_width, buffer=array)
    427 
    428     def _set_value(self, value, name):

/usr/local/lib/python3.6/dist-packages/pynq/registers.py in __init__(self, address, width, debug, buffer)
    101 
    102         self.address = address
--> 103         self.width = width
    104         self.debug = debug
    105 

/usr/local/lib/python3.6/dist-packages/pynq/registers.py in _reordered_setitem(self, value, index)
    232 
    233         """
--> 234         return self.__setitem__(index, value)
    235 
    236     def __str__(self):

/usr/local/lib/python3.6/dist-packages/pynq/registers.py in __setitem__(self, index, value)
    186         """
    187 
--> 188         curr_val = self._buffer[0]
    189         if isinstance(index, int):
    190             if value != 0 and value != 1:

AttributeError: 'Registerwidth' object has no attribute '_buffer'

Then, after setting the parameters of the block (width, height, and kernel), and starting de block with the value 0x81 (I also tried 0x1). The DMA stops when I wait for the receive channel to finish the job. This is the exact snippet where it freezes:

input_buffer[:] = frameF[:]
print(“Input buffer shape: \n”+str(input_buffer.shape)+“\n”)

io_stream.recvchannel.stop()
io_stream.sendchannel.stop()

io_stream.sendchannel.start()
io_stream.recvchannel.start()

print("Send channel idle: "+str(io_stream.sendchannel.idle))
print("Receive channel idle: "+str(io_stream.recvchannel.idle))

io_stream.sendchannel.transfer(input_buffer)
io_stream.recvchannel.transfer(output_buffer)
io_stream.sendchannel.wait()
io_stream.recvchannel.wait()

print("Send channel running: "+str(io_stream.sendchannel.running))
print("Receive channel running: "+str(io_stream.recvchannel.running))
print("Send channel idle: "+str(io_stream.sendchannel.idle))
print("Receive channel idle: "+str(io_stream.recvchannel.idle))
print("Send channel first transfer: "+str(io_stream.sendchannel._first_transfer))
print(“Receive channel first transfer: “+str(io_stream.recvchannel._first_transfer))
print(”\n”)

print(“Output buffer shape: \n”+str(output_buffer.shape)+“\n”)

And that’s my problem. Thank you so much for your attention, I’d be really glad if someone could help me, I’ve been trying for several days now and It’s starting to seem just an impossible task. I wanted to add more images to the post, unfortunately I can’t to that with a new account, so if you need any other detail or file in order to figure out the problem, I’ll be willing to give it to you.

I suspect the .register_mapissue is a bug on our part - we aren’t handling the metadata correctly for the register-mapped array.

The DMA issue is due to a mismatch in the stream the DMA engine is providing and what the IP is expecting. The standard DMA engine emits a single packet for all of the data whereas the Video blocks expect a packet for each line as provided by the dedicated Video DMA IP. You can check this by looking at the TLAST signal in the simulation which should occur once at the end of each line.

You can try replacing the DMA engine in your block diagram with the VDMA - we have a driver inside PYNQ for it It’s designed for reading and writing a stream of frames rather than single frames going through an IP so might not be exactly what you want but should be enough to test your design.

I would recommend not using stream connections at all and instead using the m_axiinterface - the SDAccel examples show the basic flow for that kind of design - i.e. xfopencv/xf_threshold_accel.cpp at master · Xilinx/xfopencv · GitHub

Peter

1 Like

Hi Peter, thank you so much for your answer, I found it really helpful, but I’m still having problems with the design.

First, I tried what you suggested about replacing the DMA with the VDMA, the block diagram been exactly the same as in my first post but with the VDMA instead of DMA. After bitstream generation, I went to the Ultra96 and got this error when trying to instantiate the handler of the VDMA:

import numpy as np
from pynq import Overlay
from pynq import xlnk

bs = Overlay(“/home/xilinx/jupyter_notebooks/dilation_overlay/dilate_overlay_interr.bit”)
bs.download()

mem_manager = xlnk.Xlnk()

bs?

import pynq.lib.dma

width = 160
height = 100

dilate_ip = bs.ip_accel_app_0 #driver para dilate_ip (Driver por defecto)
io_stream = bs.axi_vdma_0 #driver para axi_dma (Driver específico para DMA)


AttributeError Traceback (most recent call last)
in ()
5
6 dilate_ip = bs.ip_accel_app_0 #driver para dilate_ip (Driver por defecto)
----> 7 io_stream = bs.axi_vdma_0 #driver para axi_dma (Driver específico para DMA)
8
9 input_buffer = mem_manager.cma_array(shape=(height,width), dtype=np.uint8)

/usr/local/lib/python3.6/dist-packages/pynq/overlay.py in getattr(self, key)
335 “”"
336 if self.is_loaded():
→ 337 return getattr(self._ip_map, key)
338 else:
339 raise RuntimeError(“Overlay not currently loaded”)

/usr/local/lib/python3.6/dist-packages/pynq/overlay.py in getattr(self, key)
735 elif key in self._description[‘ip’]:
736 ipdescription = self._description[‘ip’][key]
→ 737 driver = ipdescription’driver’
738 setattr(self, key, driver)
739 return driver

/usr/local/lib/python3.6/dist-packages/pynq/lib/video/dma.py in init(self, description, framecount)
588 super().init(description)
589 self.framecount = framecount
→ 590 self.readchannel = AxiVDMA.S2MMChannel(self, self.s2mm_introut)
591 self.writechannel = AxiVDMA.MM2SChannel(self, self.mm2s_introut)
592

AttributeError: ‘AxiVDMA’ object has no attribute ‘s2mm_introut’

As this error seems to be related to the interrupts of the block, I just tried a new block design connecting the interrupts through the IP Concat to the pl_ps_irq0 port of the Zynq UltraScale+ MPSoC. Obviously I’m not sure if this is the right thing to do, but I saw several designs on the internet with that configuration and I thought that It may be helpful. It was not a surprise when I got this error on the exact same line as before:


KeyError Traceback (most recent call last)
in ()
5
6 dilate_ip = bs.ip_accel_app_0 #driver para dilate_ip (Driver por defecto)
----> 7 io_stream = bs.axi_vdma_0 #driver para axi_dma (Driver específico para DMA)
8
9 input_buffer = mem_manager.cma_array(shape=(height,width), dtype=np.uint8)

/usr/local/lib/python3.6/dist-packages/pynq/overlay.py in getattr(self, key)
335 “”"
336 if self.is_loaded():
→ 337 return getattr(self._ip_map, key)
338 else:
339 raise RuntimeError(“Overlay not currently loaded”)

/usr/local/lib/python3.6/dist-packages/pynq/overlay.py in getattr(self, key)
735 elif key in self._description[‘ip’]:
736 ipdescription = self._description[‘ip’][key]
→ 737 driver = ipdescription’driver’
738 setattr(self, key, driver)
739 return driver

/usr/local/lib/python3.6/dist-packages/pynq/lib/video/dma.py in init(self, description, framecount)
586
587 “”"
→ 588 super().init(description)
589 self.framecount = framecount
590 self.readchannel = AxiVDMA.S2MMChannel(self, self.s2mm_introut)

/usr/local/lib/python3.6/dist-packages/pynq/overlay.py in init(self, description)
600 self._gpio = {}
601 for interrupt, details in self._interrupts.items():
→ 602 setattr(self, interrupt, Interrupt(details[‘fullpath’]))
603 for gpio, entry in self._gpio.items():
604 gpio_number = GPIO.get_gpio_pin(entry[‘index’])

/usr/local/lib/python3.6/dist-packages/pynq/interrupt.py in init(self, pinname)
96 self.number = PL.interrupt_pins[pinname][‘index’]
97 self.parent = weakref.ref(
—> 98 _InterruptController.get_controller(parentname))
99 self.event = asyncio.Event()
100 self.waiting = False

/usr/local/lib/python3.6/dist-packages/pynq/interrupt.py in get_controller(name)
157 if con.name == name:
158 return con
→ 159 ret = _InterruptController(name)
160 _InterruptController._controllers.append(ret)
161 return ret

/usr/local/lib/python3.6/dist-packages/pynq/interrupt.py in init(self, name)
175 “”"
176 self.name = name
→ 177 self.mmio = MMIO(PL.ip_dict[name][‘phys_addr’], 32)
178 self.wait_handles = [ for _ in range(32)]
179 self.event_number = 0

KeyError: ‘’

After stumbling upon those two walls, I just decided to try and change the interface to m_axi as you said, following the SDAccel example. Unfortunately, I wasn’t able to create a proper testbench for my design, so I have to have faith that everything is ok. Assuming that, this is the block design that I get after letting the connection automation run:

I wonder if I should add or change anything. Also, If you know about any resource that I can read in order to figure out how to manage this new design I would appreciate it a lot. I’m not sure how I should get the image data to the IP from the PS and vice versa on PYNQ. Until now, I was following some tutorials in addition of the example of xfOpenCV, but there seems to be none using this approach of the SDAccel.

Thank you again for your attention and all the valuable information that you have provided to me.

Rubén

That design should be fine as an AXI master design. I can point you to a design I did with the xfOpenCV optical flow function as an example of how to interact with designs that use the m_axi interface pragma. The general idea is that the offset=slave part of the pragma will result in a new register in the register map that you can program with the address that the buffer should be read from. If you look here you can see that the process function is writing physical_address of each of the input and output buffers to the appropriate registers and then starting the accelerator. The rest of the register writes in the setup are just for configuration - the IP top-level function is at ZCU104_VideoDemo/optical_flow.cpp at master · PeterOgden/ZCU104_VideoDemo · GitHub

With respect to test benches, when using m_axi it’s vital to set the depth in the interface pragma to match what your testbench is going to use. If the depth is bigger than the test bench buffer then your program can crash in interesting ways.

Peter

2 Likes

I think that I’ll be able to go with all the aid you’ve given so far, so I’m going to mark your reply as a solution.

Thank you so much, your help has been really valuable.