Sobel filter (Vitis vision library) implementation on Pynq-Z board

Hello
I am trying to implement the Sobel filter using Vitis Vision Library 2021.2, Vitis HLS and Vivado 2021.2 on the Pynq-Z1 board.
Based on the Pynq-Hello example, I tried to modify the Sobel L1 C++ example file to use DMA. The code is shown below. It runs correctly and the IP and the log file is attached:

#include "hls_stream.h"
#include "common/xf_common.hpp"
#include "common/xf_infra.hpp"
#include "imgproc/xf_sobel.hpp"
#include "xf_config_params.h"
#include "ap_int.h"
///////
#include "xf_sobel_config.h"

///////
#define DATA_WIDTH 24
#define NPIX XF_NPPC1

/*  set the height and width  */
#define WIDTH 3840
#define HEIGHT 2160
#define FILTER_WIDTH 3
#define TYPE XF_8UC3
#define XF_USE_URAM
typedef hls::stream<ap_axiu<DATA_WIDTH,1,1,1>> stream_t;

template <int W, int TYPE, int ROWS, int COLS, int NPPC>
void axis2xfMat (hls::stream<ap_axiu<W, 1, 1, 1> >& AXI_video_strm, xf::cv::Mat<TYPE, ROWS, COLS, NPPC>& img) {
    ap_axiu<W, 1, 1, 1> axi;

    const int m_pix_width = XF_PIXELWIDTH(TYPE, NPPC) * XF_NPIXPERCYCLE(NPPC);

    int rows = img.rows;
    int cols = img.cols >> XF_BITSHIFT(NPPC);

    assert(img.rows <= ROWS);
    assert(img.cols <= COLS);

loop_row_axi2mat:
    for (int i = 0; i < rows; i++) {
    loop_col_zxi2mat:
        for (int j = 0; j < cols; j++) {
#pragma HLS loop_flatten off
#pragma HLS pipeline II=1

            AXI_video_strm.read(axi);
            img.write(i*rows + j, axi.data(m_pix_width - 1, 0));
        }
    }
}

template <int W, int TYPE, int ROWS, int COLS, int NPPC>
void xfMat2axis(xf::cv::Mat<TYPE, ROWS, COLS, NPPC>& img, hls::stream<ap_axiu<W, 1, 1, 1> >& dst) {
    ap_axiu<W, 1, 1, 1> axi;

    int rows = img.rows;
    int cols = img.cols >> XF_BITSHIFT(NPPC);

    assert(img.rows <= ROWS);
    assert(img.cols <= COLS);

    const int m_pix_width = XF_PIXELWIDTH(TYPE, NPPC) * XF_NPIXPERCYCLE(NPPC);

loop_row_mat2axi:
    for (int i = 0; i < rows; i++) {
    loop_col_mat2axi:
        for (int j = 0; j < cols; j++) {
#pragma HLS loop_flatten off
#pragma HLS pipeline II = 1

            /*Assert last only in the last pixel*/
            if ((j == cols-1) && (i == rows-1)) {
                axi.last = 1;
            } else {
                axi.last = 0;
            }

            axi.data = 0;
            axi.data(m_pix_width - 1, 0) = img.read(i*rows + j);
            axi.keep = -1;
            dst.write(axi);
        }
    }
}

void sobel_accel(stream_t& img_inp, stream_t& img_out1, stream_t& img_out2, int rows, int cols)
{
// clang-format off
    //#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1  depth=__XF_DEPTH
    //#pragma HLS INTERFACE m_axi     port=img_out1  offset=slave bundle=gmem2 depth=__XF_DEPTH_OUT
    //#pragma HLS INTERFACE m_axi     port=img_out2  offset=slave bundle=gmem3 depth=__XF_DEPTH_OUT

	#pragma HLS INTERFACE axis register both port=img_inp
	#pragma HLS INTERFACE axis register both port=img_out1
	#pragma HLS INTERFACE axis register both port=img_out2
	#pragma HLS INTERFACE s_axilite port=rows
	#pragma HLS INTERFACE s_axilite port=cols
  
    //#pragma HLS INTERFACE s_axilite port=rows     bundle=control
    //#pragma HLS INTERFACE s_axilite port=cols     bundle=control
    #pragma HLS INTERFACE s_axilite port=return   //bundle=control

    xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPIX> in_mat(rows, cols);
    xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPIX> _dstgx(rows, cols);
    xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPIX> _dstgy(rows, cols);



    #pragma HLS DATAFLOW


    //printf("Array2xfMat .... !!!\n");
    //xf::cv::Array2xfMat<INPUT_PTR_WIDTH, IN_TYPE, HEIGHT, WIDTH, NPC1>(img_inp, in_mat);
    axis2xfMat<DATA_WIDTH, TYPE, HEIGHT, WIDTH, NPIX>(img_inp, in_mat);
    //printf("Sobel .... !!!\n");
    //xf::cv::Sobel<XF_BORDER_CONSTANT, FILTER_WIDTH, TYPE, TYPE, HEIGHT, WIDTH, NPIX, XF_USE_URAM>(in_mat,_dstgx,_dstgy);
    xf::cv::Sobel<XF_BORDER_CONSTANT, FILTER_WIDTH, TYPE, TYPE, HEIGHT, WIDTH, NPIX, false>(in_mat, _dstgx,_dstgy);
    //printf("xfMat2Array .... !!!\n");
    xfMat2axis<DATA_WIDTH, TYPE, HEIGHT, WIDTH, NPIX>(_dstgx, img_out1);
    xfMat2axis<DATA_WIDTH, TYPE, HEIGHT, WIDTH, NPIX>(_dstgy, img_out2);
    //xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, OUT_TYPE, HEIGHT, WIDTH, NPIX>(_dstgx, img_out1);
    //xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, OUT_TYPE, HEIGHT, WIDTH, NPIX>(_dstgy, img_out2);
}

I have a problem with the Vivado design. Below, I joined my TCL file. I do not know how to interface the generated Y gradient output image with DMA.
Copying the generated bit file & hwh file to the board and executing the attached python script to get the X gradient output returns this error when running the kernel.
Could someone help me achieve this implementation?
I can upload the whole project if needed.
I really appreciate any help you can provide.

RuntimeError                              Traceback (most recent call last)
<ipython-input-13-86fae4039038> in <module>
----> 1 run_kernel()
      2 edge_image = Image.fromarray(out_buffer)

<ipython-input-11-0f054f9b8a08> in run_kernel()
      3     dma.recvchannel.transfer(out_buffer)
      4     sobel.write(0x00,0x81) # start
----> 5     dma.sendchannel.wait()
      6     dma.recvchannel.wait()

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/lib/dma.py in wait(self)
    214                         'DMA Slave Error (cannot access memory map interface)')
    215                 if error & 0x40:
--> 216                     raise RuntimeError(
    217                         'DMA Decode Error (invalid address)')
    218             if self.idle:

RuntimeError: DMA Decode Error (invalid address)


Sobel_pl.ipynb (1002.0 KB)

sobel.tcl (46.3 KB)
solution1.log (177.2 KB)

1 Like

Can you post the rest of your Python code?
Did you use Allocate to allocate memory for your DMA buffer?

Cathal

Hi @Kamal,

I suggest you read the DMA tutorial to understand how to configure the PS and how to connect the DMA in your design. This part is missing, your DMA has no path for data transfers to the PS.

The sobel filter has two outputs, you should connect and consume from them. Otherwise, the IP will not work. You cannot leave one of the manager interfaces unconnected.

Mario

1 Like

Hi Cathatl
Thank you for your reply,
I have attached the Python code with original thread
Sobel_pl.ipynb

1 Like

Hi Mario
Thanks a lot for your reply,
It is based on these tutorials that I have advanced on Pynq implementation
However, I am stuck on how to interface the second output with the DMA. I have three options now
1- re-check the tutorial to find an idea how to interface the Y-gradient output with DMA and later with the PS
2- Add OpenCV Square root to the main function to obtain the final image
3- Changing the sobel filter to get the final image directly, however, Vitis Vision Library can not allow to change the sobel matrix, may be using filter2d is more feasible.
Could you please propose a solution for the first option, what Should I add or change in Vivado block designs.
Thanks in advance.

1 Like

Hi,

I think that you should first address the lack of a path between the DMA and PS (HP ports).
HP port(s) in the PS are not enabled. Without this connection your design would never work.

You are missing a key step from the tutorial Tutorial: PYNQ DMA (Part 1: Hardware design)

If you do not want to use the second interface in the sobel filter, you could assert tready with a constant.

Mario

2 Likes

Hi Mario
It is true I completely forget to re-customize the Pynq processing system. I enabled the S-AXI HP0 Interface. The new design is presented below


Could you please explain more how to assert tready with a constant.
Thanks in advance

1 Like

Hi,

You would expand the img_out2 interface, by clicking on the + next to it.
Then you wire the constant to the TREADY. Make sure the constant value is set to 1.

Something like this

Mario

2 Likes

Mario, you are my hero.
Thanks a lot for your time and your help. This is the result.

Just more two questions,
1- I want two connect the second output to the DMA via another AXI4-Stream Data Width converter; when I try to customize the DMA to receive two channels, I do not have the option to change it. What to do to solve this issue.

2- Adding Constant IP to Tready in the second output solve my issue, but I do not understand how and why you could please explain it to me.
Since I am new to IP design and PS-PL codesign, can you suggest some interesting references (books or others) that explain these control signals?
Thanks again for your help
Kamal

Hi,

Glad that I could help.

  1. You will have to add another DMA. Multi channel support is discontinued. From PG 021

image

  1. You would have to familiarise yourself with AXI4, hint: handshake methodology in AXI4.

You can start with the Zynq book http://www.zynqbook.com/. There are also plenty of resources available.

Mario

1 Like

Hi Mario,
Thank you so much for providing the valuable information and your support.