PYNQ: PYTHON PRODUCTIVITY

Vitis Vision Libraries

I used Vitis Vision Library L1 from Github to generate a threshold IP core. In the xf_threshold_config.h file, we have to define HEIGHT and WIDTH

Screenshot from 2021-06-18 16-34-48

However, then in the file xf_threshold_accel.cpp we define S_AXILite interfaces for the number of rows and columns, so that we can write to the registers during runtime.

How are them related? Does HEIGHT and WIDTH have to be the same as rows and columns that we will use during the program? Before generating the HLS IP, does we have to define that variables to be the shape of the image that we will then use on the notebooks?

Hi @pedropinheiro2,

The HEIGHT and WIDTH you define in the code is the maximum resolution your IP will support. However, when you want to use the IP you need to set up the actual image size you are using.

Mario

Hi @marioruiz,

Thank you very much for your response. I’ve actually succeeded implementing the threshold function from Vitis Vision L1 libraries, using a memory mapped approach on PYNQ.
Now, I’m trying to implement the 8 pixel per word mode and I’m in trouble. Do I need to change some configuration in Vivado design/notebook, besides changing the mode when generating the HLS IP? When I do not change anything and write the register to start, the AP_START bit stays always at 1 and I cannot run any function again until I reset the board.

Thank you for your support,
Pedro

You’ll need to modify the datapath to provide the 8 pixels per clock.

This can be done with a data width converter

Thank you.

Should I use one data width converter at each side of the HLS IP core? Do you have any example of its usage where I could learn and try to replicate to my problem?

Pedro

That depends on your design. Are you moving the image from/to the PS?

I am reading the image from the SD card to the PS. Then I am allocating two buffers, one for input and other for output, using the Pynq function “allocate”. Then, I am using the HLS IP with AXI memory mapped interface to access the image on DDR. Basically, I am thresholding a grayscale, 8-bit image using the Vitis HLS IP and writing to the output buffer on the memory. At the moment I am using the 1 pixel per clock mode, but I would like to see the acceleration from parallelization when using 8 pixels per clock.

Thanks

What board are you using?

If you are using PYNQ-Z2, the datapath from the PS can be configured either to 32 or 64 bit width.
Which means that you can only pack a maximum of two pixels per clock.

Configuring the IP to process 8 pixels per clock will only accelerate if you are able to feed and consume the 8 pixels every clock cycle. This won’t be the case for PYNQ-Z2

For ZynqUS devices the datapath can go up to 128-bit (potentially packing up to 5 pixels)

Mario

I am using PYNQ-Z2 board.

When using 8-bit grayscale image and the AXI HP port from PS configured to 64 bit width, couldn’t I achieve 8 pixels per clock?

You should be able to pack the 8 pixel into the (64-bit) datapath for grayscale images

Thank you! Now my question is, do I have to change any configuration on Vivado design/notebook from the one I have with 1 pixel per clock? Or the only necessary change is the change the mode into the .cpp file during HLS IP generation?

I have changed the mode and apparently, I am not getting results. The AP_START bit from the control register is freezing at 1.

I have also checked that my HEIGHT and WIDTH are multiples of 8

Without seeing the IPI and notebook and can’t really tell.

The Vitis threshold IP was generated using the following .cpp

#include "xf_threshold_config.h"

static constexpr int __XF_DEPTH = (HEIGHT * WIDTH * (XF_PIXELWIDTH(XF_8UC1, NPIX)) / 8) / (INPUT_PTR_WIDTH / 8);

void threshold_accel(ap_uint<INPUT_PTR_WIDTH>* img_inp,
                     ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                     unsigned char thresh,
                     unsigned char maxval,
                     int rows,
                     int cols) {
// clang-format off
    #pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1 depth=__XF_DEPTH
    #pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2 depth=__XF_DEPTH

    #pragma HLS INTERFACE s_axilite port=thresh     bundle=control
    #pragma HLS INTERFACE s_axilite port=maxval     bundle=control
    #pragma HLS INTERFACE s_axilite port=rows     bundle=control
    #pragma HLS INTERFACE s_axilite port=cols     bundle=control
    #pragma HLS INTERFACE s_axilite port=return   bundle=control
    // clang-format on

    const int pROWS = HEIGHT;
    const int pCOLS = WIDTH;
    const int pNPC1 = NPIX;

    xf::cv::Mat<XF_8UC1, HEIGHT, WIDTH, NPIX> in_mat(rows, cols);
    // clang-format off
    // clang-format on

    xf::cv::Mat<XF_8UC1, HEIGHT, WIDTH, NPIX> out_mat(rows, cols);
// clang-format off
// clang-format on

// clang-format off
    #pragma HLS DATAFLOW
    // clang-format on

    xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_8UC1, HEIGHT, WIDTH, NPIX>(img_inp, in_mat);

    xf::cv::Threshold<THRESH_TYPE, XF_8UC1, HEIGHT, WIDTH, NPIX>(in_mat, out_mat, thresh, maxval);

    xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, XF_8UC1, HEIGHT, WIDTH, NPIX>(out_mat, img_out);
}

and .h file

#ifndef _XF_THRESHOLD_CONFIG_H_
#define _XF_THRESHOLD_CONFIG_H_

#include "hls_stream.h"
#include "ap_int.h"

#include "common/xf_common.hpp"
#include "common/xf_utility.hpp"

#include "imgproc/xf_threshold.hpp"
#include "xf_config_params.h"

typedef ap_uint<8> ap_uint8_t;
typedef ap_uint<64> ap_uint64_t;

/*  set the height and weight  */
#define HEIGHT 2496
#define WIDTH 3360

#if RO
#define NPIX XF_NPPC8
#endif
#if NO
#define NPIX XF_NPPC1
#endif

#define INPUT_PTR_WIDTH 8
#define OUTPUT_PTR_WIDTH 8

void threshold_accel(ap_uint<INPUT_PTR_WIDTH>* img_inp,
                     ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                     unsigned char thresh,
                     unsigned char maxval,
                     int rows,
                     int cols);

#endif // end of _XF_THRESHOLD_CONFIG_H_

with RO set to 1 and NO set to 0 in config file.

Vivado design looks like this,

with S_AXI_HP0 in 64 bits mode.

The hardware drivers for control registers into the xthreshold_accel_hw.h file look like this,

// 0x04 : Global Interrupt Enable Register
//        bit 0  - Global Interrupt Enable (Read/Write)
//        others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
//        bit 0  - enable ap_done interrupt (Read/Write)
//        bit 1  - enable ap_ready interrupt (Read/Write)
//        others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
//        bit 0  - ap_done (COR/TOW)
//        bit 1  - ap_ready (COR/TOW)
//        others - reserved
// 0x10 : Data signal of thresh
//        bit 7~0 - thresh[7:0] (Read/Write)
//        others  - reserved
// 0x14 : reserved
// 0x18 : Data signal of maxval
//        bit 7~0 - maxval[7:0] (Read/Write)
//        others  - reserved
// 0x1c : reserved
// 0x20 : Data signal of rows
//        bit 31~0 - rows[31:0] (Read/Write)
// 0x24 : reserved
// 0x28 : Data signal of cols
//        bit 31~0 - cols[31:0] (Read/Write)
// 0x2c : reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)

#define XTHRESHOLD_ACCEL_CONTROL_ADDR_AP_CTRL     0x00
#define XTHRESHOLD_ACCEL_CONTROL_ADDR_GIE         0x04
#define XTHRESHOLD_ACCEL_CONTROL_ADDR_IER         0x08
#define XTHRESHOLD_ACCEL_CONTROL_ADDR_ISR         0x0c
#define XTHRESHOLD_ACCEL_CONTROL_ADDR_THRESH_DATA 0x10
#define XTHRESHOLD_ACCEL_CONTROL_BITS_THRESH_DATA 8
#define XTHRESHOLD_ACCEL_CONTROL_ADDR_MAXVAL_DATA 0x18
#define XTHRESHOLD_ACCEL_CONTROL_BITS_MAXVAL_DATA 8
#define XTHRESHOLD_ACCEL_CONTROL_ADDR_ROWS_DATA   0x20
#define XTHRESHOLD_ACCEL_CONTROL_BITS_ROWS_DATA   32
#define XTHRESHOLD_ACCEL_CONTROL_ADDR_COLS_DATA   0x28
#define XTHRESHOLD_ACCEL_CONTROL_BITS_COLS_DATA   32

// control_r
// 0x00 : reserved
// 0x04 : reserved
// 0x08 : reserved
// 0x0c : reserved
// 0x10 : Data signal of img_inp
//        bit 31~0 - img_inp[31:0] (Read/Write)
// 0x14 : Data signal of img_inp
//        bit 31~0 - img_inp[63:32] (Read/Write)
// 0x18 : reserved
// 0x1c : Data signal of img_out
//        bit 31~0 - img_out[31:0] (Read/Write)
// 0x20 : Data signal of img_out
//        bit 31~0 - img_out[63:32] (Read/Write)
// 0x24 : reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)

#define XTHRESHOLD_ACCEL_CONTROL_R_ADDR_IMG_INP_DATA 0x10
#define XTHRESHOLD_ACCEL_CONTROL_R_BITS_IMG_INP_DATA 64
#define XTHRESHOLD_ACCEL_CONTROL_R_ADDR_IMG_OUT_DATA 0x1c
#define XTHRESHOLD_ACCEL_CONTROL_R_BITS_IMG_OUT_DATA 64

Finally, the notebook looks like this,

from pynq import Overlay, allocate
import cv2 as cv
import numpy as np
import time

overlay = Overlay('/home/xilinx/pynq/overlays/threshold_8pix/threshold.bit',download=True)

img_path = '/home/xilinx/jupyter_notebooks/image.png'

img = cv.imread(img_path,-1)
img = (img/256).astype(np.uint8)
height, width = img.shape

in_buffer = allocate(shape=(height,width), dtype=np.uint8)
out_buffer = allocate(shape=(height,width), dtype=np.uint8)

in_buffer[:] = img

overlay.threshold_accel_0.s_axi_control.write(0x00,2)
overlay.threshold_accel_0.s_axi_control.write(0x10,75)
overlay.threshold_accel_0.s_axi_control.write(0x18,255)
overlay.threshold_accel_0.s_axi_control.write(0x20,height)
overlay.threshold_accel_0.s_axi_control.write(0x28,width)

overlay.threshold_accel_0.s_axi_control_r.write(0x10,in_buffer.device_address)
overlay.threshold_accel_0.s_axi_control_r.write(0x1c,out_buffer.device_address)

overlay.threshold_accel_0.s_axi_control.write(0x00,0x1)

The height and width of the image are exactly the same as I defined in the .h file from the HLS IP generation.

When I do this with the 1 pixel per clock mode, the threshold IP runs with success and the control register stays in IDLE mode. When I do the same with 8 pixel per clock, the register remains always with AP_START set to one and I have to restart my board before I could run again the 1 ppc example successfully. It seems like there could be some denied memory access that “crash” the system independently of reprogramming the FPGA by downloading the bitstream.

Thanks for the support,
Pedro

You don’t mentioned anything about the datapath width. You should widen both INPUT_PTR_WIDTH and OUTPUT_PTR_WIDTH eight times to be able to work at 8 pixel per clock.

Is S_AXI_HP0 configured to be 64-bit wide?

I don’t think any change is needed in the notebook

Thank you so much for the support! I’ve tried your suggestion and it worked.

Pedro

1 Like