Added IP before HDMI out IP not working

Hello,

I re-built the base overlay for the pynq-Z1 (image version 2.7) and added an HLS IP between the output of the VDMA (MM2S) and the HDMI_OUT IP (in_stream):

The code for the IP is the following:

const uint32_t APP_COLS = 1920;
const uint32_t APP_ROWS = 1080;
const uint32_t APP_PIXELS_APP = APP_COLS* APP_ROWS;
const vx_uint32 APP_VEC_NUM = 1;

template <class TYPE, const size_t SIZE> //
struct vx_image_data {                   //
    TYPE pixel[SIZE];
    uint8 last;
};

typedef vx_image_data<vx_uint32, APP_VEC_NUM> app_u32_image;

void Passthrough(app_u32_image input[APP_PIXELS_APP], app_u32_image output[APP_PIXELS_APP]) {
#pragma HLS interface ap_ctrl_none port=return
#pragma HLS INTERFACE axis port=input
#pragma HLS INTERFACE axis port=output

	// Computation
#pragma HLS DATAFLOW

	app_u32_image tmp;
	for(int i=0; i<APP_PIXELS_APP; i++) {
		tmp = input[i];
		output[i] = tmp;
	}
}

And the jupyter notebook is:

from pynq import Overlay
from pynq.lib.video import *

#overlay = Overlay('base.bit')
overlay = Overlay('/home/xilinx/customOverlays/passthrough.bit')
hdmi_in = overlay.video.hdmi_in
hdmi_out = overlay.video.hdmi_out

hdmi_in.configure(PIXEL_RGBA)
hdmi_out.configure(hdmi_in.mode, PIXEL_RGBA)
hdmi_in.cacheable_frames = False
hdmi_out.cacheable_frames = False
hdmi_in.start()
hdmi_out.start()
mymode = hdmi_in.mode
print("My mode: "+str(mymode))
height = hdmi_in.mode.height
width = hdmi_in.mode.width
bpp = hdmi_in.mode.bits_per_pixel

hdmi_in.tie(hdmi_out)

I have an hdmi camera connected to the HDMI IN and a screen connected to the HDMI OUT. I can see the images from the camera when I load the base overlay but when I load the passthrough one, there is no HDMI signal and the screen goes into sleep mode.

The HLS IP takes RGBA (32 bit) pixels, that is why I sent the hdmi {in,out} mode to PIXEL_RGBA. Is there an issue with the HLS itself or where the IP is added into the block design?

Thanks for the help.

1 Like

Hi @aripod,

Using structs and arrays is discouraged in Vitis HLS to works with streams, and (I believe) it should not work. In addition to that, you are not generating the user signal, which indicates start of frame.

I would suggest you look at the PYNQ HLS IP and use hls::stream for streams PYNQ/boards/ip/hls/pixel_unpack at master · Xilinx/PYNQ · GitHub

Mario

Hi @marioruiz,

Thanks for your help. I added the user signal but still doesn’t work. I still don’t get a “valid” hdmi signal and the screen goes into sleep mode.
I am not using Vitis but vivado HLS 2019.2 to generate the IP and then vivado 2020.2 for the overlay.

I added it like:

template <class TYPE, const size_t SIZE> //
struct vx_image_data {                   //
    TYPE pixel[SIZE];
    ap_uint<1> last;
    ap_uint<1> user;
};

I’ve notice that the hdmi_in IP, out_stream has also tkeep but in_stream in hdmi_out doesn’t. Should the input of my IP also have tkeep or is that optional?

What if I don’t do it like that but rather add another DMA and the IPs without tuser and tkeep (only tvalid, tdata, tready and tlast)?

Here I added HP3 for axi_dma_1, all to the 142MHz clk as the video IPs.

How is it with the data types? I tried that and had an issue due to the data type of the PynqBuffer being always uint8 (see here).

1 Like

Hi,

You should be using Vitis HLS 2020.2 to match the Vivado version as well as using hls:stream.

You should also verify that your HLS IP is achieving an II=1. Otherwise, you will not get an output.

There a two part tutorial on DMA here

How is it with the data types? I tried that and had an issue due to the data type of the PynqBuffer being always uint8 (see here).

You are using PIXEL_RGB in that case, so you will get three channels per pixel 8-bit each that’s why you se uint8

Mario

1 Like

Hi @marioruiz,

Thanks again for your help.

As you can see in the last image, the DMA has 32 bits for tdata and I set PIXEL_RGBA. I understand that I would get now four channels with 8 bits which is what I get with the pynqbuffer. However, if I allocate a buffer for the dma as in the tutorials input_buffer = allocate(shape=(5,), dtype=np.uint32) and output_buffer = allocate(shape=(5,), dtype=np.uint32), how can I set the pynqbuffer that I get the HDMI frame in to also have dtype=np.uint32 so I can stream each frame that comes from the VDMA through the DMA? How should I use the inframe which is a pynqbuffer from VDMA to stream the data to DMA instead of using input_buffer? As far as I understand, it is a datatype issue, isn’t it? I insists on this because the original IP (with only tlast works with dma using pre-stored images, I want now to use it with live-feed from the hdmi).

1 Like

Hi,

The buffer you get from the VDMA is always going to have multiple channels per each pixel. So, you want to pack 4x 8-bit into a 1x 32-bit element. In essence, you want to go from 1920x1080x4 unit8 to 1920x1080x1 uint32.

You can try to pack the 4 channels into a single uint32, but it is going to be slow as it involves shifts and multiplications. You can explore numpy frombuffer and tobytes to reinterpret to pixels.

You can try something like this Any method to reduce the calculation time on PS (PYNQ-z2) - #5 by marioruiz

You will get your VDMA buffer then you will create your DMA buffer

buf_a =pynq.allocate((1920,1080), dtype=np.uint32) 

Instead of assigning anything to this buffer you change the physical address to the VDMA buffer

buf_a.physical_address = buf_vdma.physical_address 

Hopefully this would do the trick.

The real question is why do you want to use a DMA for video? The VDMA is optimized for Video

Mario

Hi @marioruiz ,

Thanks again for your help and the “trick” to use the physical address. I tried with the frombuffer and tobytes methods but the performances decreased quite a lot as expected.

That’s a fair question. I wanted to do that as I am using a specific image processing library and I didn’t want to modify it. The passthrough following your example worked:

#include <ap_fixed.h>
#include <ap_int.h>
#include "hls_stream.h"
#include <ap_axi_sdata.h>

typedef ap_axiu<32,1,0,0> wide_pixel;
typedef hls::stream<wide_pixel> wide_stream;

void Passthrough(wide_stream& stream_in, wide_stream& stream_out) {
#pragma HLS INTERFACE ap_ctrl_none port=return
//#pragma HLS INTERFACE s_axilite register port=mode clock=control
#pragma HLS INTERFACE axis depth=32 port=stream_in register
#pragma HLS INTERFACE axis depth=32 port=stream_out register

	wide_pixel in_pixel;
	wide_pixel out_pixel;
	bool last = false;

	while(!last) {
#pragma HLS pipeline II=1
		stream_in.read(in_pixel);
		out_pixel.data = in_pixel.data;
		out_pixel.last = in_pixel.last;
		out_pixel.user = in_pixel.user;
		last = in_pixel.last;
		stream_out.write(out_pixel);
	}
}

I will have create a wrapper around the functions of the img. proc. library around that passthrough example.

Hi @aripod
Great :grinning:

If you are not accessing the array from software, you may want to write your code in a less hacky way.

buf_a = pynq.allocate((1920,1080,4), dtype=np.uint8)
buf_a [:] = buf_vdma[:]

Or you can directly reuse buf_vdma, as pynq.allocate((1920,1080,4), dtype=np.uint8) is equivalent to pynq.allocate((1920,1080), dtype=np.uint32) in physical memory.

Mario

Hi @marioruiz,

The less hacky way works perfectly. However, I had to change it to buf_a = pynq.allocate((1080,1920, 4), dtype=np.uint8).

I will focus on using hls::stream and have everything on hardware, just using hdmi_in.tie(hdmi_out).

Thanks for the help

1 Like