Image Processing Acceleration


I have created a verilog IP in Vivado that can intake pixel values through an internal FIFO buffer(but I can change this to an AXI-buffer if recommended). I wish to load this device with pixel values to complete a convolution and after many, generate a feature map. I’ve tried doing this via Jupyter notebooks and controlling GPIO, the process was very slow so I was not satisfied with the performance. Slow meaning several hours to generate one feature map from a 256x256rgb jpg file.

What is the best method to complete this convolution process and generate a feature map if I want to load data via software? I’ve read about petalinux allowing you to create software accelerated programs but I don’t really want to waste much time going down a rabbit hole.

I really appreciate input, I feel very lost

Hi @monkeyboyfr3sh,

What board are you targeting?

How are you moving data currently from the PS to PL?
For large chunks of data it is recommended to use DMA and the HP ports in the PS.

To do so, you will need an AXI4 master interface in your IP, or AXI4-Stream and a DMA IP in the PL.


This example may also be of use:


Hi @marioruiz,

I am targeting Pynq-Z2, sorry for not mentioning that before.

Currently I’ve tried a couple things. Using the Pynq boot image and running a jupyter notebook, I would use GPIO and input the binary values of each pixel. This was the slow process I alluded to. I have also tried implementing in Vitis but I ran into a snag when I realized I did not know how to properly load an image while doing the baremetal development. I’ve attached a photo of my block design when I was implementing this in Vitis for reference. The GPIO ports are outputs/inputs that are connected to my IP by putting the two into a verilog wrapper.

I’ve seen DMA recommended but I guess I’m just a bit confused how to setup my IP to read from memory. You mention an HP port, but I don’t think I know what this is exactly.

I apologize if I’m asking simple questions that have been answered somewhere, I think I’m bad at finding this information for some reason but I do appreciate the insight.

The AXI interface is the default interface type used in Vivado. One way to do this would be to add an AXI interface to your design. An AXI stream for your pixels would be the most straightforward. You can then stream data to your IP using a DMA.

You could also build an AXI master interface where your IP could access DRAM directly (without needing a DMA). This interface is more complicated though.

For any of the AXI interfaces, I would recommend you use a Wizard in Vivado to generate the HDL for the AXI interface(s) you want. I posted in another topic about this recently:

Once you generate the template, you can integrate your Verilog into the generated code.



Wow, thank you! I am currently not available to test this but I will be trying this in a couple hours and report back. I’ll also make sure to look at the Hello-World repo you’ve linked to. Thanks

1 Like

Good afternoon @cathalmccabe

So I’ve taken a look at the example and the information you’ve provided. I wanted to implement a simple device to test my knowledge and I’m a bit confused on the issue.

So here’s the deal,
I created an IP with a Slave AXI-lite port and this will input a timer value for an LED to blink on my Pynq-Z2. This is a block diagram of my device w/ Zynq processor. I was referencing the signal descriptions provided by AMBA here (

Here’s the snippet of my logic code inside my IP:

integer count_current_down,count_start_save;
reg used;
reg led_state = 0;

assign LED_signal = led_state;
assign s00_axi_wready = used;

always @(posedge s00_axi_aclk)begin
    //Check if data on bus is valid
    if(s00_axi_wvalid) begin
        //Since data is being read, will signal for tready to wait until one repeat  
        count_start_save = s00_axi_wdata;
        count_current_down = count_start_save;
        used = 0;
    else begin
        //Decrement count down var
        if(count_current_down > 0)begin
        //If count hits 0, invert led state and reset count
        else begin
            //Invert led state
            led_state = !led_state;
            //Set tready true
            used = 1'b1;
            //Reset countdown
            count_current_down = count_start_save;

I’ve also attached the IP in a zip format if you prefer.

The issue,
When I implement the block diagram I showed a picture of, I am failing implementation. (of course I created a HDL wrapper first)

This is the error I am receiving in Vivado:

[DRC MDRV-1] Multiple Driver Nets: Net design_1_i/LED_Timer_0/inst/UNCONN_OUT has multiple drivers: design_1_i/LED_Timer_0/inst/LED_Timer_v1_0_S00_AXI_inst/axi_wready_reg/Q, and design_1_i/LED_Timer_0/inst/used_reg/Q.

I don’t believe my code is doing what is stated in the error so I’m very confused. Did I read the documentation on AXI incorrect? (45.3 KB)


s00_axi_wready has two drivers one is S_AXI_WREADY and the another is your variable used
LED_Timer_v1_0_S00_AXI.v is controlling the AXI4-Lite signals, your user logic should not “write” to the AXI4-Lite signals. So, the vivado error is correct.