Create overlay for PYNQ-Z2 with Verilog

Hi all,

I want to implement a hardware design for gaussian elimination over GF(2) in PYNQ-Z2, which is published by this research: The source code is written in Verilog and is available via the link in the paper.

I successfully ran the testbench, and then i created an IP using “Create and Package New IP” and followed the tutorial ( to create an overlay and import it into jupyter notebook on the board.

I have no problem generating the bitstream and .hwh files, however, when i try to look at the overlay in jupyter notebook, it shows that the IP Block is None:

I notice that when i click “Validate Design”, it showed me a warning that one of the input pins is not connected. I am not sure whether this warning causes the problem.

Anyone has ideas on what’s wrong here? Thanks.

Here is the warning message:

Also the block diagram:

Link to the source code:

You should connect the start, done, mem_data_out and mem_op_out pins to the processor, you could use AXI_GPIO.

As @kuoyaoming93 mentioned, the PS does not see the IP because it is not memory mapped. You should change the interfaces to support memory mapped, both for start and output data.

Thanks all for the answer, Im sorry that im quite new to FPGA. @kuoyaoming93 mentioned that i should use GPIO, but my application is performance critical. Shall i use the DMA IP instead?

After looking at the documentation, im still not sure how should i connect the ports, i ran the “Connection Automation”, but it doesnt connect all the ports for me: image

Any references/tutorials can i refer to? The tutorials about building overlays using C++ HLS seem to handle it automatically…


If your application is performance critical you should use DMA. But first, you need to be able to talk with your IP from the PS in some way.

Let’s look into the I/O in your IP (apart from clk and rst). start is your only input and done, mem_data_out and mem_op_out are your outputs. Where are the input data? Such I/O do not have any particular interface

Have a look at his overlay tutorial Notice how the multiply IP uses AXI4-Lite for the control signals and AXI4-Stream for in and out data. Even though is for HLS, your RTL IP should look similar in terms of I/O.
You should use AXI4-Stream for your critical performance input/output. And AXI4-Lite for less critical I/O such as start and done.

Even though the flow is a bit different, you can have a look here

I have not seen the code, but just looking at the name rst seems to be active high, whereas, peripheral_aresetn is active low. You should have a look at that as well.