Hello. Im trying a neural network with HLS. This is a mix of an issue for xilinx and here i think. Im using HLS4ML for the network compile/syntethize but this part of the issue is more xilinx/pynq related. The board is a PYNQ-Z2 and my NN has 512 data input size and 10 outputs (for classification)
When the projects is builded an HLS program is created. I can then open it with Vivado HLS (2019.2) and Run C Synthesis and Export the IP. Here come the issues.
- If i complie with io_type=io_parallel, the following IP is generated
but then this is my utilization estimate:
- If i compile with io_type=io_serial i get this massive IP block
with 512 inputs but the utilization estimates are
fits the board
As you can see, io_parallel has the best IP block form but the BRAM is way over the maximum of the board.
Here comes the questions:
-
MOST IMPORTANT: Its possible for the pynq to implement the smaller io_parallel and then to load the BRAM from SD? How can i reduce the BRAM usage? How to do it from python?
-
If the io_serial is used so the BRAM fits the board, how do i run the connections so then i can generate the overlay and use it from python?