Increasing computational speed for custom IP

Hi all,
I have created my own IP with Vitis HLS 2021.2 and implemented on PYNQ-Z2 and Ultra96-V2 boards. The computational speed increases from around 2.5ms to 1.2ms (The C/C++ code has been optimized for each board using pipeline, unroll, array partition). I am now trying to get a faster computational speed, hopefully around 0.1ms.
Is there any guidelines that help selecting different PYNQ boards to meet my request? What index should I be looking for?

Also, I’d like to know what are the factors that would affect the computational speed? For example, LUT size, is higher LUT size result in faster computational speed?
Thanks.

1 Like

PYNQ-Z2 uses a Zynq 7000 (28nm). Ultra96 is Zynq Ultrascale+ (16nm). The ZU+ is faster.
For both devices, you can try build and run your design at a higher target clock speed until you reach the limit.
Factors that affect speed:

  • Critical path
  • How much you can process in parallel
  • size of design
  • congestion (size of design relative to size of device)
  • routing length
  • levels of logic

Cathal