Increasing computational speed for custom IP

Hi all,
I have created my own IP with Vitis HLS 2021.2 and implemented on PYNQ-Z2 and Ultra96-V2 boards. The computational speed increases from around 2.5ms to 1.2ms (The C/C++ code has been optimized for each board using pipeline, unroll, array partition). I am now trying to get a faster computational speed, hopefully around 0.1ms.
Is there any guidelines that help selecting different PYNQ boards to meet my request? What index should I be looking for?

Also, I’d like to know what are the factors that would affect the computational speed? For example, LUT size, is higher LUT size result in faster computational speed?

1 Like

PYNQ-Z2 uses a Zynq 7000 (28nm). Ultra96 is Zynq Ultrascale+ (16nm). The ZU+ is faster.
For both devices, you can try build and run your design at a higher target clock speed until you reach the limit.
Factors that affect speed:

  • Critical path
  • How much you can process in parallel
  • size of design
  • congestion (size of design relative to size of device)
  • routing length
  • levels of logic