Execution time calculation

I have developed a kernel in HLS using streaming interfaces. Usually the parameters considered for finding timing values are clock cycles and the latency.
execution time =clock*latency;

But in streaming there is overlap with latency. So how do i find the total execution time in HLS for streaming? should i proceed to jupyter notebook to get the exact timing values?

Hi @srinivasan74,

In pipeline designs there are two key concepts

  • Initiation Interval (II), the number of clock cycles between the start times of consecutive loop iterations, ideally this should be 1
  • Latency, the time it takes to get an output since the input was fed

If you are only feeding one input the execution time (in clock cycles) is going to be equal to the Latency,
when you feed multiple inputs the computation in the different stages overlaps, so the equation to compute execution time is

execution time = Latency + II * number of inputs

Consider a simple example where the L = 5 and II=1 and you are feeding 10 input elements and for each element you get an output.

It is going to take 5 cycles to get the first output, but in the next clock cycle you’ll get another valid output and so on until all elements are processed. Therefore, the execution time is 15 clock cycles.

If you measure the execution time from the jupyter notebook, you will be measuring the time to move data from the PS to PL and vice versa. It really depends on what you want to measure.

Mario

Thanks , can we measure the data transfer time from PS to PL in jupyter notebook using existing python functions? .Can we just compute the kernel execution time using the same function?

You could do these measurements from the jupyer notebook using magics https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time,
for instance

%time
<your code goes here>

When the cell finishes executing you will get a report with the execution time. Note that the underlying OS impacts this measurement, and the granularity is the one the OS provides.

Mario

How is the execution time measured for an HLS IP on Vivado HLS(i.e. the time it takes to execute all the HLS Function defined in the top-level function) ?

Hi @Nikhil_Bhardwaj

Assuming you want to do this from software, this is one way from a cell in jupyter notebook

%time
<your code goes here>

You could also run a co-simulation of your HLS code and that should give you the execution time, without data transfer time.

Mario

Is there a way I can figure out the execution time on Vivado HLS or on Vivado IP Integrator?

If the boundaries in the HLS IP are well defined, this should be reported by HLS

Alternatively, using the Initiation Interval (II) and Latency (L) and you can compute the execution time in clock cycles as

execution(cycles) = L + II * <num iterations>

Mario

@marioruiz
Sir,

I’m currently working on implementing an image processing algorithm on Vivado HLS.
I need help with reducing the resource utilization, utilization of LUT’s in particular.
The top-level function on HLS has a kernel/window function defined for morphological operations(erosion & dilation), which causes high resource utilization.

I request you to kindly help me optimize my design for fewer LUTs.
(I tried looking on the Xilinx forum, but could not come up with a solution)

Kind Regards.

Hi @Nikhil_Bhardwaj,

Unfortunately, I am not able to help you with this.

My only suggestion is to use the Vision Vitis Accelerated libraries, these are optimized for performance and resource utilization.

https://xilinx.github.io/Vitis_Libraries/vision/2020.2/api-reference.html#dilate

Mario

1 Like