PYNQ: PYTHON PRODUCTIVITY

Execution time in PYNQ-Z2

I have written my kernel code in Vivado HLS and validated it using C/RTL Cosimulation. I exported the IP and created the block design and validated it in Vivado. I generated the bitstream for the validated block design. I also verified it on the PYNQ-Z2 board.

When I measured the execution time using the “time” python package in the host program, I noticed a discrepancy in the execution time on the PYNQ board and the time I calculated after the C/RTL Cosim. I calculated the execution time of the kernal after C/RTL cosim as follow:
Execution time = Latency x Clock Period

The timing on the board was calculated as follows:

pg_ip.write(0x00,0x01)
start = time.time()
while pg_ip.read(0x01 & 0x4)!= 0x04:
pass
end = time.time()
pg_ip.write(0x00,0x00) # stop
timing= end-start
print(timing)

These two timings do not match.

What is the reason for having different times?

What time did you measure, and what is the difference vs what you saw from co-sim?

There will be some overhead with the Python calls vs your co-sim time.
From memory, I think an MMIO call can be ~100MS.
(E.g. pg_ip.write() )
The call to time() will also add overhead.
If you are accessing memory (DRAM) there can be some additional (small) variability.

Cathal

Thank you for your reply, Cathal.

Latency I got in co-sim report = 662,059 cycles
CP achieved post-implementation = 9.726 ns
So, I calculated the theoretical kernel execution time as 662059 X 9.726 X 10^-9 = 0.006439 seconds.

The execution time of IP on the board was 0.05239 seconds.

Therefore, the time I get on the board is 8.135 times the theoretical time I calculate using co-sim latency.

0.05239 seconds isn’t very long. There will be an overhead with the Python calls.
Can you run your kernel for longer? If you want an accurate timing measurement, you can add an ILA and measure from the start/end of the control signals for your IP.

Cathal