PYNQ: PYTHON PRODUCTIVITY

AXI FULL transfer time

Hello everyone,
I am testing the time it cost for an AXI FULL to transfer 24 number of 32 bit data from PS to PL.
And I found out the time it cost varies a lot.(just like the picture below.)


Can anyone tell me why this happens? Thanks a lot!

The Python code is running on the CPU, on an OS. There will be some overhead with any Python call, and there will be some variability.

If you are using the AXI GP port, these are not AXI full, so bursts are not being used and these are individual transactions. the Python calls will be dominating the AXI transfers.

You can use the AXI HP ports for “AXI FULL” interfaces. The PL can “pull” the data from PS in this case, rather than the PS “pushing” data opver AXI GP. DMA is one way to do this, but it needs some Python calls to setup, and again this overhead will likley dominate transfer of 24 values. You could build an IP with an AXI full interface that can initiate its own transactions. This would give highest performance.

If you search forums, there has been some discussion on this by others. One thread:

Cathal

1 Like

First, thank you very much for your reply!

I am a little confused. When I create and package my IP, I initiate it as AXI FULL interface and Slave mode. Do you mean I should initiate it as AXI FULL MASTER mode so that I can use the AXI FULL interface? So even if I initiate it as AXI FULL it still won’t use burst transfer?

If the way that I use numpy array to transfer data is not by AXI FULL, then why this approach did accelerate my transfer speed? My old design which is done by mmio.write() that transfer 32 bit data once at a time is a lot slower.
(I used to think the way I use mmio.write() will be done by AXI LITE and the way I use numpy array will be done by AXI FULL. )