Poor copy performance out of PynqBuffer

Hi Folks,
Running PYNQ 3.0.1 on a ZCU111 I’m seeing much slower performance getting data out of a PynqBuffer than I think I should be getting and I’m 1) not sure why and 2) don’t know how to speed it up.

Broadly I’ve got a PynqBuffer (pynq.buffer.allocate) of about 1MiB and operations on it take on order of the time needed to fill the buffer from the PL side, making any sort of ISR timing much tighter than I’d hoped.

Bottom line: Copying data out of a 0.78MiB PynqBuffer (or into another pynq buffer!) takes about 10x that of a numpy array copy.

For some specific numbers:

Pynq version 3.0.1
10.00 ms of data @ 100.0% max rate
102400 items, 0.78 MiB
Copy into Pynq buffer: 1.71 ms
Copy Pynq 2 Pynq: 4.21 ms # buffer2[:]=buffer1
Copy Pynq 2 Pynq (.copy()): 6.44 ms # buffer2=buffer1.copy()
Copy Pynq 2 Numpy: 6.48 ms # array[:]=buffer1
Copy Pynq 2 Numpy (np.array()): 6.38 ms # array=np.array(buffer1)
Copy Numpy 2 Numpy: 0.92 ms # array2[:]=array1
Copy Numpy 2 Numpy (.copy()): 0.69 ms # array2=array.copy()

These two tests involved pip installing python-bloc2 into the pynq venv.
Compress numpy (blosc2.pack_array): 5.37 ms
Compress pynq (blosc2.pack_array): 11.92 ms

These final two tests involve bitshifts on the uint64 values to unpack them into usable form:
Unpack numpy (incl. np.zeros() allocate): 6.40 ms
Unpack pynq (incl. np.zeros() allocate): 36.04 ms

Here is the same test run on PYNQ 2.7, results are pretty similar:

Pynq version 2.7.0
10.00 ms of data @ 100.0% max rate
102400 items, 1.17 MiB, 0.78 MiB packed
Copy into Pynq: 1.43 ms
Copy Pynq 2 Pynq: 4.61 ms
Copy Pynq 2 Pynq (.copy()): 7.38 ms
Copy Pynq 2 Numpy: 6.58 ms
Copy Pynq 2 Numpy (np.array()): 6.27 ms
Copy Numpy 2 Numpy: 1.44 ms
Copy Numpy 2 Numpy (.copy()): 0.96 ms
Compress numpy (blosc2.pack_array): 7.68 ms
Compress pynq (blosc2.pack_array): 12.71 ms
Unpack numpy (incl. np.zeros() allocate): 6.68 ms
Unpack pynq (incl. np.zeros() allocate): 33.01 ms

I think there was a change in the allocator to be non-cacheable by default when moving from xlnk to xrt.

Try setting the cacheable option to True.

E.g. “buffer = pynq.allocate(shape, dtype, cacheable=True)”