PYNQ: PYTHON PRODUCTIVITY FOR ZYNQ

Using AXI DMA on ZCU104

I recently switch my works from PYNQ-Z2 to ZCU104.
However, I get a problem on my first exercise, which is multiplying each elements in a given array by 3.
I can receive the result, but it is incorrect. And any suggestion would be appreciated.

My input is

PynqBuffer([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
            64, 68, 72, 76], dtype=uint32)

the expected output is multiplying each element by 3.
However, the result I obtained is

PynqBuffer([  0,   0,   0,   0,  48,   0,   0,   0,  96,   0,   0,   0,
            144,   0,   0,   0, 192,   0,   0,   0], dtype=uint32)

My block design is

And the DMA configuration

The C++ code is also attached with this post.

array_mul.cpp (414 Bytes) array_mul.h (160 Bytes)

@Jia-Ming_Lin

The issue could be in the SysGen IP and not the DMA. To test, you could replace the IP with a AXI-Stream FIFO hooked up in the same way as the IP and check the data you sent is identical as received.

Thanks,
-Pat

The issue is possibly a mismatch between software and hardware for the AXI slave ports on the PS8 block. The widths have to be 128-bits in order to be compatible with PYNQ - that should be the default but this has been known to change for reasons I don’t understand.

Peter

Hi Peter, thank your for the reply,
I did change the data width of AXI HP0 FPD and AXI HP1 FPD from 128(default) to 32.
Do you mean the data width should be 128-bits?

Here is my original configurations of PS-PL interface

Yes - they should be 128. There has to be a match between that setting and what’s configured in software when the board boots and the block design. For the PYNQ boot files we keep the default of 128 so that’s what the block design should have as well.

Peter

2 Likes

Hi Peter,
I have one more question, I don’t understand why my input is

PynqBuffer([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
            64, 68, 72, 76], dtype=uint32)

then the output is

PynqBuffer([  0,   0,   0,   0,  48,   0,   0,   0,  96,   0,   0,   0,
            144,   0,   0,   0, 192,   0,   0,   0], dtype=uint32)

, only multiply by 3 at 1,5,9,13,…4k+1 position, and all other are zero under my original settings.

My original settings are,
Datawidth of slave HP on PS: 32 bits
DMA Memory Map Access Datawidth: 32 bits
DMA Streaming Datawidth: 32 bits
Top function I/O Datawidth: 32bits

Thank you!

One interesting thing about the way the way AXI works is that even if performing “narrow” transactions (i.e. the number of bytes is less than the bus width) the data will always appear on the wires that would be used if a full-width transaction was performed.

As an example, if I have a 128-bit wide bus and read 4 bytes at address 0x8 the data will be presented on wires 95:64.

There is a hardened data-width conversion block inside the processing system that converts between the internal 128-bit wide bus and the configuration seen in the block diagram. When configured in 128-bit mode (as is the case in the PYNQ image) it will send and receive the data as defined by the AXI standard shown above. In your design, however, only the lowest 32-bits are actually connected so only 1 in 4 32-bit words will be correctly sent into and out of the design. The other lines are not tied off so will default to 0.

Peter