[solved] AXI DMA Stream receive does not work on PYNQ-Z2 board

PYNQ-Z2 v2.6, unsure of tool version as I have never worked with PYNQ before
I am using Vitis HLS and Vivado to generate a bitstream/overlay.

I currently have an IP that calculates the output of a function (2 integers) based on 2 float inputs. I have a large array of input values that I want to run the IP on, and currently, having to write each and get the result 1 at a time, it’s very slow. I want to speed it up, and I thought about doing SIMD execution by writing an array, but I figured it would be even faster to use a stream to simply read in the values and generate outputs on the fly. Also, since I have a variable size of input (it’s being run for every pixel in an image, which may be of variable dimensions), a stream can handle that better than a fixed-size array.

I followed along on some tutorials, but I could not get the streams to work. Each time, I had the same problem. Everything works fine, until trying to retrieve the DMA data. The receive channel wait() function hangs forever. I thought this was a problem with my IP simply not being activated, but this doesn’t seem to be the case (in my first example, I even had the ap_start flag connected to a constant 1 value). After many attempts to fix, I eventually just followed along with this lab exactly:
https://pp4fpgas.readthedocs.io/en/latest/axidma2.html
And I encountered the same error! Specifically, on the dma_recv.wait() line, the program freezes forever. Looking at the DMA registers, the idle and ready flags are both set to 0.

Can anybody tell me what needs to be done differently on a PYNQ board to get this to work? I should note that I have no experience with FPGAs or hardware design before this project, so if you can explain at a beginner level, it would be greatly appreciated.

(EDIT: corrected which tutorial I followed. Originally had said Lab: Axistream Single DMA (axis) — pp4fpgas 0.0.1 documentation, but I will need multiple streams, and in any case, had the same issue with that.)

1 Like

It turns out that when a DMA stream fails to finish, nothing short of restarting the board will get it to work again. For that reason, after my code failed, it made the tutorial’s code fail, until this morning when I tried it again. The tutorial actually does work. Why does it work when my code doesn’t? Because the tutorial’s code has a fixed length of input, so it knows when to halt, which I simply didn’t notice because the length is defined in a header and not the main code. Oops! Fortunately I should just be able to add the length as a function argument and write it to a port, I expect. (I had been told that using ap_axiu would take care of the TLAST signal automatically, and therefore I didn’t have to define a length, but apparently that is incorrect.)

1 Like