Hi, I am implementing 2D FFT on my ZCU104 board and using PYNQ for testing. The image is unit8 512 by 512 . Since it is uint8, I am able to parallelize my design by streaming 256-bit at a time. The hardware design works, however, I have to reshape the image before copying it to the DMA buffer. This is because it reads a whole column before going to the next column making the rows change constantly. I want to have the same rows being read until all columns in the rows have been read.
Right now I am slicing and stacking my image from (512, 512) to (512*16, 32) for it to work as intended. This task is “meh” and adds to pre-processing. Is there a way to tell the allocate function how the image is to be represented in memory without the need for slicing and stacking? I have looked around the forums, but I have not found an answer.