Hi all, I’m working on a hardware accelerator for image convolution. I have much of my design ready to go but I’m faced with a small issue and I’m curious how you guys would fix it.
Target Device: Pynq Z2
So basically I’ve created an architecture to complete convolutions on hardware using a DMA controller and a custom IP to process the data coming from memory. Example graphic of convolution:
After doing some testing with the DMA, it seems the controller will send each value individually like I want, but one row at a time. This is a screenshot where I use the DMA as a FIFO and move value from one memory location to another.
The issue is that my data processing block is expecting my values to come in from the DMA in the following order:
I can do some data pre-processing to make sure the values are coming into my IP in the right order to fill my kernel, but this does not seem ideal. It might be worth mentioning that I was planning to do one color channel at a time so thats some more pre-processing overhead I might need. Any ideas?