Image Convolution Data Pre-Processing

Hi all, I’m working on a hardware accelerator for image convolution. I have much of my design ready to go but I’m faced with a small issue and I’m curious how you guys would fix it.

Target Device: Pynq Z2

So basically I’ve created an architecture to complete convolutions on hardware using a DMA controller and a custom IP to process the data coming from memory. Example graphic of convolution:
image

After doing some testing with the DMA, it seems the controller will send each value individually like I want, but one row at a time. This is a screenshot where I use the DMA as a FIFO and move value from one memory location to another.

The issue is that my data processing block is expecting my values to come in from the DMA in the following order:
image

I can do some data pre-processing to make sure the values are coming into my IP in the right order to fill my kernel, but this does not seem ideal. It might be worth mentioning that I was planning to do one color channel at a time so thats some more pre-processing overhead I might need. Any ideas?

Is this HLS controller? you could modify your code in C.
What about creating mini-ip-core that would rotate that matrix?

The controller design is written in RTL, I haven’t quite figured out how the HLS side works yet…

The issue with simply rotating the matrix is that the controller would need to buffer a lot of data because the DMA will send ALL pixels of row 0 for the picture, then ALL of row 1, then ALL of row 2 and so on. So since I’m looking to get a value from the first three rows (and then more after I’ve completed one line) I think I would need to store three rows entirely.

So this is great occasion to try how hls work and for practice rotate incoming data :smiley:
Hmm, I don’t see another way besides preprocessing