PYNQ: PYTHON PRODUCTIVITY

Using DMA for single channel image transfer

Hi, I’ve created a convolution processor that is shown in the following BD:

When I’m using simple 2D arrays in Jupyter notebooks, I am successful in completing convolution. But when I try this with the numpy library to create an image, I think I am likely doing something wrong as the DMA transfer stalls and will only halt by “KeyboardInterrupt”. This is the code I am using to put an image into a buffer: (I’m attempting to process one data channel at a time at the moment)

# Picture metadata
image_path = "Images/mario.jpg"
original_image = Image.open(image_path)
width, height = original_image.size
in_buffer = allocate(shape=(height, width, 1),dtype=np.uint32, cacheable=1)
out_buffer = allocate(shape=(height-2, width-2, 3),dtype=np.uint32, cacheable=1)

# Copying original image into buffer
array = abs(np.array(original_image)[:,:,[2]])
in_buffer[:] = array

# Begin hardware processing stream
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer[:,:,[2]])
dma.sendchannel.wait()
dma.recvchannel.wait()

I’ve setup an ILA for my design and the DMA seems to recieve my convolution data but then becomes unready for some reason. Take a look at this starting transaction captured from the ILA output:

1) Convolution Controller receives necessary data and begins processing
2) Convolution puts first cSum onto the bus and it the first 4 are seemingly received by the DMA
3) The DMA becomes unready and does not come back on until interrupt

Any help would be appreciated! I’ve tried to investigate this to the furthest I know how but I am lost sadly.

Edit:
Pynq-Z2 Target Device
I’ve attatched the .bit
PL_Convolution.bit (3.9 MB)

Can you try swapping the sendchannel and recvchannel transfers and re-running the ILA. I suspect the halt you are seeing is because of the time it takes for the DMA to be programmed after data has started arriving. Usually these DMA issues are the result of a late/missing tlast.

I’d also try with a buffer that doesn’t need to be sliced - we haven’t got any supported for strided DMA so even if the transfer completes it’s likely not going to be in the form you want.

Peter

Hi Peter,
So it turns out your idea of using buffers that I wasn’t slice from was the way to go. Since I’m only doing single channel I just queue up three transfers like this and then combine them into one image later.

in_buffer[:] = np.array(original_image)[:,:,[0]]
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer0)
dma.sendchannel.wait()
dma.recvchannel.wait()

in_buffer[:] = np.array(original_image)[:,:,[1]]
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer1)
dma.sendchannel.wait()
dma.recvchannel.wait()

in_buffer[:] = np.array(original_image)[:,:,[2]]
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer2)
dma.sendchannel.wait()
dma.recvchannel.wait()

Thank you for your help!