Hello,
I am trying to use a VDMA to stream inputs with varying sizes to a consumer IP that does some processing on the inputs. Currently, I have to stop the IPs in the design and restart the Python kernel before I can change the video mode for the writechannel of the VDMA without messing up the output of the consumer IP. By messing up the output I mean the values are correct but out of position in the output buffer as if the whole buffer wrapped around and shifted a few locations.
What is the correct way of stopping the VDMA to allow inputs with different dimensions without restarting the kernel?
Set up
- Custom ZU+ SoC
- PYNQ 2.7.0 image
- Xilinx tools 2020.2
- Ubuntu 18.04.6 set up using Vagrant file
VDMA configuration
Example code
from pynq import Overlay
from pynq.lib.video import VideoMode
# Load overlay
ol = Overlay('my_bitstream.bit')
my_ip = ol.my_ip
vdma = ol.vdma
vdma_in = vdma.writechannel
# Configure VDMA
input_w = 224 # cols
input_h = 224 # rows
vdma_in.mode = VideoMode(input_w, input_h, 24)
# Start IPs
vdma_in.start()
my_ip.start() # self.write(0, 0x81) in the "driver" for the IP
# Send data
input_data = (np.random.rand(input_h, input_w, 3) * 255).astype(np.uint8)
in_frame = vdma_in.newframe()
in_frame[...] = input_data
vdma_in.writeframe(in_frame)
# Get result
out_data = my_ip.results() # returns data from output buffer of my_ip in the "driver"
# Stop IPs
my_ip.stop() # self.write(0, 0) in the "driver" for the IP
vdma_in.stop()
# Delete overlay
del ol
Application
I am trying to benchmark my consumer IP for various inputs sizes to see how the hardware implementation compares to software implementation for different input sizes. Some of the benchmarks may take for a while and would like to be able to run them back to back without stopping the kernel.
Other VDMA questions
- Do I need to create a new input frame using
vdma_in.newframe()
for every new input I pass to the VDMA? My benchmarked FPS drops by ~30% if I create a new frame for every new input.
When running in a loop I seemed to be able to get away with creating a single input frame outside of the loop and use it repeatedly. From the pynq.lib.video module documentation it seems that when I hand back the ownership of the buffer to the DMA usingvdma_in.writeframe(in_frame)
I shouldn’t access the buffer (frame) again as it may be deleted without warning. I ran this set up of 1000s of iterations without any issue. - When I provided inputs with dimensions not multiples of 4, my output becomes inconsistent. Is this a limitation of the VDMA or something likely in my IP?
Thank you for your help.
Mario