PYNQ: PYTHON PRODUCTIVITY

Transferring XADC packets of samples through DMA

My goal is to read around 600K XADC samples per second through DMA. I’ve found a few older threads on how to do this with a custom IP, but it baffles me why it wouldn’t be possible with native blocks.

So far, my work is largely based on Adam Taylor’s signal processing tutorial.. Basic layout is XADC samples → AXI Subset Converter (to generate TLAST every 256 samples) → AXI DMA. However, this limits the PS to transferring 256 samples per operation. I then have to gather about 128 operations into a 65KB packet in the PS before further manipulation.

I have been working for about a week trying to find a way to transfer the whole 65KB in one operation, preferably with native IP blocks in Vivado (for now). It seems if I could just extend the AXI Subset Converter to 32k samples between TLAST assertions, it would work perfectly. I haven’t been able to build a workaround. I’ve tried using the generated TLAST or eoc_out as a clock for a binary counter with a threshold at 128, or slicing the ‘128’ bit, or tying TLAST high, or a FIFO in packet mode replacing the subset converter. The majority of attempts worked on the first PS operation, but failed because A) errored with DMA not idle/started, B) subsequent transfers had only 1 sample, or C) transfer sizes were unpredictable.

It seems that the slower my XADC clock, the worse this bottleneck becomes - I guess the PS is spending more time waiting for the next transfer to be ready. When I reduced my sample rate to 150KSPS, the PS could only gather about 120KSPS with obvious missed samples. I’ve since expanded the FIFO on the XADC and the DMA, but I still can’t access the memory fast enough to recover every sample.

Thanks

1 Like

Are you able to write VHDL or HLS? I think you could hack something together from blocks in Vivado, but a small piece of VHDL/HLS may be easier.
I think the VHDL would be the input AXI stream signals registered and passed to the corresponding AXI stream outputs and a counter to set TLAST whenever you want.
Edit: you would need to stall the counter when valid/ready on the output stream channel are are not set. I may have overlooked something else but it should be straightforward to make this IP.

I’ve tried using the generated TLAST or eoc_out as a clock for a binary counter with a threshold at 128

You would need to make sure that the TLAST you derive is only valid for 1 clock cycle. If you use a counter a take one of the bits, it will stay high for more than one cycle. This may be why this doesn’t work or some of the other methods you described. You may just need a “Utility logic block” to include an AND gate or similar to only set TLAST for one clock.

It seems that the slower my XADC clock, the worse this bottleneck becomes
Are you using the same clock for the XADC as the rest of your PL design - including the DMA? If you try set the clock as high as possible, the ratio between sampling the ADC and capturing the data will be higher and may help reduce data loss.
Deeper FIFOs may help but I see you mention you tried this.

Cathal

1 Like

I’ve written VHDL in school, so I’m sure I could get there with further effort. I appreciate your suggestions!

That’s a good point. You’re absolutely right, I was setting TLAST high for one tick of the input, not the AXI stream. Duh. That would probably just appear like it went high forever on the PS side.

I realized yesterday that everything was inferring a default clock. Adding a Clock Wizard and running everything else faster was a step in the right direction.

2 Likes