Reducing RFSoC4x2 MTS Capture Overhead

Hi, I am using the MTS overlay (using PYNQ v3.0.1) on RFSoC4x2 to perform measurements. I am able to successfully perform synchronized captures on the 3 channels, as required in my application. However, performing these captures with calls to ol.internal_capture() takes ~0.5 seconds, whereas the actual capture duration (64 kSamples at 4 GSPS) is only ~16 us. I have an automated script that reconfigures some external hardware between each measurement that takes <1 ms to run, so this 0.5 s capture overhead is the bottleneck in my ability to take measurements quickly.

Is this overhead expected, and if so, where does it come from? Maybe it’s due to saving the capture to memory, and could be reduced by modifying the MTS design to pipeline capturing and saving to several blocks of memory? Maybe it is due to Python/PYNQ and could be eliminated using a more bare metal approach? Or maybe it is due to some limitation of MTS itself?

Any insights into where this extra delay is coming from would be greatly appreciated, as would any suggestions on how to more quickly perform synchronized captures using the RFSoC.

Thank you so much!

Hi @dko

Welcome to the PYNQ community!

I’ve been meaning to look into the MTS repo for a while but haven’t had the time yet, so, unfortunately, I don’t think I’ll be much help on this one.

However, from a quick look at the code I can see a few sleep commands, particularly the one in trigger_capture() which explicitly waits for 0.5 seconds:

As I said, I’m not familiar with this repo, so I’m not sure the reason the sleep statements are there, but my guess is to make sure the capture is complete without the need for interrupts or polling. You could try to shorten this sleep value with a bit of experimentation, making sure you’re not missing any data when capturing.

Additionally, I can see that the internal_capture() function makes an explicit copy of each channel buffer:

With a large number of samples and a high sample rate, this could potentially increase the latency as well. You could try reducing the number of copies by returning the original buffers directly (e.g. triplebuffer[0] = self.adc_capture_chA[:len(triplebuffer[0])]. Again, I’m not familiar with this repo, so I’m not sure of the ramifications of this, so caution here would be advised.

If you do find an answer please post back with the solution you found, as I’m sure users would be interested.

1 Like