As usual, working on the ZCU208 with the latest version of PYNQ. I’ve gotten fairly far along in my project now, and have reached a point where I need to feed the PL with signals that are calculated on the Python side (or pre-calculated and stored as files) as numerical relativity is not something that’s very easy to do in an FPGA.
My question boils down to the following: What is the timing on loops running in the PS? The Zynq module in the block design is fed by some 99.9MHz clock or so – is this what’s driving the Python? How many cycles of this clock are required to set a memory-mapped register over AXI Lite?
It may be 1.2GHz for the board you are using, but you would need to check this.
The Zynq module in the block design is fed by some 99.9MHz clock
If you check the PS settings, you will see that this clock is the input clock that is used to derive the internal clocks. The PS clock is much higher.
Going back to your question, if you are trying to transfer data using MMIO (AXI lite read/write) it will be relatively slow - 100’s ms. Python adds an overhead to this. You could speed this up by bypassing PYNQ and using C/C++ driver, but this will still be relatively slow. This may be OK for sending control data (perhaps infrequently), but not for high performance data transfer.
For large amounts of data, you would be better moving the data to PS DRAM, and accessing it directly from the PL via the HP ports.
Hm… it doesn’t seem to be exactly what I need. I need to be able to add an offset to a data stream that can be updated at a very consistent rate. I don’t think I need to stream it… at least I hope I don’t. I was hoping to have been done with the PL side of my implementation.
I need to be able to add an offset to a data stream that can be updated at a very consistent rate.
What do you mean by “consistent”?
You could write a new value using MMIO (over AXI Lite) to a register in a loop periodically.
How many cycles of this clock are required to set a memory-mapped register over AXI Lite?
Going back to your original questions, there will be variability in the time between loop iterations. I would not count this in clock cycles as it is not deterministic, and it would be a lot of clock cycles at >1GHz - this could me milli-seconds or more depending on what the OS is doing. (Ubuntu is not a deterministic/real time OS). If this variability is OK for your application then MMIO may be the right way to do this.
I think you need to try test and benchmark this yourself.
By consistent, I mean that the offsets themselves constitute a phase modulation containing relevant signals. I could get away with an update rate of 8Hz, I think, as long as that 8Hz is consistent relative to the streaming data clock (in my case, a 128MHz that is distributed from one of the RFDC DAC tiles).
MMIO is what I’ve been using, but I’ve been thinking it over more recently and it really needs to be deterministic.
How would you test and benchmark this? Some ILA implementation and a flag raised every time an MMIO command returns BVALID or something? Then checking the time between them? It’s too many samples for an ILA, and too fast for a typical oscilloscope to measure precisely…
As Cathal pointed out above, using MMIO would make this signal non-deterministic, which based on your comment, is what you’re after.
So let’s say you need a deterministic 8Hz update based on an MMIO loaded value. I would make a signal “pulse” every 8Hz cycle with the 128MHz clock, and fetch a value from a buffer (i.e. FIFO) loaded by your MMIO object. Assuming the MMIO accesses are faster than this 8Hz, you will need to check if the buffer is about to overflow before every loading a new value over MMIO.