Timed control signals from PS to PL

redson · October 9, 2024, 2:14pm

Hello all,

As usual, working on the ZCU208 with the latest version of PYNQ. I’ve gotten fairly far along in my project now, and have reached a point where I need to feed the PL with signals that are calculated on the Python side (or pre-calculated and stored as files) as numerical relativity is not something that’s very easy to do in an FPGA.
My question boils down to the following: What is the timing on loops running in the PS? The Zynq module in the block design is fed by some 99.9MHz clock or so – is this what’s driving the Python? How many cycles of this clock are required to set a memory-mapped register over AXI Lite?

cathalmccabe · October 9, 2024, 3:19pm

You can check the PS clock:

from pynq import Clocks

print(f"CPU: {Clocks.cpu_mhz:.6f}MHz")

It may be 1.2GHz for the board you are using, but you would need to check this.

The Zynq module in the block design is fed by some 99.9MHz clock
If you check the PS settings, you will see that this clock is the input clock that is used to derive the internal clocks. The PS clock is much higher.

Going back to your question, if you are trying to transfer data using MMIO (AXI lite read/write) it will be relatively slow - 100’s ms. Python adds an overhead to this. You could speed this up by bypassing PYNQ and using C/C++ driver, but this will still be relatively slow. This may be OK for sending control data (perhaps infrequently), but not for high performance data transfer.

For large amounts of data, you would be better moving the data to PS DRAM, and accessing it directly from the PL via the HP ports.

I made some tutorials here that may be useful:
[https://github.com/cathalmccabe/PYNQ_tutorials(https://github.com/cathalmccabe/PYNQ_tutorials)
You should check DMA, and HLS AXI Master. These are ways to access the DRAM from the PL.

Cathal

redson · October 9, 2024, 3:41pm

Hm… it doesn’t seem to be exactly what I need. I need to be able to add an offset to a data stream that can be updated at a very consistent rate. I don’t think I need to stream it… at least I hope I don’t. I was hoping to have been done with the PL side of my implementation.

cathalmccabe · October 9, 2024, 8:36pm

I need to be able to add an offset to a data stream that can be updated at a very consistent rate.
What do you mean by “consistent”?

You could write a new value using MMIO (over AXI Lite) to a register in a loop periodically.

How many cycles of this clock are required to set a memory-mapped register over AXI Lite?

Going back to your original questions, there will be variability in the time between loop iterations. I would not count this in clock cycles as it is not deterministic, and it would be a lot of clock cycles at >1GHz - this could me milli-seconds or more depending on what the OS is doing. (Ubuntu is not a deterministic/real time OS). If this variability is OK for your application then MMIO may be the right way to do this.

I think you need to try test and benchmark this yourself.

Cathal

redson · October 9, 2024, 9:29pm

By consistent, I mean that the offsets themselves constitute a phase modulation containing relevant signals. I could get away with an update rate of 8Hz, I think, as long as that 8Hz is consistent relative to the streaming data clock (in my case, a 128MHz that is distributed from one of the RFDC DAC tiles).

MMIO is what I’ve been using, but I’ve been thinking it over more recently and it really needs to be deterministic.

How would you test and benchmark this? Some ILA implementation and a flag raised every time an MMIO command returns BVALID or something? Then checking the time between them? It’s too many samples for an ILA, and too fast for a typical oscilloscope to measure precisely…

patocarr · October 16, 2024, 12:32am

Hi @redson,

As Cathal pointed out above, using MMIO would make this signal non-deterministic, which based on your comment, is what you’re after.
So let’s say you need a deterministic 8Hz update based on an MMIO loaded value. I would make a signal “pulse” every 8Hz cycle with the 128MHz clock, and fetch a value from a buffer (i.e. FIFO) loaded by your MMIO object. Assuming the MMIO accesses are faster than this 8Hz, you will need to check if the buffer is about to overflow before every loading a new value over MMIO.

redson · May 27, 2025, 8:10am

Hi there,

Wish I’d logged in and read this comment 7 months ago, because you’re right on the money with what needs to happen. I’ve implemented it so that the MMIO checks the programmable empty flag and that actual empty is a couple seconds’ worth of data away. Then the buffer gets refilled via DMA.

Topic		Replies	Views
The speed of AXILite for data transfer from PS to PL Support	2	277	October 30, 2023
Sending data to custom IP and receiving it back Support	3	146	September 4, 2024
MMIO write is too slow Support	5	3403	August 16, 2019
Regarding speed of overlay Support	3	105	September 12, 2024
Accesing RAM from PL only Support	6	1193	April 9, 2024

Timed control signals from PS to PL

Related topics