PYNQ: PYTHON PRODUCTIVITY FOR ZYNQ

PYNQ to BRAM - weird BRAM addressing

Hi all,

I have an issue when loading a certain data pattern in my PL dual port BRAM, from where I then stream the loaded data to my DAC (RFDC).

Before I start - here are the settings I did in my Vivado Block Design:

  • Board: ZCU111

  • BRAM Controller Settings:
    Data width: 32 bit
    Memory Depth (auto): 1024

  • BRAM:
    Mode: BRAM Controller,
    Type: True Dual Port RAM
    Port A & B: Write/Read width is 32 bit, Write/Read Depth 1024

  • Binary Counter IP:
    Final Count Value: 1000(hex) corresponding to 4096(dec)
    Increment value (hex): 1 !

Well, I would like to generate a 1024 data-word long sequence in Python (PYNQ), which is then stored in my BRAM (therefore the BRAM depth is 1024). The data width is 32 bit and the data from PYNQ is loaded to the BRAM Port A. From Port B, I would like then to stream the data pattern in a cyclic repetition to my RFDC DAC. The address is counted by a binary counter, which final value is set to 4096.
And here is the issue! I only can see my whole data pattern on my oscilloscope, if I set the counter to count up to 4096 instead of 1024 (which I would normally expect since my BRAM depth is only 1024…).
If I set the final count to 1024, then I observe only the first quarter of my data pattern at the oscilloscope.
Moreover, If I compare the repetition rate of my pattern shown on the osci with the clock rate used, than the pattern rate is a factor of 4096 smaller than my clock.
Thus, it seems, that my BRAM is indeed 4096 deep and filled my pattern…
I cannot explain this behaviour, since I followed in PYNQ all the instructions from other tutorials (e.g. this tutorial https://www.hackster.io/adam-taylor/pynq-controlled-neopixel-led-cube-92a1c1).

I have attached a screenshots of my Jupyter Notebook code where I write my data to the BRAM.
Please note, that I used the range command up to 1024 only, which corresponds to 1024 data I generated. The address increments in steps of 4 (since using a 32 bit system). The memory size in the beginning, I set to 4096, since the address editor in Vivado tells me, that the address range is 4k (4 x 1024bit). Setting the mem_size to 4096 allows me to write a 1024 long array in the BRAM. I cannot see here any issue in my code…

I am very grateful for any advice and help!

Kind regards
Patrick

Are you connecting the address generator to a second AXI port on the BRAM controller or to the BRAM_B port on the block memory directly? Are you able to supply an image of the block diagram?

Peter

Hi Peter,

Thanks for the response!

Here is my vivado block design:
(As a newcomer I was only allowed to include one picture in my post)

I may just did a further progress…
I set the increments of my counter to 4 instead of 1.
Thus, I now count to 4096 in steps of 4 (1024 steps in total).
Doing so, shows me the whole pattern with the correct pattern repetition rate (1/1024 * clock rate).

I would explain this behaviour as follows:
Normally the address of a BRAM within the PL is incremented by 1.
However, when using the PS & PYNQ, we need to increment the address in steps of 4.
Therefore, the address range is 4096 bits. I did this when I wrote data into my BRAM (see my Jupyter Notebook).
I then assumed, that I can read from the BRAM with address increments of 1.
However, I now assume, that the address for reading needs to be also incremented by 4, since we did this also the same way when writing inside the BRAM.

Am I right in making this assumption?

Kind regards
Patrick

I think you are on the right track. I’ve been working my way through the use guide for the Block Memory Generator and it looks like it uses pseudo-byte-level addresses when 32-bit addressing is enabled (as it required when using the AXI BRAM controller). A quick search came up with this reddit thread.

I think the solution is to pad out the lower couple of address bits with zeros and attach your counter to the higher-order bits - you need the 32-bit addressing for the AXI connection.

Peter

For your notebook cell 11, it looks like the final data you wrote is:
mmio.write(1022+4, 0x80018001)

And if you read mmio.read(1023*4), this is out of the range your just wrote. Something else (or your previous code) might have written those locations.

And remember the address you provided is byte address. To have it working, the address should always look like a multiple of 4. Otherwise I am not sure how the system is writing unaligned addresses.