Query on Pynq DMA

MugundhanV · May 16, 2023, 8:03am

Hi,

I would like to understand how the data from the DMA gets stored in the DDR RAM in in pynq. I was trying to save ADC data in bursts (using RFSoC) where, unlike the base overlay example where the trigger comes from the user space, I trigger the ADC to dump 256x8 samples every second. On the Pynq side, from the jupyter notebook, I allocate a buffer (corresponding to the above sample number), initiate a receive channel transfer, then wait. Code snippet below:

transfersize = 256
for i in range(10):
    cfg.write(0x00,int(1))    // enables the transfer to automatically generate 256x8 samples from the RFSoC.
    buffer_re = allocate(shape=(transfersize*8,), dtype=np.int16)
    axi_dma0.recvchannel.transfer(buffer_re)
    axi_dma0.recvchannel.wait()
    cfg.write(0x00,int(0))
    print(buffer_re)
    time.sleep(1)
    buffer_re.freebuffer()

This works as expected.

The IP that is generating the AXI signals is also capable of generating trigger once per second after I enable it too, so when I try this:

transfersize = 256
buffer_re = allocate(shape=(transfersize*8,), dtype=np.int16)
for i in range(10):
    if i == 0:
       axi_dma0.recvchannel.transfer(buffer_re)
       cfg.write(0x00,int(1))
       axi_dma0.recvchannel.wait()
       print(buffer_re)
       time.sleep(1)
    else:
      axi_dma0.recvchannel.transfer(buffer_re)
      axi_dma0.recvchannel.wait()
      print(buffer_re)
    time.sleep(1)
cfg.write(0x00,int(0))

This just print the same values 10 times. The same IP, when I replace the DMA with BRAM, works fine in both the modes. So I had a few doubts:

After the first sample 256 samples are written, are the next 256 written to the next subsequent memory location? If this is the case should we define a 8 MB buffer directly (as 8 MB is the max buffer length for DMA), and read data to user space after 8 MB is written by having a fifo to mediate the process?
If I want to have a location in the RAM fixed for data from a particular operation, I can add a BRAM and read as AXI stream from that to the DMA controller. But is there a cleaner way for this?

Sincerely,

Mugundhan

marioruiz · May 17, 2023, 7:46am

Hi @MugundhanV,

Welcome to the PYNQ community.

I think that in you last code snippet there’s an extra time.sleep(1) inside the if, which may cause problems.

After the first sample 256 samples are written, are the next 256 written to the next subsequent memory location? If this is the case should we define a 8 MB buffer directly (as 8 MB is the max buffer length for DMA), and read data to user space after 8 MB is written by having a fifo to mediate the process?

No, the buffer should be overwritten as it is being done in the first code snippet you show.

If I want to have a location in the RAM fixed for data from a particular operation, I can add a BRAM and read as AXI stream from that to the DMA controller. But is there a cleaner way for this?

I don’t fully understand this question. Can you please clarify?

Mario

MugundhanV · May 17, 2023, 8:32am

Hi Mario,

Thanks for your reply. I think your first question answers the second Let me try this again and check how it works !

To understand, I want to transmit 256 samples (assuming tvalid and tlast are asserted correctly), the buffer will get overwritten with the new samples?

In case I hold my tvalid high always (to stream data) and enable tlast once in 2k samples (to packetize), the and have a buffer of, say 2k in the DDR, this buffer will get overwritten everytime a new packet comes in?

Sincerely,
Mugundhan

marioruiz · May 17, 2023, 9:00am

To understand, I want to transmit 256 samples (assuming tvalid and tlast are asserted correctly), the buffer will get overwritten with the new samples?

You need to assert tlast in the last transaction of the 256 samples.

In case I hold my tvalid high always (to stream data) and enable tlast once in 2k samples (to packetize), the and have a buffer of, say 2k in the DDR, this buffer will get overwritten everytime a new packet comes in?

You should only assert tvalid when you want to transmit data. If you program the DMA to transfer 2K samples to the same physical address, the buffer should be overwritten.

Mario

MugundhanV · May 22, 2023, 11:55am

Hi Mario,

I got ahead further in the debugging this issue. I was able to get new data on new transfers every time. I have modified the firmware. It is a spectrometer outputting 4 channels of 256 long 32 bit spectral points. Now, i wanted to transfer 256*32 bit samples, but if I allocate like allocate(shape=(1024,),dtype=np.uint32), I get an internal error, but when I increase the buffer size, then the internal error disappears. But however big I allocate the shape, It saturates at 4688 bytes. While I can work with this by slicing the buffer to get the samples I want, I would like to know why this happens? My DMA conf is as follows:

Here is my address editor:

This is the same config as in the base design. I see that the buffer length allocated is 26, which means i can have a buffer length of 2**26?

Sincerely,
Mugundhan

cathalmccabe · May 22, 2023, 12:26pm

Can you post the error?

Cathal

MugundhanV · May 22, 2023, 12:43pm

Hi Cathal,

spec_buffer = allocate(shape=(1024,), dtype=np.uint32)
spec_dma.recvchannel.transfer(spec_buffer)
spec_dma.register_map

results in

RegisterMap {
  MM2S_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=0, IRQDelay=0),
  MM2S_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  MM2S_CURDESC = Register(Current_Descriptor_Pointer=0),
  MM2S_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  MM2S_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  MM2S_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  MM2S_SA = Register(Source_Address=0),
  MM2S_SA_MSB = Register(Source_Address=0),
  MM2S_LENGTH = Register(Length=0),
  SG_CTL = Register(SG_CACHE=0, SG_USER=0),
  S2MM_DMACR = Register(RS=1, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=1, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
  S2MM_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  S2MM_CURDESC = Register(Current_Descriptor_Pointer=0),
  S2MM_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  S2MM_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  S2MM_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  S2MM_DA = Register(Destination_Address=25739264),
  S2MM_DA_MSB = Register(Destination_Address=0),
  S2MM_LENGTH = Register(Length=4096)
}

This is the error I get.

RuntimeError                              Traceback (most recent call last)
Input In [98], in <cell line: 3>()
      1 spec_dma.recvchannel.transfer(spec_buffer)
      2 spec_dma.register_map
----> 3 spec_dma.recvchannel.wait()

File /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq/lib/dma.py:174, in _SDMAChannel.wait(self)
    172 if self.error:
    173     if error & 0x10:
--> 174         raise RuntimeError("DMA Internal Error (transfer length 0?)")
    175     if error & 0x20:
    176         raise RuntimeError(
    177             "DMA Slave Error (cannot access memory map interface)"
    178         )

RuntimeError: DMA Internal Error (transfer length 0?)

But when i increase the DMA size, and change the allocated size to 2048, then the register map is like this:

RegisterMap {
  MM2S_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=0, IRQDelay=0),
  MM2S_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  MM2S_CURDESC = Register(Current_Descriptor_Pointer=0),
  MM2S_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  MM2S_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  MM2S_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  MM2S_SA = Register(Source_Address=0),
  MM2S_SA_MSB = Register(Source_Address=0),
  MM2S_LENGTH = Register(Length=0),
  SG_CTL = Register(SG_CACHE=0, SG_USER=0),
  S2MM_DMACR = Register(RS=1, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=1, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
  S2MM_DMASR = Register(Halted=0, Idle=1, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=1, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  S2MM_CURDESC = Register(Current_Descriptor_Pointer=0),
  S2MM_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  S2MM_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  S2MM_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  S2MM_DA = Register(Destination_Address=2010193920),
  S2MM_DA_MSB = Register(Destination_Address=0),
  S2MM_LENGTH = Register(Length=4688)
}

and I’m able to do multiple reads from the PL without any errors. But beyond this, how much ever i increase the size I allocate, the S2MM length always saturates to 4688. I have transferred even 65k samples before using the base overlay example. So i’m not able to understand why there is an upper limit to the size…

Sincerely,
Mugundhan

MugundhanV · June 1, 2023, 8:09am

Hi,

I was able to get some reproducable results here. Turns out I just had to re-assign the address for each block to get consistent RAM allocations. But the DMA output seems to read 4 samples extra than what I want. But when I probe the Signals into the S2MM DMA block, they are what I want. Like for example, I’m having a spectrometer design, with 1MHz channel widths. I give in a 10 MHz sine, and I see the output at the 10th channel in the AXI stream input to the DMA block, but when I read it out from ipython notebook, it always comes as the 14th sample, and the 0-frequency is at channel 4. This has been consistent for any number of times I run the spectrometer. I can slice the appropriate samples and make it work, but it still makes me uncomfortable as I don’t understand why this happens… Any suggestions/pointers to check at, or obvious poitns I may be missing will be really helpful!
Sincerely,

Mugundhan

Topic		Replies	Views
DMA output buffer does not fill up Support	2	900	February 25, 2021
AXI DMA with FFT and PYNQ-Z2 Support	5	2856	March 25, 2021
DMA receive Support	4	33	December 3, 2024
DMA MM2S stuck at first data and not Changing Support	6	321	January 31, 2024
Zynq Ultrascal+ RF data Converter to DMA Support	3	1281	July 4, 2023

Query on Pynq DMA

Related topics