Query on Pynq DMA

Hi,

I would like to understand how the data from the DMA gets stored in the DDR RAM in in pynq. I was trying to save ADC data in bursts (using RFSoC) where, unlike the base overlay example where the trigger comes from the user space, I trigger the ADC to dump 256x8 samples every second. On the Pynq side, from the jupyter notebook, I allocate a buffer (corresponding to the above sample number), initiate a receive channel transfer, then wait. Code snippet below:

transfersize = 256
for i in range(10):
    cfg.write(0x00,int(1))    // enables the transfer to automatically generate 256x8 samples from the RFSoC.
    buffer_re = allocate(shape=(transfersize*8,), dtype=np.int16)
    axi_dma0.recvchannel.transfer(buffer_re)
    axi_dma0.recvchannel.wait()
    cfg.write(0x00,int(0))
    print(buffer_re)
    time.sleep(1)
    buffer_re.freebuffer()

This works as expected.

The IP that is generating the AXI signals is also capable of generating trigger once per second after I enable it too, so when I try this:

transfersize = 256
buffer_re = allocate(shape=(transfersize*8,), dtype=np.int16)
for i in range(10):
    if i == 0:
       axi_dma0.recvchannel.transfer(buffer_re)
       cfg.write(0x00,int(1))
       axi_dma0.recvchannel.wait()
       print(buffer_re)
       time.sleep(1)
    else:
      axi_dma0.recvchannel.transfer(buffer_re)
      axi_dma0.recvchannel.wait()
      print(buffer_re)
    time.sleep(1)
cfg.write(0x00,int(0))

This just print the same values 10 times. The same IP, when I replace the DMA with BRAM, works fine in both the modes. So I had a few doubts:

  1. After the first sample 256 samples are written, are the next 256 written to the next subsequent memory location? If this is the case should we define a 8 MB buffer directly (as 8 MB is the max buffer length for DMA), and read data to user space after 8 MB is written by having a fifo to mediate the process?
  2. If I want to have a location in the RAM fixed for data from a particular operation, I can add a BRAM and read as AXI stream from that to the DMA controller. But is there a cleaner way for this?

Sincerely,

Mugundhan

Hi @MugundhanV,

Welcome to the PYNQ community.

I think that in you last code snippet there’s an extra time.sleep(1) inside the if, which may cause problems.

After the first sample 256 samples are written, are the next 256 written to the next subsequent memory location? If this is the case should we define a 8 MB buffer directly (as 8 MB is the max buffer length for DMA), and read data to user space after 8 MB is written by having a fifo to mediate the process?

No, the buffer should be overwritten as it is being done in the first code snippet you show.

If I want to have a location in the RAM fixed for data from a particular operation, I can add a BRAM and read as AXI stream from that to the DMA controller. But is there a cleaner way for this?

I don’t fully understand this question. Can you please clarify?

Mario

1 Like

Hi Mario,

Thanks for your reply. I think your first question answers the second :slight_smile: Let me try this again and check how it works !

To understand, I want to transmit 256 samples (assuming tvalid and tlast are asserted correctly), the buffer will get overwritten with the new samples?

In case I hold my tvalid high always (to stream data) and enable tlast once in 2k samples (to packetize), the and have a buffer of, say 2k in the DDR, this buffer will get overwritten everytime a new packet comes in?

Sincerely,
Mugundhan

To understand, I want to transmit 256 samples (assuming tvalid and tlast are asserted correctly), the buffer will get overwritten with the new samples?

You need to assert tlast in the last transaction of the 256 samples.

In case I hold my tvalid high always (to stream data) and enable tlast once in 2k samples (to packetize), the and have a buffer of, say 2k in the DDR, this buffer will get overwritten everytime a new packet comes in?

You should only assert tvalid when you want to transmit data. If you program the DMA to transfer 2K samples to the same physical address, the buffer should be overwritten.

Mario

Hi Mario,

I got ahead further in the debugging this issue. I was able to get new data on new transfers every time. I have modified the firmware. It is a spectrometer outputting 4 channels of 256 long 32 bit spectral points. Now, i wanted to transfer 256*32 bit samples, but if I allocate like allocate(shape=(1024,),dtype=np.uint32), I get an internal error, but when I increase the buffer size, then the internal error disappears. But however big I allocate the shape, It saturates at 4688 bytes. While I can work with this by slicing the buffer to get the samples I want, I would like to know why this happens? My DMA conf is as follows:

Here is my address editor:

This is the same config as in the base design. I see that the buffer length allocated is 26, which means i can have a buffer length of 2**26?

Sincerely,
Mugundhan

Can you post the error?

Cathal

Hi Cathal,

spec_buffer = allocate(shape=(1024,), dtype=np.uint32)
spec_dma.recvchannel.transfer(spec_buffer)
spec_dma.register_map

results in

RegisterMap {
  MM2S_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=0, IRQDelay=0),
  MM2S_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  MM2S_CURDESC = Register(Current_Descriptor_Pointer=0),
  MM2S_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  MM2S_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  MM2S_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  MM2S_SA = Register(Source_Address=0),
  MM2S_SA_MSB = Register(Source_Address=0),
  MM2S_LENGTH = Register(Length=0),
  SG_CTL = Register(SG_CACHE=0, SG_USER=0),
  S2MM_DMACR = Register(RS=1, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=1, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
  S2MM_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  S2MM_CURDESC = Register(Current_Descriptor_Pointer=0),
  S2MM_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  S2MM_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  S2MM_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  S2MM_DA = Register(Destination_Address=25739264),
  S2MM_DA_MSB = Register(Destination_Address=0),
  S2MM_LENGTH = Register(Length=4096)
}

This is the error I get.

RuntimeError                              Traceback (most recent call last)
Input In [98], in <cell line: 3>()
      1 spec_dma.recvchannel.transfer(spec_buffer)
      2 spec_dma.register_map
----> 3 spec_dma.recvchannel.wait()

File /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq/lib/dma.py:174, in _SDMAChannel.wait(self)
    172 if self.error:
    173     if error & 0x10:
--> 174         raise RuntimeError("DMA Internal Error (transfer length 0?)")
    175     if error & 0x20:
    176         raise RuntimeError(
    177             "DMA Slave Error (cannot access memory map interface)"
    178         )

RuntimeError: DMA Internal Error (transfer length 0?)

But when i increase the DMA size, and change the allocated size to 2048, then the register map is like this:

RegisterMap {
  MM2S_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=0, IRQDelay=0),
  MM2S_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  MM2S_CURDESC = Register(Current_Descriptor_Pointer=0),
  MM2S_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  MM2S_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  MM2S_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  MM2S_SA = Register(Source_Address=0),
  MM2S_SA_MSB = Register(Source_Address=0),
  MM2S_LENGTH = Register(Length=0),
  SG_CTL = Register(SG_CACHE=0, SG_USER=0),
  S2MM_DMACR = Register(RS=1, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=1, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
  S2MM_DMASR = Register(Halted=0, Idle=1, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=1, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
  S2MM_CURDESC = Register(Current_Descriptor_Pointer=0),
  S2MM_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
  S2MM_TAILDESC = Register(Tail_Descriptor_Pointer=0),
  S2MM_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
  S2MM_DA = Register(Destination_Address=2010193920),
  S2MM_DA_MSB = Register(Destination_Address=0),
  S2MM_LENGTH = Register(Length=4688)
}

and I’m able to do multiple reads from the PL without any errors. But beyond this, how much ever i increase the size I allocate, the S2MM length always saturates to 4688. I have transferred even 65k samples before using the base overlay example. So i’m not able to understand why there is an upper limit to the size…

Sincerely,
Mugundhan

Hi,

I was able to get some reproducable results here. Turns out I just had to re-assign the address for each block to get consistent RAM allocations. But the DMA output seems to read 4 samples extra than what I want. But when I probe the Signals into the S2MM DMA block, they are what I want. Like for example, I’m having a spectrometer design, with 1MHz channel widths. I give in a 10 MHz sine, and I see the output at the 10th channel in the AXI stream input to the DMA block, but when I read it out from ipython notebook, it always comes as the 14th sample, and the 0-frequency is at channel 4. This has been consistent for any number of times I run the spectrometer. I can slice the appropriate samples and make it work, but it still makes me uncomfortable as I don’t understand why this happens… Any suggestions/pointers to check at, or obvious poitns I may be missing will be really helpful! :slight_smile:
Sincerely,

Mugundhan