AXI DMA in cyclic bd mode

Hello,

I have been trying for days to get the AXI DMA IP core to run in cyclic bd mode with PYNQ. I have followed the programming sequence from the product guide (PG021). I only want to use the S2MM channel. So I created a buffer with pynq.allocate() and an array of descriptors, with the last descriptor pointing back to the first. Then I used the programming sequence from scatter/gather mode:

A DMA operation for the S2MM channel is set up and started by using the following sequence:

  1. Write the address of the starting descriptor to the Current Descriptor register. If AXI DMA is configured for an address space greater than 32, then also program the MSB 32 bits of the current descriptor.

  2. Start the S2MM channel running by setting the run/stop bit to 1 (S2MM_DMACR.RS =1).

  3. The halted bit (DMASR.Halted) should deassert indicating the S2MM channel is running.

  4. If desired, enable interrupts by writing a 1 to S2MM_DMACR.IOC_IrqEn and S2MM_DMACR.Err_IrqEn.

  5. Write a valid address to the Tail Descriptor register. If AXI DMA is configured for an address space greater than 32, then also program the MSB 32 bits of the current descriptor.

  6. Writing to the Tail Descriptor register triggers the DMA to start fetching the descriptors from the memory. The fetched descriptors are processed and any data received from the S2MM streaming channel is written to the memory.

The only thing that needs to be changed then is that the cyclic bit in the control register must be set and the tail descriptor register must be written with a value that is not part of the BD chain (e.g., 0x50).

When I start the DMA now, I can read the status and control registers cyclically. Everything seems to be working as expected here. Even when I look at the current descriptor register, my buffer is always being written to cyclically, so everything is as expected. However, when I stop the DMA and look at the contents of the buffer, I get some unexpected data:

I use a generator to create a sine wave, which is also displayed correctly in the rear part of the buffer. However, the data in the front part is corrupted. Why is that? Has anyone gotten DMA to work in cyclic bd mode? I have found many incomplete forum posts on this topic. I also use buffer.invalidate() before reading the buffer for cache coherence.

Hi @Jan

Would it be possible for you to share your code and overlay files?

Hi @joshgoldsmith,

I can provide the configuration of the DMA but not the whole overlay files.

The following shows the programming sequence and start/stop of the dma:

class RingbufferDMA:
    def __init__(self, descr_num: int, descr_data: int, dma) -> None:
        self.descr_num = descr_num
        self.descr_data = descr_data
        self.descr_buf_len_bytes = 2 * self.descr_data
        self.buffer = allocate(
            shape=(self.descr_num * self.descr_data,), dtype=np.int16
        )
        self.descriptors = allocate(shape=(self.descr_num, 16), dtype=np.uint32)
        self.read_idx = 0
        self.dma = dma
        self.bd_size_bytes = 16 * 4  # 16 words * 4 bytes per block descriptor

    def write_descriptors(self) -> None:
        logging.info("Write descriptor array (cyclic BD mode)")

        for i in range(self.descr_num):
            # init all words to zero
            self.descriptors[i, :] = 0

            if i == self.descr_num - 1:
                # in cyclic bd mode the last descriptor points back to the first
                next_bd_addr = self.descriptors.device_address
            else:
                # pointer to next descriptor
                next_bd_addr = self.descriptors.device_address + (
                    (i + 1) * self.bd_size_bytes
                )

            self.descriptors[i, 0] = next_bd_addr & 0xFFFFFFFF
            self.descriptors[i, 1] = next_bd_addr >> 32

            # buffer length
            self.descriptors[i, 6] = self.descr_buf_len_bytes

            # buffer address (64-bit)
            buf_addr = self.buffer.device_address + (i * self.descr_buf_len_bytes)
            self.descriptors[i, 2] = buf_addr & 0xFFFFFFFF
            self.descriptors[i, 3] = buf_addr >> 32

        # cache coherency
        try:
            if not self.descriptors.coherent:
                logging.info("Flushing descriptor array")
                self.descriptors.flush()
        except Exception as e:
            logging.warning(f"flush() not available: {e}")

        try:
            if not self.buffer.coherent:
                logging.info("Flushing buffer array")
                self.buffer.flush()
        except Exception as e:
            logging.warning(f"flush() not available: {e}")

class _PaSlot:
    def __init__(self, dma_ip_voltage, name: str):
        self.name = name
        self.dma_voltage = dma_ip_voltage
        self.num_descr = 16 * 100
        self.data_descr = 2048

        self.buffer_voltage = RingbufferDMA(
            descr_num=self.num_descr, descr_data=self.data_descr, dma=self.dma_voltage
        )
        self.buffer_voltage.write_descriptors()

    def start_dma_voltage(self) -> None:
        self._start_dma(
            self.dma_voltage, self.buffer_voltage.descriptors.device_address
        )

    def _start_dma(self, dma, base_addr) -> None:
        # reset DMA
        # 0x30: S2MM DMA control register ->
        # | IRQDelay  | IRQ Threshold | RSVD | ERR_IrqEN | Dly_IrqEN | IOC_IrqEN | RSVD     | Cyclic BD enable | Keyhole | Reset | RSVD | Run/Stop |
        # | 31 ... 24 | 23 ... 16     | 15   | 14        | 13        | 12        | 11 ... 5 | 4                | 3       | 2     | 1    | 0        |
        # | 0x0       | 0x0           | 0    | 0         | 0         | 0         | 0x0      | 1                | 0       | 1     | 0    | 0        |
        dma.write(0x30, 0x14)

        # set current descriptor
        # 0x38: S2MM current descriptor pointer lower 32 address bits
        # 0x3C: S2MM current descriptor pointer upper 32 address bits
        dma.write(0x38, base_addr & 0xFFFFFFFF)
        dma.write(0x3C, (base_addr >> 32) & 0xFFFFFFFF)

        # start DMA
        # 0x30: S2MM DMA control register ->
        # | IRQDelay  | IRQ Threshold | RSVD | ERR_IrqEN | Dly_IrqEN | IOC_IrqEN | RSVD     | Cyclic BD enable | Keyhole | Reset | RSVD | Run/Stop |
        # | 31 ... 24 | 23 ... 16     | 15   | 14        | 13        | 12        | 11 ... 5 | 4                | 3       | 2     | 1    | 0        |
        # | 0x0       | 0x0           | 0    | 0         | 0         | 0         | 0x0      | 1                | 0       | 0     | 0    | 1        |
        dma.write(0x30, 0x11)

        # set tail descriptor
        # 0x40: S2MM tail descriptor pointer lower 32 address bits
        # 0x44: S2MM tail descriptor pointer upper 32 address bits
        # in cyclic BD design "the tail descriptor register does not serve any purpose and is only
        # used to trigger the DMA engine. [...] Program the tail descriptor register with some value
        # which is not part of the BD chain. Say for example, 0x50."
        dma.write(0x40, 0x50)
        dma.write(0x44, 0x50)

class FpgaManager:
    def __init__(self, overlay: str) -> None:
        try:
            self.overlay = Overlay(overlay)
        except Exception as e:
            logging.error(f"Overlay not loaded properly: {e}")
            sys.exit(1)

        try:
            self.slot1 = _PaSlot(
                self.overlay.DMAs_adc.axi_dma_0,
                "Slot 1",
            )
            self.dma_fft = self.overlay.FFT_Teil.axi_dma_fft
        except Exception as e:
            logging.error(f"Get DMA IPs error: {e}")
            sys.exit(1)


async def main():
    pa_fpga_manager = FpgaManager(overlay=cfg.OVERLAY)

    logging.info(f"Start cyclic data aquisition")

    pa_fpga_manager.slot1.start_dma_voltage()
  
    await asyncio.sleep(20)

    pa_fpga_manager.slot1.dma_stop()

    await asyncio.sleep(1)

    pa_fpga_manager.slot1.buffer_voltage.buffer.invalidate()

    await asyncio.sleep(1)

    data_voltage = np.copy(pa_fpga_manager.slot1.buffer_voltage.buffer)
    data_voltage.tofile("samples_voltage.bin")

if __name__ == "__main__":
    asyncio.run(main())

This is only the code regarding the dma.

The PYNQ DMA driver already supports cyclic BD mode. Although it is undocumented, the functionality is in the driver.

It’s been a while since I’ve used it in this mode but, to use cyclic_bd mode, the code would look something like this:

from pynq import allocate, Overlay
import numpy as np
ol = Overlay('mybitstream.bit')
dma = ol.dma

buf = allocate(shape=(100,), dtype=np.uint32)
data = np.arange(100, dtype=np.uint32)
buf[:] = data

dma.sendchannel.transfer(input_buffer, cyclic=True)

Just to clarify, cyclic_bd mode is only available when the DMA is in Scatter Gather mode.

1 Like