DMA output buffer does not fill up

PYNQ v2.6 Standard image using the PYNQ-Z1 board

We’re trying to get out DMA to send 32 bit ints in packets of 16 from our FIFO to PYNQ. Currently we are only able to recieve one number each packet, the rest of the packet seems to be just zeros.

Our data block is creating 48 bit numbers and splitting them to 32 bit numbers. The AXIPacketInterface is set to send a tLast signal along with the 16th 32 bit number.

The DMA only has one write channel enabled with Scatter Gather off, the width of Buffer Length Register is 14, and the Address width is 32. The MM and Stream - Data Width is set to 32, Max burst size is set to 16 and Unaligned Transfers is allowed.

The FIFO has a depth of 32768 and is built with BRAM. Packet mode is enabled. Independent clocks, ACKLEN conversion mode and ECC is disabled. It has a 4-byte data signal and a tLast signal, the rest of the signals are disabled.

As far as we know the DMA and FIFO should be set up to handle bursts of 16 packets.

Our program allocates 16 ints to our buffer, it then guides the DMA to the buffer, and we save the buffer to a file. This is our code and output:

from pynq import Overlay
from pynq import allocate
import numpy as np
import time

print("Making buffer")
tidBuff = time.time()
output_buffer = allocate(shape = (16,), dtype = '<I')
print("Took ", str(time.time()-tidBuff), " to make buffer.")

print(output_buffer)

cnt = 0;

print("Loading overlay")
tidOverlay = time.time()
overlay = Overlay('/home/xilinx/pynq/overlays/aloft6/aloft.bit')

dma = overlay.dataDMA
dma.buffer_max_size = 16
print("Took ", str(time.time()-tidOverlay), " to load overlay.")

tid = time.time()

with open("hello.txt", "w") as f:
    print("Saving to file: ")
    while cnt < 20 :
        dma.recvchannel.transfer(output_buffer)
        print(output_buffer[0:15])
        for x in output_buffer:
            f.write(str(bin(x)))
        cnt += 1
    dma.recvchannel.wait()
    f.close()
print("Took ",time.time()-tid," to save to file.")



*********************__OUTPUT__**************************
Making buffer
Took  0.0035223960876464844  to make buffer.
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Loading overlay
Took  0.796987771987915  to load overlay.
Saving to file: 
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[65536     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0]
[2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[196608      0      0      0      0      0      0      0      0      0
      0      0      0      0      0      0]
[4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[327680      0      0      0      0      0      0      0      0      0
      0      0      0      0      0      0]
[6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[458752      0      0      0      0      0      0      0      0      0
      0      0      0      0      0      0]
[8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[589824      0      0      0      0      0      0      0      0      0
      0      0      0      0      0      0]
[10  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[720896      0      0      0      0      0      0      0      0      0
      0      0      0      0      0      0]
[12  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[851968      0      0      0      0      0      0      0      0      0
      0      0      0      0      0      0]
Took  0.06118488311767578  to save to file.
**************************************************************

As you can see every buffer only contains one int, and the rest is zeros, we’d like to fill up every buffer with 16 ints. Is there an obvious mistake we have made here? Any help would be appreciated.

1 Like

In my previous project I have used DMA to calculate output error for simple NN
and it goes like this:

print("Allocating dma...")
        self.target = allocate(shape=(768), dtype=np.float32)
        self.calc_error = allocate(shape=(768), dtype=np.float32)

print("Allocating neural network outputs...")    
        self.outputs = allocate(shape=(3, 768), dtype=np.float32)

    def output_error(self):
        self.nn_overlay.axi_dma_target.sendchannel.transfer(self.target)
    self.nn_overlay.axi_dma_nn_out.sendchannel.transfer(self.outputs[2])
    self.nn_overlay.axi_dma_out_r.recvchannel.transfer(self.calc_error)
    self.nn_overlay.axi_dma_nn_out.sendchannel.wait()
    self.nn_overlay.axi_dma_target.sendchannel.wait()
    self.nn_overlay.axi_dma_out_r.recvchannel.wait()
    #free running

I didn’t modify DMA settings, though.

I think the current PYNQ DMA is broken, it breaks the code in all sort of ways.

One time, it is corrupting my output_buffer, another time, it screws up the bus. I suggest to not use PYNQ DMA…