Getting "RuntimeError: DMA channel not started" occasionally

Hi,

I am getting a mix of run time errors as well as a successful execution of the python script whenever I run it. The run time error states that the “DMA channel not started”. I don’t understand what this means and would like to seek clarification on it.

Here is my register map before the error and after the error:

Before:

RegisterMap {
MM2S_DMACR = Register(RS=1, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
MM2S_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
MM2S_CURDESC = Register(Current_Descriptor_Pointer=0),
MM2S_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
MM2S_TAILDESC = Register(Tail_Descriptor_Pointer=0),
MM2S_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
MM2S_SA = Register(Source_Address=0),
MM2S_SA_MSB = Register(Source_Address=0),
MM2S_LENGTH = Register(Length=0),
SG_CTL = Register(SG_CACHE=0, SG_USER=0),
S2MM_DMACR = Register(RS=1, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
S2MM_DMASR = Register(Halted=0, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=0, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
S2MM_CURDESC = Register(Current_Descriptor_Pointer=0),
S2MM_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
S2MM_TAILDESC = Register(Tail_Descriptor_Pointer=0),
S2MM_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
S2MM_DA = Register(Destination_Address=0),
S2MM_DA_MSB = Register(Destination_Address=0),
S2MM_LENGTH = Register(Length=0)
}

After:

RegisterMap {
MM2S_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
MM2S_DMASR = Register(Halted=1, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=1, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=1, Dly_Irq=0, Err_Irq=1, IRQThresholdSts=0, IRQDelaySts=0),
MM2S_CURDESC = Register(Current_Descriptor_Pointer=0),
MM2S_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
MM2S_TAILDESC = Register(Tail_Descriptor_Pointer=0),
MM2S_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
MM2S_SA = Register(Source_Address=1966104576),
MM2S_SA_MSB = Register(Source_Address=0),
MM2S_LENGTH = Register(Length=96),
SG_CTL = Register(SG_CACHE=0, SG_USER=0),
S2MM_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
S2MM_DMASR = Register(Halted=1, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=1, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
S2MM_CURDESC = Register(Current_Descriptor_Pointer=0),
S2MM_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
S2MM_TAILDESC = Register(Tail_Descriptor_Pointer=0),
S2MM_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
S2MM_DA = Register(Destination_Address=0),
S2MM_DA_MSB = Register(Destination_Address=0),
S2MM_LENGTH = Register(Length=0)
}

In the “after” section of the register map, I noticed that the S2MM_DMASR and MM2S_DMASR were halted, and the IOC_Irq were both set to 1. Only MM2S_DMASR has the Err_Irq set to one. I tried reading the documentation and was not able to glean much info on why those interrupts were set to 1.

Is there any other way I can find out why is the Err_Irq is triggered, or even why there is a runtimeError? I am new to this so I would appreciate any advise on how to progress forward.

Thank you.

Edit: The python code uses the dma to transfer data to an ultra96 to do some computation before sending back the results. As of now the code is written to send a bunch of test cases to verify the correctness of the computation.

Edit2: I am currently running pynq ver. 2.7.0, my board is ultra96. Below is a snippet of the run time error I received:

Traceback (most recent call last):
File “dma.py”, line _, in
main()
File “dma.py”, line _, in main
ol.axi_dma_0.recvchannel.transfer(output_buffer)
File “***/dma.py”, line _, in transfer
raise RuntimeError(‘DMA channel not started’)
RuntimeError: DMA channel not started

The traceback usually points the error to either ol.axi_dma_0.recvchannel.transfer(), ol.axi_dma_0.sendchannel.transfer, ol.axi_dma_0.recvchannel.wait() and ol.axi_dma_0.sendchannel.wait()

1 Like

You are looking in the right area, but you need to check the DMADecErr which is =1. Err_Irq is just the interrupt telling you that there is an error.

MM2S_DMASR = Register(Halted=1, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=1, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=1, Dly_Irq=0, Err_Irq=1, IRQThresholdSts=0, IRQDelaySts=0),

I think you are already doing this, but posting this link as it may be useful for anyone else who finds this post.
You can check the documentation for the IP to understand the Status Register information that is reported.
https://docs.xilinx.com/r/en-US/pg021_axi_dma/Register-Space in particular the Status Register section.

DMAIntErr:

DMA Decode Error. This error occurs if the address request points to an invalid address.

Your source address is:
MM2S_SA = Register(Source_Address=1966104576)
In hex: 0x75306000

Did you use allocate to create your memory buffer, and did you pass the buffer correctly to the DMA?

https://discuss.pynq.io/t/tutorial-pynq-dma-part-2-using-the-dma-from-pynq/3134

You can check your input_buffer.physical_address (rename input_buffer to the name you can your buffer) is the same as the value passed to the DMA.

If this doesn’t resolve your problem, you would need to post more info about your design - your Python code, and your block diagram woudl be a good start.

Cathal

Hi @cathalmccabe,

Thank you so much for your reply.

Here is the main area of the python code with covers the dma:

def main(dma):

input_buffer = allocate(shape=(NUM_OF_INPUTS,), dtype=np.intc)
output_buffer = allocate(shape=(1,), dtype=np.intc)

expected_output_index = 0
count = 0
data_row_index = 0 # Index for one row of readings, which containts {NUM_OF_INPUTS} readings 
while (data_row_index < len(TEST_DATASET)):
    buffer_index = 0

    while (buffer_index < NUM_OF_INPUTS):
        input_buffer[buffer_index] = TEST_DATASET[data_row_index + buffer_index]
        buffer_index += 1

    dma.sendchannel.transfer(input_buffer)
    dma.recvchannel.transfer(output_buffer)
    dma.sendchannel.wait()
    dma.recvchannel.wait()

    result = output_buffer[0]
    print(result)
    if result == EXPECTED_OUTPUT[expected_output_index]:
       count += 1
    
    data_row_index += NUM_OF_INPUTS
    expected_output_index += 1

accuracy = count / (len((TEST_DATASET) / NUM_OF_INPUTS)
print(f"Accuracy: {accuracy}")

del input_buffer, output_buffer

if name == “main”:

ol = Overlay('design_1_wrapper.bit')
dma = ol.axi_dma_0

print(ol.axi_dma_0.register_map)

try:
    main(dma)
except RuntimeError as e:
    print(ol.axi_dma_0.register_map)
    print("RuntimeError: ", e)

Edit: To clarify, whenever the RuntimeError occurs, the address request points to some random address. I did allocate the input and output buffer, and only changed the content of the input buffer to be sent via dma at every round.

Edit2: Below is a screenshot of the address editor from vivado. I was not sure of the address space to map to so I allowed vivado to automatically do this for me to prevent any critical warnings.

Edit3: My block design is the same as the one shown in the following youtube link you’ve uploaded:

Only difference is that the FIFO is replaced with an IP core that I’ve designed via vivado HLS, but also has a SAXIS, MAXIS, ap_rst and ap_clk.

@cathalmccabe
Do let me know if I need to provide any more information. Thank you!

Hi

Does anyone know how to resolve my issue? Would appreciate any advice given. The previous comment did not work for me.

Do you consistently see this issue?
What address width did you set in the DMA, and what is the address range the DMA is memory mapped to? Ultra96 uses Zynq Ultrascale+ which is based on the 64-bit ARM A53, so the DMA should be set to 64-bit.

Cathal

Hi @dieter,

If you replaced the FIFO with an IP you designed on HLS, it is most likely that your IP is not compliant with what the DMA expects, for instance tlast either/or tkeep.

Mario

Hi @cathalmccabe

No, there are times where the python code works, other times the run time error is thrown.

Here is a snippet of the address map and the input_buffer.physical address during a failure:

Input buffer physical address:
617979904
RegisterMap {
MM2S_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
MM2S_DMASR = Register(Halted=1, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=0, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=1, Dly_Irq=0, Err_Irq=0, IRQThresholdSts=0, IRQDelaySts=0),
MM2S_CURDESC = Register(Current_Descriptor_Pointer=0),
MM2S_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
MM2S_TAILDESC = Register(Tail_Descriptor_Pointer=0),
MM2S_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
MM2S_SA = Register(Source_Address=617979904),
MM2S_SA_MSB = Register(Source_Address=0),
MM2S_LENGTH = Register(Length=96),
SG_CTL = Register(SG_CACHE=0, SG_USER=0),
S2MM_DMACR = Register(RS=0, Reset=0, Keyhole=0, Cyclic_BD_Enable=0, IOC_IrqEn=0, Dly_IrqEn=0, Err_IrqEn=0, IRQThreshold=1, IRQDelay=0),
S2MM_DMASR = Register(Halted=1, Idle=0, SGIncld=0, DMAIntErr=0, DMASlvErr=0, DMADecErr=1, SGIntErr=0, SGSlvErr=0, SGDecErr=0, IOC_Irq=1, Dly_Irq=0, Err_Irq=1, IRQThresholdSts=0, IRQDelaySts=0),
S2MM_CURDESC = Register(Current_Descriptor_Pointer=0),
S2MM_CURDESC_MSB = Register(Current_Descriptor_Pointer=0),
S2MM_TAILDESC = Register(Tail_Descriptor_Pointer=0),
S2MM_TAILDESC_MSB = Register(Tail_Descriptor_Pointer=0),
S2MM_DA = Register(Destination_Address=1607938048),
S2MM_DA_MSB = Register(Destination_Address=0),
S2MM_LENGTH = Register(Length=4)
}

It seems that the destination address of 1607938048 is not equals to the input buffer’s physical address of 617979904

Below is a screenshot of my DMA’s configuration. The address width is 64 bits. As for the address range, I am not sure where to look for it.

1 Like

Hi @marioruiz

The synthesis is able to run normally, and the test bench did not show any signs of error, so I am not sure what you mean by tlast and/or tkeep. Those two variables are all initiated to 1 when I have filled up the input and output according to the array length I provided.

@dieter

How do you test the IP? Via the AXIS Verification IP?
Or ILA visual observation?

If you used HLS both tlast and tkeep need to be done with a struct and this is what the AXI-Stream control signals use to tells the DMA this is the last bytes and which bytes of the bus is masked.

If these signals are not welly introduce the DMA will not trigger and stop.

This is what @marioruiz trying to tell.

Hi @briansune

Yes it was done via a struct. The tlast/tkeep is set to 0 for every iteration of a read/write. Only in the last iteration when we are reading/writing the last block of data is the tlast/tkeep set to 1.

Edit: Only tlast is used for signaling a stop on how much to write. As for the IP testing, I used the test bench provided in vivado HLS. A cpp code is written to test the source code.

1 Like

@dieter

You haven’t answer the question how to do determine the tlast is asserted?
ILA or just the Notebook message?

I will highly suggest use ILA to probe the tlast and tkeep and other critical signals to debug.
So HLS simulation is not undergo during develop?

ENJOY~

@briansune

Here is a code snippet of the tlast I am presuming you talked about:

for(i = 0; i < NUMBER_OF_OUTPUT; i++){
write_output.data = output[i];
write_output.last = 0;
if(i==NUMBER_OF_OUTPUT-1)
{
write_output.last = 1;
}
M_AXIS.write(write_output);
}

As for how I design the IP and test it, I followed the link provided by vivado:

@dieter

Understood, but did you try verify on ILA?

@briansune

No, may I know how to do that? Is ILA the waveform viewer in the simulation tab in vivado?

@dieter

Nope ILA is in-system logic analyzing.
You type ILA or system-ILA and connect the net you would like to probe.
After implementation, the JTAG can be connected to open the program window and waveform will be capture by the trigger setting.
see video:

@dieter,

Please review the AXI4-Stream specs, keep should be set (all the bits) in all transaction.

There are exception to this on the last transaction (e.g.last is asserted) and you are sending less bytes (less than the bus support), but this doesn’t apply in your case.

Mario

1 Like

@marioruiz

I’d like to take back what I mentioned previously, as I checked the code again. I did not use keep, only last is used for write.

2 Likes

Hi @briansune,

I am currently unable to connect to my own device (ultra96) via cable to send the bitstream over, and can only connect to it via ssh. So I can’t view the waveform diagram in real time on my Vivado application. Are there other methods of analysing the input and output?

1 Like

@dieter

Then this only method is to do simulation either default Vivado or Modelsim will do the job.
However, you need to manually modify the testbench via the AXI-Stream verification IP.
Or simply write a trigger fsm and read back from the zynq emio.
Such idea is AKA BIST - build in self test

ENJOY~