RuntimeError: DMA channel not started on ZCU 111

  • PYNQ version & Board name & Tool Version
    PYNQ Version: 3.0.1, Board Name: Xilinix Ultrascale+ ZCU 111 Development Board, Tool Version: Vivado 2020.2

  • Full details of the error message you see, or a detailed description of the problem you experience.
    Erorr Message in PYNQ :


RuntimeError Traceback (most recent call last)
Input In [11], in <cell line: 250>()
248 init_rd_plot()
249 init_det_plot()
→ 250 init_rd_range_slice_plot(v_0_bin)
251 init_rd_vel_slice_plot(r_bin)
252 update_plot()

Input In [11], in init_rd_range_slice_plot(v_bin)
146 global line, fig_rd_slice, line_cfar_ca, line_cfar_os
147 fig_rd_slice, ax_rd_slice = plt.subplots(figsize=(15, 8))
→ 148 rd_map = generate_range_doppler()
149 range_slice = rd_map[:, v_bin]
150 ax_rd_slice.set_xlabel(‘Range (m)’) # Set x-axis label

Input In [11], in generate_range_doppler()
78 def generate_range_doppler():
79 #start_time = time.time()
80 for doppler_bin in range(NUM_SEQS):
—> 81 read_buffer[doppler_bin*NUM_SYMBOLS : (doppler_bin+1)*NUM_SYMBOLS] = get_range_profile()
82 matrix = np.reshape(read_buffer, (NUM_SEQS, NUM_SYMBOLS)).T
84 if debug_doppler:

Input In [11], in get_range_profile()
54 dma_corr.recvchannel.start()
55 dma_corr.recvchannel.transfer(output_buffer)
—> 56 dma_corr.recvchannel.wait()
57 return output_buffer

File /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq/lib/dma.py:169, in _SDMAChannel.wait(self)
167 “”“Wait for the transfer to complete”“”
168 if not self.running:
→ 169 raise RuntimeError(“DMA channel not started”)
170 while True:
171 error = self._mmio.read(self._offset + 4)

RuntimeError: DMA channel not started

The error i’m facing is that i’m getting a DMA channel not started error as shown above because of which my DMA’s t_ready is getting to 1 and then because of some reason (which i don’t know ) it is going back to 0. Because of this issue the data from the FFT block of my design is not being wirtten in the PL DDR4 memory of my board.

  • Steps to reproduce the problem, and if needed: source code and bitstream or any relevant files
    Python Code that I have been working on is as below:
    radar = Overlay(‘/home/xilinx/jupyter_notebooks/radar/Bitstreams/design_1_wrapper.xsa’)
    import scipy.constants as const

NUM_SEQS = 1024
NUM_SYMBOLS = 1024
DELAY = 2
CONVERTER_FREQ_MULTIPLIER = 6 # given by hardware, should not be changed
lo_freq = 13.6e9
f_c = lo_freq * CONVERTER_FREQ_MULTIPLIER
c0 = const.c
B = 307.2e6
T_chip = 1/B
T_seq = T_chip * NUM_SYMBOLS
v = -30
T_seq = NUM_SYMBOLS* T_chip
debug_doppler = False
debug_delay = False
v_0_bin = NUM_SEQS // 2
r_bin = 6
WINDOW = np.hanning(NUM_SEQS)

r_res = c0 / (2B)
print(f’r_res = {r_res:.2f} m’)
r_max = r_res * NUM_SYMBOLS
print(f’r_max = {r_max:.2f} m’)
v_res = c0 / (2
f_cNUM_SEQST_seq)
print(f’v_res = {v_res:.2f} m/s’)
v_max = v_res * NUM_SEQS / 2
print(f’v_max = {v_max:.2f} m/s’)

r_bins = r_res*np.arange(NUM_SYMBOLS)
import utils.cfar as cfar
from matplotlib.ticker import FormatStrFormatter
import sys

SHOW_RD_MAP = False
SHOW_RD_DET_MAP = False
SHOW_VEL_SLICE = True
SHOW_RANGE_SLICE = False

np.set_printoptions(threshold=sys.maxsize)

dma_corr = radar.Correlation.axi_dma_0
dma_wr = radar.doppler.dma_doppler_write
dma_rd = radar.doppler.dma_doppler_read

dma_corr.recvchannel.start()
print(“dma_corr.recvchannel.running =”, dma_corr.recvchannel.running)
dma_wr.sendchannel.start()
dma_rd.recvchannel.start()

Allocate Buffers explicitly on PL DDR memory

output_buffer = allocate((NUM_SYMBOLS,), dtype=np.uint64, target=radar.ddr4_0)
read_buffer = allocate((NUM_SYMBOLS * NUM_SEQS,), dtype=np.uint64, target=radar.ddr4_0)
send_buffer = allocate((NUM_SYMBOLS, NUM_SEQS), dtype=np.uint64, target=radar.ddr4_0)
recv_buffer = allocate((NUM_SYMBOLS, NUM_SEQS), dtype=np.uint64, target=radar.ddr4_0)

#output_buffer = allocate((NUM_SYMBOLS, ), dtype = np.uint64)
#read_buffer = allocate((NUM_SYMBOLS*NUM_SEQS,), dtype = np.uint64)
#send_buffer = allocate((NUM_SYMBOLS, NUM_SEQS), dtype = np.uint64)
#recv_buffer = allocate((NUM_SYMBOLS, NUM_SEQS), dtype = np.uint64)
#range_doppler_buffer = allocate((262144,), dtype = np.uint64)
#buf32 = allocate((32,), dtype = np.uint64)generate_range_doppler()

def transform_back_to_64bit(matrix_complex):
# Step 1: Extract real and imaginary parts
real_part = np.real(matrix_complex)
imag_part = np.imag(matrix_complex)

# Step 2: Multiply by 2 to revert the division
real_part_scaled = np.int32(real_part * 2)
imag_part_scaled = np.int32(imag_part * 2)

# Step 3: Ensure the parts are 32-bit integers
real_part_32bit = real_part_scaled.astype(np.uint32)
imag_part_32bit = imag_part_scaled.astype(np.uint32)

# Step 4: Pack the real and imaginary parts into a 64-bit integer
matrix_64bit = (imag_part_32bit.astype(np.uint64) << 32) | real_part_32bit.astype(np.uint64)

return matrix_64bit

def get_range_profile():
if not dma_corr.recvchannel.running:
dma_corr.recvchannel.start()
dma_corr.recvchannel.transfer(output_buffer)
dma_corr.recvchannel.wait()
return output_buffer

def add_delay(delay):
if debug_delay:
if(counter < 31):
counter = counter + delay
else:
counter = 0

def add_doppler_shift(v):
matrix_complex = (np.int32((matrix) & 0xFFFFFFFF)/2) + 1j*(np.int32((matrix >> 32) & 0xFFFFFFFF)/2)
for seq_index in range(NUM_SEQS):
t = seq_index * T_seq
matrix_complex[:,seq_index] = matrix_complex[:,seq_index] * np.exp(1j2np.pif_c2vt/c)

for chip_index in range(NUM_SYMBOLS):
    matrix_complex[chip_index, :] *=  WINDOW
    
matrix = transform_back_to_64bit(matrix_complex)  

def generate_range_doppler():
#start_time = time.time()
for doppler_bin in range(NUM_SEQS):
read_buffer[doppler_bin*NUM_SYMBOLS : (doppler_bin+1)*NUM_SYMBOLS] = get_range_profile()
matrix = np.reshape(read_buffer, (NUM_SEQS, NUM_SYMBOLS)).T

if debug_doppler:
    add_doppler_shift(v)

for range_bin in range(NUM_SYMBOLS):
    send_buffer[range_bin] =  matrix[range_bin]
    dma_wr.sendchannel.transfer(send_buffer[range_bin])
    dma_rd.recvchannel.transfer(recv_buffer[range_bin])
    dma_wr.sendchannel.wait()
    dma_rd.recvchannel.wait()
range_doppler_map = (np.int32((recv_buffer) & 0xFFFFFFFF)/2) + 1j*(np.int32((recv_buffer >> 32) & 0xFFFFFFFF)/2)
range_doppler_map_shift = np.fft.fftshift(range_doppler_map, axes=1)
#range_Doppler_map_norm = abs(range_doppler_map_shift) / np.max(abs(range_doppler_map_shift))
range_doppler_map_norm = abs(range_doppler_map_shift)
range_doppler_map_norm_log = range_doppler_map_norm    
return range_doppler_map_norm

def target_detection(range_doppler):
r_d_image_cfar = cfar.cfar_ca(range_doppler, guard=2, training=10, pfa=1e-2)
r_d_image_det = np.where(range_doppler >= r_d_image_cfar, 1, 0)
return r_d_image_det

def accumulate(matrix, num_cols):
new_matrix = matrix.reshape(matrix.shape[0], matrix.shape[1]//num_cols, num_cols).sum(axis=2)
return new_matrix

def init_rd_plot():
if SHOW_RD_MAP:
global im_rd, fig_rd
fig_rd, ax_rd = plt.subplots(figsize=(10, 10))
rd_map = generate_range_doppler()
rd_map_log = 20*np.log10(rd_map)
im_rd = ax_rd.imshow(rd_map_log, cmap=‘viridis’, aspect=‘auto’, origin=‘lower’)
im_rd.axes.set_xlim(450, 560)
im_rd.axes.set_ylim(0, 30)
#ax_rd.yaxis.set_major_formatter(FormatStrFormatter(‘%.1f’))
#ax_rd.set_yticks(r_bins)
#plt.y_ticks(r_bins)
cbar_rd = plt.colorbar(im_rd)
#im_rd.set_clim(vmin=-60, vmax=0)
cbar_rd.set_label(‘Power (dB)’, size=18)
ax_rd.set_xlabel(‘Velocity bin’)
ax_rd.set_ylabel(‘Range bin’)
ax_rd.set_title(‘Range-Doppler-Map’)

def init_det_plot():
if SHOW_RD_DET_MAP:
global im_det, fig_det
fig_det, ax_det = plt.subplots(figsize=(10, 10))
rd_map = generate_range_doppler()
det_map = target_detection(rd_map)
im_det = ax_det.imshow(det_map, cmap=‘viridis’, aspect=‘auto’, origin=‘lower’)
im_det.axes.set_xlim(450, 560)
im_det.axes.set_ylim(0, 40)
cbar_det = plt.colorbar(im_det)
im_det.set_clim(vmin=0, vmax=1)
cbar_det.set_label(‘1 - Target, 0 - No Target’, size=18)
ax_det.set_xlabel(‘Velocity Bin’)
ax_det.set_ylabel(‘Range Bin’)
ax_det.set_title(‘Range-Doppler-Map after CFAR’)

def init_rd_range_slice_plot(v_bin):
if SHOW_VEL_SLICE:
global line, fig_rd_slice, line_cfar_ca, line_cfar_os
fig_rd_slice, ax_rd_slice = plt.subplots(figsize=(15, 8))
rd_map = generate_range_doppler()
range_slice = rd_map[:, v_bin]
ax_rd_slice.set_xlabel(‘Range (m)’) # Set x-axis label
ax_rd_slice.set_ylabel(‘Relative Power (dB)’) # Set y-axis label
ax_rd_slice.grid(True) # Add grid lines
ax_rd_slice.set_ylim(120, 180)
ax_rd_slice.set_xlim(0, 20)
#ax_rd_slice.set_title(‘Range slice v=0m/s’) # Set title
ax_rd_slice.relim()
ax_rd_slice.autoscale_view()
range_slice_log = 20np.log10(range_slice)
line, = ax_rd_slice.plot(r_bins, range_slice_log)
cfar_det_ca = cfar.cfar_ca(range_slice, guard=1, training=2, pfa=1e-1)
cfar_log_ca = 20
np.log10(cfar_det_ca)
#cfar_det_os = cfar.cfar_os(range_slice, n=5, k=14, pfa=1e-3)
#cfar_log_os = 20*np.log10(cfar_det_os)
#line_cfar_os, = ax_rd_slice.plot(r_bins, cfar_log_os)
line_cfar_ca, = ax_rd_slice.plot(r_bins, cfar_log_ca)
# np.save(‘./meas/8/meas_1target_mls_L20_600cm.npy’, range_slice_log)
ax_rd_slice.legend([‘Signal’, ‘CFAR-CA’, ‘CFAR-OS’])

def init_rd_vel_slice_plot(r_bin):
if SHOW_RANGE_SLICE:
global line, fig_rd_v_slice, line_cfar_ca, line_cfar_os
fig_rd_v_slice, ax_rd_v_slice = plt.subplots(figsize=(15, 8))
rd_map = generate_range_doppler()
doppler_slice = rd_map[r_bin, :]
ax_rd_v_slice.set_xlabel(‘Velocity bin’) # Set x-axis label
ax_rd_v_slice.set_ylabel(‘Amplitude (dB)’) # Set y-axis label
ax_rd_v_slice.grid(True) # Add grid lines
#ax_rd_v_slice.set_ylim(-80, 5)
ax_rd_v_slice.set_xlim(450, 560)
ax_rd_v_slice.set_title(‘Doppler slice’) # Set title
ax_rd_v_slice.relim()
ax_rd_v_slice.autoscale_view()
doppler_slice_log = 20np.log10(doppler_slice)
line, = ax_rd_v_slice.plot(doppler_slice_log)
cfar_det_ca = cfar.cfar_ca(doppler_slice, guard=2, training=10, pfa=1e-2)
cfar_log_ca = 20
np.log10(cfar_det_ca)
#cfar_det_os = cfar.cfar_os(range_slice, n=5, k=14, pfa=1e-3)
#cfar_log_os = 20*np.log10(cfar_det_os)
#line_cfar_os, = ax_rd_slice.plot(r_bins, cfar_log_os)
#line_cfar_ca, = ax_rd_v_slice.plot(v_bins, cfar_log_ca)
ax_rd_v_slice.legend([‘Signal’, ‘CFAR-CA’, ‘CFAR-OS’])

def update_rd_plot():
if SHOW_RD_MAP:
rd_map = generate_range_doppler()
rd_map_log = 20*np.log10(rd_map)
i,j = np.unravel_index(rd_map_log.argmax(), rd_map_log.shape)
print(f’Max cell: rbin={i},vbin={j}')
im_rd.set_data(rd_map_log)
display(fig_rd, clear=True)

def update_detection_plot():
if SHOW_RD_DET_MAP:
rd_map = generate_range_doppler()
detections = target_detection(rd_map)
im_det.set_data(detections)
display(fig_det, clear=True)

def update_rd_range_slice_plot(v_bin):
if SHOW_VEL_SLICE:
rd_map = generate_range_doppler()
range_slice = rd_map[:, v_bin]
range_slice_log = 20np.log10(range_slice)
print(np.max(range_slice_log))
line.set_ydata(range_slice_log)
cfar_det_ca = cfar.cfar_ca(range_slice, guard=2, training=10, pfa=1e-2)
cfar_ca_log = 20
np.log10(cfar_det_ca)
line_cfar_ca.set_ydata(cfar_ca_log)
#cfar_det_os = cfar.cfar_os(range_slice, n=5, k=14, pfa=1e-3)
#cfar_os_log = 20*np.log10(cfar_det_os)
#line_cfar_os.set_ydata(cfar_os_log)
display(fig_rd_slice, clear=True)

def update_rd_vel_slice_plot(r_bin):
if SHOW_RANGE_SLICE:
rd_map = generate_range_doppler()
doppler_slice = rd_map[r_bin, :]
doppler_slice_log = 20np.log10(doppler_slice)
line.set_ydata(doppler_slice_log)
cfar_det_ca = cfar.cfar_ca(doppler_slice, guard=2, training=10, pfa=1e-2)
cfar_ca_log = 20
np.log10(cfar_det_ca)
#line_cfar_ca.set_ydata(cfar_ca_log)
#cfar_det_os = cfar.cfar_os(range_slice, n=5, k=14, pfa=1e-3)
#cfar_os_log = 20*np.log10(cfar_det_os)
#line_cfar_os.set_ydata(cfar_os_log)
display(fig_rd_v_slice, clear=True)

def update_plot():
while True:
#LoadRFSeq(counter)
#add_delay(DELAY)
update_rd_plot()
update_detection_plot()
update_rd_range_slice_plot(v_0_bin)
update_rd_vel_slice_plot(r_bin)

plt.ion()
init_rd_plot()
init_det_plot()
init_rd_range_slice_plot(v_0_bin)
init_rd_vel_slice_plot(r_bin)
update_plot()
plt.ioff()

The DMA; FFT ip and MIG settings along with address editor details are as below:







DMA register values before and after error:

print(“=== S2MM Registers BEFORE error ===”)
print_dma_registers(dma_s2mm)
=== S2MM Registers BEFORE error ===
Dumping DMA registers from 0x00 to 0x48:
0x00: 0x00000000
0x04: 0x00000000
0x08: 0x00000000
0x0C: 0x00000000
0x10: 0x00000000
0x14: 0x00000000
0x18: 0x00000000
0x1C: 0x00000000
0x20: 0x00000000
0x24: 0x00000000
0x28: 0x00000000
0x2C: 0x00000000
0x30: 0x00010002
0x34: 0x00000001
0x38: 0x00000000
0x3C: 0x00000000
0x40: 0x00000000
0x44: 0x00000000
0x48: 0x00000000
print(“=== S2MM Registers AFTER error ===”)
print_dma_registers(dma_s2mm)
=== S2MM Registers AFTER error ===
Dumping DMA registers from 0x00 to 0x48:
0x00: 0x00000000
0x04: 0x00000000
0x08: 0x00000000
0x0C: 0x00000000
0x10: 0x00000000
0x14: 0x00000000
0x18: 0x00000000
0x1C: 0x00000000
0x20: 0x00000000
0x24: 0x00000000
0x28: 0x00000000
0x2C: 0x00000000
0x30: 0x00010002
0x34: 0x00005041
0x38: 0x00000000
0x3C: 0x00000000
0x40: 0x00000000
0x44: 0x00000000
0x48: 0x77D6C000

Hi @Sunil_Dabbiru,

Welcome to the PYNQ community.

I have wrote a blog on how to debug DMA issues.

This typically boils down to TLAST not being properly set or PS config.

Mario

Hello Mario,
I have seen that it is not the issue with the t_last bit that is being received to my DMA. But thank you for your reply. I have observed that my DMA is not able to acces the DDR memory because of the address mismatch. And i’m not sure how to solve this issue.
The address that im allocating in the address editor has an offset of 0x05 but while generating a buffer it is generating at a diifferent region i.e 0x77d6e000. So this address mismatch is causing the ddr to raise runtime error.
Do you have an solution for this.


You design is quite complicated and the screenshots are not completed.

You may want to look at the MTS design that does something similar to this.