Kernel does not finish execution after the first call on Alveo U50

Hi,

I succesfully went through the tutorials GitHub - Xilinx/Alveo-PYNQ: Introductory examples for using PYNQ with Alveo with the older shell xilinx_u50_xdma_201920_1 installed on Alveo U50 and overlays downloaded from the provided link. However, after I upgraded the card to the most recent shell xilinx_u50_gen3x16_xdma_201920_3 and created .xclbin files with the provided makefiles, the .call() function does not return for any kernel after the first run. The first run finishes and the outputs are as expected. Since the second run does not finish I have to interrupt the kernel, which returns

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
/tmp/ipykernel_4283/3157139858.py in <module>
----> 1 vadd.call(in1_vadd, in2_vadd, out, size)
      2 

~/anaconda3/lib/python3.9/site-packages/pynq/overlay.py in _call(self, *args, **kwargs)
    756 
    757     def _call(self, *args, **kwargs):
--> 758         self.start(*args, **kwargs).wait()
    759 
    760     def _start_sw(self, *args, ap_ctrl=1, waitfor=None, **kwargs):

~/anaconda3/lib/python3.9/site-packages/pynq/overlay.py in wait(self)
    633 
    634     def wait(self):
--> 635         while self.target.mmio.read(0) & 0x4 != 0x4:
    636             pass
    637 

~/anaconda3/lib/python3.9/site-packages/pynq/mmio.py in read(self, offset, length, word_order)
    146 
    147         # Read data out
--> 148         lsb = int(self.array[idx])
    149         if length == 8:
    150             if word_order == 'little':

~/anaconda3/lib/python3.9/site-packages/pynq/_3rdparty/tinynumpy.py in __getitem__(self, key)
    666         if not shape:
    667             # Return scalar
--> 668             return self._data[offset]
    669         else:
    670             # Return view

~/anaconda3/lib/python3.9/site-packages/pynq/_3rdparty/tinynumpy.py in __getitem__(self, key)
    452         if self._hook:
    453             return self._struct.unpack(
--> 454                 self._hook.read(offset, self._itemsize))[0];
    455         else:
    456             return self._struct.unpack(self._bytearray[offset:offset+self._itemsize])[0]

~/anaconda3/lib/python3.9/site-packages/pynq/mmio.py in read(self, offset, length)
     45 
     46     def read(self, offset, length):
---> 47         return self.device.read_registers(self.baseaddress + offset, length)
     48 
     49     def write(self, offset, data):

~/anaconda3/lib/python3.9/site-packages/pynq/pl_server/xrt_device.py in read_registers(self, address, length)
    484     def read_registers(self, address, length):
    485         data = (ctypes.c_char * length)()
--> 486         ret = xrt.xclRead(self.handle,
    487                           xrt.xclAddressSpace.XCL_ADDR_KERNEL_CTRL,
    488                           address, data, length)

~/anaconda3/lib/python3.9/site-packages/pynq/_3rdparty/xrt.py in xclRead(handle, space, offset, hostBuf, size)
    720     libc.xclRead.restype = ctypes.c_size_t
    721     libc.xclRead.argtypes = [xclDeviceHandle, ctypes.c_int, ctypes.c_uint64, ctypes.c_void_p, ctypes.c_size_t]
--> 722     return libc.xclRead(handle, space, offset, hostBuf, size)
    723 
    724 def xclExecBuf(handle, cmdBO):

KeyboardInterrupt:

Deallocating the buffers and freeing the FPGA context

%xdel in1_vadd
%xdel in2_vadd
%xdel out
ol.free()

Or redownloading the overlay

ol = pynq.Overlay("intro.xclbin")

do not help either. I have to reboot the machine to get the kernel finish execution once again.

I tried other .xclbin files of different kernels and each time had the same result. Lastly I wrote this simple HLS kernel to see maybe if it is about buffers

extern "C" {
	void do_nothing(int a) {
		int b = 5+5;
	}
}

compiled it with

v++ -c -t hw --platform xilinx_u50_xdma_201920_1 -k do_nothing simple_krnl.cpp -o do_nothing.xo
v++ -l -t hw --platform xilinx_u50_gen3x16_xdma_201920_3 ./do_nothing.xo -o simple_krnl.xclbin

and ran this code on the host machine

import pynq
ol = pynq.Overlay("simple_krnl.xclbin")
ol.do_nothing_1.call(0)

Again the third line executes for the first time and does not finish execution on the later runs, unless I reboot the machine.

Any suggestions?

Hi @kimo,

What pynq version are you using? What XRT version are you using? What Vitis version did you use to generate the xclbin file?
How many kernels do you have in the design?

You do not need to reboot the machine if the kernel hangs, use xbutil reset https://xilinx.github.io/XRT/master/html/xbutil.html?highlight=reset#xbutil

Mario

1 Like

Hi @marioruiz

Pynq version is 2.7.0
XRT version is 2.12.427
Vitis version is 2021.2

intro.xclbin has two kernels and simple_krnl.xclbin has one. All kernels do not work after the first run.

Also I should mention, host programs written in C++ with OpenCL API work without any problem for the same kernels.

Didn’t you get this message?

pynq/pl_server/xrt_device.py:89: UserWarning: xbutil failed to run - unable to determine XRT version
  'xbutil failed to run - unable to determine XRT version')

Yeah I added --legacy command to the source code to fix it. I don’t think it is related.

Would you be able to share both xclbin files? and the notebook you are using for the intro.xclbin

You can download them from here:
https://drive.google.com/drive/folders/1R3hja4sOAmvSNmU_-f2UxT9dhuyVYDBC?usp=sharing

You can also use the notebook from the tutorial for intro.xclbin:
Alveo-PYNQ/1-vector-addition.ipynb at master · Xilinx/Alveo-PYNQ · GitHub

Can you do a final check before I have a look?

Use start_sw instead of call and let me know if this works.

For instance ol.do_nothing_1.start_sw(0)

It doesn’t get stuck on .call() line now but the problem persists. If I call .wait() on the returned WaitHandle object it only finishes execution in the first run. And the outputs of the vadd kernel is correct only the first time it is called.

Hi @kimo,

I tested this locally and I am not able to reproduce the problem. I am using a 2019 anaconda environment with Python 3.7.4.

How did you install your anaconda environment? I see that it is using Python 3.9.
What OS are you running on?

When the kernel hangs can you also report this xbutil examine -d <bdf> -r dynamic-regions

Mario

1 Like

Hi @marioruiz,

Thank you for your efforts. I removed Python and Anaconda and installed the same versions as yours, but nothing changed. My OS is CentOS 7.

Meanwhile, I discovered that if I add #pragma HLS INTERFACE ap_ctrl_hs port=return line to the HLS code the problem goes away. I guess the reason I did not have this problem with the older shell is that ap_ctrl_hs was the default execution model prior to 2019.1 release.

I added some print commands to see the register writes and reads issued to the kernel. When the interface is ap_ctrl_hs or when it is ap_ctrl_chain but it is the first run, the kernel execution finishes once x0e is read at 0th address.

write_reg addr:  0 data:  b'\x01\x00\x00\x00'
read_reg addr:  0 data:  b'\x01\x00\x00\x00'
read_reg addr:  0 data:  b'\x01\x00\x00\x00'
...
read_reg addr:  0 data:  b'\x0e\x00\x00\x00'

When the interface is ap_ctrl_chain and it is a second or later run, host reads x03 constantly from 0th address and the kernel hangs.

write_reg addr:  0 data:  b'\x01\x00\x00\x00'
read_reg addr:  0 data:  b'\x03\x00\x00\x00'
read_reg addr:  0 data:  b'\x03\x00\x00\x00'
read_reg addr:  0 data:  b'\x03\x00\x00\x00'
...

Even though x03 means idle is low, when I run xbutil examine -d <bdf> -r dynamic-regions command when the kernel hangs, it returns

------------------------------------------------------
1/1 [0000:02:00.1] : xilinx_u50_gen3x16_xdma_201920_3
------------------------------------------------------
Xclbin UUID
  7321E8F4-0B29-7FEB-3B3B-054C534387C4

Compute Units
  PL Compute Units
    Index   Name                                              Base_Address    Usage   Status  
    0       do_nothing:do_nothing_1                           0x1400000       0       (IDLE)  

  PS Compute Units
    Index   Name                                              Base_Address    Usage   Status

Any further suggestions would be appreciated.

Hi,

Can you try with an older version of XRT? 2.11?

you can use the .register_map attribute instead of reading and writing from address 0x0.

I am not able to reproduce this error locally, so I cannot investigate deeper.

Mario