Writes to PYNQ-allocated Alveo buffer cause firewall crash

Hello!

I have wrapped an RTL kernel in the Vitis flow, and I am interacting with it via PYNQ on a U250 board. The kernel is not using the standard .call/HLS interface, it is user managed. The code to run the RTL looks roughly like this:

import pynq
import numpy as np
from pynq import Register
ol = pynq.Overlay("kernel.xclbin")
kernel = ol.kernel


print(f"Control Register: {hex(kernel.register_map.S2MM_DMACR)}")
print(f"Status Register: {hex(kernel.register_map.S2MM_DMASR)}")


trace = pynq.allocate(32, np.uint64)
trace[:] = 0
trace.sync_to_device()
# I have tried using trace.flush() here.

# Using the Xilinx Simple DMA engine:
kernel.register_map.S2MM_DMACR[0] = 1
kernel.register_map.S2MM_DA = trace.device_address
kernel.register_map.S2MM_LENGTH = 32 * 8

kernel.register_map.start = 1

On the write to start, the RTL kernel issues data to the DMA engine, which is written into the buffer allocated by PYNQ.

One time in ten, this code will succeed and the correct result is returned. The other nine times it will crash the notebook, and any register reads will return 0xDEADFA11, which seems to indicate that the firewall has been tripped: https://support.xilinx.com/s/article/71212?language=en_US.

If I had to guess, it’s because there’s a race condition of some sort. Perhaps the firewall doesn’t know that the buffer is valid to write to? Am I missing a python call for a user managed RTL kernel?

1 Like

Hi @drichmond,

The user managed kernel is something recently new in XRT that we have not verified in PYNQ.
There may be certain considerations with memory allocation flags or other intricacies that are not verified.

https://xilinx.github.io/XRT/master/html/xrt_native_apis.html#allocating-buffers-for-the-ip-inputs-outputs

Mario

Sounds like an intricacy that isn’t verified – changing the other allocation parameters have no effect.

We’ll try the C++ api and see what happens.

1 Like

The C++ api produces the same behavior (unfortunately) so I’ll ask in the community forum and post the solution here.

1 Like

Hi @drichmond,

Check if by any chance the user managed kernel is accessing memory outside the allocated range, this could be the reason the firewall trips.

Mario

I’ll check - but I’ve had issues with the ILA bringing down the PCIe express bus.

Why would it sometimes pass, though? The allocated address is always the same.