PYNQ: PYTHON PRODUCTIVITY

Help debuging chronic PYNQ System Hang

Hi Folks,
Since updating to the 2.5 image we’ve been seeing quasi-regular ZCU111 board hangs, possibly even without an overlay loaded. We’ve also been starting to test new cores much more heavily, so we did change the cpuidle setting so we could use the System ILA in Vivado.

I had been chalking the need to power cycle the board ~2-3 / week up to us causing deadlocks or not properly connecting/disconnecting from the ILA, but this weekend it hung after a fresh boot without us having loaded any of our software for testing or starting anything other that the hw_server on the host PC.

Two questions:

  1. How can I even start to debug this? When it happens after testing for a while and walking away I can tell from the JTAG connection in vivado that some things are still alive, but the Jupyter server isn’t responsive (This is NOT directly a cpuidle issue as I set that and can use python and the ILA concurrently).

  2. Is there a way to trigger a reboot of the board from Vivado or some Xilinx tool on the windows host PC? This would move this from a major headache to a minor inconvenience. The boot from configuration memory in the hardware manager doesn’t seem to “just work”, I’d guess because it isn’t configured properly to be aware of the uboot loader or some such.

Thanks!

Generally full-system hangs are the result of AXI transactions hitting the fabric and not being acknowledged for one reason or another although why that’s happening at boot is don’t know. One thing to try is setting up the AXI watchdog timers on the AXI master connections which will trigger a slave error rather than a hang and see if that helps.

The following code will do this for the LPD master.

mmio  = pynq.MMIO(0xFF416000, 64)

mmio.write(0x18, 3) # Return slave errors when timeouts occur
mmio.write(0x20, 0x1020) # Set and enable prescale of 32 which should be about 10 ms
mmio.write(0x10, 0x3) # Enable transactions tracking
mmio.write(0x14, 0x3) # Enable timeouts

And again for the The FPD

mmio  = pynq.MMIO(0xFD610000 , 64)

mmio.write(0x18, 7) # Return slave errors when timeouts occur
mmio.write(0x20, 0x1020) # Set and enable prescale of 32 which should be about 10 ms
mmio.write(0x10, 0x7) # Enable transactions tracking
mmio.write(0x14, 0x7) # Enable timeouts

You can also enable a system-wide watchdog that’ll reset the board if Linux locks up for some reason. I don’t know of a way of triggering a reboot via JTAG in a nice way. You might be able to write to the watchdog registers via JTAG and use that to reset the board.

Peter