PYNQ-Z2: DPU timeout during YOLO inference causes full board freeze — SSH drops, no ping, only power cycle recovers it. Is this an AXI stall?

Hey everyone,

I’m running a YOLO object detection project on a PYNQ-Z2 and keep hitting a really frustrating issue.

After running for about 30–60 seconds everything looks fine, then the DPU throws a timeout error, RAM spikes to 100%, my SSH drops, and the board becomes completely unreachable. No soft reset works — I have to physically power cycle it every single time.

I’m streaming frames from an Android phone over TCP into the PYNQ, running inference on the DPU, and displaying results with OpenCV. It runs great until it doesn’t.

Has anyone dealt with this kind of board freeze on PYNQ-Z2 before? Would love to hear how you handled it — whether it’s a settings thing, a memory thing, or just a limitation of the board.

Any tips appreciated! :folded_hands:

Hi @hack110011,

Welcome to the PYNQ community.

From what you are saying, you are running out of memory and the system crashes. You can try to constrain how much memory your DPU uses.

Mario

Hi @marioruiz

Thank you for your reply. But sometimes I see the memories just use 10-20% and utilisation of core is 50-60 % then also crash happened.sometime Also i get DPU TIME-Out. Any how can I fix this issue or get all times??. I run the tiny-yolov3 model. Any method to debug this?? I try to using URAT but also Putty stuck not get any output after the ssh drop.

hack110011

Hi @hack110011,

There is no enough information to understand your system. Where is the DPU running?
What DPU are you using? Have you taken any steps to find the cause of the crash?
What PYNQ version are you using?

Mario

Hi @marioruiz

thank you.

Let’s go into the details of my system.

DPU IP: zcu102-dpu-trd-2019-1
PetaLinux: 2019.2
DNNDK: v3.1

First, I used Vivado 2020.1 to create the IP design and integrate the DPU IP with the Processing System. I shared the IP design earlier. In the Clock Wizard, I used either 150 MHz or 300 MHz—please consider these clock frequencies.

Using the reference design I implemented PetaLinux:https://andre-araujo.gitbook.io/yolo-on-pynq-z2/dpu-implementation/implementing-the-dpu-on-a-sd-card-image

In PetaLinux, I enabled the DPU driver. Then, I wrote C++ code using the DPU C++ APIs to run the YOLOv3 model.

I trained the model on my own dataset. After training, I obtained the model files and then quantized the model using DNNDK v3.1. This generated the .elf file and the quantization details such as kernel ID and kernel name.

Now, regarding the crash issue:

The C++ code works fine when running Tiny-YOLOv3. However, when I run the YOLOv3 model, I get a DPU timeout error. Initially, it works, but after trying 4–5 times, I suddenly lose the connection. When I try to reconnect, it does not work. I have to restart the system every time.

I monitored the system using the top command. When the crash happens, I can still see my C++ process running, but it is not utilizing much CPU. Memory usage spikes to around 50–60%, and overall memory usage is around 7–8%. After that, the display gets stuck.

also in tiny-yolov3 i run 10-15 times its run but sometime i face the same issue.

I have explained the complete process above, but I still have some doubts. Please help me understand what might be happening.

hack110011

Hi @hack110011,

The issue could be in many places, I think you will have to narrow down a bit more, probably adding ILAs to the system. However, this issue is outside the scope of this forum, pynq runtime and PYNQ SD card build.
You may be better off asking this question in the Xilinx forums.

Mario