PYNQ: PYTHON PRODUCTIVITY

Overlay loading problem

It just hangs when trying to load my overlay. The difference between my previous video project and the current project is AXI IIC IP is implemented which is connected to pmod 0.

The errors can be seen from serial terminal as follows:

[   95.022193] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   95.031079] rcu:     1-...0: (1 GPs behind) idle=7fa/1/0x4000000000000000 softirq=6241/6241 fqs=2626
[  158.042279] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  158.051224] rcu:     1-...0: (1 GPs behind) idle=7fa/1/0x4000000000000000 softirq=6241/6241 fqs=10191
[  221.062302] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  221.071278] rcu:     1-...0: (1 GPs behind) idle=7fa/1/0x4000000000000000 softirq=6241/6241 fqs=11346
[  221.083413] rcu: rcu_sched kthread starved for 13349 jiffies! g9709 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  221.096753] rcu: RCU grace-period kthread stack dump:
[  284.082312] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  284.091324] rcu:     1-...0: (1 GPs behind) idle=7fa/1/0x4000000000000000 softirq=6241/6241 fqs=11346
[  284.103526] rcu: rcu_sched kthread starved for 29104 jiffies! g9709 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=0
[  284.116983] rcu: RCU grace-period kthread stack dump:
[  347.102325] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  347.111434] rcu:     1-...0: (1 GPs behind) idle=7fa/1/0x4000000000000000 softirq=6241/6241 fqs=11346
[  347.123716] rcu: rcu_sched kthread starved for 44859 jiffies! g9709 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=0
[  347.137234] rcu: RCU grace-period kthread stack dump:

1 Like

I’m not a Linux expert, but I think those messages are information rather than errors. A quick google indicates there can sometimes be a valid reason for these stalls, but if you are seeing a lot of these messages and your Overlay doesn’t work they are probably pointing to a problem.

One problem with loading overlays is if an AXI transaction is in progress while you change overlay. THe transaction never finishes and can hang the bus. If you have video running this can happen frequently - video in/out can be constantly writing data to memory, so the chance of reloading the overlay while a transaction is in flight is high.

Could this be happening in your case? If there is no Overlay running before you program yours, is there something in your overlay that could be hanging one of the AXI interfaces?

Cathal

2 Likes

I have misspelled it (video->vivado).
It’s happening when loading the overlay. Nothing is loaded before. It never passes the loading overlay line. I have tried to print a line after this which was never executed. Both terminal interface and jupyter froze and I had to restart the board.

What do you mean by this, I couldn’t understand properly. the project before has a lot of IPs in the vivado project. It was running okay. after implementing another axi iic, I am facing this problem. But how axi transactions will start even before the bitstream is loaded?
Thank you for the reply. I will try removing the IP and try again. I will update if the problem persists.

1 Like

I’m speculating. If for example the PS tried to write something to your design (e.g. a MMIO translation on the AXI GP port) and your design doesn’t respond, that could hang the AXI interface. The PS would wait for a response which it never receives.

I think I would try add an ILA to the PS-PL interfaces, and check if any transactions are hanging after you load the overlay.
If you only changed one IP between a previously working design and this one, that is an obvious place to start looking for problems.

Cathal

2 Likes

I have loaded the previous bitstream, it is working. But if I try to load the bitstream with AXI IIC, the board frooze. I am still a bit confused, how ps will start a transaction before giving any command to it.
I will follow your guidance and will check if there is any transaction.

If there is nothing running on the PS side, then it won’t start a transaction randomly. This is why I gave the example of video IP in the PL running and writing back to memory. This would be the PL triggering a problem on the AXI HP ports.

What SPI slave is your AXI SPI connected to? Do you try to load any driver load for the attached slave when you load the overlay? This would be an example of how the PS could trigger this issue. i.e. if the driver tried to communicate with the slave and it was at the wrong address or the controller did not respond.

As I said, I’m speculating. It could be a different problem.

Cathal

Hi,
As a side note, if you experience lots of bus hangs and you are using a Zynq Ultrascale sytstem, there is a watchdog that you can enable.
It’s not good for production, but in debug it saves hours of reboots.

Also, to synthesize what Cathalmccabe said: it happens when you access an AXI address where no slave is answering. You must find which program is doing this.
This could a running software that you forgot to stop before uploading a new bitstream.
Also, at boot there is a led-blinking program, and if you are fast enough you could upload a bitstream before it complete and cause the same error. So make sure the leds are done blinking before you do anything or disable it with systemctl stop bootpy.

I am using AXI IIC (not spi) which is connected to PMOD (this is the new one). There is another axi iic that is connected to a deserializer through high-speed pins (FMC LPC).
also, I have an encoder that is connected to PS IIC through PMOD2.

There is smbus library imported, but not initiated anything before loading the overlay.

I will try and findout how to use it.

I am just guessing, it is from the custom axi lite interfaces which have given some problems before as well. Removing some registers from them has solved the problem before (though I could not understand how). Now I think after implementing AXI IIC it’s again somehow led back to the previous state.

Nothing I know of, if there is any default ones apart from those.

It always happens to me :sweat_smile:. I always forgot and had to restart the board. Now I wait until the all lights up if I remember.