Accelerating the partial reconfiguration process?

Hello everyone,

I think my topic answers itself but I figured I should ask before making my own conclusions.

I’ve been using PYNQ for a bit to test some DPR HDL designs I’ve been using. Using Vivado 2018.2, I created my custom design following this tutorial with relative success. If I’m able to perform DPR and communicate via DMA using the Overlay library, the partial reconfiguration process using Overlay.pr_download to reconfigure a partial design takes way too long for my needs. Below is a screenshot of measured execution time.

exectime

Target board is the PYNQ Z2, using PYNQ v2.6 image. Partial bitstream is 668kB in size, and the associated hardware handoff file is 18kB. I’m 99% sure PYNQ is using PCAP but can’t find a source on this. I can’t really give a broad estimation of the PCAP bandwith since the pr_download function does a bit more than merely loading the bitstream. Nevertheless, I’m guessing partial reconfiguration through PCAP takes most of the measured time.

I was thinking of ways to reduce this reconfiguration time significantly. Unless there is some ways to reduce the PCAP PR time that I’m unaware of, I was considering loading the partial bitstream through ICAP which has higher bandwith (up to 400mbps if I remember correctly the Xilinx HW ICAP IP documentation), and litterature has shown works that made PR via ICAP quite fast (up to 1Gpbs and more).

However, I guess using an external ICAP IP would break the overlay library since the latter wouldn’t be loaded with the hwh files, and ultimately break the library with now undetected IPs in the design. On the other hand, since the ip_dict only shows the AXI Lite interface of my accel that’s connected to the AXI GP0 of my PS since the AXI Stream interfaces are connected via DMA, with proper care maybe it could still work. See below snapshots of my design and printed ip_dict.

accel_bd

So to clarify my questions:

1/ Are my partial reconfiguration times using pr_download normal ?
2/ Can we accelerate the PR process?

Thank you kindly,

Alexis

Edit: using the pynq.Bistream class I’m able to lower the reconfiguration time to 60ms, but this comes with the mentionned problems, ie having to be extra careful with library usage with DPR-ed IPs. I’m still looking for ways to lower this if possible.

1 Like