In the beginning, I put all the python program in one single .py file and then execute via python3.6 xxx.py. It includes loading the overlay .bit, the configuration of control registers in the PL and then enables the hardware in the PL. However, it takes a very long time to wait for complete in command script mode.
After some debugging by profiling the execution time of the .py code, I found that the loading of the overlay .bit file takes 6.x sec which dominates the 99% of the execution time of the .py code. For the demo, itâs not a good user experience since all are waiting for the result to come out.
Thus, Iâm thinking how you usually tackle the problem. Can I separate the .py program into one .py program to load the overlay and one .py program for configuration and enable the hardware in the PL and then execute the two .py program in sequence? I only need to load the overlay once before execution of the Python PL driver code.
Would you please share your comments or best practices if any ?
I donât see why you canât. Usually we just put all the codes in a notebook; the 1st notebook loads the overlay, while in the later cells you can run the computation codes as many time as you can.
While I tried to implement in python script, I encounter a problem. As you can see from the following python example code, I try to separate the code into two python files. The first one python file is the code labeled with Part 1 to load the overlay. And another python code is the Part 2 which to execute the test.
Because the first line of part 2 code is to instantiate an object âadd_ip = overlay.scalar_addâ. However, overlay is not defined in part2.py. How can you separate into two files ???
Whatâs your approach ? Thank you
Assume I separate the whole python code into the two pieces to run in command mode. In startup, part1.py will be executed to load the PL image. While in test phase, part2.py can be executed separately.
//Part 1 to load the overlay ==> part1.py
from pynq import Overlay overlay = Overlay(â/home/xilinx/tutorial_1.bitâ)
//Part 2 for test code ==> part2.py
add_ip = overlay.scalar_add
add_ip.write(0x10, 4)
add_ip.write(0x18, 5)
add_ip.read(0x20)
I tried your suggestion and it works. But, I found that I also need to insert the code âfrom pynq import Overlayâ in the part2.py file. After profiling, I can make sure that the most time consuming part of the code execution is the line âfrom pynq import Overlayâ. It takes 6.2 sec to import the Overlay and other 0.3 sec to execute the code on PYNQ-Z1 FPGA platform.
If thatâs the case, even I split into two separate python files, it still takes long time to execute the driver code. Do you have any suggestion ?
Maybe I should use Jupyter notebook to load the pynq Overlay class and FPGA image first which executes the part1. And then execute the part2.py in the notebook. In this regard, does it mean I can use the following part2.py file without to import PYNQ overlay again in the same Jupyter nootbook?
/Part 2 for test code ==> part2.py
add_ip = overlay .scalar_add
add_ip.write(0x10, 4)
add_ip.write(0x18, 5)
add_ip.read(0x20)
If you are writing a program, do you not run âpart 2â in a loop?
If you only run through your design once, and only execute on the FPGA once, there will be a relatively large overhead to download the bitstream if your execution time on the FPGA is relatively short.
I tried to open a Notebook and run the part1.py first. Then run the part2.py in separate line.
It seems the execution time of âfrom pynq import Overlayâ minimizes to 0.00003 sec.
But when I try to execute a shell command and then execute the part2.py as following in the Notebook, the execution time of part2.py still takes 6.x sec for âfrom pynq import Overlayâ.
ls -altr && python3.6 part2.py
Now Iâm checking how to run the part2.py right after a script file in the Notebook which the execution time of the line âfrom pynq import Overlayâ doesnât take 6.x sec.
I would suggest refrain from trying that. The reason why you have that 0.0003 sec is that you have loaded the overlay in previous cell, allowing following cells to use cached data efficiently. If you run that code in a different interpreter (the shell terminal), your code in jupyter will not know what has been run in your shell so it will be slow again.
I still donât get it what you are trying to do. Canât you just do everything in jupyter notebook, and exclude the overlay downloading time from the computation? In all of our example notebooks we are doing that - the overlay downloading is only required once, therefore it is not relevant to the computation which we run multiple times.
e.g.
from pynq import Overlay
overlay = Overlay('scalar_add.bit')
add_ip = overlay.scalar_add
from time import time
start = time()
add_ip.write(0x10, 4)
add_ip.write(0x18, 5)
add_ip.read(0x20)
stop = time()
execution_time = stop-start
Thank you for your suggestion. Why Iâm trying to do that is because the whole system is C/C++ executable and it will trigger the python driver at the end of the execution. Since the overlay import takes 6.x sec, I would like to mimic the loading time to have better demo experience.