Can it separate the loading of overlay and python SW driver into different files?

Hi, dear PYNQ experts,

I use PYNQ-Z1 for application development.

In the beginning, I put all the python program in one single .py file and then execute via python3.6 xxx.py. It includes loading the overlay .bit, the configuration of control registers in the PL and then enables the hardware in the PL. However, it takes a very long time to wait for complete in command script mode.

After some debugging by profiling the execution time of the .py code, I found that the loading of the overlay .bit file takes 6.x sec which dominates the 99% of the execution time of the .py code. For the demo, it’s not a good user experience since all are waiting for the result to come out.

Thus, I’m thinking how you usually tackle the problem. Can I separate the .py program into one .py program to load the overlay and one .py program for configuration and enable the hardware in the PL and then execute the two .py program in sequence? I only need to load the overlay once before execution of the Python PL driver code.

Would you please share your comments or best practices if any ?

Thank you

1 Like

Yes, you could do this.

You could also get the “download” part to run at startup if you want, and only run the “user” portion on demand.

Cathal

1 Like

I don’t see why you can’t. Usually we just put all the codes in a notebook; the 1st notebook loads the overlay, while in the later cells you can run the computation codes as many time as you can.

Thank you.

Since I’m not the user of Jupyter, I only run the python script in command mode. However, I’ve learned from you now.

Hi, cathal,

While I tried to implement in python script, I encounter a problem. As you can see from the following python example code, I try to separate the code into two python files. The first one python file is the code labeled with Part 1 to load the overlay. And another python code is the Part 2 which to execute the test.

Because the first line of part 2 code is to instantiate an object “add_ip = overlay.scalar_add”. However, overlay is not defined in part2.py. How can you separate into two files ???
What’s your approach ? Thank you

Assume I separate the whole python code into the two pieces to run in command mode. In startup, part1.py will be executed to load the PL image. While in test phase, part2.py can be executed separately.

//Part 1 to load the overlay ==> part1.py
from pynq import Overlay
overlay = Overlay(‘/home/xilinx/tutorial_1.bit’)

//Part 2 for test code ==> part2.py
add_ip = overlay.scalar_add
add_ip.write(0x10, 4)
add_ip.write(0x18, 5)
add_ip.read(0x20)

Good point.

Add this to part 2:
overlay = Overlay(’/home/xilinx/tutorial_1.bit’, download=False)

This will instantiate the overlay without downloading the bitstream.

Some info here:
https://pynq.readthedocs.io/en/v2.5/pynq_libraries/overlay.html

Cathal

1 Like

Hi, Cathal,

I tried your suggestion and it works. But, I found that I also need to insert the code “from pynq import Overlay” in the part2.py file. After profiling, I can make sure that the most time consuming part of the code execution is the line “from pynq import Overlay”. It takes 6.2 sec to import the Overlay and other 0.3 sec to execute the code on PYNQ-Z1 FPGA platform.

If that’s the case, even I split into two separate python files, it still takes long time to execute the driver code. Do you have any suggestion ?

Maybe I should use Jupyter notebook to load the pynq Overlay class and FPGA image first which executes the part1. And then execute the part2.py in the notebook. In this regard, does it mean I can use the following part2.py file without to import PYNQ overlay again in the same Jupyter nootbook?

/Part 2 for test code ==> part2.py
add_ip = overlay .scalar_add
add_ip.write(0x10, 4)
add_ip.write(0x18, 5)
add_ip.read(0x20)

All the best,

You could use a notebook.

If you are writing a program, do you not run “part 2” in a loop?

If you only run through your design once, and only execute on the FPGA once, there will be a relatively large overhead to download the bitstream if your execution time on the FPGA is relatively short.

Cathal

Thank you. I’m newbie to the Jupyter Notebook.

I tried to open a Notebook and run the part1.py first. Then run the part2.py in separate line.
It seems the execution time of “from pynq import Overlay” minimizes to 0.00003 sec.

But when I try to execute a shell command and then execute the part2.py as following in the Notebook, the execution time of part2.py still takes 6.x sec for “from pynq import Overlay”.

ls -altr && python3.6 part2.py

Now I’m checking how to run the part2.py right after a script file in the Notebook which the execution time of the line “from pynq import Overlay” doesn’t take 6.x sec.

Thank you

I would suggest refrain from trying that. The reason why you have that 0.0003 sec is that you have loaded the overlay in previous cell, allowing following cells to use cached data efficiently. If you run that code in a different interpreter (the shell terminal), your code in jupyter will not know what has been run in your shell so it will be slow again.

I still don’t get it what you are trying to do. Can’t you just do everything in jupyter notebook, and exclude the overlay downloading time from the computation? In all of our example notebooks we are doing that - the overlay downloading is only required once, therefore it is not relevant to the computation which we run multiple times.

e.g.

from pynq import Overlay
overlay = Overlay('scalar_add.bit')
add_ip =  overlay.scalar_add

from time import time
start = time()
add_ip.write(0x10, 4)
add_ip.write(0x18, 5)
add_ip.read(0x20)
stop = time()
execution_time = stop-start
1 Like

@rock

Thank you for your suggestion. Why I’m trying to do that is because the whole system is C/C++ executable and it will trigger the python driver at the end of the execution. Since the overlay import takes 6.x sec, I would like to mimic the loading time to have better demo experience.

If so, why don’t you just use a python program that wraps up everything? Like:

# Load overlay
overlay = ...

# Run your C code
os.system(...)

# Run your computation
adder_ip.write()

1 Like

@rock

Exactly. That’s what I’ve come up with after discussion with you in the forum. Thank you for your time in exchanging.