Testing Tensorflow 2.5 in Zybo-Z7 running Pynq 2.7

This post summarizes my unsuccessful attempt to correctly execute Tensorflow 2.5 in the Zybo-Z7 running Pynq 2.7.
The stages described in this post come from the following sources:

In addition, the post uses the Pynq 2-7 image for Zybo-Z7 proposed in Pynq 2.7 for Zybo-Z7, which can be downloaded from Zybo-Z7-2.7.0.img - Google Drive .

Lastly, a demonstration video of the TensorFlow test can be found in: https://www.youtube.com/watch?v=3m7kavySEYE. Note that I haven’t speed up the video to give you an idea of the execution time. For users interested in just having the big picture please fast forward the video :stuck_out_tongue:.

1. Installing Python3.7 (50 min approx.)
The Pynq 2.7 release comes with Python3.8, but the latest TensorFlow wheel available for the
armv7l architecture is compiled for Python3.7. For this reason, the former python version must be installed in the Pynq.

2. Create a virtual environment for Python3.7 (5 min approx.)
To avoid unresolved dependencies between packages and be able to create an Ipython kernel in the future, it is necessary to create a virtual environment.

  • $ cd /home/xilinx/
  • $ python3.7 -m pip install virtualenv
  • $ python3.7 -m virtualenv env
  • $ source /env/bin/activate

3. Install TensorFlow in the virtual environment (> 5h approx.)
Warning: This step can take a lot of time. I’ll comment the estimated time for each command.
To get the TensorFlow 2.5 wheel there are two alternatives:
The first one: using the scripts:

OR The second one: Download the wheel file from https://drive.google.com/uc?id=1iqylkLsgwHxB_nyZ1H4UmCY3Gy47qlOS and copy it in /home/xilinx/, which is located in the ROOT partition.

Then, use these commands:

  • (env) $ apt update # 2 min approx.
  • (env) $ apt install libhdf5-dev # 2 min approx.
  • (env) $ pip3.7 install --no-binary=h5py h5py # 2 h approx.
  • (env) $ pip3.7 install tensorflow-2.5.0-cp37-none-linuxarmv7l.whl # 2h approx.
  • (env) $ exec $SHELL
  • $ cd /home/xilinx/
  • $ source env/bin/activate
  • (env) $ python3.7 -m pip install matplotlib # 30 min approx.
  • (env) $ python3.7
  • >>> import tensorflow
  • >>> tensorflow.version
    The last command should return ‘2.5.0’.

4. Create the Ipython Kernel (1h approx.)
To use TensorFlow in a Jupyter notebook a kernel must be created.

  • (env) $ python3.7 -m pip install ipykernel # 1h approx.
  • (env) $ python3.7 -m ipykernel install --user --name=tfenv
    After this step the kernel for tensorflow has been created.

5. Testing tensorflow
The Notebook I used for testing Tensorflow can be found in https://gitlab.com/dorfell/fer_sys_dev/-/tree/master/01_hw/Pynq_Zybo-Z7. It is the same notebook used in the demonstration video.
In the last cell when executing “model.fit” the reported errors are:

  • 2022-05-05 22:33:04.821349: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
  • 2022-05-05 22:33:04.882152: W tensorflow/core/platform/profile_utils/cpu_utils.cc:118] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency

As indicated, the cpu_utils.cc should have a “condition” for the armv7l architecture. As seen for the aarch64 in https://github.com/tensorflow/tensorflow/pull/46643/files/239839b09e02b5766d61041f31027de4eb882c45.
However, as shown in the video this *.cc file is not available since the wheel is a pre-compiled package.
After a quick search using ls and grep, the only available files are *.so libraries and *.h headers files.
A possible solution would be cross-compiling Tensorflow from source code, considering the Python3.8 available in the Pynq 2.7 release. And of course the armv7l architecure.


  1. The total installation time could be more than to 5 hours.
  2. The time spent executing the notebook can be over several seconds, but at the end failed in the model.fit step.
  3. This post didn’t aim to test the inference process, maybe the performance is better in forward propagation. In addition, integer inference with TensorFlow Lite should be considered as well.
  4. Is it worth to create a DPU overlay to increase performance? (Maybe something similar to Xilinx DNNDK?).
  5. If it is really need to train in the embedded system, other libraries such as PyTorch should be taking into account.

Thanks for reading the whole post :sweat_smile:, I hope you weren’t bored :stuck_out_tongue:

1 Like

dorfel once again great work on investigating the Tensorflow on PYNQ.
Btw I would like to know a bit more after setting up env just like conda opening a folder to hold another python env.
I think it will be much clear to know to above install is under root right?
Meantime after creating the env to hold the tensorflow 2.5 with Python3.7 which I agree the latest found compiled revision of ARM Tensorflow is up to 3.7.
Do you install PYNQ package via pip3.7 as well and any thing need to paid attention to and issues are found.

Great work =]

1 Like

Hi Brian I hope you’re doing great. First of all, sorry for the late reply :sweat_smile: and thanks for reading the post.

  • I didn’t use Conda for creating the environment, instead I use the virtualenv package since I though it could be more simple. When calling the command $ python3.7 -m virtualenv env (as shown in step 2), a folder called env is created in the current directory. This folder will contain the binaries for the activate and deactivate environment routines.
  • The above mentioned process was made in the /home/xilinx/ directory, but of course you could create the environment anywhere, including the root. However, it is important to run the source $PATHtoENV/env/bin/activate command to get the environment running. After that the shell prompt will add (env) to indicate the activated environment.
  • I’m not sure if I understood the question correctly. So, here is my humble attempt to answer it. The Tensorflow 2.5 wheel was compiled for Python3.7 running on a CPU with the 32-bits architecture ARMv7l. For that reason, the installation was made using the command $pip3.7 install tensorflow-2.5.0-cp37-none-linuxarmv7l.whl . Take into account that all the packages needed by tensorflow (e.g. h5py, matplotlib, numpy, etc) are also installed inside the Python3.7 environment.
  • Issues: I didn’t test all the Tensorflow features. As reported, the model.fit method had a problem. However, you should test your application/model requirements to determine if it doesn’t need that method. One scenario that occurs to me, is to load the pre-trained weights and run the inference in the board. Thus, avoiding the NN training with TF in the Zybo-Z7 board.
    On the other hand, I could train a CNN model with PyTorch as reported here https://discuss.pynq.io/t/testing-pytorch-1-8-in-zybo-z7-running-pynq-2-7/4181. The installation process is similar and there are some example notebooks.
    Hope this can be helpful for anyone reading the post.

For Conda it is just an example that you are creating env for different python version this is very common just like Conda env control (this is mostly better on PC compare to pip if you ask me but some one like pip more).

Nein, Dorfell both tensorflow and tensorflow lite are not good for ZYNQ model load crash on either API. So it is a dead-end here at least for PYNQ 2.7 Python 3.8 No good, so don’t even try your method as I will suggest.
For weight extract Tensorflow Lite work well so quantized CNN or other network could try out I recommended this approach only. Meantime it also work well for PYNQ 2.7 Python3.8 environment so this community might try to figure out any way to make it works other than weight extracting somehow.

Enclosing a relative tutorial from mine as well about Tensorflow Lite and Overlay on CNN AXI4-Stream design.

1 Like

Hi Brian, I agree with you, at this moment TensorFlow is not a good option for Zynq FPGAs with 32-bits ARM processors. However, for the Zynq Ultrascale+ FPGAs with 64-bits ARM processors, we could have a better scenario, since is the same architecture which other boards have correctly executed TF (e.g. Raspberry Pi 4).
I saw your tutorial, great work thanks for sharing. Just curious about which board are you using?
Do you know something about the DPU on Pynq project https://github.com/Xilinx/DPU-PYNQ for Zynq devices? Ii mean, know they only aimed to Zynq Ultrascale+ boards, is there a reason to avoid 32-bit processors?

1 Like


So sad that I didn’t have ZYNQ Ultra scale on hand as most the work focus on pure FPGA in the pass.
So the closest is to use MB w Kintex U but this is not the best way.
My board is a custom one only got lowest LE xc7z010 so I don’t even waste time on the CONV and MAXPOOL as MNIST is really small network. Under my research in the pass BNN can suffer over 15 to 20% loss if not properly design.
So why not even just move to old days FC NN and achieve 90% accuracy which do its work perfectly.

Meantime, ZYNQ ARM7 can even do power thing we see as SIMD is already very good if proper designed but driver or API do need to work out with this that I cannot tell exist is plug and use.

I didn’t got enough study on the ARM of Ultrascale ZYNQ so I cannot tell much. But the story is getting more deviated if you ask me. FPGA is a ASIC subset to speed-up chip design and low-volume design or idea of proof purpose. So when designing application on powerful ARM hard-core I personal say it is losing the point here.

The only reason why I really want to run tensorflow on ARM is only to have a baseline on the ARM runtime vs FPGA acceleration time comparison nothing-else do worth. Meantime, why tflite or tf is needed is just make life easier when weight loading (lazy + less error) as additional weight export and load introduce more work to rectification.

Thank you

1 Like

Hi Brian, thanks for sharing your thoughts and for taking the time to develop your ideas.
I think this discussion can help to clear the panorama somehow, so people working in NNs in HW can
decide the more feasible strategy to evaluate.
As always, it is insightful to read you :grin:.
Have a nice day, regards

1 Like