Interactive C++ on the Kria-SoM in Jupyter Lab

Interactive C++ on the Kria-SoM in Jupyter Lab

With xeus-cling and Jupyter lab, it is possible to write interactive, interpreted, modern C++ that can execute on-target and interface with programmable logic. This blog will demonstrate this and provide instructions on setting up an environment on the Kria-SoM.

Interactive C++ is exciting from a reconfigurable SoC development and deployment perspective. C++ is commonplace in embedded system development, and using C++ notebooks means we can easily integrate with legacy code and drivers. Furthermore, using Jupyter labs, C++ notebooks can be used alongside Python notebooks, allowing developers to leverage PYNQ Python productivity and C++ performance.

This blog will demonstrate three things:

  1. Interactive C++ on the Kria-SoM. We will develop a C++ userspace driver for a simple IP core in an interactive Jupyter notebook.
  2. A Vitis AI demo. Where we take a preexisting demo application and run it in an interactive C++ environment with minimal effort.
  3. A setup guide. So that you can also experiment with interactive C++ notebooks on your Kria-SoM.

Interactively creating a C++ IP-core driver.

Let’s say that we have a simple IP core in our system that counts clock cycles. It has a single AXI_S interface (S00_AXI) with a single memory-mapped register. If we read this register, it returns the number of clock cycles the counter has counted; if we write any value to the register, we clear the counter. The input clock we’re counting is the AXI Bus clock which runs at 100MHz.

Let’s develop an interactive C++ driver for this IP using Jupyter Lab running on the Kria-Som. First, let’s open two notebooks side-by-side: a Python notebook, where we can configure the overlay and inspect the hardware, and a C++17 xeus-cling notebook, where we will write the userspace driver code.

You may have to click to play the gif
open_two_notebooks

Next, we can use the Python notebook to load our overlay and inspect the IP. Below, you can see that I imported the Overlay module from the PYNQ Python library which I then use to load our bitstream, containing our simple IP core, onto the programmable logic.

With the overlay loaded, let’s get to work in our interactive C++ notebook creating the userspace driver.

First off, let’s include some header files for printing the output and getting a virtual address from a physical address. Above you can see that I include stdio.h for printing, stdint.h for some data types, and paddr2vaddr.h a simple header file that I created which can obtain a virtual address for a physical one using /dev/mem.

Now we can start writing a simple driver class, AXICycleCounter. First, we create a constructor that gets a virtual address from the physical address and stores it in a private data member. We can then write some class methods, count, to return the count by reading the base address of the virtual address, and reset, to clear the counter by writing to it.

You may have to click to play the gif
base_address

We can now instantiate an object for our driver IP. Here you can see that we instantiate an AXICycleCounter object, called counter, in a new cell. We then go to the Python notebook where we instantiated the overlay to grab the base address of our IP, 2148204544, and use that to initialise our object.

You may have to click to play the gif
using_the_counter

Let’s use our driver to print out the number of clock cycles our IP counts several times, resetting each iteration by calling the counter.reset() method. We can see that the number of cycles is a little higher on the first iteration but is then consistently 31 cycles.

You may have to click to play the gif
adding_the_time_method

Great we have something working, however, after creating the driver, we might decide to display the actual result in time, not the number of clock cycles. We know that the AXI clock we are using in our counter runs at 100MHz, so let’s use that value to create a new method in our class to return the time in microseconds. This is where the real magic of interactive C++ happens. We can return to the cell containing our class, add the new method, and rerun the cell as seen above.

You may have to click to play the gif
printing_us

Then if we change the function call in our loop, from count() to time_us(), to match our newly created method, we can see it now prints the time.

Using xeus-cling, we can provide a lightweight, interactive, C++ development experience on the Kria-SoM and other Zynq style reconfigurable SoC devices. While this was a simple example, we can see the power of this approach, where we can combine the productivity provided by Python and PYNQ with the raw performance of C++ host code. All while preserving the interactive development experience supplied by Jupyter. Let’s now move on to a move complicated demo, where multiple libraries need to be linked to form an application.

Vitis AI demo

Another advantage of utilising xeus-cling is that it is possible to take pre-existing C++ code and execute it with minimal effort in our new interactive environment. We just released DPU-PYNQ for the Kria-SoM, so as an example, let’s take a more complex Vitis AI demo and run it directly in a C++ notebook.

The DPU-PYNQ Kria-SoM repo contains examples where Xilinx DPU programmable logic and application code performs object detection on some example image data. In one of the examples, pybind11 was used to take some pre-existing C++ demo code and integrate it with the Python PYNQ libraries. While this is great, it does involve some effort to set up, requiring the rewriting of some of the interface code to be compatible. However, once we have xeus-cling setup on our device, we can seamlessly use the C++ host code with minimal modification, as we will see shortly, and still use our productive Python PYNQ environment for managing the overlay.

[What we did previously] Running the DPU-PYNQ examples with Python bindings.

You may have to click to play the gif
dpu_pynq

In the DPU-PYNQ repo, the notebook dpu_resnet50.ipynb creates python bindings to interact with the original C++ demo code using pybind11. The notebook does the following:

  1. It starts by loading the overlay using the PYNQ libraries.
  2. It makes a header file common.h.
  3. It uses the %%pybind11 cell magic to create python bindings around a modified version of the original demo C++ code.
  4. It executes the demo, using the sys_pipes() module from the wurlitzer package to capture the output and display it.

Using pybind11 in this way is excellent; we can run almost the original C++ demo code while retaining the productivity of interacting with the Overlay in Python. However, there was still considerable friction in getting it to work. Mainly:

  1. Modifications are necessary to the original code, the most significant ones being at the function interface when creating the bindings, where only simple pointer types are allowed. In this case, the pointer to the runner object was type pruned to a void pointer at the interface and then recast to the correct type.
  2. Compiling the bindings can take considerable time, in this case about 30 seconds. This extra time disrupts the interactive experience of using a Jupyter notebook.

As we will see in a moment, using xeus-cling solves these issues:

  • We can take and run the original source as is. The only difference is that as cling is an interpreted version of C++, we must change the main function call, but that’s it.
  • Compiling code in this fashion is fast and feels responsive. Well, certainly a lot faster than waiting for pybind11.

[What we can do now] Running the Vitis AI demo with a C++ notebook

The first thing we can do is open our two notebooks side-by-side, a python one on the left for loading the overlays and inspecting the input image data and a C++ one on the right for running the interactive C++ host code.

You may have to click to play the gif
resnet50_python

Above, we can see that we use the PYNQ library to load the DPU overlay, dpu.bit. We then use OpenCV to display all the input images that we will use. This notebook is very similar to how the previous pybind11 DPU-PYNQ notebook started. But, unlike that notebook, we will switch to our C++ notebook to run the host code. However, first, we need to grab the Vitis AI C++ host demo code.

You may have to click to play the gif
vitis_ai_repo

So we can head over to the Vitis-AI repo and get the code for the resnet50 demo, which was the demo that was modified for our DPU-PYNQ example notebook dpu_resnet50_pybind11.ipynb. In particular, we are getting the common header files and the main.cc file. However, we will change the name of main.cc to dpu_cling.h and include it as a header file in our notebook.

The Vitis AI demo makes extensive use of pre-existing libraries, such as OpenCV. Using these libraries means that to compile this successfully in C++ we need to tell the compiler where these libraries exist and which ones to load. However, luckily xeus-cling provides useful pragmas for providing this information to the compiler within the cells. To add an include directory we simply type, #pragma cling add_include_path("path"); to add a library path, we use, #pragma cling add_library_path("path"); and to load specific libraries we use, #pragma cling load("library").

You may have to click to play the gif
resnet50_cpp

Now we are ready to run our interactive C++ host code. In this notebook we run three interactive cells:

  1. The first cell sets up the arguments for the compiler. This cell sets the include and library paths and loads the libraries using the cling pragmas. We are pointing these at a directory /miniconda3 because we use the Conda and Mamba package managers for this setup (See the xeus-cling-Pynq setup section below).
  2. The next cell contains all the header files, including our new header file dpu_cling.h which is a renamed copy of the Vitis AI main.cc file.
  3. Finally, the last cell runs the main function, located in dpu_cling.h. It is a little strange that we can call a main function like this, but we wanted to show you that directly copying and pasting the code is possible. We could, of course, alternatively give the main function another name in dpu_cling.h, or we could copy and paste the body of the main function and execute it within its own cell. The main function takes two command-line arguments; however, we can fake this in our cell to pass in the model name.

When the last cell is executes we can see that it prints the same classification for each input images as the original demo.

Hopefully, you’re as excited as we are about what possibilities xeus-cling opens up for interactive reconfigurable SoC development. For example, suppose we need to develop high-performance host code or integrate with legacy drivers and host code. In that case, we now have a way to do so in an interactive fashion improving the developer experience.

If you have a new Kria-SoM and are excited about this, then we would love to help you experiment with it. Below we have some instructions for how to set up a xeus-cling environment on your device.

Setup instructions

For setup we will target the Xilinx Kria KV260 AI-Vision starter kit.

1. Ubuntu base image

Download and install the latest Xilinx Ubuntu image for the Kria-SoM. You can find out the setup instructions [here].

2. Miniconda and Mamba package managers

In order to install LLVM and Clang along with xeus-cling we will use the Mamba package manager. To get the Mamba package manager we need to install miniconda. Conda creates an environment that we will use as our environment for setting up the system and tools. To get miniconda setup ssh into your development board and open a root shell, then type the following.

        wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.9.2-Linux-aarch64.sh
        chmod +x Miniconda3-py38_4.9.2-Linux-aarch64.sh
        ./Miniconda3-py38_4.9.2-Linux-aarch64.sh -u -b -p /miniconda3

        eval "$(/miniconda3/bin/conda shell.bash hook)"
        conda init

After executing those commands you should now be placed within your new conda environment and should see a message asking you to close and reopen your current shell, do that.

Next we will use conda to install mamba:

        conda install -y -c conda-forge mamba

3. Installing xeus-cling and PYNQ

Once conda is setup we can then use the following script to setup PYNQ and xeus-cling. Use the following command in a root shell on your device:

curl -o- https://raw.githubusercontent.com/STFleming/xeus-on-pynq/main/setup.sh | bash -

4. Run the Jupyter server

From a root shell launch the jupyter server

        jupyter server --allow-root

5. Open the browser

Now open the browser and navigate to the IP address of your device on the port 9090 as you usually would, and you should be greeted with the following:

launcher_resize

7 Likes

Nice tutorial.

I did notice that this installation seems to have broken the original pybind11 notebook dpu_resnet50_pybind11.ipynb

Thanks! This is likely because we have grabbed a different version of OpenCV using Mamba, I will take a look into fixing it.

Good afternoon, I didn’t understand the part about what paddr2vaddr.h should contain. I don’t know if you could make it explicit, sorry for the inconvenience.