PYNQ: PYTHON PRODUCTIVITY

Tutorial: using a HLS stream IP with DMA (Part 3: Using the HLS IP from PYNQ)

Introduction

This part of the tutorial will show how to use a HLS IP we created earlier with PYNQ. Part 1 showed how to create the HLS IP and Part 2 showed how to create the hardware design.

The source document for these instructions is a Jupyter Notebook. You can find it on the GitHub repository linked below. The notebook can be copied to your board and the code cells can be executed to test the design created in the previous steps.

Sources

HLS stream tutorial GitHub repository (includes the Jupyter notebook, BIT and HWH used for this part of the tutorial along with the source files to rebuild the Vivado project in the earlier parts of this tutorial)

Hardware design

This overlay consists of an AXI DMA and connected to the HLS IP with AXI streams created earlier.

The code for this example is very similar to a previous tutorial showing how to use the DMA. The main different, or addition, is that the HLS IP has a control interface can be used to start, stop and autostart (or continuously run) the HLS IP. We can also check if the IP is done (i.e. if an iteration has completed).

Once the HLS IP is running, it will wait for data from the DMA. This means that reading and writing data via the DMA effectively controls the flow of data through the HLS IP.

Copy the design files to the board

Rename and copy the BIT file and HWH file created by Vivado to a folder on your board. I renamed the files to:

  • dma_axis_ip_example.bit
  • dma_axis_ip_example.hwh

If you downloaded this notebook from GitHub you can also copy it notebook to the same folder. Alternatively, you can create a new Jupyter Notebook and copy the code into the new notebook to run it.

Instantiate and download the overlay

from pynq import Overlay

ol = Overlay("./dma_axis_ip_example.bit")

We can check the IPs in this overlay using the IP dictionary (ip_dict).

ol.ip_dict
{'axi_dma': {'addr_range': 65536,
  'device': <pynq.pl_server.device.XlnkDevice at 0xb3adf910>,
  'driver': pynq.lib.dma.DMA,
  'fullpath': 'axi_dma',
  'gpio': {},
  'interrupts': {},
  'mem_id': 'S_AXI_LITE',
  'parameters': {'C_BASEADDR': '0x40400000',
   'C_DLYTMR_RESOLUTION': '125',
   'C_ENABLE_MULTI_CHANNEL': '0',
   'C_FAMILY': 'zynq',
...

The HLS has the default name from the Vivado project of example_0.

Check help for the HLS IP:

ol.example_0?
e[0;31mType:e[0m        DefaultIP
e[0;31mString form:e[0m <pynq.overlay.DefaultIP object at 0xaee96890>
e[0;31mFile:e[0m        /usr/local/lib/python3.6/dist-packages/pynq/overlay.py
e[0;31mDocstring:e[0m  
Driver for an IP without a more specific driver

This driver wraps an MMIO device and provides a base class
for more specific drivers written later. It also provides
access to GPIO outputs and interrupts inputs via attributes. More specific
drivers should inherit from `DefaultIP` and include a
`bindto` entry containing all of the IP that the driver
should bind to. Subclasses meeting these requirements will
automatically be registered.

Attributes
----------
mmio : pynq.MMIO
    Underlying MMIO driver for the device
_interrupts : dict
    Subset of the PL.interrupt_pins related to this IP
_gpio : dict
    Subset of the PL.gpio_dict related to this IP

This tells us that this is not a known IP (type is DefaultIP) and will get assigned a default driver in PYNQ. The default driver provides MMIO read/write capability.

Create aliases

Using the labels for the HLS IP and DMA listed above, we can create aliases which will make it easier to write and read the rest of the code in this example.

dma = ol.axi_dma
dma_send = ol.axi_dma.sendchannel
dma_recv = ol.axi_dma.recvchannel

hls_ip = ol.example_0 

Check the status of the HLS IP

hls_ip.register_map
RegisterMap {
  CTRL = Register(AP_START=0, AP_DONE=0, AP_IDLE=1, AP_READY=0, RESERVED_1=0, AUTO_RESTART=0, RESERVED_2=0),
  GIER = Register(Enable=0, RESERVED=0),
  IP_IER = Register(CHAN0_INT_EN=0, CHAN1_INT_EN=0, RESERVED=0),
  IP_ISR = Register(CHAN0_INT_ST=0, CHAN1_INT_ST=0, RESERVED=0)
}

Note that the HLS IP is not started yet (AP_START=0). You can also see the IP is idle (AP_IDLE=1).

We will start the HLS IP and then start some transfers from the DMA.

We could initiate the DMA transfers first if we preferred. The DMA transfers would stall until the IP is started.

Start the HLS IP

We can start the HLS IP by writing 0x81 to the control register. This will set bit 0 (AP_START) to “1” and bit 7 (AUTO_RESTART) to “1”. AUTO_RESTART means the IP will run continuously. If we don’t set this then after the IP completes one full operation or iteration, it will stop and wait until AP_START is set again. We would have to set this every time we want the IP to process some data.

CONTROL_REGISTER = 0x0
hls_ip.write(CONTROL_REGISTER, 0x81) # 0x81 will set bit 0

Check the correct bits have been set.

hls_ip.register_map
RegisterMap {
  CTRL = Register(AP_START=1, AP_DONE=0, AP_IDLE=0, AP_READY=0, RESERVED_1=0, AUTO_RESTART=1, RESERVED_2=0),
  GIER = Register(Enable=0, RESERVED=0),
  IP_IER = Register(CHAN0_INT_EN=0, CHAN1_INT_EN=0, RESERVED=0),
  IP_ISR = Register(CHAN0_INT_ST=0, CHAN1_INT_ST=0, RESERVED=0)
}

DMA send

Now we will send some data from DRAM to the HLS IP. Once the HLS IP is started, the steps are the same as the previous DMA tutorial.

Note the array used below is uint32. This was selected to match the type of data width of the HLS IP and the widths used for the DMA. The DMA will transfer blocks of data and may “reformat” it based on the internal data widths in the hardware. The array we create in the notebook effects how the data is formatted in Python. This is an area that can cause problems so it is worthwhile checking you understand your data formats and data movement in your hardware. If you see the wrong

from pynq import allocate
import numpy as np

data_size = 100
input_buffer = allocate(shape=(data_size,), dtype=np.uint32)

Initialize the array.

for i in range(data_size):
    input_buffer[i] = i

Start the DMA transfer

dma_send.transfer(input_buffer)

DMA receive

Readback data from the HLS IP and store in DRAM. Start by creating the output buffer

output_buffer = allocate(shape=(data_size,), dtype=np.uint32)
dma_recv.transfer(output_buffer)

Print first few values of buffer

The result from the HLS IP should be “i”+5

for i in range(10):
    print('0x' + format(output_buffer[i], '02x'))
0x05
0x06
0x07
0x08
0x09
0x0a
0x0b
0x0c
0x0d
0x0e

Verify that the arrays are equal

print("Arrays are equal: {}".format(np.array_equal(input_buffer, output_buffer-5)))
Arrays are equal: True

Free all the memory buffers

Don’t forget to free the memory buffers to avoid memory leaks!

del input_buffer, output_buffer

Summary

In this tutorial you saw how to create a HLS IP with AXI Streams, incorporate this IP into a Vivado design and connected the AXI streams to a DMA, and how to use the HLS IP from PYNQ.
The HLS design was very simple, incrementing the input value and writing it to the output. You should be able to see how you can create more advanced HLS kernels. You can also have multiple AXI stream interfaces in your HLS kernel by adding more input or output parameters.

The AXI DMA used in this design supports one AXI stream input and/or one AXI output stream. You can use multiple DMAs if you need to send data to additional AXI stream interfaces.

3 Likes