Introduction
This part of the tutorial will show how to use a HLS IP we created earlier with PYNQ. Part 1 showed how to create the HLS IP and Part 2 showed how to create the hardware design.
The source document for these instructions is a Jupyter Notebook. You can find it on the GitHub repository linked below. The notebook can be copied to your board and the code cells can be executed to test the design created in the previous steps.
Sources
HLS stream tutorial GitHub repository (includes the Jupyter notebook, BIT and HWH used for this part of the tutorial along with the source files to rebuild the Vivado project in the earlier parts of this tutorial)
Hardware design
This overlay consists of an AXI DMA and connected to the HLS IP with AXI streams created earlier.
The code for this example is very similar to a previous tutorial showing how to use the DMA. The main different, or addition, is that the HLS IP has a control interface can be used to start, stop and autostart (or continuously run) the HLS IP. We can also check if the IP is done (i.e. if an iteration has completed).
Once the HLS IP is running, it will wait for data from the DMA. This means that reading and writing data via the DMA effectively controls the flow of data through the HLS IP.
Copy the design files to the board
Rename and copy the BIT file and HWH file created by Vivado to a folder on your board. I renamed the files to:
- dma_axis_ip_example.bit
- dma_axis_ip_example.hwh
If you downloaded this notebook from GitHub you can also copy it notebook to the same folder. Alternatively, you can create a new Jupyter Notebook and copy the code into the new notebook to run it.
Instantiate and download the overlay
from pynq import Overlay
ol = Overlay("./dma_axis_ip_example.bit")
We can check the IPs in this overlay using the IP dictionary (ip_dict).
ol.ip_dict
{'axi_dma': {'addr_range': 65536,
'device': <pynq.pl_server.device.XlnkDevice at 0xb3adf910>,
'driver': pynq.lib.dma.DMA,
'fullpath': 'axi_dma',
'gpio': {},
'interrupts': {},
'mem_id': 'S_AXI_LITE',
'parameters': {'C_BASEADDR': '0x40400000',
'C_DLYTMR_RESOLUTION': '125',
'C_ENABLE_MULTI_CHANNEL': '0',
'C_FAMILY': 'zynq',
...
The HLS has the default name from the Vivado project of example_0.
Check help for the HLS IP:
ol.example_0?
e[0;31mType:e[0m DefaultIP
e[0;31mString form:e[0m <pynq.overlay.DefaultIP object at 0xaee96890>
e[0;31mFile:e[0m /usr/local/lib/python3.6/dist-packages/pynq/overlay.py
e[0;31mDocstring:e[0m
Driver for an IP without a more specific driver
This driver wraps an MMIO device and provides a base class
for more specific drivers written later. It also provides
access to GPIO outputs and interrupts inputs via attributes. More specific
drivers should inherit from `DefaultIP` and include a
`bindto` entry containing all of the IP that the driver
should bind to. Subclasses meeting these requirements will
automatically be registered.
Attributes
----------
mmio : pynq.MMIO
Underlying MMIO driver for the device
_interrupts : dict
Subset of the PL.interrupt_pins related to this IP
_gpio : dict
Subset of the PL.gpio_dict related to this IP
This tells us that this is not a known IP (type is DefaultIP) and will get assigned a default driver in PYNQ. The default driver provides MMIO read/write capability.
Create aliases
Using the labels for the HLS IP and DMA listed above, we can create aliases which will make it easier to write and read the rest of the code in this example.
dma = ol.axi_dma
dma_send = ol.axi_dma.sendchannel
dma_recv = ol.axi_dma.recvchannel
hls_ip = ol.example_0
Check the status of the HLS IP
hls_ip.register_map
RegisterMap {
CTRL = Register(AP_START=0, AP_DONE=0, AP_IDLE=1, AP_READY=0, RESERVED_1=0, AUTO_RESTART=0, RESERVED_2=0),
GIER = Register(Enable=0, RESERVED=0),
IP_IER = Register(CHAN0_INT_EN=0, CHAN1_INT_EN=0, RESERVED=0),
IP_ISR = Register(CHAN0_INT_ST=0, CHAN1_INT_ST=0, RESERVED=0)
}
Note that the HLS IP is not started yet (AP_START=0). You can also see the IP is idle (AP_IDLE=1).
We will start the HLS IP and then start some transfers from the DMA.
We could initiate the DMA transfers first if we preferred. The DMA transfers would stall until the IP is started.
Start the HLS IP
We can start the HLS IP by writing 0x81 to the control register. This will set bit 0 (AP_START) to “1” and bit 7 (AUTO_RESTART) to “1”. AUTO_RESTART means the IP will run continuously. If we don’t set this then after the IP completes one full operation or iteration, it will stop and wait until AP_START is set again. We would have to set this every time we want the IP to process some data.
CONTROL_REGISTER = 0x0
hls_ip.write(CONTROL_REGISTER, 0x81) # 0x81 will set bit 0
Check the correct bits have been set.
hls_ip.register_map
RegisterMap {
CTRL = Register(AP_START=1, AP_DONE=0, AP_IDLE=0, AP_READY=0, RESERVED_1=0, AUTO_RESTART=1, RESERVED_2=0),
GIER = Register(Enable=0, RESERVED=0),
IP_IER = Register(CHAN0_INT_EN=0, CHAN1_INT_EN=0, RESERVED=0),
IP_ISR = Register(CHAN0_INT_ST=0, CHAN1_INT_ST=0, RESERVED=0)
}
DMA send
Now we will send some data from DRAM to the HLS IP. Once the HLS IP is started, the steps are the same as the previous DMA tutorial.
Note the array used below is uint32. This was selected to match the type of data width of the HLS IP and the widths used for the DMA. The DMA will transfer blocks of data and may “reformat” it based on the internal data widths in the hardware. The array we create in the notebook effects how the data is formatted in Python. This is an area that can cause problems so it is worthwhile checking you understand your data formats and data movement in your hardware. If you see the wrong
from pynq import allocate
import numpy as np
data_size = 100
input_buffer = allocate(shape=(data_size,), dtype=np.uint32)
Initialize the array.
for i in range(data_size):
input_buffer[i] = i
Start the DMA transfer
dma_send.transfer(input_buffer)
DMA receive
Readback data from the HLS IP and store in DRAM. Start by creating the output buffer
output_buffer = allocate(shape=(data_size,), dtype=np.uint32)
dma_recv.transfer(output_buffer)
Print first few values of buffer
The result from the HLS IP should be “i”+5
for i in range(10):
print('0x' + format(output_buffer[i], '02x'))
0x05
0x06
0x07
0x08
0x09
0x0a
0x0b
0x0c
0x0d
0x0e
Verify that the arrays are equal
print("Arrays are equal: {}".format(np.array_equal(input_buffer, output_buffer-5)))
Arrays are equal: True
Free all the memory buffers
Don’t forget to free the memory buffers to avoid memory leaks!
del input_buffer, output_buffer
Summary
In this tutorial you saw how to create a HLS IP with AXI Streams, incorporate this IP into a Vivado design and connected the AXI streams to a DMA, and how to use the HLS IP from PYNQ.
The HLS design was very simple, incrementing the input value and writing it to the output. You should be able to see how you can create more advanced HLS kernels. You can also have multiple AXI stream interfaces in your HLS kernel by adding more input or output parameters.
The AXI DMA used in this design supports one AXI stream input and/or one AXI output stream. You can use multiple DMAs if you need to send data to additional AXI stream interfaces.