Connecting to the ILA using HW Server [Part 2]

marioruiz · June 17, 2024, 12:28pm

Connecting to the ILA using HW Server

After designing the Overlay with a System ILA as described in part 1 of this series, we are going to look at how to connect and start analyzing the AXI4-Stream channels.

Prepare the Environment

To get system ILA information, the board needs to be connected via the micro USB cable to your development machine, where your Vivado is installed. You will also need the *.ltx file (<proj_path>/<proj_name>.runs/impl_1/*.ltx), this file indicates Vivado how the System ILA is configured and what signals are being monitored.

NOTE: For Kria boards it is recommended that you use the Debug Bridge instead of the micro USB cable. Check out how to Use an ILA without Physical micro USB cable

Connect to the System ILA

Create a new notebook and execute the following to load the overlay.

from pynq import Overlay, allocate
import numpy as np

ol = Overlay('dma.bit')
dma = ol.axi_dma
dma_send = ol.axi_dma.sendchannel
dma_recv = ol.axi_dma.recvchannel

In your Vivado instance, click Open Hardware Manger under PROGRAM AND DEBUG

1_open_hw_manager

Click on Open target, on the green bar that just appeared, then Auto Connect

2_open_target

This will automatically connect to a local hw_server that establish the

connection with your overlay running on the board.

If you do not see this, it is highly likely that the *.ltx file was not correctly loaded. Only follow the steps below if you do see the Waveform windows completely black, no AXI4-Stream information.

In the Hardware window, select your device, xc7z020_1 for PYNQ-Z2, as you can see it is already programmed. However, if you look into the Hardware Device Properties, you can see that the Probes file is empty. Click on the ... button and add the *.ltx file

Once the ltx files is added, the hardware manager will automatically refresh the Waveform window and show you AXI4-Stream channels that we being monitored.

Start Capturing with the System ILA

Now, that you are fully connected to the System ILA, we can do some test captures so you can familiarize with it.

The System ILA has many different options, I will only focus on some of the basics. For more information, check out Vivado Hardware Manager Dashboards.

The two buttons that we will use the most are Run Trigger for this ILA core and Run Trigger immediate for this ILA core

6_waveform_buttons

Go ahead and click , as there is no activity in the AXI4-Stream channels you will see both MM2S and S2MM channels inactive. The waveform clearly shows that both channels are inactive and there is no stream activity.

Now, we are ready to set a trigger, in other words control when the System ILA starts capturing after an event in one of the signals of our choosing.

Drag the TVALID signal from the MM2S channel and drop it on the Trigger Setup - hw_ila_1 window, then set the trigger Value to R (0-to-1 transition), i.e., rising edge. I prefer R rather than 1 (logical one) at it ensures that it triggers in the transition, for this example both options should work. Note that the Core status is Idle.

Click and note how the Core status changes to Waiting for Trigger, this means that the System ILA is now waiting for the trigger to happen in order to start collecting the transactions and display them in the waveform.

11_waiting_for_trigger

Go back to the JupyterLab notebook, allocate the buffers and only use the sendchannel of the DMA.

data_size = 16

input_buffer = allocate(shape=(data_size,), dtype=np.uint32)
output_buffer = allocate(shape=(data_size,), dtype=np.uint32)
input_buffer[:] = np.arange(data_size, dtype=np.uint32)
dma_send.transfer(input_buffer)
dma_send.wait()

After you run this code, the System ILA should have triggered and then updated the Waveform window. To simplify the visualization, let us change the radix of the TDATA signals to be Unsigned Decimal

Select both TDATA signals, then right-click on one of them and select Radix and click on Unsigned Decimal

Let us zoom in around the red T vertical line, which is where the valid signal for the MM2S channel is asserted for the first time. I am highlighting in orange the TVALID and TREADY signal of both channels and in yellow the TLAST. This is a good point in the blog series to point you to the AXI4-Stream Interface, but as very important note a valid transaction only happens when both TVALID and TREADY are asserted at the same time. Also, the DMA IP works in what it is called packet mode, which means that it is mandatory for the TLAST signal to be asserted to indicate the end of a transfer. Also, the DMA IP expects the TKEEP signal.

Note: I will leave up to the reader to familiarize with AXI4-Stream.

Now, let’s look at the Waveform. The MM2S channel is active from sample 512 to 528 and you can see that all the transactions are valid as both TVALID and TREADY are asserted, in the sample 527 TLAST is assert indicating the end of the transfer, this relates to the size of the input_buffer in our PYNQ code. After this TVALID goes to 0. In each of the samples, you can see how TDATA carries each element of the input_buffer array.

Bringing our attention to the slot_1 or S2MM channel, you can see that the stream goes active for 4 cycles, 515 to 519, then TREADY goes to 0, this is because the DMA preemptively loads 4 transactions (Stream Beat).

How would you setup the trigger to capture the rest of the S2MM channel transactions when we run the PYNQ code? Stop for for a few minutes to think.

I would add S2MM channel (slot_1) TREADY signal to the Trigger Setup (Value R)

Then I would set the trigger condition to Global OR.

15_trigger_condition_OR

Click to start the trigger.

Go back to the JupyterLab notebook, allocate the buffers and only use the recvchannel of the DMA.

dma_recv.transfer(output_buffer)

dma_recv.wait()

print(f'Are buffers equal after DMA? {np.array_equal(output_buffer, input_buffer)}')

I would leave up to the reader to analyze the waveform.

DMA Register Map

PYNQ allows you to read the DMA status via .register_map. This capability shows all the DMA register and their value. In the JupyterLab after completing the transfers in both send and receive channels run:

dma.register_map

For now, just focus on the MM2S and S2MM length registers, as you can see both report 64, this value indicates the number of requested bytes to be transferred in each channel. Each array has 16 elements, and the datatype is np.uint32 which is represented by 4-Byte. So, 16 x 4 = 64. Both DMASR registers indicate that the channel is Idle.

This concludes the second part of this blog series, see debugging common DMA issues.

Please, use the comments section for questions related to the content of this blog. If you have questions about your own design or unrelated topics, please create a new topic in the forum.

rgbblue · April 2, 2025, 1:18am

Hi, why does the DMA preemptively load 4 transactions? Is this the default behaviour of the DMA or is this just by coincidence from the FIFO - i.e is this due to the DMA telling the FIFO its ready or the FIFO telling the DMA that the data is valid - the loopback nature of this DMA tutorial.

If I were to design a custom IP to use with the DMA, will I need to account for this, even when the IP i write will control the TVALID signal?

marioruiz · April 2, 2025, 7:19am

Hi @rgbblue,

Is this the default behaviour of the DMA.

Yes, it is. If you check the DMA documentation you will find this.

this due to the DMA telling the FIFO its ready

The DMA will assert the TREADY for of the subordinate stream port for 4 transactions even if it is not started.