4kb boundary space for AXI4-LITE interface

Hi everyone.

I am using the PYNQ 2.7 image and the ZCU111 board.

The AXI4-LITE interface accepts only 4096 (4KB) to complete the transaction. When I give the input, it loads 4098, which means 4096 and +2 bytes. So on what basis is it taking the 4098 bytes?
If anyone has some idea about this, Please give me suggestions.

Thank you.

1 Like

Hi @Nagendra,

I do not understand your question. Some context is missing here.

The AXI4-LITE interface accepts only 4096 (4KB) to complete the transaction.

What does this mean? Where do you get this number?

When I give the input, it loads 4098, which means 4096 and +2 bytes. So on what basis is it taking the 4098 bytes?

What input?

Please provide a code snippet of what you are trying to do.

Mario

Hi @marioruiz

Thank you for your reply.

This is the Python code I am using.

def execute(self, test_data, batch_size, input_ch, input_dim, input_dim1, output_ch, output_dim, output_dim1):
input_mat = test_data[0:batch_size]
print(‘finished input_mat’)
print('input_mat = ', test_data[0:batch_size] )
input_val = np.append([0, batch_size, 0, input_ch, input_dim, input_dim1, output_ch, output_dim, output_dim1], input_mat.ravel())
print(‘finished input_val’)
print('input_val = ', [0, batch_size, 0, input_ch, input_dim, input_dim1, output_ch, output_dim, output_dim1], input_mat.ravel())
in_buffer = allocate(shape=(input_val.shape), dtype=np.int16)
out_buffer = allocate(shape=(9 + output_ch * batch_size * output_dim * output_dim1), dtype=np.int16)
for i, v in enumerate(input_val):
in_buffer[i] = v
print('input buffer = ', input_val.shape )
print('output buffer = ', 9 + output_ch * batch_size * output_dim * output_dim1)
start_time = time.process_time()
print(‘finished copying’)
self.axi_dma_0.sendchannel.transfer(in_buffer)
print(‘finished sendchannel.transfer’)
self.axi_dma_0.recvchannel.transfer(out_buffer)
print(‘finished recvchannel.transfer’)
self.axi_dma_0.sendchannel.wait()
print(‘finished sendchannel.wait’)
self.axi_dma_0.recvchannel.wait()
print(‘finished recvchannel.wait’)
end_time = time.process_time()
print("Elapsed Test Time: ", end_time-start_time)
output_mat = out_buffer[9:].reshape(batch_size, -1).astype(np.float32)
print(‘finished output_mat’)
@staticmethod
def checkhierarchy(description):
if ‘axi_dma_0’ in description[‘ip’]:
return True
return False

I put a comment to see the error code value here.

The senchannel input is 4098, but it should be 4096 in order to complete the DMA transaction. The AXI interface boundary access is limited to 4 KB per transaction. In my case, the answer is 4098. It is more than 2 bytes. Because this process is taking place internally, we are unable to modify and send the bytes that we require.

Screenshot 2022-05-30 at 08-06-56 FPGA_CNN - Jupyter Notebook

It keeps running without showing the result on recvchannel.wait()

I feel this information is enough to understand the issues. However, if you still want more info about vivado parameters, I am glad to share.
Thank You.

I don’t think this is a problem with the AXI4-Lite.

You need to make sure that your IP is producing the amount of data you expect in the out_buffer and the sideband signals in the AXI4-Stream are correct. This was already mentioned in the other threads you have open.

A 4096 value in this register indicates that IOC_Irq is asserted.

Are the interrupts of the DMA connected to the PS?

Did you try without the .wait()? Put a timer instead of that and see if you get data in the output buffer.

Mario

@marioruiz
He had posted this problem again and again and what he is trying to achieve is an CNN design:
Which is based on one github repository with an older PYNQ version.
Meantime, a new version on PYNQ 2.7 had already post here in this community learn post:

Hi @briansune

I have done everything using Vivado 2019.2 I am in the last stage of completing the work. Now it will be a little more complicated to change the version.

I am able to implement minst using the same version and method, so why will my algorithm not work?

@briansune Every time I come up with different issues related to the DMA data loading.

@Nagendra you are making it difficult for anyone to help you.

The 4KB boundary should not be an issue. Yes, crossing a 4KB boundary is a violation of the AXI specification, but you can create designs that violate this and it can work OK.

You have posted some code for a Python function without any of the parameters you are using. These parameters determine the data size and data transfer.

When you do a DMA transfer (either a send or receive) the size of the buffer will determine the length of the DMA transaction.

If you are seeing 4098 sent, this is determined by the size of the buffer you allocate. If you only receive 4096, this is also determined by the size of the buffer.

The transactions can be symmetric - amount of data sent = amount of data received, or asymmetric where you sent or receive different amounts of data. Your IP will determine this and should set TLAST appropriately.

The forum is here to help, but you will need to debug your code yourself. If you are sending or receiving more data than you expect you will need to debug this yourself to determine why, and check where TLAST is set in the received stream.

Cathal

Cat,

I totally agree with you!
But I am not sure you are talking about the bus width or the bus tx payload byte numbers?
https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/AXI4-Stream-Interfaces
I just see that → “The maximum supported port width is 4096 bits, even for aggregated structs or reshaped arrays.”
Remember that bit-width of the bus and the tx bytes are completely different idea.

As my example MNIST used previous github repository design with older PYNQ revision: PYNQ-CNN-ATTEMPT/Minst-CNN at master · ZhaoqxCN/PYNQ-CNN-ATTEMPT · GitHub
Meantime, the learn example already proof this clearly:
4kb boundary space for AXI4-LITE interface - #5 by briansune
As design require a weight load feedforward and return and check it is identical and also inference with image loading and class prediction result return which is completely arbitrary.
And even the tx size of weight when quantized to 8 bit can easily over 4096 when FC layer i.e. [10x864] [out_ch x in_ch].

Hi @briansune

Thank you for the tutorial. I attempted to run your mnist on the ZCU111 & ZCU104 boards using a pynq 2.7 image, Vivado 2020.2, and Vitis HLS 2020.2.

I’m experiencing the same issue as I did with my algorithm. It became stuck at the recvchannel (). This suggests that the problem is not with HLS IP. It is shown in the Vivado block diagram. In the Vivado, I believe I made all of the necessary connections. This issue is caused by the Zynq ultrascale+? I have no idea how to deal with this because if an error occurs, I can be certain that the problem appears there, but in this case, it continues to run without publishing any errors. So I’m getting confused about where I should make changes.

Vivado Parameters

S_AXI_HPC0_FPD = 64 bit data width
M_AXI_HPM0_FPD = 64 bit data width

sanity_test_hls.ipynb (743.1 KB)

Can you post the jupyter log about any weight load on either 1 of the layer?

From my repository:

Hi @briansune

I can successfully load the weights and implement them on the ZCU111 board. When it comes to sanity tests, they continue to run without producing a result. So the issue is with the vivado block diagram?

mnist (1).ipynb (1022.8 KB)

Oh come on Nagendra, can you really make thing clear yourself not just copy others stuff and blindly follow. Of cause it wont pass sanity correctly. That are not for CNN revision it is for FC revision.
So there are no problem on PYNQ system and turns out all are your own design fault here and we cannot help out as this is your design not ours.

@marioruiz
@cathalmccabe
Problem had check, conclusion PYNQ Ultrascale had no issue just design fault.

Hi @briansune

Thank you so much for the clarification.

@Nagendra,

Is this solved? If so, can you please post the solution to help other people as well?

Mario