Why a 6x6 matrix only print out 32 elements?

Hi dear all,

I used vitis HLS 2022.2 and vivado 2022.2 to my object.
My output expected to a 6X6 matrix, i had checked it in matlab and C++ environment for simulation.
But when I use the similar (same structure) to run during vitis and vivado, and put them into FPGA broad.
I can only output the elements to 32, these are correct with matlab and C++ results.
Why the other 4 were gone? I have no idea with this
thank you for your attention.

Please have a reference with my following result:
matlab:
P_ss =

12.6831 8.6205 4.6556 6.2474 -4.5267 -0.1616
8.6205 13.6159 6.6142 13.6793 -2.9391 0.2384
4.6556 6.6142 14.9192 11.6875 5.6499 0.9135
6.2474 13.6793 11.6875 19.1106 5.3445 1.2266
-4.5267 -2.9391 5.6499 5.3445 17.6676 2.1206
-0.1616 0.2384 0.9135 1.2266 2.1206 0.4714

jupyter:
image

1 Like

@shadow346015

Welcome to PYNQ forum =]

Too many missing background.
Need more info to resolve such issue

ENJOY~

Hi briansune, thanks for your replying

here is the call out of FPGA

1 Like

@shadow346015

Can you post the DMA block settings?
Possible all the block connection image capture.
What kind of HLS multiplication you had implemented IEEE754?

ENJOY~

Hi @briansune
here are the setting of dma and the block connection

1 Like

1 Like

@shadow346015

My best guess if 512/32 = 16.
So simply speaking 6X6 = 36/16 floor = 2 aka 2x16 = 32.

So what do you think the DMA block setting could resolve such issue?
Or how do you make the matrix to support a complete transferring?

ENJOY~

Hi @briansune
I am not sure where you specify the calculation above.
But maybe meet my setting of limit(almost the same)

image

image

@shadow346015

OK, when the DMA engine is based on 512 bit transfer and 32bit IEEE754 makes 16 set.
So if your DMA engine cannot handle unaligned transfer 36 set will not able to mask the remain 4 set of data.

Don’t modify the HLS first, and observe what will happen if modify the DMA engine or just the Python script.

Your bus is fixed to 512 wide bus.
So solutions:
A: try 48 set of result if return complete. If this return success, then prediction is correct.
B: try activate DMA unaligned transferring. If both A, B also settle then sure it is what the root cause.

Always solve issues by making a reasonable assumptions and cross examinations.

ENJOY~

Hi @briansune
Thanks for your replying.
The final script was fixed based on j_limit = 16 ; i_limit = 2, these caused my loop only run 32 times as i_limit floor.

But base on this DMA setting, does it mean I can only transfer a 6 digits value? Even when my data type setting is “double”?

@shadow346015

Nope, DMA bus width and the data format is all free to use. This is purely based on your design constrains.

If you had activate unaligned transfer. Then only thing to concern is transfer cycle. While the final data you can masked out whatever the MSB information.
In example, if your data is 18b and bus is 32b then the tricks are 24b as 18b payload and ID would be needed to sperate the payload destination. (So 6 bit is wasted on the transfer).

Remember a aligned transfer always make computation more effective in digital world. So I have a good suggestion on data format:
Use byte based information standard IEEE754 for example.
Use LUT of finite data to reduce resolution but far far less bit information.
Reduce bus transfer needs by computing all inside the PL logic and only pass the final result to the CPU. (Accelerator design methodology).

ENJOY~

@briansune

thanks for your kindly suggestions.
that help me a lot as a freshman in FPGA universe. :slight_smile:

1 Like