The result is different with pynq and HLS simulation

AllenChenChao · October 22, 2021, 7:03pm

Hello
I use a blog from pynq which teach us how to use axi-master do sqrt. The blog has no problem by my verification. After that, I moified the project and add more complex computation, but I find the result is wrong in pynq board(right in HLS simulation).
I am not sure where is problem. So I try to add my algorithm step by step. After 20 projects, I find if I add more computation after somewhere of the algorithm, The result in pynqboard will be wrong(still right in HLS simulation). I don’t know how to deal with it. When I add more code in notation ‘B:’, the board’s result will be wrong!

#include "sampen.hpp"
#include <string.h>
#include <math.h>
void axi4_sampen(float *in, float *out, int len)
{
#pragma HLS INTERFACE s_axilite port=return bundle=sqrt
#pragma HLS INTERFACE s_axilite port=len bundle=sqrt
#pragma HLS INTERFACE m_axi depth=50 port=out offset=slave bundle=output
#pragma HLS INTERFACE m_axi depth=50 port=in offset=slave bundle=input
#pragma HLS INTERFACE s_axilite port=in
#pragma HLS INTERFACE s_axilite port=out

        float buff[100];
        float sampen[1];
        float D[100][100];
        int N = len;
        int m = 2;        float r = 20;
        memcpy(buff, (const float*) in, len * sizeof(float));

        for(int i = 0; i < len; i ++){
        	for(int j = 0; j < len; j ++){
        		if(abs(buff[i] - buff[j]) <= r){
        			D[i][j] = 1;
        		}
        	}
        }
        float count1[1] = {0};
        for(int i = 0; i < len - m + 1; i ++){
        	for(int j = 0; j < len - m + 1; j ++){
        		count1[0] = count1[0] + (D[i][j] and D[i+1][j+1]);
        	}
        }
        count1[0] = count1[0] - len + m - 1;
   B:
        float B[1] = {0};
        B[0] = (float)count1[0]/((len-m+1)*(len-m));

        float count2[1] = {0};
        for(int i = 0; i < len - m ; i ++){
        	for(int j = 0; j < len - m ; j ++){
        		count2[0] = count2[0] + (D[i][j] and D[i+1][j+1] and D[i+2][j+2]);
        	}
        }
        count2[0] = count2[0] - len + m;
        float A[1] = {0};
        A[0] = (float)count2[0]/((len-m)*(len-m-1));
        memcpy(out, (const float*) A, 1 * sizeof(float));
}

pynq board is test in jupyter:

sqrt_ip.write(0x20, length)
sqrt_ip.write(0x10, inpt.physical_address)
sqrt_ip.write(0x18, outpt.physical_address)
sqrt_ip.write(0x00, 1)

Is the algorithm too long, so when we read data from out port, the computation is still on. But how could we wait for a moment when we send data in port ‘in’ by write 0x00 1?

marioruiz · October 26, 2021, 11:20am

Can you share the full notebook?

Are you waiting for the IP to indicate that the computation is done?
This is bit 1 of register 0, asserted when the kernel has completed operation. Cleared on read.

AllenChenChao · October 30, 2021, 1:27pm

AllenChenChao · October 30, 2021, 1:32pm

Hello. Here is the notebook.I change the code from sqrt to another algorithm and find that if the algorithm is too long(I tried more than 30 times, and think this is the reason), the result would only be right in the first time after I update the bit and be wrong then.

AllenChenChao · October 30, 2021, 1:33pm

Here is some part of notebook Python code:

length = 9 # length = 40
inpt = Xlnk().cma_array(shape=(length,), dtype=np.float32)
outpt = Xlnk().cma_array(shape=(length,), dtype=np.float32)
#a = [i*i for i in range(length)]
a = [112, 60, 100, 95, 85, 70, 80, 110, 70]
np.copyto(inpt, a)
soft_op = np.sqrt(inpt)
sqrt_ip.write(0x20, length)
sqrt_ip.write(0x10, inpt.physical_address)
sqrt_ip.write(0x18, outpt.physical_address)
sqrt_ip.write(0x00, 1)

print("Hardware Output", "Software Output \n")
for i in range(length):
    print(outpt[i], "\t\t  ", soft_op[i])

anoir_nechi · November 1, 2021, 4:13pm

I think the HLS code is incompatible with the new versions of Vitis HLS. I tried the same code and it worked when used Vivado 2018.2. Today I tried it with Vitis 2020.2 I got the same results as you got all zeros. This is so bad

cathalmccabe · November 2, 2021, 2:25pm

The last line looks incorrect:
memcpy(out, (const float*) A, 1* sizeof(float));

This is why you only get the first result.

It looks like you should be using len* sizeof(float)

I don’t think your HLS code is equivalent to the np.sqrt() function.

Cathal

AllenChenChao · November 3, 2021, 8:44am

Dear

I change the code of sqrt. the len in our algorithm is 1,that is why we use 1 to replace len in sqrt example. It would’t be wrong, because I have try more 30 projects by add parts.of oir algorithm to find the fault place. They all work at begin until the.fault.happen.

Best regards,
Chao

—Original—

cathalmccabe · November 3, 2021, 9:50am

What is your problem?
From what I can see, you send an array of 9 values. You get one result back and the rest zeros. This is expected as your HLS code only writes 1 value back.

You compare your algorithm vs np sqrt() on 9 values. You get a different result for the first output because your algorithm is not implementing the equivalent sqrt function in HLS. The rest are zeros as expected.

Have I missed something?

Cathal

AllenChenChao · November 3, 2021, 11:24am

Dear
As we mentioned at begin. The result is right in simulation. But when we use it on board with bit file and jupyter notebook. The result has some strange fault. The hardware computation result is right only at first input. Then it will be wrong at second input.
As I has verify the correctness of sqrt from document tutorial. I changed my algorithm short. It is right. So I am not sure if the fault comes from too long algoritms or the specific computation I use in the design.
I want to change the interface further to replace default interface of sqrt example. Let it give date in and with a time delay to get date out in python IDE jupyter notebook.
The specific algorithm I use is the calculation of B or A in C code in HLS. As I have try more than 10 hours that day. I am not sure.the.result.of.it. because I may be careless then.
Best regards
Chao

—Original—

cathalmccabe · November 4, 2021, 7:10pm

You only get 1 result and the rest zeros because of this line:
memcpy(out, (const float*) A, 1* sizeof(float));
You only memcpy one result.

Can you post a copy of your block diagram/your full code?

Cathal

Topic		Replies	Views
PYNQ custom overlay using m_axi Interface not working Support	3	1036	May 14, 2021
It is so strange that pynq can't transfer value correct with a axilite IP Support	10	497	February 24, 2021
Problem with HLS Video Stream with AXI Master on Pynq-Z2 Support	4	1588	April 16, 2020
The ANN model in pynq perform bad compared with HLS simulation Support	4	504	May 22, 2021
Execute hls example with pointers from pynq Support	3	603	July 26, 2021

The result is different with pynq and HLS simulation

Related topics