The result is different with pynq and HLS simulation

Hello
I use a blog from pynq which teach us how to use axi-master do sqrt. The blog has no problem by my verification. After that, I moified the project and add more complex computation, but I find the result is wrong in pynq board(right in HLS simulation).
I am not sure where is problem. So I try to add my algorithm step by step. After 20 projects, I find if I add more computation after somewhere of the algorithm, The result in pynqboard will be wrong(still right in HLS simulation). I don’t know how to deal with it. When I add more code in notation ‘B:’, the board’s result will be wrong!

#include "sampen.hpp"
#include <string.h>
#include <math.h>
void axi4_sampen(float *in, float *out, int len)
{
#pragma HLS INTERFACE s_axilite port=return bundle=sqrt
#pragma HLS INTERFACE s_axilite port=len bundle=sqrt
#pragma HLS INTERFACE m_axi depth=50 port=out offset=slave bundle=output
#pragma HLS INTERFACE m_axi depth=50 port=in offset=slave bundle=input
#pragma HLS INTERFACE s_axilite port=in
#pragma HLS INTERFACE s_axilite port=out

        float buff[100];
        float sampen[1];
        float D[100][100];
        int N = len;
        int m = 2;        float r = 20;
        memcpy(buff, (const float*) in, len * sizeof(float));

        for(int i = 0; i < len; i ++){
        	for(int j = 0; j < len; j ++){
        		if(abs(buff[i] - buff[j]) <= r){
        			D[i][j] = 1;
        		}
        	}
        }
        float count1[1] = {0};
        for(int i = 0; i < len - m + 1; i ++){
        	for(int j = 0; j < len - m + 1; j ++){
        		count1[0] = count1[0] + (D[i][j] and D[i+1][j+1]);
        	}
        }
        count1[0] = count1[0] - len + m - 1;
   B:
        float B[1] = {0};
        B[0] = (float)count1[0]/((len-m+1)*(len-m));

        float count2[1] = {0};
        for(int i = 0; i < len - m ; i ++){
        	for(int j = 0; j < len - m ; j ++){
        		count2[0] = count2[0] + (D[i][j] and D[i+1][j+1] and D[i+2][j+2]);
        	}
        }
        count2[0] = count2[0] - len + m;
        float A[1] = {0};
        A[0] = (float)count2[0]/((len-m)*(len-m-1));
        memcpy(out, (const float*) A, 1 * sizeof(float));
}

pynq board is test in jupyter:

sqrt_ip.write(0x20, length)
sqrt_ip.write(0x10, inpt.physical_address)
sqrt_ip.write(0x18, outpt.physical_address)
sqrt_ip.write(0x00, 1)

Is the algorithm too long, so when we read data from out port, the computation is still on. But how could we wait for a moment when we send data in port ‘in’ by write 0x00 1?

1 Like

Can you share the full notebook?

Are you waiting for the IP to indicate that the computation is done?
This is bit 1 of register 0, asserted when the kernel has completed operation. Cleared on read.

Hello. Here is the notebook.I change the code from sqrt to another algorithm and find that if the algorithm is too long(I tried more than 30 times, and think this is the reason), the result would only be right in the first time after I update the bit and be wrong then.

Here is some part of notebook Python code:

length = 9 # length = 40
inpt = Xlnk().cma_array(shape=(length,), dtype=np.float32)
outpt = Xlnk().cma_array(shape=(length,), dtype=np.float32)
#a = [i*i for i in range(length)]
a = [112, 60, 100, 95, 85, 70, 80, 110, 70]
np.copyto(inpt, a)
soft_op = np.sqrt(inpt)
sqrt_ip.write(0x20, length)
sqrt_ip.write(0x10, inpt.physical_address)
sqrt_ip.write(0x18, outpt.physical_address)
sqrt_ip.write(0x00, 1)

print("Hardware Output", "Software Output \n")
for i in range(length):
    print(outpt[i], "\t\t  ", soft_op[i])
1 Like

I think the HLS code is incompatible with the new versions of Vitis HLS. I tried the same code and it worked when used Vivado 2018.2. Today I tried it with Vitis 2020.2 I got the same results as you got all zeros. This is so bad

The last line looks incorrect:
memcpy(out, (const float*) A, 1* sizeof(float));

This is why you only get the first result.

It looks like you should be using len* sizeof(float)

I don’t think your HLS code is equivalent to the np.sqrt() function.

Cathal

1 Like

Dear

I change the code of sqrt. the len in our algorithm is 1,that is why we use 1 to replace len in sqrt example. It would’t be wrong, because I have try more 30 projects by add parts.of oir algorithm to find the fault place. They all work at begin until the.fault.happen.

Best regards,
Chao

—Original—

1 Like

What is your problem?
From what I can see, you send an array of 9 values. You get one result back and the rest zeros. This is expected as your HLS code only writes 1 value back.

You compare your algorithm vs np sqrt() on 9 values. You get a different result for the first output because your algorithm is not implementing the equivalent sqrt function in HLS. The rest are zeros as expected.

Have I missed something?

Cathal

1 Like

Dear
As we mentioned at begin. The result is right in simulation. But when we use it on board with bit file and jupyter notebook. The result has some strange fault. The hardware computation result is right only at first input. Then it will be wrong at second input.
As I has verify the correctness of sqrt from document tutorial. I changed my algorithm short. It is right. So I am not sure if the fault comes from too long algoritms or the specific computation I use in the design.
I want to change the interface further to replace default interface of sqrt example. Let it give date in and with a time delay to get date out in python IDE jupyter notebook.
The specific algorithm I use is the calculation of B or A in C code in HLS. As I have try more than 10 hours that day. I am not sure.the.result.of.it. because I may be careless then.
Best regards
Chao

—Original—

You only get 1 result and the rest zeros because of this line:
memcpy(out, (const float*) A, 1* sizeof(float));
You only memcpy one result.

Can you post a copy of your block diagram/your full code?

Cathal

1 Like