Have to "wait" for a while before the correct result to appear

Input: A DNA string, length of string
Process: Count A, C, G, T
Implemented: DNA array as M_axi port; A,C,G,T as axilite (register)
My board is PYNQ-Z2

I program the COUNT accelerator in PYNQ (shown below), the problem is for a long string length (maybe 100K and up), there should be a “wait” time (see 2 below) for the result to come up. If I remove the “wait”, it will give a random value. Is there a solution or something of a wait function (something like a DMA wait) that can be inserted in (see 1 below) so that we know that the process has ended?

Any help will be appreciated. Thanks!
---------------driver------------

from pynq import DefaultIP
class CountDriver(DefaultIP):
def init(self, description):
super().init(description)

bindto = ['xilinx.com:hls:cdna_axi:1.0']

def count(self, data):
    with xlnk.cma_array(shape=(len(data),), dtype=np.int8) as in_buffer:
        np.copyto(in_buffer,data)
        self.write(0x30,in_buffer.physical_address)
        self.write(0x38,n)
        self.write(0x00,0x01) # start     <------------- (1)       
        in_buffer.freebuffer()
        xlnk.xlnk_reset()
    return

--------------- program to use the driver--------

from pynq import Overlay
from pynq import Xlnk
xlnk = Xlnk()

Load the overlay

overlay = Overlay(‘/home/xilinx/pynq/overlays/countdna_axi/countdna_axi4.bit’)
#count_dna = overlay.cdna_axi

Run the hardware solution using driver

overlay.cdna_axi.count(dna)
time.sleep(0.1) <-------------------------------------- ***temporary solution
print(“Frequency of A = {:,}”.format(overlay.cdna_axi.read(0x10)))
print(“Frequency of C = {:,}”.format(overlay.cdna_axi.read(0x18)))
print(“Frequency of G = {:,}”.format(overlay.cdna_axi.read(0x20)))
print(“Frequency of T = {:,}”.format(overlay.cdna_axi.read(0x28)))
total_length = overlay.cdna_axi.read(0x10)+ overlay.cdna_axi.read(0x18)+overlay.cdna_axi.read(0x20)

--------HLS code ---------------

#include <string.h>
/*top function */
void cdna_axi(count_t *a, count_t *c, count_t *g, count_t t,volatile char DNA, count_t N) {

#pragma HLS INTERFACE m_axi port=DNA offset=slave depth=500000 bundle=gmemin
#pragma HLS INTERFACE s_axilite port=DNA bundle=cdna
#pragma HLS INTERFACE s_axilite port=N bundle=cdna
#pragma HLS INTERFACE s_axilite port=a bundle=cdna
#pragma HLS INTERFACE s_axilite port=c bundle=cdna
#pragma HLS INTERFACE s_axilite port=g bundle=cdna
#pragma HLS INTERFACE s_axilite port=t bundle=cdna
#pragma HLS INTERFACE s_axilite port=return bundle=cdna
char buff[500000];
memcpy (buff, (const char*) DNA, N*sizeof(char)); */
int i;
int t_A=0;
int t_C=0;
int t_G=0;
int t_T=0;
for (i=0; i<N; ++i){
if (buff[i]== 0x41)
t_A++;
else if(buff[i]==0x43)
t_C++;
else if(buff[i]==0x47)
t_G++;
else if(buff[i]==0x54)
t_T++;
}
*a = t_A;
*c = t_C;
*g = t_G;
*t = t_T;
}

The DMA needs some time to send over that large string. Based on your HLS code, it looks like the IP will need all N data to be ready before producing correct results; otherwise the output data might be wrong.

Thank you for that idea. From that suggestion, I inserted a code to check if the ap_idle signal is 1 already (meaning it is idle already which means the process is done). This code is inserted before I get the result.

while self.read(0x00 & 0x4)!= 0x04:
pass

With that, the accelerator is working already. Thank you