PYNQ: PYTHON PRODUCTIVITY

HLS works/But not taking result in python

Hello everyone,
I have writen an HLS code where C simulation and Cosimulation works fine. Also my vivado project i think works as i have tested it with other IP and works fine. HLS—> chachaCore.cpp (4.2 KB) my_chacha_hls.cpp (538 Bytes) my_chacha_hls.hpp (935 Bytes) ps2chacha.cpp (507 Bytes)
The small functions QR,LITTLE_INT and MOD_OV are very small and only do mathematic. I had QR declaration like (uint32&a,uint32&b…) in order not to use struct but i am not sure if i can do that in hardware so i changed it.
Vivado -->


Python -->
input_buffer = allocate(shape=(1,), dtype=‘S64’)
output_buffer = allocate(shape=(endoff+1,), dtype=‘S64’)

input_buffer[:] = np.array(buffer)
print(input_buffer)

dma.sendchannel.transfer(input_buffer)
dma.recvchannel.transfer(output_buffer)

result.extend(list(output_buffer.copy()))

print(len(result))
print(result)

del input_buffer,output_buffer

RESULT–>
[ b’00000000\xbc`t\xbc\xf1\xcf\x8e\t\xda\xb8G\x85\xde\x0cy\xab\xb1\xfdw\x94dd\xf7\xe2T\x05\xfb\xcf\xc95l|\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05’]
6
[b’’, b’’, b’’, b’’, b’’, b’’]—NOT ANY INFORMATION AND IF I RERUN IT ERROR DMA channel not idle occur. Also if i put .recvchannel.wait() hanges there forever.

Any idea please???

@Kwstis,

What is the data width of the HP interface?
What is the address width and stream data width of the DMA?

Maybe you can insert an System ILA and check if the input stream to the HLS module matches what you are sending from python.

Mario

I feel a bit silly as i found that i was sending wrong bytes from python. Now it works but my problem is that i can run it only once. If I want to rerun it i have to close the ipynb and redownload the bit file.In case not do that a wrong occur like the dma is not start. Also despite the fact that simulation and cosimulation shows correct results when i run my bit file from python the results are wrong. Maybe i will try to see with ILA in order to understand better what is happening. Thank you very much for taking time to look at it!!!

1 Like

You are probably not sending/received the right amount of data through the DMA.
E.g. if you send 100 samples, and expect to get back 100 samples, but you only get 99, the DMA will still be waiting for the last sample.
If the DMA is not idle, try reading more data from it and see if you can get it back to IDLE, and it might give you a better idea about what the problem may be.

Are you setting TLAST in your HLS design? This will signal the end of a packet.

Cathal

Despite setting Tlast siganl in my HLS for some reason you are right and DMA don’t stop . I tried to change my logic so i send one thing at time and receive one output. Both simulation and co-simulation shows correct output but in python i take something else. Can someone explain how to use ILA with pynq(I have not use it until now) cause i cannot think of another way to debug it?
Thanks very much !

System ILA is independent from pynq. You need to generate your overlay with the ILA, download the overlay to the board from pynq and connect to the ILA from Vivado.

I 'll try to use ILA then thanks. Although i think that my problem is that i concat bytes from python lets say buf = va1+var2+var3 until buf is 64 byte. Inside my HLS code i take this ap_uint<512> and do things like
x1=buf.range(256,0) etc but i don’t think that bit goes there as i send them from python.(meaning if var3 is 32byte then does this will go to x1 in HLS code or the bits are not transferred in this order?

You can try creating an array of 32-bit elements instead

input_buffer = allocate(shape=(16,), dtype='np.int32')

unfortunately same thing occur. I tried ILA but could not make it work. Can i run my software from jupyter either terminal or not and see the results with open hardware manager and wait for trigger or i have too insert something more at my python code?

The ILA will operate independently of your Python code.
Make sure you download the correct bitstream, and don’t overwrite it from Python using the Overlay() class. (or if you are loading it from Python, make sure it is the new one with ILA).
Once the design is running, open HW manager, and setup your trigger.
Then run your Python code. The Python code needs to do something which makes your trigger condition happen.
Once the ILA sees the trigger it will capture.

Cathal

I wiil try again more carefully. I really appreciate your time and thanks for every answer until now. Although I am a bit closer to the answer as something is happening on how the data are read and write(like big/little endian from DMA to my ip).
My mind has really stack. I have this if i put as input to my HLS code x then we will have y. If we put x’ where x’ is x in little-endian we will have y’.
Now from python i send x and receive z where z’ is the same with y’. What don’t I understand?

I found that when i am sending
x = b’\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\xbct\xbc\xf1\xcf\x8e\t\xda\xb8G\x85\xde\x0cy\xab\xb1\xfdw\x94dd\xf7\xe2T\x05\xfb\xcf\xc95l|' and in mh HLS i am doing x = x+1 for 6 times i take back in python this y = b'\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\xbct\xbc\xf1\xcf\x8e\t\xda\xb8G\x85\xde\x0cy\xab\xb1\xfdw\x94dd\xf7\xe2T\x05\xfb\xcf\xc95l|’
.That’s why my initial code does not work. But how can i resolve this???
Also i tried reverse at the beggining and at the end of my hls but i got
b’\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\xbc`t\xbc\xf1\xcf\x8e\t\xda\xb8G\x85\xde\x0cy\xab\xb1\xfdw\x94dd\xf7\xe2T\x05\xfb\xcf\xc95l" ’ which again is not what we want.

Finally i found that my IP in HLS takes what i send in little endian and also send me back in little endian. So if i want to take correct results from python i have to do
‘’’
buffi = initial+endoffset+startoffset+nonce+key
bufnew = (int.from_bytes(buffi,“little”)).to_bytes(64,“big”)

input_buffer = allocate(shape=(1,), dtype=‘S64’)
output_buffer = allocate(shape=(endoff+1,), dtype=‘S64’)
input_buffer[:] = np.array(bufnew)
dma.sendchannel.transfer(input_buffer)
dma.recvchannel.transfer(output_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()

result.extend(list(output_buffer.copy()))
bufnew = list()
for i in result:
bufnew.append((int.from_bytes(i,“little”)).to_bytes(64,“big”))

print(bufnew)
‘’’
Is it due to the ARM cores/due to the AXI4 protocol i don’t know. Also recently i upgrade my version from 2.5 to 2.6 but from terminal. May something changed when i upgraded???

1 Like

Is it due to the ARM cores/due to the AXI4 protocol i don’t know. Also recently i upgrade my version from 2.5 to 2.6 but from terminal. May something changed when i upgraded???

The ARM architecture is little endian. Therefore, DMA moves the data using little endian.