Hey everyone,
I’m following the DMA tutorial, but I’m working with floats. Here is my code:
#include "dma_copy.h"
#include <iostream>
using namespace std;
void copier(STREAMTYPE& a, STREAMTYPE& b) {
#pragma HLS INTERFACE s_axilite port = return bundle = control
#pragma HLS INTERFACE axis port=a
#pragma HLS INTERFACE axis port=b
#pragma HLS PIPELINE II=0
DTYPE input[N];
DTYPE output[N];
while (1) {
for (int i = 0; i < N; ++i) { // READ
axis_t temp = a.read();
input[i] = temp.data;
}
for (int i = 0; i < N; ++i) { // PROCESS
output[i] = input[i] - 5;
}
axis_t temp2;
for (int i = 0; i < N; ++i) { // WRITE
temp2.data = output[i];
temp2.last = i == N-1? true : false;
temp2.keep = -1;
b.write(temp2);
}
}
}
Here are my data structures:
#ifndef _DMA_COPY_H_
#define _DMA_COPY_H_
#include "hls_stream.h"
#include "ap_axi_sdata.h"
#define N 5
typedef float DTYPE;
typedef hls::axis<float, 0, 0, 0> axis_t;
typedef hls::stream<axis_t> STREAMTYPE;
void copier(STREAMTYPE& a, STREAMTYPE& b);
#endif
I used hls::axis<float, 0, 0, 0>
after reading this topic. Here is my python code:
overlay = Overlay("dma_hls.bit")
dma = overlay.axi_dma_0
data_size = 5
input_buffer = allocate(shape=(data_size,), dtype=np.float32)
output_buffer = allocate(shape=(data_size,), dtype=np.float32)
features = [-0.2,2,0.333,4.1234, -0.62689]
for i in range(data_size):
input_buffer[i] = features[i]
dma.sendchannel.transfer(input_buffer)
dma.recvchannel.transfer(output_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()
for i in range(data_size):
print("Output vector", output_buffer[i])
Lastly, here is my block diagram:
When I first executed the python code, I got a RuntimeError: DMA does not support unaligned transfers; Starting address must be aligned to 6 bytes
. I applied this solution.
Now, the python code doesn’t execute any code from ....wait()
onwards. When I remove the lines containing ...wait()
my output vector is still zero. I’m not sure why the DMA doesn’t work. Am I missing any step? Appreciate any help, thank you :))