PYNQ: PYTHON PRODUCTIVITY

PYNQ + data width converter

I am testing a simple design, which is the following:

I am testing it in PYNQ with the following simple code:

from pynq import Overlay
from pynq import allocate
import matplotlib.pyplot as plt
import numpy as np
import pynq.lib.dma
from pynq import allocate

overlay = Overlay('/home/xilinx/pynq/overlays/multiplier/axis_multiplier.bit')
dma = overlay.axi_dma_0

in_buffer = allocate(shape=(20,), dtype=np.uint32)
out_buffer = allocate(shape=(20,), dtype=np.uint32)

for i in range(20):
    in_buffer[i] = i

dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()

out_buffer

The code works fine, depending on the “type” of my IP.

To make this point clearer, this is the working version of my IP:

#include "ap_axi_sdata.h"
#include "hls_stream.h"

typedef struct {
	ap_int<24> first_value;
	ap_int<24> second_value;
} my_data_struct;

typedef hls::axis<my_data_struct,0,0,0> pkt_t;

void mult_stream(
		hls::stream< pkt_t > &din,
		hls::stream< pkt_t > &dout ) {
	#pragma HLS INTERFACE ap_ctrl_none port=return
	#pragma HLS INTERFACE axis port=din
	#pragma HLS INTERFACE axis port=dout

	pkt_t pkt;
	//pkt_t pkt_out;

	din.read(pkt);
	pkt.data.first_value *= 2;
	pkt.data.second_value *=3;

	//pkt_out.data.first_value = pkt.data.first_value*2;
	//pkt_out.data.second_value = pkt.data.first_value*2;

	dout.write(pkt_out);
}

Where I am just multiplying the two values of the struct and then sending them to the output stream. This code works perfectly fine.

However, if I just change slightly the code and I assign the result of the multiplication to a new (yet identical) variable, my code in PYNQ just idles and waits infinitely:

#include "ap_axi_sdata.h"
#include "hls_stream.h"

typedef struct {
	ap_int<24> first_value;
	ap_int<24> second_value;
} my_data_struct;

typedef hls::axis<my_data_struct,0,0,0> pkt_t;

void mult_stream(
		hls::stream< pkt_t > &din,
		hls::stream< pkt_t > &dout ) {
	#pragma HLS INTERFACE ap_ctrl_none port=return
	#pragma HLS INTERFACE axis port=din
	#pragma HLS INTERFACE axis port=dout

	pkt_t pkt;
	pkt_t pkt_out;

	din.read(pkt);
	//pkt.data.first_value *= 2;
	//pkt.data.second_value *=3;

    //Same identical code, i am just assigning the result to a new variable
	pkt_out.data.first_value = pkt.data.first_value*2;
	pkt_out.data.second_value = pkt.data.first_value*2;

	dout.write(pkt_out);
}

Does anybody know why? Both versions of the IP work fine inside vitis 2020.2, correct C smulation, correct synthesis and correct Co-simulation

Hi @mattiasu96,

You are not setting up the last signal in the second version. The DMA uses it to finish the transaction, and that may be the reason why the python code never ends.

Mario

I changed the IP code with the following code:

void mult_stream(
		hls::stream< pkt_t > &din,
		hls::stream< pkt_t > &dout ) {
	#pragma HLS INTERFACE ap_ctrl_none port=return
	#pragma HLS INTERFACE axis port=din
	#pragma HLS INTERFACE axis port=dout

	pkt_t pkt;
	pkt_t pkt_out;

	din.read(pkt);

	pkt_out.data.first_value = pkt.data.first_value*2;
	pkt_out.data.second_value = pkt.data.first_value*2;
	pkt_out.last = pkt.last;

	dout.write(pkt_out);
}

However the PYNQ code still freezes on the DMA waiting. Am I doing something wrong?

Is keep being used in the data mover? If so, you should also set all bits to one, e.g. pkt_out.keep = -1
Or the same value as the input.

I set also that signal, but my PYNQ code still freezes. Here is the updated code:

void mult_stream(
		hls::stream< pkt_t > &din,
		hls::stream< pkt_t > &dout ) {
	#pragma HLS INTERFACE ap_ctrl_none port=return
	#pragma HLS INTERFACE axis port=din
	#pragma HLS INTERFACE axis port=dout

	pkt_t pkt;
	pkt_t pkt_out;

	//int test_variable;
	din.read(pkt);
	//pkt.data.first_value *= 2;
	//test_variable = pkt.data.first_value.to_int();
	//pkt.data.second_value *=3;
	//test_variable = pkt.data.second_value.to_int();

	pkt_out.data.first_value = pkt.data.first_value*2;
	pkt_out.data.second_value = pkt.data.first_value*2;
	pkt_out.last = pkt.last;
	pkt_out.keep = pkt.keep;

	//test_variable = pkt_out.data.first_value.to_int();
	//test_variable = pkt_out.data.second_value.to_int();

	dout.write(pkt_out);
}

However, the first IP code i provided, works fine, even if I am not setting neither the tlast nor the tkeep, I can’t understand why

In your first example the side-band signals of last/keep/strb are being stored when pkt is read so being written out again. In the second case the you’re creating a new structure from scratch so the side-band signals are undefined. Hopefully adding pkt_out.strb = pkt.strb will get it working.

Note that these things won’t be picked up by HLS simulation unless you’re explicitly checking for them in the test-bench.

Peter

I managed to solve the problem, and it seems it’s board related. After trying almost anything (also adding the control signals as you suggested) my board still froze and/or returned all 0 in output, no matter what.

With my university supervisor we decided to test another board and in the end the same identical code and design work perfectly fine, also without setting the strb signal.

The code with last and keep works fine on the new board, probably because it is not needed in my simulation (this might be a wrong assumption, I am not an expert, I am just learning).

Long story short: the design is correct, the code is correct (the updated one with last and keep). For completeness i also set the strb signal, but the problem was actually the board/PYNQ which are somewhat corrupted.

1 Like