PYNQ: PYTHON PRODUCTIVITY

Output array doesn't show result in PYNQ

I’m new to PYNQ. I’m developing a simple project using a Zynq Ultrascale+ (code: xczu3eg-sbva484-1-e).
I wrote a simple function that “emulates” a decision tree and classifies the user, it’s a simple code I decided to write in order to gain confidence with the tool (I’m moving towards Machine Learning on embedded systems).

I wrote my top level function in Vivado HLS which is the following:

// NO SYSTEM CALLS! NO STRING, NO STDIO ECC...!

typedef struct data{   // Declare PERSON struct type
    int age;
    int sex;// 1 male- 0 female
    //float weight;
}input_data;

// Classes: young_male, old_male, young_female, old_female
// If i pass only the pointer, the code doesn't actually know the real size of the array.
// In standard programming, it's no problem, since as soon as I manage it properly, I dont have any segfault
// In Vivado, since i need to build real "circuits" and connection, it needs to know the size of the incoming array
// So i have to pass my array specifying also the size

void simple_tree(input_data *input, int result[4]){ //input is a pointer to a data struct
#pragma HLS INTERFACE s_axilite port=input bundle=CTRL_BUS
#pragma HLS INTERFACE s_axilite port=result bundle=CTRL_BUS
#pragma HLS INTERFACE s_axilite port=return bundle=CTRL_BUS


	if(input->age > 50)
	{
		if(input->sex==1)
		{
			result[0]=1; //old male;
		}
		else{
			result[1]=1; //old_female;
		}
	}
	else{
		if(input->sex==1){
			result[2]=1; //young_male;
		}
		else{
			result[3]=1; //young_female;
		}
	}

}

It basically takes the input and then “classify” my user as commented in the code. I run the simulation, co-simulation with my test bench and the behavior is correct.

This is the hw.h file:

// ==============================================================
// File generated on Wed Sep 16 18:06:47 CEST 2020
// Vivado(TM) HLS - High-Level Synthesis from C, C++ and SystemC v2018.3 (64-bit)
// SW Build 2405991 on Thu Dec  6 23:36:41 MST 2018
// IP Build 2404404 on Fri Dec  7 01:43:56 MST 2018
// Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.
// ==============================================================
// CTRL_BUS
// 0x00 : Control signals
//        bit 0  - ap_start (Read/Write/SC)
//        bit 1  - ap_done (Read/COR)
//        bit 2  - ap_idle (Read)
//        bit 3  - ap_ready (Read)
//        bit 7  - auto_restart (Read/Write)
//        others - reserved
// 0x04 : Global Interrupt Enable Register
//        bit 0  - Global Interrupt Enable (Read/Write)
//        others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
//        bit 0  - Channel 0 (ap_done)
//        others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
//        bit 0  - Channel 0 (ap_done)
//        others - reserved
// 0x10 : Data signal of input_age
//        bit 31~0 - input_age[31:0] (Read/Write)
// 0x14 : reserved
// 0x18 : Data signal of input_sex
//        bit 31~0 - input_sex[31:0] (Read/Write)
// 0x1c : reserved
// 0x20 ~
// 0x2f : Memory 'result' (4 * 32b)
//        Word n : bit [31:0] - result[n]
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)

#define XSIMPLE_TREE_CTRL_BUS_ADDR_AP_CTRL        0x00
#define XSIMPLE_TREE_CTRL_BUS_ADDR_GIE            0x04
#define XSIMPLE_TREE_CTRL_BUS_ADDR_IER            0x08
#define XSIMPLE_TREE_CTRL_BUS_ADDR_ISR            0x0c
#define XSIMPLE_TREE_CTRL_BUS_ADDR_INPUT_AGE_DATA 0x10
#define XSIMPLE_TREE_CTRL_BUS_BITS_INPUT_AGE_DATA 32
#define XSIMPLE_TREE_CTRL_BUS_ADDR_INPUT_SEX_DATA 0x18
#define XSIMPLE_TREE_CTRL_BUS_BITS_INPUT_SEX_DATA 32
#define XSIMPLE_TREE_CTRL_BUS_ADDR_RESULT_BASE    0x20
#define XSIMPLE_TREE_CTRL_BUS_ADDR_RESULT_HIGH    0x2f
#define XSIMPLE_TREE_CTRL_BUS_WIDTH_RESULT        32
#define XSIMPLE_TREE_CTRL_BUS_DEPTH_RESULT        4

I imported my IP inside Vivado and generated my block diagram with my Ultrascale+ PS:

I imported my .bit files inside PYNQ and created my overlay.

from pynq import Overlay

overlay = Overlay('/home/xilinx/pynq/overlays/simple_tree/simple_tree.bit')
simple_tree_ip = overlay.simple_tree_0 

# 0x10 -> address of input age
# 0x18 -> address of input sex
# 0x20 -> result 

simple_tree_ip.write(0x10, 67)
age=simple_tree_ip.read(0x10)

simple_tree_ip.write(0x18, 1)
sex= simple_tree_ip.read(0x18)

result= simple_tree_ip.read(0x20)

The problem is I can’t read the result. It doesn’t matter which offset I read (0x20 to 0x2C), I always get zero.

I expected (as i did in my C code) to write the values of age and sex as input, and then my IP will compute the “prediction” returning the result array which can have the following 4 values: 1000,0100,0010,0001 but unfortunately I can’t get the result I expected.

What am I doing wrong? Sorry for the basic question, but I’m a newbie.

@mattiasu96,

If you look at the register map, offset 0x0 is the control register.
The way you wrote the C code, you need to start the IP core each time you want to compute the prediction.

After you write age and sex and before reading the result you need to start the IP.

simple_tree_ip.write(0x00, 0x1) # Start simple_tree

You can also enable autorestart setting bit 7 of control signal register while starting the IP

simple_tree_ip.write(0x00, 0x81) # Start simple_tree and enable auto start

Mario

Thank you so much! I didn’t know about that.
Is there a way to avoid to manually start the IP? Your “The way you wrote your C code” makes me think there’s a smarter way to build my simple IP.

I have should have said, by default the result of HLS includes the control signal register.

Having said that, you can include a pragma to remove the signal register. This is call a free running kernel, it always be computing.

#pragma HLS INTERFACE ap_ctrl_none port=return

You example is simple and the computation takes very little time to complete. However, in compute heavy kernels you usually want to know when the computation is done, and in that case the control signal register is useful.

1 Like