How to use ap_fixed data type to communicate with the ip made by the vivado hls?

How to use ap_fixed data type to communicate with the ip made by the vivado hls?
pynq package version == 2.5
pynq-z2
thanks for help

Hi @dongzw,
I am sure you noticed I had to close the issue on GitHub due to our internal policy. Glad to see you also asked your question here.

I would ask you to provide a bit more context so we can try to come up with a proper answer for you.
What is the width of your ap_fixed? How do you interact with your HLS core?
Would you be willing to share the HLS code and the python code you are trying to use? We don’t really need the entirety of the code, perhaps the HLS core signature and a bit of pseudocode (or at very least a high level explanation).

In the meantime, I am also tagging @PeterOgden as he’s probably the guy that can help you out here.

# include "ap_fixed.h"
typedef ap_fixed<8, 2> data_t;
#define DIM_1 3
#define DIM_2 4
#define DIM_3 5

void top(data_t A[DIM_1][DIM_2],data_t B[DIM_2][DIM_3],data_t out[DIM_1][DIM_3])
{
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE m_axi depth=1024 port=A offset=slave
#pragma HLS INTERFACE m_axi depth=1024 port=B offset=slave
#pragma HLS INTERFACE m_axi depth=1024 port=out offset=slave
    for (int i = 0; i < DIM_1;  ++i)
    {
        for(int j = 0; j < DIM_3; ++j)
        {
#pragma HLS PIPELINE
            data_t tmp = 0;
            for(int t = 0; t < DIM_2; ++t)
            {
                tmp += A[i][t] * B[t][j];
            }
            out[i][j] = tmp;
        }
    }
}

The code to compute the multiplication of two matrix above is my test on how to use ap_fixed data type. I want to create a ap_fixed<N, M> data, the N and M will be any number in my work. I try to use the code like below to allocate the ram to communicate with the pl

a = pynq.allocate(shape=(50,), dtype='f4')

the dtype or data_type not support the ap_fixed type. I didn’t find the ap_fixed solution in the python package of pynq.
I can’t find the right way to use both float in the pynq linux and ap_fixed<N, M> in the hls code.
Thanks for your help.

since your ap_fixed uses 8 bits, with a bit of hackery it can be worked out (I am not aware of an explicit way of managing this, but I may be wrong). What I mean is that there’s definitely a way to transfer the data to the accelerator, as long as the single element is 8 bits in the dtype it should work ok.
The true “problem” is to use this data properly while in python I guess, by which I mean interpret it as it supposed to be interpreted.
But again, I would wait to hear also @PeterOgden’s opinion.

We don’t have any specific support for ap_fixed within PYNQ but it’s not too difficult to set things up and do the conversion yourself. The native type for the array should be a numpy int or uint of the power of 2 the same or bigger than the total bits of the fixed point – e.g. a 24-bit fixed gets packed into 32-bit ints.

Once that is in place you can create a helper function to convert an array of floats to your integer type by multiplying by the correct shift and assigning to an int array. The extra sign-bits won’t matter in this case

import numpy as np
fixed_ar = np.ndarray((1024,), 'i4')
float_ar = np.arange(-512, 512, dtype='f4')
fixed_ar[:] = float_ar * 256

Converting back is more challenging as the array needs to be sign-extended first. If we mask off the top 8 bits of fixed_ar to simulate a returned list of 24-bit ap_fixeds with 8 fractional bits we can see that dividing by 256 gives the wrong results for negative numbers

return_ar = fixed_ar & 0xFFFFFF
return_ar[0] / 256
> 65024

With some view casting and np.where we can construct a function that will do the down-conversion for us

def convert(a, total_bits, frac_bits): 
    condition = 1 << (total_bits - 1) 
    mask = (~((1 << total_bits) - 1)) & 0xFFFFFFFF 
    return np.where(a < condition, a, (a.view('u4') | mask).view('i4')) / (1 << frac_bits)

convert(return_ar, 24, 8)
> array([-512., -511., -510., ...,  509.,  510.,  511.])

I’m not offering this up as authoritative just an example of a way to do this type of conversion. Hope this gives some ideas.

Quick edit - using some bit-manipulation magic we can get rid of the np.where to improve performance (maybe?)

def convert(a, total_bits, frac_bits): 
    mask1 = 1 << (total_bits - 1) 
    mask2 = mask1 - 1 
    return ((a & mask2) - (a & mask1)) / (1 << frac_bits)

For more details on how this works see https://stackoverflow.com/questions/32030412/twos-complement-sign-extension-python

Peter

2 Likes

Are there any suggestions for writing nonstandard bit widths? For example, a core with an array of ap_uint<12> specified as an AXILite interface that I want to load with numbers between 0-2047 (in a particular pattern) before running data through the core.

I’m pretty sure I could add a bit slice to the AXI stream in vivado and then use np.uint16, but that places extra burdens on FPGA resources. The only other things I can think of are adding a custom numpy data type (yikes, I think) or using cma_alloc directly and manually packing the bits in a for loop in python (slow, messy looking).

My block’s HLS top looks like
void top(myaxis1_t &in1, myaxis1_t &in2, myaxis2_t &out, ap_uint<12> cfg[256][8]){
#pragma HLS INTERFACE axis register reverse port=in1
#pragma HLS INTERFACE axis register reverse port=in2
#pragma HLS INTERFACE axis register forward port=out
#pragma HLS INTERFACE s_axilite port=cfg clock=S_AXI_clk name=S_AXI_resmap
#pragma HLS ARRAY_RESHAPE variable=cfg complete dim=2
#pragma HLS INTERFACE ap_ctrl_none port=return
//Code
}

You could do as you suggest, and i think you see the trade-offs.

I’m pretty sure I could add a bit slice to the AXI stream in vivado and then use np.uint16, but that places extra burdens on FPGA resources.

What do you mean extra burden on FPGA resources?

Cathal

I’d meant that if I were slicing off 4 bits of data to write the ap_uint<12> values with an axi slice then up to that point I’d be allocating wires and registers that were superfluous. That said, I think that concern was born of a misunderstand about how the AXILite bus handled the writes.

While I couldn’t find ANY documentation that this would happen (or explaining it :confused:) the HLS auto-generated header file eventually helped me tease out how the data is being backed. So, since I couldn’t find any examples of how to do this anywhere:

from pynq import DefaultIP
class Example(DefaultIP):
    resmap_addr = 0x1000
    def __init__(self, description):
        """
        0x1fff : Memory 'data_V' (256 * 96b)
        Word 4n   : bit [31:0] - data_V[n][31: 0]
        Word 4n+1 : bit [31:0] - data_V[n][63:32]
        Word 4n+2 : bit [31:0] - data_V[n][95:64]
        Word 4n+3 : bit [31:0] - reserved
        """
        super().__init__(description=description)

    bindto = ['MazinLab:mkidgen3:bin_to_res:0.4']

    @staticmethod
    def _checkgroup(group_ndx):
        if group_ndx<0 or group_ndx>255:
            raise ValueError('group_ndx must be in [0,255]')

    def read_group(self, group_ndx):
        self._checkgroup(group_ndx)
        g=0
        vals=[self.read(self.resmap_addr+16*group_ndx+4*i) for i in range(3)]
        for i,v in enumerate(vals):
            #print(format(v,'032b'))
            g|=v<<(32*i)
        #print('H-'+format(g,'096b')+'-L')
        return [((g>>(12*j))&0xfff) for j in range(8)]

    def write_group(self, group_ndx, group):
        self._checkgroup(group_ndx)
        if len(group)!=8:
            raise ValueError('len(group)!=8')
        bits=0
        for i,g in enumerate(group):
            bits|=(int(g)&0xfff)<<(12*i)
        #print('H-'+format(bits,'096b')+'-L')
        data = bits.to_bytes(12, 'little', signed=False)
        #print([format(d,'08b') for d in data])
        #print(len(data))
        self.write(self.resmap_addr+16*group_ndx, data)

    def bin(self, res):
        """ The mapping for resonator i is 12 bits and will require reading 1 or 2 32 bit word
        n=i//8 j=(i%8)*12//32 
        """
        return self.read_group(res//8)[res%8]

    @property
    def bins(self):
        return [v for g in range(256) for v in self.read_group(g)]

    @bins.setter
    def bins(self, bins):
        if len(bins)!=2048:
            raise ValueError('len(bins)!=2048')
        if min(bins)<0 or max(bins)>4095:
            raise ValueError('Bin values must be in [0,4095]')
        for i in range(256):
            self.write_group(i, bins[i*8:i*8+8])

Comments are welcome if there is a better way!

I found this very useful. Could you please elaborate a bit how the conversion in the code works? I tried it with 32bits size and it works perfectly, however with different datatypes (as an example an input array of np.uint16 dtype) it does not seem to work.

I think I have to mess around with the third argument of the return, but I am not entirely sure about how to fix my problem