PYNQ AXI Vitis Core (gmem, allocate)

dimiter · December 1, 2020, 12:14am

I am trying to use a Vitis accumulate core via AXI interface. The block design is as shown below.

When trying to get the output via the following call:

c.sync_from_device()

it’s all zeros.

The IP passes CSIM and COSIM , so this looks like a PYNQ allocate issue since a custom matrix multiplication core works correctly using the same procedure.

Any ideas?accum.tcl (53.9 KB) AccumulateAXI.ipynb (13.8 KB)

xilinx_com_hls_accumulate_accel_1_0.zip (324.6 KB)

#!/usr/bin/env python
# coding: utf-8

# # Accumulate IP in AXI mode

# In[2]:

import datetime
from pynq import Overlay
from pynq import DefaultIP
from pynq import DefaultHierarchy
from pynq import allocate
from pynq import MMIO
from pynq.pl import *
import pynq.lib.dma
import numpy as np
import time

XACCUMULATE_ACCEL_CONTROL_ADDR_AP_CTRL        = 0x00
XACCUMULATE_ACCEL_CONTROL_ADDR_GIE            = 0x04
XACCUMULATE_ACCEL_CONTROL_ADDR_IER            = 0x08
XACCUMULATE_ACCEL_CONTROL_ADDR_ISR            = 0x0c
XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN1_V_DATA = 0x10
XACCUMULATE_ACCEL_CONTROL_BITS_IMG_IN1_V_DATA = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN2_V_DATA = 0x18
XACCUMULATE_ACCEL_CONTROL_BITS_IMG_IN2_V_DATA = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_OUT_V_DATA = 0x20
XACCUMULATE_ACCEL_CONTROL_BITS_IMG_OUT_V_DATA = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_HEIGHT_DATA    = 0x28
XACCUMULATE_ACCEL_CONTROL_BITS_HEIGHT_DATA    = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_WIDTH_DATA     = 0x30
XACCUMULATE_ACCEL_CONTROL_BITS_WIDTH_DATA     = 32


# In[16]:


#------------------------Address Info-------------------
# 0x00 : Control signals
#        bit 0  - ap_start (Read/Write/COH)
#        bit 1  - ap_done (Read/COR)
#        bit 2  - ap_idle (Read)
#        bit 3  - ap_ready (Read)
#        bit 7  - auto_restart (Read/Write)
#       others - reserved
# 0x04 : Global Interrupt Enable Register
#        bit 0  - Global Interrupt Enable (Read/Write)
#        others - reserved
# 0x08 : IP Interrupt Enable Register (Read/Write)
#        bit 0  - Channel 0 (ap_done)
#        bit 1  - Channel 1 (ap_ready)
#        others - reserved
# 0x0c : IP Interrupt Status Register (Read/TOW)
#        bit 0  - Channel 0 (ap_done)
#       bit 1  - Channel 1 (ap_ready)
#        others - reserved
# 0x10 : Data signal of img_in1_V
#        bit 31~0 - img_in1_V[31:0] (Read/Write)
# 0x18 : Data signal of img_in2_V
#        bit 31~0 - img_in2_V[31:0] (Read/Write)
# 0x1c : reserved
# 0x20 : Data signal of img_out_V
#        bit 31~0 - img_out_V[31:0] (Read/Write)
# 0x24 : reserved
# 0x28 : Data signal of height
#        bit 31~0 - height[31:0] (Read/Write)
# 0x2c : reserved
# 0x30 : Data signal of width
#        bit 31~0 - width[31:0] (Read/Write)
# 0x34 : reserved
# (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)


# In[17]:


ol = Overlay("accum.bit")


# In[18]:


get_ipython().run_line_magic('pinfo', 'ol')


# In[19]:


ip = ol.accumulate_accel_0


# In[20]:


DIM = 128

a = allocate(shape=((DIM, DIM)), dtype=np.uint8, cacheable=True)
b = allocate(shape=((DIM, DIM)), dtype=np.uint8, cacheable=True)
c = allocate(shape=((DIM, DIM)), dtype=np.uint16, cacheable=True)

a[:] = np.ones((DIM,DIM)).astype('int') * 11
b[:] = np.ones((DIM,DIM)).astype('int') * 23
c[:] = np.zeros((DIM,DIM)).astype('int')

ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_HEIGHT_DATA, DIM) # dst rows
ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_WIDTH_DATA, DIM)  # dst cols

ip.write(0x00, 4)
fpga_state = ip.read(0x00)

print(fpga_state)

a_p_ptr = a.physical_address
b_p_ptr = b.physical_address
c_p_ptr = c.physical_address

ip.write(0x00, 4)

if fpga_state == 4:
    ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN1_V_DATA, a_p_ptr)
    ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN2_V_DATA, b_p_ptr)
    ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_OUT_V_DATA, c_p_ptr)
else:
    print("Can't write values, must be in IDLE state")
    raise KeyboardInterrupt



#get_ipython().run_cell_magic('timeit', '', '\nip.write(0x00, 0x81)\nfpga_state = ip.read(0x00)\n\nmax_try = 100\nwhile fpga_state != 6 and fpga_state != 4:\n    fpga_state = ip.read(0x00)\n    max_try = max_try -1\n    if max_try == 0:\n        print("ERROR: Can\'t go ahead")\n        ip.write(0x00, 4)\n        raise KeyboardInterrupt\n        \nip.write(0x00, 4)')

c.sync_from_device()


print(c)

cathalmccabe · December 1, 2020, 8:44am

It looks like you don’t start the IP.
You write 0x4 to the control register ip.write(0x00, 4) which tries to write a 1 to bit 3. Bit 3 is the ap_ready bit and is read only. Try writing a 1 to ap_start, and checking for ap_done.

# 0x00 : Control signals
#        bit 0  - ap_start (Read/Write/COH)
#        bit 1  - ap_done (Read/COR)
#        bit 2  - ap_idle (Read)
#        bit 3  - ap_ready (Read)
#        bit 7  - auto_restart (Read/Write)

Cathal

dimiter · December 1, 2020, 12:05pm

@cathalmccabe

That is done on the last cells as one can’t write to the core if it’s enabled.
Even if you comment this line:

ip.write(0x00, 4)

The same happens . Output is all zeros. This is the cell where the core gets activated.
Issue is C is all zeros.

%%timeit

ip.write(0x00, 0x81)
fpga_state = ip.read(0x00)

max_try = 100
while fpga_state != 6 and fpga_state != 4:
    fpga_state = ip.read(0x00)
    max_try = max_try -1
    if max_try == 0:
        print("ERROR: Can't go ahead")
        ip.write(0x00, 4)
        raise KeyboardInterrupt
        
ip.write(0x00, 4)

c.sync_from_device()

print(c)

dimiter · December 14, 2020, 7:08pm

Is AXI gmem supported on PYNQ devcies?

I built a couple of other IPs from Vitis Libraries and they also stall on the receive side.
Using custom AXI IP works however.

cathalmccabe · December 14, 2020, 8:04pm

The “GEM” ports in your design are just AXI Master ports. You connect them as you have in the block diagram in your first post. They will have access to PS DRAM in this config.

In the loop, what is the value of fpga_state/control register?

I’m not sure what you are trying to do here. I think you should check the values you expect from the status register.
If you write 0x81, I think bit 7 (auto restart) should stay set, so you won’t see 0x6 or 0x4

Cathal

dimiter · December 14, 2020, 8:36pm

I’m trying to read the output from the AXI master.
I expect that when you issue:

print(c)

It will print the sum of the two matrices. However all I get are 0’s.
I don’t see what I am missing.
After you start the IP and assign the input matrix addresses I would assume that once it’s started it will run and output C.

ip.write(0x00, 0x81)
fpga_state = ip.read(0x00)
## comented out
#max_try = 100
#while fpga_state != 6 and fpga_state != 4:
#    fpga_state = ip.read(0x00)
 #   max_try = max_try -1
  #  if max_try == 0:
      #  print("ERROR: Can't go ahead")
     #   ip.write(0x00, 4)
   #     raise KeyboardInterrupt
 
print(fpga_state )       
ip.write(0x00, 0x81) 
c.sync_from_device()

arshaan256 · May 14, 2021, 2:58am

Hi, were you able to resolve this issue?

Topic		Replies	Views
Vitis HLS wrong ap_axiu tdata size Support	4	1525	June 29, 2021
Using m_axi with PYNQ-Z1 Support	1	1242	May 10, 2021
Zynq AXI HP Interface Support	1	413	December 9, 2021
Which vivado IP block is required to transfer data to and from a kernel? Support	3	702	December 8, 2020
Library Incompatibility: PYNQ Composable Pipeline on KV260 Support	1	409	February 17, 2023

PYNQ AXI Vitis Core (gmem, allocate)

Related Topics