PYNQ AXI Vitis Core (gmem, allocate)

I am trying to use a Vitis accumulate core via AXI interface. The block design is as shown below.


When trying to get the output via the following call:

c.sync_from_device()

it’s all zeros.

The IP passes CSIM and COSIM , so this looks like a PYNQ allocate issue since a custom matrix multiplication core works correctly using the same procedure.

Any ideas?accum.tcl (53.9 KB) AccumulateAXI.ipynb (13.8 KB)

xilinx_com_hls_accumulate_accel_1_0.zip (324.6 KB)

#!/usr/bin/env python
# coding: utf-8

# # Accumulate IP in AXI mode

# In[2]:

import datetime
from pynq import Overlay
from pynq import DefaultIP
from pynq import DefaultHierarchy
from pynq import allocate
from pynq import MMIO
from pynq.pl import *
import pynq.lib.dma
import numpy as np
import time

XACCUMULATE_ACCEL_CONTROL_ADDR_AP_CTRL        = 0x00
XACCUMULATE_ACCEL_CONTROL_ADDR_GIE            = 0x04
XACCUMULATE_ACCEL_CONTROL_ADDR_IER            = 0x08
XACCUMULATE_ACCEL_CONTROL_ADDR_ISR            = 0x0c
XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN1_V_DATA = 0x10
XACCUMULATE_ACCEL_CONTROL_BITS_IMG_IN1_V_DATA = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN2_V_DATA = 0x18
XACCUMULATE_ACCEL_CONTROL_BITS_IMG_IN2_V_DATA = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_OUT_V_DATA = 0x20
XACCUMULATE_ACCEL_CONTROL_BITS_IMG_OUT_V_DATA = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_HEIGHT_DATA    = 0x28
XACCUMULATE_ACCEL_CONTROL_BITS_HEIGHT_DATA    = 32
XACCUMULATE_ACCEL_CONTROL_ADDR_WIDTH_DATA     = 0x30
XACCUMULATE_ACCEL_CONTROL_BITS_WIDTH_DATA     = 32


# In[16]:


#------------------------Address Info-------------------
# 0x00 : Control signals
#        bit 0  - ap_start (Read/Write/COH)
#        bit 1  - ap_done (Read/COR)
#        bit 2  - ap_idle (Read)
#        bit 3  - ap_ready (Read)
#        bit 7  - auto_restart (Read/Write)
#       others - reserved
# 0x04 : Global Interrupt Enable Register
#        bit 0  - Global Interrupt Enable (Read/Write)
#        others - reserved
# 0x08 : IP Interrupt Enable Register (Read/Write)
#        bit 0  - Channel 0 (ap_done)
#        bit 1  - Channel 1 (ap_ready)
#        others - reserved
# 0x0c : IP Interrupt Status Register (Read/TOW)
#        bit 0  - Channel 0 (ap_done)
#       bit 1  - Channel 1 (ap_ready)
#        others - reserved
# 0x10 : Data signal of img_in1_V
#        bit 31~0 - img_in1_V[31:0] (Read/Write)
# 0x18 : Data signal of img_in2_V
#        bit 31~0 - img_in2_V[31:0] (Read/Write)
# 0x1c : reserved
# 0x20 : Data signal of img_out_V
#        bit 31~0 - img_out_V[31:0] (Read/Write)
# 0x24 : reserved
# 0x28 : Data signal of height
#        bit 31~0 - height[31:0] (Read/Write)
# 0x2c : reserved
# 0x30 : Data signal of width
#        bit 31~0 - width[31:0] (Read/Write)
# 0x34 : reserved
# (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)


# In[17]:


ol = Overlay("accum.bit")


# In[18]:


get_ipython().run_line_magic('pinfo', 'ol')


# In[19]:


ip = ol.accumulate_accel_0


# In[20]:


DIM = 128

a = allocate(shape=((DIM, DIM)), dtype=np.uint8, cacheable=True)
b = allocate(shape=((DIM, DIM)), dtype=np.uint8, cacheable=True)
c = allocate(shape=((DIM, DIM)), dtype=np.uint16, cacheable=True)

a[:] = np.ones((DIM,DIM)).astype('int') * 11
b[:] = np.ones((DIM,DIM)).astype('int') * 23
c[:] = np.zeros((DIM,DIM)).astype('int')

ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_HEIGHT_DATA, DIM) # dst rows
ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_WIDTH_DATA, DIM)  # dst cols

ip.write(0x00, 4)
fpga_state = ip.read(0x00)

print(fpga_state)

a_p_ptr = a.physical_address
b_p_ptr = b.physical_address
c_p_ptr = c.physical_address

ip.write(0x00, 4)

if fpga_state == 4:
    ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN1_V_DATA, a_p_ptr)
    ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_IN2_V_DATA, b_p_ptr)
    ip.write(XACCUMULATE_ACCEL_CONTROL_ADDR_IMG_OUT_V_DATA, c_p_ptr)
else:
    print("Can't write values, must be in IDLE state")
    raise KeyboardInterrupt



#get_ipython().run_cell_magic('timeit', '', '\nip.write(0x00, 0x81)\nfpga_state = ip.read(0x00)\n\nmax_try = 100\nwhile fpga_state != 6 and fpga_state != 4:\n    fpga_state = ip.read(0x00)\n    max_try = max_try -1\n    if max_try == 0:\n        print("ERROR: Can\'t go ahead")\n        ip.write(0x00, 4)\n        raise KeyboardInterrupt\n        \nip.write(0x00, 4)')

c.sync_from_device()


print(c)

It looks like you don’t start the IP.
You write 0x4 to the control register ip.write(0x00, 4) which tries to write a 1 to bit 3. Bit 3 is the ap_ready bit and is read only. Try writing a 1 to ap_start, and checking for ap_done.

# 0x00 : Control signals
#        bit 0  - ap_start (Read/Write/COH)
#        bit 1  - ap_done (Read/COR)
#        bit 2  - ap_idle (Read)
#        bit 3  - ap_ready (Read)
#        bit 7  - auto_restart (Read/Write)

Cathal

@cathalmccabe

That is done on the last cells as one can’t write to the core if it’s enabled.
Even if you comment this line:

ip.write(0x00, 4)

The same happens . Output is all zeros. This is the cell where the core gets activated.
Issue is C is all zeros.

%%timeit

ip.write(0x00, 0x81)
fpga_state = ip.read(0x00)

max_try = 100
while fpga_state != 6 and fpga_state != 4:
    fpga_state = ip.read(0x00)
    max_try = max_try -1
    if max_try == 0:
        print("ERROR: Can't go ahead")
        ip.write(0x00, 4)
        raise KeyboardInterrupt
        
ip.write(0x00, 4) 

c.sync_from_device()

print(c)

Is AXI gmem supported on PYNQ devcies?

I built a couple of other IPs from Vitis Libraries and they also stall on the receive side.
Using custom AXI IP works however.

The “GEM” ports in your design are just AXI Master ports. You connect them as you have in the block diagram in your first post. They will have access to PS DRAM in this config.

In the loop, what is the value of fpga_state/control register?

I’m not sure what you are trying to do here. I think you should check the values you expect from the status register.
If you write 0x81, I think bit 7 (auto restart) should stay set, so you won’t see 0x6 or 0x4

Cathal

I’m trying to read the output from the AXI master.
I expect that when you issue:

print(c)

It will print the sum of the two matrices. However all I get are 0’s.
I don’t see what I am missing.
After you start the IP and assign the input matrix addresses I would assume that once it’s started it will run and output C.

ip.write(0x00, 0x81)
fpga_state = ip.read(0x00)
## comented out
#max_try = 100
#while fpga_state != 6 and fpga_state != 4:
#    fpga_state = ip.read(0x00)
 #   max_try = max_try -1
  #  if max_try == 0:
      #  print("ERROR: Can't go ahead")
     #   ip.write(0x00, 4)
   #     raise KeyboardInterrupt
 
print(fpga_state )       
ip.write(0x00, 0x81) 
c.sync_from_device()

Hi, were you able to resolve this issue?