PYNQ Z1 Matrix Multiplication Accelerator Overlay Example

lim200789 · January 24, 2024, 11:42am

Hi there! I’m studying about FPGA acceleration with PYNQ Z1 board.
First I tried to make matrix multiplication accelerator.

Reference Youtube video Title : Matrix Multiplication using Xilinx Vivado and Vitis
I don’t know why, but the link doesn’t attach, so I leave the name of the video.

In this video, he used Vivado HLS 2019.2 and PYNQ Z2 board.

My environment
PYNQ-Z1
Vivado 2018.3

I tried to build an FPGA matrix multiplication accelerator.
I used the following YouTube video as a reference, which uses a VIVADO 2019.2 and a PYNQ Z2 board.

I followed the same steps as the video, but the result of the matrix multiplication was always [18. 18. 18. 18. 27. 27. 27. 27. 27. 27. 27] on the first run,
From the next run, [27. 27. 27. 27. 27. 27. 27. 27. 27. 27. 27. 27. 27] is fixed and output.

Seeing the error in the results, even though I’ve configured the same, I suspect that the difference between the PYNQ Z1 and Z2 boards is causing this problem, but I’d like to get advice from the experts.

What can i do?

import time
from pynq import Overlay
import pynq.lib.dma
from pynq import allocate
import numpy as np
from pynq import MMIO
import random

ol = Overlay('/home/xilinx/pynq/overlays/matmul/matmul.bit') # check your path
ol.download() # it downloads your bit to FPGA

# check IP Blocks --> "pynq.lib.dma.DMA" name --> next line : dma = ol."name"
# ol?

dma = ol.axi_dma_0 # creating a dma instance. Note that we packed smul_dma into streamMul
sadd_ip = MMIO(0x43C00000, 0x1000) # we got this IP from Address Editor

length = 18
length_out = 9

in_buffer = allocate(shape=(length,), dtype=np.float32) # input buffer
out_buffer = allocate(shape=(length_out,), dtype=np.float32) # output buffer

samples = [1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 8, 7, 6, 5, 4, 3, 2, 1]
np.copyto(in_buffer, samples) # copy samples tp inout buffer

sadd_ip.write(0x10, length) # we got this address from Vivado source
t_start = time.time()
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)
dma.sendchannel.wait() # wait for send channel
dma.recvchannel.wait() # wait for recv channel
t_stop = time.time()

in_buffer.close()
out_buffer.close()

print(in_buffer)
print(out_buffer)

I attach the code and bitstream file I used.

(Second run ~ Last run)

matmul.bit (3.9 MB)
matmul.hwh (301.6 KB)

Topic		Replies	Views
PYNQ3.0.1 Overlay output always 0 Support	6	263	March 22, 2024
To transfer a double precision matrix by PYNQ Support	1	255	May 19, 2023
Example Design Methodology Const Multiply DMA not working Support	2	713	July 28, 2020
PYNQ 3.0.1 Performance Degraded? Support	17	1222	November 3, 2022
Pynq Allocate Speed Support	3	706	November 28, 2022

PYNQ Z1 Matrix Multiplication Accelerator Overlay Example

Related topics