Using multiple accelerators simultaneously(multi-thread)

Oscar_Lin · May 31, 2023, 10:26am

Hi,
I am a beginner in FPGA.
Currently, I am practicing using FPGA to perform complex computations for the purpose of acceleration.

I have implemented an accelerator for cross-correlation (FFT) using HLS IP.
I have also planned four accelerators on the FPGA using Vivado.

In my application, I use these accelerators simultaneously through multi-thread.
However, I have noticed that the more threads I use, the slower the accelerators run.
Is this expected? (Since the accelerators are independent, their performance should not be affected.)

Additionally, I have observed that different IP objects have the same memory address for their control signals.
Is this normal?

Thank you.

[PYNQ ver.] 3.0.1
[Board] KV260
[Vivado ver.] 2021.1
[HLS ver.] 2021.1

[Reproduce]

unzip project.zip to PYNQ
sudo su
source /etc/profile.d/pynq_venv.sh
enter project folder
python ./multi-thread.test.py

[File] project.zip
project.zip (1.8 MB)

hls code:
-/project/hls/
vivado block design:
-/project/vivado_block_design.pdf
python code:
-/project/multi-thread_test.py
bitstream:
-/project/bitstream/correlation/

Oscar_Lin · June 1, 2023, 1:22am

Vivado block design

Oscar_Lin · June 1, 2023, 2:10am

Driver

Oscar_Lin · June 1, 2023, 2:13am

Runtime comparison

Oscar_Lin · June 1, 2023, 2:14am

Control signal issue

Oscar_Lin · June 1, 2023, 2:15am

marioruiz · June 1, 2023, 10:49am

Hi @Oscar_Lin,

Welcome to the PYNQ community.

You are probably running into memory contention. The bandwidth to memory is limited, so when you run multiple IP at the same time they may end up trying to access memory at the same time, hence slowing down the run time.

Mario

Oscar_Lin · June 6, 2023, 2:26am

Hi, @marioruiz

Thank you for your response.

May I ask if there is any method to add a separate memory for each IP to avoid memory contention?
Or are there any documents or tutorials that I can refer to?

Thanks
Oscar

marioruiz · June 6, 2023, 3:06pm

Hi @Oscar_Lin,

I am not sure which board you’re using. But, typically all Zynq MPSoC boards only have one memory module.

Mario

briansune · June 6, 2023, 3:30pm

@marioruiz

If this is the case, I guess @Oscar_Lin should use a better design hierarchy.
Accelerator should have its own memory pool aka PL DDR memory rather than sharing DDR memory from PS.
When result is finalized interface between accelerator can be shared memory.

The structure of shared memory via PS is a low cost but sub-optimized design.

Same as how Harvard vs Von Neumann.
Both have its pros and cons.

When multi-thread does this mean it is also lead to split channel memory physical support?
I guess answer is NO.
Most cases are score board architecture etc.

So AXI MIG on each accelerator and split CPU execution on split MIG is the most efficiency and faster design. While mutex and data fork join could be another story unless there are better way to handle.

ENJOY~

Topic		Replies	Views
Could I run 2 IPs of the same overlay in parallel with PYNQ? Support	5	754	May 12, 2022
PYNQ DDR+DMA for Arrays (not streams!) Support	2	870	December 1, 2021
Questions about running OpenCL accelelator Support	1	551	November 25, 2020
PYNQ Z1 Matrix Multiplication Accelerator Overlay Example Support	0	523	January 24, 2024
Bottleneck Support	2	309	February 2, 2023

Using multiple accelerators simultaneously(multi-thread)

Related topics