PYNQ Numpy ARM7l SIMD

briansune · May 31, 2022, 7:25am

This is a very interesting question that I would ask or enquiry.
For CNN tensorflow lite inference is also problematic as memory overflow or other issue (didn’t investigate).

So rebuild CNN basic function blocks are simple and much controllable.
Meantime Numpy from this link do mention about SIMD support:
https://numpy.org/devdocs/reference/simd/index.html
Do Xilinx ARM7l Numpy run SIMD as well?

So when considering Fully-Connected layers:
Turns out even Numpy can even do faster job than FPGA

FPGA @ 100MHz Runtime # 0.061389923095703125
ARM Runtime # 0.031568288803100586

If FPGA goes to 200MHz ~= 0.061389923095703125/2 = 0.0305
Still this is far good compared to ARM SIMD.

So any suggestions or good explanation that can support such behavior.

HLS might not fully optimized
DSP inherent structure are better on pipelined action as parallel addition is the bottleneck

→ I might be wrong

Topic		Replies	Views
Tensorflow and keras on pynq z2 Support	5	2151	March 20, 2024
Tensorflow Lite or Pytorch (YOLO) Support	0	368	April 3, 2023
ResNet50 Dataflow Inference with PYNQ on Alveo Announcements	1	1931	May 20, 2022
CNN on PYNQ-Z1 Board Support	1	2062	February 4, 2020
Tensor flow installation Support	1	425	April 18, 2023

PYNQ Numpy ARM7l SIMD

Related topics