PYNQ-Torch: a framework to develop PyTorch accelerators on the PYNQ platform

Dear Manohar,
I ran into an unexpected problem: I am having trouble regenerating the bitstream for the Regression example using the provided Vivado Files, where the build is erroring out during “place_design”.
I am trying to regenerate a bitstream from a minimally modified copy of the project.
The only changes I made were to upgrade IP blocks from Vivado 2018.2 to 2018.3, and to manually add the Vivado_HLS_Files folder as an IP repository in the IP catalog.
The errors mostly say “This design requires more LUT as Logic cells than are available in the target device”.
I attached a screenshot of the error messages. Have you seen these before when generating a bitstream from this Vivado project on a new PC?

Hi Gheorghe,

Sorry for the late response, got caught up with work. So if you tried unrolling or pipelining a lot, you may run out of resources on the FPGA. That is the reason why I used pipeline in some places and unrolling elsewhere.

So line 55-59 are taken from the source code of pytorch. Basically, I modelled how they were doing operations in the backend. I assuming it is the vanishing gradients.

Dear Manoharvhr,

Can I ask you to point me to the pytorch source code from which lines 55-59 are modeled after?

I could not find it in these pages, containing the source code for the backward() function.

https://pytorch.org/docs/stable/_modules/torch/autograd.html#backward

https://pytorch.org/docs/stable/_modules/torch/tensor.html#Tensor.backward

I ask this because I ran into some unexpected behavior when modifying your backward_lite() driver.

I modified your main.cpp function for backward_lite to calculate gradients for a single 2x7 linear layer instead of a 1x5 linear layer. I attached this modified main.cpp function below.

main.cpp (4.8 KB)

Following your “Regression” example, I exported this driver to VHDL using Vivado HLS, and integrated this IP core in the “Regression” Vivado project, replacing the original backward_lite driver.

I built a bitstream from this project, loaded it onto my board, and replaced loss.backward() in my training function with code to send data to and from the backward_lite driver. This modified pytorch project, and the relevant bitstream files for PYNQ-Z1 are attached in a zip file below.

Naive_MLP_collision_PYNQ.zip (503.6 KB)

I kept an unmodified copy of the original PyTorch project, and also ran it on my PYNQ-Z1 board to double check the results from the backward_lite driver. I have attached this file below as well.

Naive_MLP_collision.zip (3.6 KB)

Here are some differences I have observed between my original pytorch project and my accelerated pytorch project:

  1. In the original project, the maximum value for the calculated gradient seems to start out at around 10-20. However, in the PYNQ-accelerated project, the maximum value for the calculated gradient seems to hardly ever exceed 5-6.

  2. In the original project, the maximum value for the calculated gradient appears to monotonically decrease with each training epoch. However, in the PYNQ-accelerated project, the maximum value for the calculated gradient seems to grow and shrink randomly.

  3. Both projects appear to have a monotonically decreasing training and testing loss. However, the original project tends to reduce training and testing loss more rapidly.

  4. This is not a difference, but it slightly surprised me: The PYNQ-accelerated project has almost the same execution time as the original project.

I suspected 1 and 2 are caused by fixed-point data type overflow, but I doubt this because I used a (32,16) fixed point representation, which supports a large range of values up to 2^15. It is also possible that I did not write the backward_lite driver correctly for this single 2x7 linear layer.

Do you know what may be causing these differences between my original pytorch project and my PYNQ-accelerated pytorch project, and how I may debug and resolve them?

Best,

  • Gheorghe Schreiber

Hi Gheorghe,

Sorry for the late reply! How have you gotten along with your project? Do you still need me to answer the above or have you figured it out?

Best,
Manohar

Hello @manoharvhr , thank you for starting this project.

We would like to run PyTorch on PYNQ with a Zynq-7000 and ultimately a Zynq UltraScale+ MPSoC, and I had 2 questions:

  1. Is this project still actively maintained?
  2. Will the method you offer generalize to running PyTorch on other boards besides the Z1? So far we have PYNQ up and running on an Avnet MicroZed Zynq-7000, built with these instructions.