How to map a CNN onto PYNQ-Z2?

Hello, I am trying to map a CNN model onto PYNQ-Z2 board, with the weights already trained off the board.

The papers I mainly referred to are “Optimizing FPGA-based Accelerator Design for Deep
Convolutional Neural Networks” (FPGA '15) and “Maximizing CNN Accelerator Efficiency
Through Resource Partitioning” (ISCA '17), but I found it impossible for me to reproduce their work by just reading their papers.

I want to adopt loop tiling, loop unrolling techniques and systolic array to implement this accelerator, but I failed to do so. It is very frustrated working alone, could anyone help me with that?