Deep learning algorithm with Pynq-Z1

Hi, for academic purposes, I need to work with any simple deep learning algorithm and compare the throughput between the CPU (for instance, the code written in Python/Jupyter) and the CPU+FPGA deployment. For FPGA, I should accelerate part of the code using Vivado HLS.

I am completely new in machine learning and I know that there is plenty of information which I am already reading, but I think that experienced user could guide me in the right direction, with suitable examples and information, in short, how can I start to develop this task as quick as possible.

I am using Pynq-Z1 board and a Windows 7 laptop. Specific information would be really appreciated.