I suggest the following:
1- Make sure that TLAST is synthesized (either from HLS report or from RTL code or from system integrator by expanding INPUT_r and OUTPUT_r ports.)
2- Try AXILite interface and use control flags from your PS.
3- Take a look at the tutorials mentioned in this post (because DMA is very sensitive in PYNQ, either you configure it exactly as mentioned or it won’t work):