DMA data transfer to image processing HLS IP is not working

I have designed a convolution module using Vivado HLS and simulation and co-simulation works fine. However, when I use it as an IP in a system on PYNQ-Z2 and try passing an image from Jupyter notebook, the output is either all zeros or erroneous.

from pynq import Overlay, Xlnk
import numpy as np
from PIL import Image 
import matplotlib.pyplot as plt

ov = Overlay("/home/xilinx/jupyter_notebooks/HLS_Basics/pynq_axis/conv_axis2.bit")
dma = ov.axi_dma_0

xlnk = Xlnk()
img_path = "/home/xilinx/jupyter_notebooks/Images/lena_gray.bmp"
img = Image.open(img_path)

width, height = img.size

print("Image size: {}x{}".format(width, height))

img_inbuff = xlnk.cma_array(shape=(height,width),dtype = np.uint8, cacheable = 1)
img_outbuff = xlnk.cma_array(shape=(500,500),dtype = np.int8, cacheable = 1)

img_inbuff[:] = np.array(img)


print(img_inbuff)
#img = Image.fromarray(img)
_ = plt.imshow(img_inbuff, cmap = 'gray')

dma.sendchannel.transfer(img_inbuff)
dma.recvchannel.transfer(img_outbuff)

outimg = Image.fromarray(img_outbuff)
print("Image size: {}x{} pixels.".format(500, 500))
_ = plt.imshow(outimg)
print(outimg)

And if dma.sendchannel.wait() and dma.recvchannel.wait() is used after transfer, the execution is hanging indefinitely.

Can anyone please tell me where am I making the mistake?

Did you start the HLS IP? An example here: PYNQ-HelloWorld/resizer_pl.ipynb at master · Xilinx/PYNQ-HelloWorld · GitHub

Also, the hanging is often caused by data size mismatch between send or recv buffer. For the height and width you use, make sure they are consistent with what have been specified in the HLS code.