Image not passed between PS and PL

I am working on a project where the implementation of sobel filter is required. I have created a custom ip using Vitis HLS for the same. After moving my design into my PYNQ board (v2.7 image), where I am using a DMA for memory access, I find some issue between the transfer of image. I am not able to figure out what the issue is for the image not to be transferred properly. I believe the image is being send but it is not being received back properly. Initially the dma_recv used to get stuck at wait() so I changed from passing a color image to gray image directly then I can see that the output_buffer is not changing.

I simulated my C++ code and it worked as required. Here for now I have given within the code the required rows and cols value which I will be changing back into inputs to the ip.

This is my HLS code:

void find_SI(hls::stream<ap_axiu<64,1,1,1>>& inp_stream, hls::stream<ap_axiu<64,1,1,1>>& output_stream){

	#pragma HLS INTERFACE axis port=inp_stream
	#pragma HLS INTERFACE axis port=output_stream
	#pragma HLS INTERFACE s_axilite port=return

	int rows = 480;
	int cols = 640;
	float sigma = 0.5f;

	//kernel x and y initialization
	ap_int<4> kernel_x[3][3] = {{-1, 0 ,1} , {-2, 0, 2}, {-1 , 0, 1}};
	ap_int<4> kernel_y[3][3] = {{-1, -2 ,-1} , {0, 0, 0}, {1 , 2, 1}};

	//defining all the required Mat
	xf::cv::Mat<XF_8UC3, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> in_mat(rows,cols);
 	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> blur_mat(rows,cols);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> gray_mat(rows,cols);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> copy_mat(rows+2, cols+2);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> square_mat(rows,cols);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> temp_mat(rows,cols);

	xf::cv::rgb2gray<XF_8UC3, XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1>(in_mat, gray_mat);
	xf::cv::GaussianBlur<3, XF_BORDER_CONSTANT, XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1>(gray_mat, blur_mat, sigma);
xf::cv::xfMat2AXIvideo(square_mat, output_stream);

Python Code :

from pynq import Overlay
from pynq import allocate
import numpy as np
import cv2 as cv
ol = Overlay('./edge_detection.bit')
dma = ol.axi_dma_0
dma_send = ol.axi_dma_0.sendchannel
dma_recv = ol.axi_dma_0.sendchannel
hls_ip = ol.sobel_filter_0
hls_ip.write(CONTROL_REGISTER, 0x81)
img = cv.imread('./sobel.png')
height, width= img.shape
input_buffer = allocate(shape=(height, width, 3), dtype=np.uint8)
output_buffer = allocate(shape=(height, width), dtype=np.uint8)
input_buffer[:] = np.array(img)


del input_buffer, output_buffer

Can someone please guide me through the issue here? I have hereby attached the required details.

Kind Regards,