Image not passed between PS and PL

Hi,
I am working on a project where the implementation of sobel filter is required. I have created a custom ip using Vitis HLS for the same. After moving my design into my PYNQ board (v2.7 image), where I am using a DMA for memory access, I find some issue between the transfer of image. I am not able to figure out what the issue is for the image not to be transferred properly. I believe the image is being send but it is not being received back properly. Initially the dma_recv used to get stuck at wait() so I changed from passing a color image to gray image directly then I can see that the output_buffer is not changing.

I simulated my C++ code and it worked as required. Here for now I have given within the code the required rows and cols value which I will be changing back into inputs to the ip.

This is my HLS code:

void find_SI(hls::stream<ap_axiu<64,1,1,1>>& inp_stream, hls::stream<ap_axiu<64,1,1,1>>& output_stream){

	#pragma HLS INTERFACE axis port=inp_stream
	#pragma HLS INTERFACE axis port=output_stream
	#pragma HLS INTERFACE s_axilite port=return

	int rows = 480;
	int cols = 640;
	float sigma = 0.5f;

	//kernel x and y initialization
	ap_int<4> kernel_x[3][3] = {{-1, 0 ,1} , {-2, 0, 2}, {-1 , 0, 1}};
	ap_int<4> kernel_y[3][3] = {{-1, -2 ,-1} , {0, 0, 0}, {1 , 2, 1}};

	//defining all the required Mat
	xf::cv::Mat<XF_8UC3, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> in_mat(rows,cols);
 	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> blur_mat(rows,cols);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> gray_mat(rows,cols);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> copy_mat(rows+2, cols+2);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> square_mat(rows,cols);
	xf::cv::Mat<XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1> temp_mat(rows,cols);

	#pragma HLS DATAFLOW
	xf::cv::AXIvideo2xfMat(inp_stream,in_mat);
	xf::cv::rgb2gray<XF_8UC3, XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1>(in_mat, gray_mat);
	xf::cv::GaussianBlur<3, XF_BORDER_CONSTANT, XF_8UC1, MAX_HEIGHT, MAX_WIDTH, XF_NPPC1>(gray_mat, blur_mat, sigma);
	xf::cv::Sobel<XF_BORDER_CONSTANT,XF_FILTER_3X3,XF_8UC1,XF_8UC1,MAX_HEIGHT,MAX_WIDTH,XF_NPPC1,false>(blur_mat,square_mat,square_mat);
	
xf::cv::xfMat2AXIvideo(square_mat, output_stream);
	return;
}

Python Code :

from pynq import Overlay
from pynq import allocate
import numpy as np
import cv2 as cv
ol = Overlay('./edge_detection.bit')
dma = ol.axi_dma_0
dma_send = ol.axi_dma_0.sendchannel
dma_recv = ol.axi_dma_0.sendchannel
dma_send.start()
dma_recv.start()
hls_ip = ol.sobel_filter_0
CONTROL_REGISTER = 0x0
hls_ip.write(CONTROL_REGISTER, 0x81)
img = cv.imread('./sobel.png')
height, width= img.shape
input_buffer = allocate(shape=(height, width, 3), dtype=np.uint8)
output_buffer = allocate(shape=(height, width), dtype=np.uint8)
input_buffer[:] = np.array(img)

dma_send.transfer(input_buffer)
dma_recv.transfer(output_buffer)
dma_send.wait()
dma_recv.wait()

print(output_buffer)
del input_buffer, output_buffer

Can someone please guide me through the issue here? I have hereby attached the required details.

Kind Regards,
SK