Best practices for accelerating Python image processing on PYNQ?

Hi everyone — I’m experimenting with image processing workflows on a PYNQ board and trying to figure out whether it’s more efficient to offload parts of the pipeline into programmable logic or handle it in Python/PL together. I’m particularly interested in real-time tasks like streaming or object detection, and I want to see what others have done for similar workloads. For context, I came across this ESP32-CAM object counting project that shows how an ESP32-CAM can stream images over Wi-Fi and feed them into a Python/OpenCV system for real-time processing: https://www.theengineeringprojects.com/2025/03/object-counting-project-using-esp32-cam-and-opencv.html I’ve also seen some Raspberry Pi image-processing threads and Arduino forum projects where people push camera frames to local servers or PC apps for analysis. On PYNQ, should I be using custom overlays and DMA to feed frames directly into hardware accelerators, or is leveraging Python with PL only for pre/post processing usually sufficient? Any workflow tips or examples would be super helpful!