Hello,
I’m having some issues with meeting timing when synthesizing my design.
The goal: Process two 32-bit vectors in parallel and write the results to DDR.
Design background:
-
Each IP, two total, contains a CDMA and DMA reading/writing 32-bit data.
-
PL clock set at 250 Mhz, CPU at 450 Mhz, DDR at 250 Mhz.
-
The interconnects have the crossbar-size set to 64-bit width.
-
HP’s are set to 64 bit width.
Previously, I working with a single IP at 32-bits, timing was met at the above mentioned speeds. I’m attempting to process more data by running two IPs each on upper and lower sections of the original data - Split in PS. My hope is to double or greatly improve throughput.
Based on this design, are there any flaws, fundamental issues I’m overlooking?
Thanks
