My synthesized HLS IP BRAM utilization exceeds PYNQ-Z1 available resources

Hello
I have created an IP that uses more BRAM resources than available on the board by almost 400%. What should I do? Please refer to the synthesis report that I got.
image
please have a look into this code snippet:

void dual_gap_hw(double *A_in, double *AT_in, double *x, double *y, double lmbda, double pobj, double dobj, double gap)
{
#pragma HLS INTERFACE m_axi depth=100 port=A_in offset=slave bundle=A_port
#pragma HLS INTERFACE m_axi depth=100 port=AT_in offset=slave bundle=AT_port
#pragma HLS INTERFACE m_axi depth=100 port=x offset=slave bundle=x_port
#pragma HLS INTERFACE m_axi depth=100 port=y offset=slave bundle=y_port

#pragma HLS INTERFACE s_axilite port=lambda bundle=lambda_port
#pragma HLS INTERFACE s_axilite port=pobj bundle=pobj_port
#pragma HLS INTERFACE s_axilite port=dobj bundle=dobj_port
#pragma HLS INTERFACE s_axilite port=gap bundle=gap_port
#pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS

	double A_buff[M][N];
	double AT_buff[N][M];
	double x_buff[N];
	double y_buff[M];

	memcpy(A_buff, (const double*) A_in, M * N * sizeof(double));
	memcpy(AT_buff, (const double*) AT_in, M * N * sizeof(double));
	memcpy(x_buff, (const double*) x, N * sizeof(double));
	memcpy(y_buff, (const double*) y, M * sizeof(double));

	duality_gap(A_buff, AT_buff, x_buff, y_buff, lmbda, &pobj, &dobj, &gap);

	memcpy(y, (const double*) y_buff, M * sizeof(double));
}

Is this going to be critical later? Should I ignore it for the moment?

Thank you

1 Like

400% BRAM utilization means you are using 4x more memory than the device you are using has.

Get a bigger device with 4x BRAM may not be practical…

The memcpy is copying data to on-chip BRAM (using BRAM like cache). Do you need to copy all your data to the chip? For example if you can process 1/4 of your data at a time in your function, you can loop over your function 4 times and copy 1/4 of the data at a time.

Do you need all data on-chip?
If you remove the mempy()'s, depending on how you built your duality_gap() function, the data will be accessed directly from the (external?) memory. This may be lower performance, but should remove the on-chip memory usage.

This all depends on your data access and reuse patterns in your duality_gap() function.

You might want to look at line buffers.

Cathal

1 Like