PS->DMA->BRAM data transfer?

Hi,
I need to around 200 kB of data from PS->PL’s BRAMs
I initially explored the mmio tutorial shared by @cathalmccabe. (thank you for the very well explained video! )
And while it was working well, I found the transfer speed to slower than what I need.

It takes roughly 24 ms to transfer 200 kB of data:

I need to reduce that speed to 1 ms.

and so I began exploring DMA to transfer data from PS to PL. Here is snip of what I implemented:

In the above diagram, I am using the DMA to transfer data into blk_mem_gen_0 (through port A)
and verifying whether the data is transferred into it by reading from port B with mmio read.

But even though I initiate dma send and recv (without errors), I observe that data is not written into blk_mem_gen_0
neither is it written into blk_mem_gen_1, I confirm this by reading from blk_mem_gen_1 as well.

Can someone please help me understand what is going wrong?
how can I transfer data to BRAM using DMA?
or is there a faster way to transfer data into BRAM (without DMA?)?

Any help would be appreciated.

Regards,
AI

2 Likes

You don’t want to do what you are trying in the block diagram you shared. The DMA is intended to read from PS DRAM and stream data to a peripheral (AXI stream).
What are you trying to do with the data?

  • what is your IP doing and what interfaces does it have?
  • what language have you implemented it in - VHDL/Verilog, HLS?

Cathal

1 Like

Hi @cathalmccabe,
Thank you for replying.

I intend to receive data via ethernet(with the PS) and from there stream the data into BRAM (True Dual Port), and I have an IP which will read from the BRAM’s 2nd port and process it.

I intend to give my IP an interface to access the BRAM and it will be in Verilog.
Right now I just want to transfer data from PS to PL BRAM [ ~200 kB of data in 1ms]

You didn’t say how you are designing your IP.

If you are using HLS, you can add an AXI master interface that can access the PS memory directly. This is probably better than trying to copy to BRAM. No separate DMA required.

See this tutorial:

Cathal

1 Like

Hi @cathalmccabe, Thank you for the suggestion.
I tried the AXI Master interface with HLS IP tutorial and I can see the speed of data transfer is much faster.
I am also able to connect the HLS to a BRAM successfully.

May I ask for one more help? I am new to HLS and I don’t know how to pass an array of data as an argument to the function.

This is the original code in the example:


void example(volatile int *a, int length, int value){
#pragma HLS INTERFACE m_axi port=a depth=50 offset=slave
#pragma HLS INTERFACE s_axilite port=length
#pragma HLS INTERFACE s_axilite port=value
#pragma HLS INTERFACE s_axilite port=return

	int i;
  int buff[1024];
  
  //memcpy creates a burst access to memory
  //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
  //memcpy requires a local buffer to store the results of the memory transaction
  memcpy(buff,(const int*)a,length*sizeof(int));
  
  for(i=0; i < length; i++){
    buff[i] = buff[i] + value;
  }
  
  memcpy((int *)a,buff,length*sizeof(int));
}

and I want to pass an array (say int value[1024] ) instead. can you please suggest how I can proceed?

This is what I tried:

void example(volatile int *a, int length, int value[1024]){
#pragma HLS INTERFACE m_axi port=a depth=50 offset=slave
#pragma HLS INTERFACE s_axilite port=length
#pragma HLS INTERFACE s_axilite port=value[1024]
#pragma HLS INTERFACE s_axilite port=return

	int i;
  int buff[1024];
  
  //memcpy creates a burst access to memory
  //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
  //memcpy requires a local buffer to store the results of the memory transaction
  memcpy(buff,(const int*)a,length*sizeof(int));
  
  for(i=0; i < length; i++){
    buff[i] = buff[i] + value[i];
  }
  
  memcpy((int *)a,buff,length*sizeof(int));
}

but this is not working as I intended. it is generating int value[1024] as a port of the HLS IP, but I want to assign data to the array within the PS and then have the HLS IP stream content of the array to BRAM

1 Like