Hi I have made the following design: design.pdf (68.6 KB) it is a simple dma flow around a custom IP block. The custom IP block is a test to see how things work and is very simple, it simply bit shifts all words 1 bit to the right. The verilog code is here: shifter_v1_0.zip (856 Bytes).
Now when I do a sendchannel.transfer and a recvchannel.transfer from a jupyter notebook I can see in the ILA that actual data is flowing. But the output buffer in python contains only zeros. I am not sure why and how to debug it. My design makes the TVALID always high as you can always read the internal buffer. I was unsure how TLAST works and just made it always high as well, does this perhaps not work with the dma controller?
I call the DMA with a simple transfer. allocate a buffer and execute sendchannel.transfer and recvchannel.transfer. I tested it first with a fifo block and that worked flawlessly. I then replaced the fifo block with the custom IP.
with other IP blocks you dont need to call the IP itself either, only the DMA transfer. I also have a working FFT pipeline for example.
I checked the logic of your verilog. The assertion of the tlast is very suspicious. Since it is always asserted, the transaction almost immediately stops after start. Since you have ILA connected, I would recommend you run a C program in SDK and debug the AXI interface. The logic does not look correct to me - at least I would do a FSM to make sure those handshake signals are properly generated.
I dont really have experience with the C SDK with pynq. However I now inserted a delay line for the tlast signal. So it is buffered from the input, my idea is that this alligns the transaction from slave to master and is a little simpler then a full FSM. I think this is also compliant to the axis spec that I found here
This results in tlast not being asserted at the beginning, so I would expect to see at least some data. However again everything is zero in the output buffer.
My adjusted verilog (without unchanged module definition):
// internal buffer of one word
reg [TDATA_WIDTH-1 : 0] buffer;
// delay buffer of tlast signal
reg buffer_tlast;
// always be ready to read since our operation is single clock cycle
assign s00_axis_tready = 1'b1;
// slave driver
always @(posedge s00_axis_aclk)
begin
if(!s00_axis_aresetn)
begin
buffer <= {(TDATA_WIDTH){1'b0}};
buffer_tlast <= 1'b0;
end
else
begin
if(s00_axis_tvalid)
begin
buffer <= s00_axis_tdata;
buffer_tlast <= s00_axis_tlast;
end
end
end
// master driver
// data can always be read from the buffer so always valid
assign m00_axis_tvalid = 1'b1;
assign m00_axis_tstrb = {(TDATA_WIDTH/8){1'b1}};
// register of output driver
reg [TDATA_WIDTH-1 : 0] data_out;
// register of output driver of tlast signal
reg tlast_out;
// assign output drivers to output wires
assign m00_axis_tdata = data_out;
assign m00_axis_tlast = tlast_out;
always @(posedge m00_axis_aclk)
begin
if(!m00_axis_aresetn)
begin
data_out <= {(TDATA_WIDTH){1'b0}};
tlast_out <= 1'b0;
end
else
begin
data_out <= buffer >> 1;
tlast_out <= buffer_tlast;
end
end
I’m facing the exact same issue. The reason of the difference between AXI-Stream Data FIFO and custom ip that in FIFO AXI interface, there’s no TLAST port. There’s only TKEEP port and it’s always asserted high. I checked it with ILA. TREADY is asserted HIGH for the length of the output buffer in this case.
That all zero issue is because of the transfer method. Transfer does not correctly program the DMA. S2MM_length register is always set for 1-byte in transfer method. Not equal the output buffer. For a proper solution, I suggest you to get rid of the TLAST and replace it with TKEEP. Or program DMA registers yourself. I’m using PYNQ with a custom board and my version is 2.6. I don’t know about the later versions. Maybe they fixed it.