PYNQ: PYTHON PRODUCTIVITY

DMA array transfer order

hdl.zip (10.1 KB)

Hi, i’m having trouble understanding data type representations and their transfers. I have followed this tutorial to create a custom IP with streaming interfaces. I did not use the files in Verilog, i followed the tutorial with VHDL with this Overlay as a result:

Parameters:

  • Zynq:
    S AXI HP0 DATA WIDTH: 64 – (With 32 is even worse, i get 400, 400, 600, 600 and then 0’s. Any idea why?)
  • DMA:
    Everything to 32 bits. Address width, memory map data width, stream data width. After all the tutorial uses 32 bit signals.

The problem i’m getting is testing the slave streaming interface from Pynq. I’m getting the values on a different order.

as you can see, i get 400, 500, 600 … and the last 4 values are 0. My guess is that by the time 700 arrives, TLAST is 1 and the Streaming state machine goes to IDLE. But the order of the streaming is correct, it is not 400, 500, 600, 700, 0, 100, 200, 300 as one can see debugging

The order is correct, and the TLAST is up when the whole array is done.

On the other hand, when i run the test for the master streaming interface:

it works smoothly, starting at 150 and streaming the next 8 words to the buffer.

Interestingly, here the streaming order is as in the previous case:

First the last 4 integers, then the first 4 integers. Even after TLAST was risen.

Any idea why this is so?
Any info or article on the subject will be appreciated.

What is your design in the first example, and how are you reading data back from your IP?
It looks like you have the DMA connected to the output of your IP (receive channel), but that you are not doing a DMA back from the IP.
For the ILA image you posted, you are looking at the data from the DMA to your IP which seems OK.
I would suggest you check the output from your IP as it may give you more info about the issue.

Cathal

Hi cathal, if by ‘design’ you mean Vivado block diagram, (1) it is the same IP for both scenarios. The tutorial guides one through the creation of an IP with 3 modules: an AXI Lite and two Streaming Master+Slave.
(2) The way i’m reading back the data is through the AXI Slave Lite Module.

From the tutorial, regarding the AXI Stream Slave Module,

The default module instantiation takes in data from a stream and stores it to an 8-deep register. We’ll simply add a path to read back the data streamed to it.

which i’ve written in VHDL as

		-- Users to add ports here
        slaveStreamReadRegister : in std_logic_vector(2 downto 0);
        slaveStreamReadValue : out std_logic_vector(C_S_AXIS_TDATA_WIDTH-1 downto 0); 
		-- User ports ends

...

	 signal stream_data_fifo : BYTE_FIFO_TYPE;
	 begin   
	  -- Streaming input data is stored in FIFO
	  process(S_AXIS_ACLK)
	  begin
	    if (rising_edge (S_AXIS_ACLK)) then
	      if (fifo_wren = '1') then
	        stream_data_fifo(write_pointer) <= S_AXIS_TDATA((byte_index*8+7) downto (byte_index*8));
	      end if;  
	    end  if;  
	  end process;
      -- ADDED THIS LINE.
	  slaveStreamReadValue((byte_index*8+7) downto (byte_index*8)) <= stream_data_fifo(to_integer(unsigned(slaveStreamReadRegister)));
	end generate FIFO_GEN;

Now on the AXI Slave Lite module i assign this slaveStreamReadValue value to the third register as

-- Implement memory mapped register select and write logic generation
...
	process (S_AXI_ACLK)
	variable loc_addr :std_logic_vector(OPT_MEM_ADDR_BITS downto 0); 
	begin
	  if rising_edge(S_AXI_ACLK) then 
	    if S_AXI_ARESETN = '0' then
	      slv_reg0 <= (others => '0');
	      slv_reg1 <= (others => '0');
	      slv_reg2 <= (others => '0');
	      slv_reg3 <= (others => '0');
	    else
	      slv_reg3 <= slaveStreamReadValue; -- THIS IS THE MODIFIED LINE. ALL OTHER REGISTERS WERE ERASED.
	      slv_reg7 <= x"decade90";
	      loc_addr := axi_awaddr(ADDR_LSB + OPT_MEM_ADDR_BITS downto ADDR_LSB);
	      if (slv_reg_wren = '1') then
	        case loc_addr is
	          when b"000" =>
	            for byte_index in 0 to (C_S_AXI_DATA_WIDTH/8-1) loop
	              if ( S_AXI_WSTRB(byte_index) = '1' ) then
	                -- Respective byte enables are asserted as per write strobes                   
	                -- slave registor 0
	                slv_reg0(byte_index*8+7 downto byte_index*8) <= S_AXI_WDATA(byte_index*8+7 downto byte_index*8);
	              end if;
	            end loop;
	          when b"001" =>
	            for byte_index in 0 to (C_S_AXI_DATA_WIDTH/8-1) loop
	              if ( S_AXI_WSTRB(byte_index) = '1' ) then
	                -- Respective byte enables are asserted as per write strobes                   
	                -- slave registor 1
	                slv_reg1(byte_index*8+7 downto byte_index*8) <= S_AXI_WDATA(byte_index*8+7 downto byte_index*8);
	              end if;
	            end loop;
	          when b"010" =>
	            for byte_index in 0 to (C_S_AXI_DATA_WIDTH/8-1) loop
	              if ( S_AXI_WSTRB(byte_index) = '1' ) then
	                -- Respective byte enables are asserted as per write strobes                   
	                -- slave registor 2
	                slv_reg2(byte_index*8+7 downto byte_index*8) <= S_AXI_WDATA(byte_index*8+7 downto byte_index*8);
	              end if;
	            end loop;
	          when others =>
	            slv_reg0 <= slv_reg0;
	            slv_reg1 <= slv_reg1;
	            slv_reg2 <= slv_reg2;
	        end case;
	      end if;
	    end if;
	  end if;                   
	end process; 

So this is why in the first case, for the Slave Streaming example, i don’t do a DMA back from the IP but rather read from the register 0x0C.
Does all of this makes sense to you?
I’m new to hardware design so i’m not confident of my code.

(3) With regards to checking the output of my IP, i thought that tracking port S00_AXI coming from the IP to the AXI interconnect i would be able to track operations such as

for i in range(8):
    my_ip.write(0x08, i)
    actual = my_ip.read(0x0C)
.... 

but the bus is ‘Inactive’ the whole execution (see the first figure). Which wire should it be?