I wrote a 3*3 convolutional ip core, the ip is correct. I ues it to perform calculations. When the input is 4 channels and the output is 16 channels, only the first channel results are correct, the rest of the 15 channels results are messed up. What is the reason for this?
The code is as follows:
def RunConv3(Conv3, feature_in, weight, bias, feature_out, ch_in, ch_out, size, act):
Conv3.write(0x10, feature_in.physical_address);
Conv3.write(0x18, feature_in.physical_address);
Conv3.write(0x20, feature_in.physical_address);
Conv3.write(0x28, feature_in.physical_address);
Conv3.write(0x30, weight.physical_address);
Conv3.write(0x38, weight.physical_address);
Conv3.write(0x40, weight.physical_address);
Conv3.write(0x48, weight.physical_address);
Conv3.write(0x50, bias.physical_address);
Conv3.write(0x58, feature_out.physical_address);
Conv3.write(0x60, feature_out.physical_address);
Conv3.write(0x68, feature_out.physical_address);
Conv3.write(0x70, feature_out.physical_address);
Conv3.write(0x78, ch_in);
Conv3.write(0x80, ch_out);
Conv3.write(0x88, size);
Conv3.write(0x90, act);
Conv3.write(0, (Conv3.read(0) & 0x80) | 0x01);
tp= Conv3.read(0)
while not ((tp>>1)&0x1):
tp= Conv3.read(0)
Convblock_w_1 = allocate(shape=(16433), cacheable=0, dtype=np.int16)
Convblock_bais_1 = allocate(shape=(16), cacheable=0, dtype=np.int16)
Convblock_out_1 = allocate(shape=(16432*432), cacheable=0, dtype=np.int16)
RunConv3(Conv3, image, Convblock_w_1, Convblock_bais_1, Convblock_out_1, 4, 16, 432, 1)