ValueError: embedded null byte while implementing DPR on ZCU216

Hi,
I have a problem with Dynamic Function eXchange on PYNQ.
PYNQ Version – 2.7.0
Vitis_HLS/Vivado Version – 2023.1
Board: Zynq® UltraScale + ™ RFSoC ZCU216

below is my code:

from pynq import Overlay, Bitstream, allocate, GPIO, MMIO
import pynq.lib.dma
from pynq.lib.dma import *
import numpy as np

DATA_OFFSET = 0x0000 

overlay = Overlay("dpr.bit") 

decouple = MMIO(0xA000_0000, 0x1000)
decouple_status = MMIO(0xA001_0000, 0x1000)

dma_in1 = overlay.dma_in1
dma_in2 = overlay.dma_in2
dma_out = overlay.dma_out

decouple.write(DATA_OFFSET, 0x0000_0001)
add_pb = Bitstream("add_pblock_partial.bit", None, True)
# print(repr(add_pb.binfile_name))
add_pb.download()
decouple.write(DATA_OFFSET, 0x0000_0000)

in1_data = allocate(shape=(4,), dtype=np.uint32)
in2_data = allocate(shape=(4,), dtype=np.uint32)
out_data = allocate(shape=(4,), dtype=np.uint32)

in1_data[:] = np.array([2, 4, 6, 100], dtype=np.uint32)
in2_data[:] = np.array([1, 2, 3, 70], dtype=np.uint32)


dma_in1.sendchannel.transfer(in1_data)
dma_in2.sendchannel.transfer(in2_data)
dma_in1.sendchannel.wait()
dma_in2.sendchannel.wait()  
dma_out.recvchannel.transfer(out_data)
dma_out.recvchannel.wait()
print(out_data)

and it shows following error:

ValueError                                Traceback (most recent call last)
<ipython-input-12-8496ce35de51> in <module>
     18 add_pb = Bitstream("add_pblock_partial.bit", None, True)
     19 # print(repr(add_pb.binfile_name))
---> 20 add_pb.download()
     21 decouple.write(DATA_OFFSET, 0x0000_0000)
     22 

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/bitstream.py in download(self, parser)
    185 
    186         """
--> 187         self.device.download(self, parser)
    188 
    189     def remove_dtbo(self):

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/pl_server/embedded_device.py in download(self, bitstream, parser)
    576         if parser is None:
    577             from .xclbin_parser import XclBin
--> 578             parser = XclBin(DEFAULT_XCLBIN)
    579 
    580         if not bitstream.binfile_name:

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/pl_server/xclbin_parser.py in __init__(self, filename, xclbin_data)
    369     def __init__(self, filename="", xclbin_data=None):
    370         self.ip_dict, self.mem_dict, self.clock_dict = \
--> 371             _xclbin_to_dicts(filename, xclbin_data)
    372         self.gpio_dict = {}
    373         self.interrupt_controllers = {}

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/pl_server/xclbin_parser.py in _xclbin_to_dicts(filename, xclbin_data)
    276 def _xclbin_to_dicts(filename, xclbin_data=None):
    277     if xclbin_data is None:
--> 278          with open(filename, 'rb') as f:
    279              xclbin_data = bytearray(f.read())
    280     sections, xclbin_uuid = parse_xclbin_header(xclbin_data)

ValueError: embedded null byte

I’m pretty sure there’s no “null” in my filename:

print(repr("add_pblock_partial.bit"))

‘add_pblock_partial.bit’

And this Dynamic Function eXchange implementation is successfully on my PYNQ-Z2. So the bitstream should be correct.

By the way, if I comment #add_pb.download(), the code work properly.

Please help me solve this problem. Thanks!

Hi @H.W,

Welcome to the PYNQ community.

First, I would like to highlight that Vivado 2020.2 is the official version supported by PYNQ 2.7. At that time, DFX was not the flow. It was still partial reconfiguration.

Is your DFX region part of a hierarchy in the block design? if so, you should be able to download like this Partial Reconfiguration — Python productivity for Zynq (Pynq)

Mario

Thank you for the reply.

  1. My inspiration is actually from this link:
    PYNQ 2.7 DFX Partial Reconfiguration under Vivado 2020.2

He used the same PYNQ version and same Partial Reconfiguration Implementation as I did.
(However the vivado version and board are different)


  1. In you suggestions, which states:
    " In the following example, let us assume there is a hierarchical block called block_0 in the design."

I do have a hierarchy called “pr_sec” for the DPR, (check in the image above.) but I can’t find it in the ip_dict in PYNQ:

AttributeError: Could not find IP or hierarchy pr_sec in overlay

Is there anything I missed or doing wrong?

BR,
H.W

I think the Vivado version is important, it always should match what PYNQ officially supports.

What do you see with ip_dict?

Following is the result from

overlay?
Type:            Overlay
String form:     <pynq.overlay.Overlay object at 0xffff5f930fa0>
File:            /usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/overlay.py
Docstring:      
Default documentation for overlay dpr.bit. The following
attributes are available on this overlay:

IP Blocks
----------
decouple             : pynq.lib.axigpio.AxiGPIO
decouple_status      : pynq.lib.axigpio.AxiGPIO
dma_in1              : pynq.lib.dma.DMA
dma_in2              : pynq.lib.dma.DMA
dma_out              : pynq.lib.dma.DMA
zynq_ultra_ps_e_0    : pynq.overlay.DefaultIP

Hierarchies
-----------
None

Interrupts
----------
None

GPIO Outputs
------------
None

Memories
------------

There’s nothing in Hierarchies

Did you say that the same design works for the PYNQ-Z2? Do you see the DFX region in the hierarchies?

No, there’s also nothing in the Hierarchies:

overlay?
Type:            Overlay
String form:     <pynq.overlay.Overlay object at 0xb38bbe38>
File:            /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq/overlay.py
Docstring:      
Default documentation for overlay dpr_with_multi_slots/dpr_multi_slot.bit. The following
attributes are available on this overlay:

IP Blocks
----------
dma1_in1             : pynq.lib.dma.DMA
dma1_in2             : pynq.lib.dma.DMA
dma1_out             : pynq.lib.dma.DMA
dma2_in1             : pynq.lib.dma.DMA
dma2_in2             : pynq.lib.dma.DMA
dma2_out             : pynq.lib.dma.DMA
dma3_in1             : pynq.lib.dma.DMA
dma3_in2             : pynq.lib.dma.DMA
dma3_out             : pynq.lib.dma.DMA
decouple             : pynq.lib.axigpio.AxiGPIO
decouple_status      : pynq.lib.axigpio.AxiGPIO
processing_system7_0 : pynq.overlay.DefaultIP

Hierarchies
-----------
None

Interrupts
----------
None

GPIO Outputs
------------
None

Memories
------------
PSDDR                : Memory

Project background:
PYNQ version – 3.0.0
Vivado/Vitis version – 2023.2
Board: PYNQ-Z2

The same way which I successfully implemented DFX on PYNQ-Z2:

This is the Block Design (there are 3 DFX section/ Hierarchies)

This is the Design Source:

This is the PYNQ code:

from pynq import Overlay, Bitstream, allocate, GPIO, MMIO
import pynq.lib.dma
from pynq.lib.dma import *
import numpy as np
DATA_OFFSET = 0x0000 

overlay = Overlay("dpr_with_multi_slots/dpr_multi_slot.bit") 

decouple = MMIO(0x4120_0000, 0x1000)
decouple_status = MMIO(0x4121_0000, 0x1000)

# GPIO     test  PASSED
# decouple test  PASSED
# DMA      test  PASSED
# DPR      test  PASSED

dma1_in1 = overlay.dma1_in1
dma1_in2 = overlay.dma1_in2
dma1_out = overlay.dma1_out

dma2_in1 = overlay.dma2_in1
dma2_in2 = overlay.dma2_in2
dma2_out = overlay.dma2_out

dma3_in1 = overlay.dma3_in1
dma3_in2 = overlay.dma3_in2
dma3_out = overlay.dma3_out

# DMAs belong to the static part of the system -> not in the function


def add_s1():
    decouple.write(DATA_OFFSET, 0x00000001)
    add_pb = Bitstream("dpr_with_multi_slots/add_pblock_1_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma1_in1.sendchannel.transfer(in1_data)
    dma1_in2.sendchannel.transfer(in2_data)
    dma1_in1.sendchannel.wait()
    dma1_in2.sendchannel.wait()  
    dma1_out.recvchannel.transfer(out_data)
    dma1_out.recvchannel.wait()
    print(out_data)
    
def sub_s1():
    decouple.write(DATA_OFFSET, 0x00000001)
    add_pb = Bitstream("dpr_with_multi_slots/sub_pblock_1_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma1_in1.sendchannel.transfer(in1_data)
    dma1_in2.sendchannel.transfer(in2_data)
    dma1_in1.sendchannel.wait()
    dma1_in2.sendchannel.wait()  
    dma1_out.recvchannel.transfer(out_data)
    dma1_out.recvchannel.wait()
    print(out_data)
    
def mul_s1():
    decouple.write(DATA_OFFSET, 0x00000001)
    add_pb = Bitstream("dpr_with_multi_slots/mul_pblock_1_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma1_in1.sendchannel.transfer(in1_data)
    dma1_in2.sendchannel.transfer(in2_data)
    dma1_in1.sendchannel.wait()
    dma1_in2.sendchannel.wait()  
    dma1_out.recvchannel.transfer(out_data)
    dma1_out.recvchannel.wait()
    print(out_data)

def div_s1():
    decouple.write(DATA_OFFSET, 0x00000001)
    add_pb = Bitstream("dpr_with_multi_slots/div_pblock_1_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma1_in1.sendchannel.transfer(in1_data)
    dma1_in2.sendchannel.transfer(in2_data)
    dma1_in1.sendchannel.wait()
    dma1_in2.sendchannel.wait()  
    dma1_out.recvchannel.transfer(out_data)
    dma1_out.recvchannel.wait()
    print(out_data)
    
def add_s2():
    decouple.write(DATA_OFFSET, 0x00000002)
    add_pb = Bitstream("dpr_with_multi_slots/add_pblock_2_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma2_in1.sendchannel.transfer(in1_data)
    dma2_in2.sendchannel.transfer(in2_data)
    dma2_in1.sendchannel.wait()
    dma2_in2.sendchannel.wait()  
    dma2_out.recvchannel.transfer(out_data)
    dma2_out.recvchannel.wait()
    print(out_data)
    
def sub_s2():
    decouple.write(DATA_OFFSET, 0x00000002)
    add_pb = Bitstream("dpr_with_multi_slots/sub_pblock_2_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma2_in1.sendchannel.transfer(in1_data)
    dma2_in2.sendchannel.transfer(in2_data)
    dma2_in1.sendchannel.wait()
    dma2_in2.sendchannel.wait()  
    dma2_out.recvchannel.transfer(out_data)
    dma2_out.recvchannel.wait()
    print(out_data)
    
def mul_s2():
    decouple.write(DATA_OFFSET, 0x00000002)
    add_pb = Bitstream("dpr_with_multi_slots/mul_pblock_2_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma2_in1.sendchannel.transfer(in1_data)
    dma2_in2.sendchannel.transfer(in2_data)
    dma2_in1.sendchannel.wait()
    dma2_in2.sendchannel.wait()  
    dma2_out.recvchannel.transfer(out_data)
    dma2_out.recvchannel.wait()
    print(out_data)

def div_s2():
    decouple.write(DATA_OFFSET, 0x00000002)
    add_pb = Bitstream("dpr_with_multi_slots/div_pblock_2_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma2_in1.sendchannel.transfer(in1_data)
    dma2_in2.sendchannel.transfer(in2_data)
    dma2_in1.sendchannel.wait()
    dma2_in2.sendchannel.wait()  
    dma2_out.recvchannel.transfer(out_data)
    dma2_out.recvchannel.wait()
    print(out_data)

def add_s3():
    decouple.write(DATA_OFFSET, 0x00000004)
    add_pb = Bitstream("dpr_with_multi_slots/add_pblock_3_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma3_in1.sendchannel.transfer(in1_data)
    dma3_in2.sendchannel.transfer(in2_data)
    dma3_in1.sendchannel.wait()
    dma3_in2.sendchannel.wait()  
    dma3_out.recvchannel.transfer(out_data)
    dma3_out.recvchannel.wait()
    print(out_data)
    
def sub_s3():
    decouple.write(DATA_OFFSET, 0x00000004)
    add_pb = Bitstream("dpr_with_multi_slots/sub_pblock_3_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma3_in1.sendchannel.transfer(in1_data)
    dma3_in2.sendchannel.transfer(in2_data)
    dma3_in1.sendchannel.wait()
    dma3_in2.sendchannel.wait()  
    dma3_out.recvchannel.transfer(out_data)
    dma3_out.recvchannel.wait()
    print(out_data)
    
def mul_s3():
    decouple.write(DATA_OFFSET, 0x00000004)
    add_pb = Bitstream("dpr_with_multi_slots/mul_pblock_3_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma3_in1.sendchannel.transfer(in1_data)
    dma3_in2.sendchannel.transfer(in2_data)
    dma3_in1.sendchannel.wait()
    dma3_in2.sendchannel.wait()  
    dma3_out.recvchannel.transfer(out_data)
    dma3_out.recvchannel.wait()
    print(out_data)

def div_s3():
    decouple.write(DATA_OFFSET, 0x00000004)
    add_pb = Bitstream("dpr_with_multi_slots/div_pblock_3_partial.bit", None, True)
    add_pb.download()
    decouple.write(DATA_OFFSET, 0x00000000)

    dma3_in1.sendchannel.transfer(in1_data)
    dma3_in2.sendchannel.transfer(in2_data)
    dma3_in1.sendchannel.wait()
    dma3_in2.sendchannel.wait()  
    dma3_out.recvchannel.transfer(out_data)
    dma3_out.recvchannel.wait()
    print(out_data)
    
    
in1_data = allocate(shape=(4,), dtype=np.uint32)
in2_data = allocate(shape=(4,), dtype=np.uint32)
out_data = allocate(shape=(4,), dtype=np.uint32)

in1_data[:] = np.array([2, 4, 6, 100], dtype=np.uint32)
in2_data[:] = np.array([1, 2, 3, 70], dtype=np.uint32)
    
add_s1()
add_s2()
add_s3()
sub_s1()
sub_s2()
sub_s3()
mul_s1()
mul_s2()
mul_s3()
div_s1()
div_s2()
div_s3()

add_s1()
sub_s1()
mul_s1()
div_s1()
add_s2()
sub_s2()
mul_s2()
div_s2()
add_s3()
sub_s3()
mul_s3()
div_s3()


in1_data.close()
in2_data.close()
out_data.close()

and the result:

[  3   6   9 170]
[  3   6   9 170]
[  3   6   9 170]
[ 1  2  3 30]
[ 1  2  3 30]
[ 1  2  3 30]
[   2    8   18 7000]
[   2    8   18 7000]
[   2    8   18 7000]
[2 2 2 1]
[2 2 2 1]
[2 2 2 1]
[  3   6   9 170]
[ 1  2  3 30]
[   2    8   18 7000]
[2 2 2 1]
[  3   6   9 170]
[ 1  2  3 30]
[   2    8   18 7000]
[2 2 2 1]
[  3   6   9 170]
[ 1  2  3 30]
[   2    8   18 7000]
[2 2 2 1]

Clearly the DFX region is not been parsed. The latest Vivado version officially supported is 2022.1.

Could you please share the hwh files?

Sure.

dpr_multi_slot.hwh (1.2 MB)

And what means DFX region?
By setting the DFX part (where in my project, the function IP block in Hierarchy pr_sec) to HD_RECONFIGURATION 1 in Vivado is enough for the DFX right? So that part become a DFX region?

Hi @H.W,

It looks like that in newer versions of Vivado the hierarchy is not part of the metadata, that’s why PYNQ does not recognize it.

Maybe you can try something, right-click on the hierarchy then select Create Block Design Container...

Rebuild and try again.

I also suggest you check how to do DFX with Block Design Containers. https://docs.amd.com/r/en-US/ug909-vivado-partial-reconfiguration/IP-Integrator-Using-Block-Design-Containers

If this does not work, I strongly suggest you move to Vivado 2022.1 as this is the latest verified version.

Mario

@H.W @marioruiz

Look carefully in the AXI block the decoupler is not even mapped to both side of the HLS-IP.
I highly suggest you student the basic and have working example from the most basic HLS-IP like simple adder subtracter or other operators.

Meantime if you use 2023 why bother using TCL use BDC - block design container.
But of cause if you making decoupling wrong it will also not work as well.

“Eye wide open”

ENJOY~

Hi,
The function of dfx decouple is to prevent the circuits of the pblock from having unpredictable effects on the static circuits when loading the partial bitstream. In my block design, the IP block receives the axis bus from DMA. Since this is a unidirectional channel, there is no need to set up dfx decouple at the Slave interface of the IP block (Master interface of DMA input IPs). Additionally, this design has already been successfully implemented on my PYNQ-Z2.

1 Like

Hi @marioruiz @briansune ,
I finally solved the problem! Simply put, there is a BUG in the source code of PYNQ 2.7:

Firstly, lets review the error:

ValueError                                Traceback (most recent call last)
<ipython-input-29-3b813cdecb5c> in <module>
     18 decouple.write(DATA_OFFSET, 0x0000_0001)
     19 pb = Bitstream("add_pblock_partial.bit", None, True)
---> 20 pb.download()
     21 decouple.write(DATA_OFFSET, 0x0000_0000)
     22 

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/bitstream.py in download(self, parser)
    185 
    186         """
--> 187         self.device.download(self, parser)
    188 
    189     def remove_dtbo(self):

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/pl_server/embedded_device.py in download(self, bitstream, parser)
    576         if parser is None:
    577             from .xclbin_parser import XclBin
--> 578             parser = XclBin(DEFAULT_XCLBIN)
    579 
    580         if not bitstream.binfile_name:

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/pl_server/xclbin_parser.py in __init__(self, filename, xclbin_data)
    369     def __init__(self, filename="", xclbin_data=None):
    370         self.ip_dict, self.mem_dict, self.clock_dict = \
--> 371             _xclbin_to_dicts(filename, xclbin_data)
    372         self.gpio_dict = {}
    373         self.interrupt_controllers = {}

/usr/local/share/pynq-venv/lib/python3.8/site-packages/pynq/pl_server/xclbin_parser.py in _xclbin_to_dicts(filename, xclbin_data)
    276 def _xclbin_to_dicts(filename, xclbin_data=None):
    277     if xclbin_data is None:
--> 278          with open(filename, 'rb') as f:
    279              xclbin_data = bytearray(f.read())
    280     sections, xclbin_uuid = parse_xclbin_header(xclbin_data)

ValueError: embedded null byte

It turns out, that an embedded null byte in the filename, and I found that this filename is actually DEFAULT_XCLBIN.

Somehow, in the PYNQ 2.7, the following line of code cannot execute correctly.

DEFAULT_XCLBIN = (Path(__file__).parent / 'default.xclbin').read_bytes()

Location:

/PYNQ-image_v2.7/pynq/pl_server/embedded_device.py

My Solution is, by using Method Overriding, I rewrite these Classes:

class DebugXclBin(XclBin):
class DebugBitstream(Bitstream):

Now my full_test code is:

from pynq import Overlay, Bitstream, allocate, GPIO, MMIO
import pynq.lib.dma
from pynq.lib.dma import *
import numpy as np

######### Method Overriding #########
from pynq.pl_server.embedded_device import EmbeddedDevice, DEFAULT_XCLBIN, _get_bitstream_handler
from pynq.pl_server.xclbin_parser import XclBin, _xclbin_to_dicts


class DebugXclBin(XclBin):
    def __init__(self, filename="", xclbin_data=None):
        super().__init__(filename, xclbin_data)
        self.xclbin_data = xclbin_data  

class DebugBitstream(Bitstream):
    def download(self, parser=None):
        if parser is None:
            bitfile_path = self.bitfile_name
            
            # Create the bitstream handler
            bitstream_handler = _get_bitstream_handler(bitfile_path)

            # Get parser without partial
            parser = bitstream_handler.get_parser()
            if parser is None:
                parser = DebugXclBin(xclbin_data=DEFAULT_XCLBIN)
        
        if not hasattr(parser, 'xclbin_data'):
            raise AttributeError("Parser object has no attribute 'xclbin_data'")
        if parser.xclbin_data is None:
            raise ValueError("Parser object has xclbin_data set to None")
        
        super().download(parser)
        
######### Method Overriding #########

DATA_OFFSET = 0x0000 

overlay = Overlay("dpr.bit") 

decouple = MMIO(0xA000_0000, 0x1000)
decouple_status = MMIO(0xA001_0000, 0x1000)

dma_in1 = overlay.dma_in1
dma_in2 = overlay.dma_in2
dma_out = overlay.dma_out

decouple.write(DATA_OFFSET, 0x0000_0001)
pb = DebugBitstream("add_pblock_partial.bit", None, True)
pb.download()
decouple.write(DATA_OFFSET, 0x0000_0000)

in1_data = allocate(shape=(4,), dtype=np.uint32)
in2_data = allocate(shape=(4,), dtype=np.uint32)
out_data = allocate(shape=(4,), dtype=np.uint32)

in1_data[:] = np.array([2, 4, 6, 100], dtype=np.uint32)
in2_data[:] = np.array([1, 2, 3, 70], dtype=np.uint32)

dma_in1.sendchannel.transfer(in1_data)
dma_in2.sendchannel.transfer(in2_data)
dma_in1.sendchannel.wait()
dma_in2.sendchannel.wait()  
dma_out.recvchannel.transfer(out_data)
dma_out.recvchannel.wait()
print(out_data)

decouple.write(DATA_OFFSET, 0x0000_0001)
pb = DebugBitstream("sub_pblock_partial.bit", None, True)
pb.download()
decouple.write(DATA_OFFSET, 0x0000_0000)

in1_data = allocate(shape=(4,), dtype=np.uint32)
in2_data = allocate(shape=(4,), dtype=np.uint32)
out_data = allocate(shape=(4,), dtype=np.uint32)

in1_data[:] = np.array([2, 4, 6, 100], dtype=np.uint32)
in2_data[:] = np.array([1, 2, 3, 70], dtype=np.uint32)

dma_in1.sendchannel.transfer(in1_data)
dma_in2.sendchannel.transfer(in2_data)
dma_in1.sendchannel.wait()
dma_in2.sendchannel.wait()  
dma_out.recvchannel.transfer(out_data)
dma_out.recvchannel.wait()
print(out_data)

The output is:

[  3   6   9 170]
[ 1  2  3 30]

Problem solved!

1 Like

@H.W

No I don’t even think even when AXI-Stream master is used means it is good to create such IP without decoupling.

Because when reconfiguration you have no idea how the flops and LUT are defined.
Decoupling prevent all kinds of flop lock out and unnecessary switching behavior.

I can only suggest you to follow full decoupling rather than partial decoupling.

And good to hear you find bugs on the API.

ENJOY~

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.