Running PYNQ on AWS F1

AlphaApodis · May 17, 2022, 8:25pm

Hi,

I am attempting to run the PYNQ example notebooks (including FINN) on an AWS F1 instance. I have followed the ‘Getting Started with PYNQ -Alveo Edition using AWS F1 Instances’ and everything seems to install correctly.

When I run the notebooks, however, I get this error when I try to load the included overlays:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-f7bd789b747b> in <module>
      3 
      4 # program the device
----> 5 ol = pynq.Overlay("intro.xclbin")
      6 vadd = ol.vadd_1
      7 

/usr/local/lib/python3.6/site-packages/pynq/overlay.py in __init__(self, bitfile_name, dtbo, download, ignore_version, device)
    353 
    354         if download:
--> 355             self.download()
    356 
    357         self.__doc__ = _build_docstring(self._ip_map._description,

/usr/local/lib/python3.6/site-packages/pynq/overlay.py in download(self, dtbo)
    417                     Clocks.set_pl_clk(i)
    418 
--> 419         super().download(self.parser)
    420         if dtbo:
    421             super().insert_dtbo(dtbo)

/usr/local/lib/python3.6/site-packages/pynq/bitstream.py in download(self, parser)
    185 
    186         """
--> 187         self.device.download(self, parser)
    188 
    189     def remove_dtbo(self):

/usr/local/lib/python3.6/site-packages/pynq/pl_server/xrt_device.py in download(self, bitstream, parser)
    505                 self.contexts = old_contexts
    506                 raise RuntimeError("Programming Device failed: " +
--> 507                                    _format_xrt_error(err))
    508         finally:
    509             xrt.xclUnlockDevice(self.handle)

RuntimeError: Programming Device failed: EIO (5) Input/output error

For the Welcome to PYNQ for Alveo device check I have the output:
0) xilinx_aws-vu9p-f1_shell-v04261818_201920_2

For the PYNQ version I have is 2.6.2
I have also tried to download the notebooks using the interactive option to make sure I was downloading for the correct shell version:
pynq get-notebooks --interactive
Detected shells:
0) - xilinx_aws-vu9p-f1_shell-v04261818_201920_2

But this also resulted in the same error.

I am using AWS F1 Developer kit version 1.4.18+
Tool Version supported 2020.2
Compatible FPGA Developer version v1.10.X (Xilinx Vivado/Vitis 2020.2)

marioruiz · May 18, 2022, 8:24am

Hi @AlphaApodis,

Have you source the aws runtime tools?

Can you please try to run the steps described here and report back? pynq-compute-labs · PyPI

Mario

AlphaApodis · May 18, 2022, 2:11pm

Hi @marioruiz ,

Yes I have sourced the aws runtime tools. I followed the link and performed the steps but first I removed the install I did before so that it would be fresh.

I had no errors following the steps outlined. After I started jupyter lab I attempted to run FCCM-Lab 1-vadd.ipynb. The first error I got was:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-672ed9753c68> in <module>
----> 1 import pynq

~/anaconda3/lib/python3.7/site-packages/pynq/__init__.py in <module>
     45 from .buffer import allocate
     46 from .pl_server import Device
---> 47 from . import lib
     48 
     49 

~/anaconda3/lib/python3.7/site-packages/pynq/lib/__init__.py in <module>
     29 
     30 
---> 31 from .pynqmicroblaze import PynqMicroblaze
     32 from .pynqmicroblaze import MicroblazeRPC
     33 from .pynqmicroblaze import MicroblazeLibrary

~/anaconda3/lib/python3.7/site-packages/pynq/lib/pynqmicroblaze/__init__.py in <module>
     42 try:
     43     __IPYTHON__
---> 44     from .magic import MicroblazeMagics
     45 except NameError:
     46     pass

~/anaconda3/lib/python3.7/site-packages/pynq/lib/pynqmicroblaze/magic.py in <module>
    102     get_ipython().register_magics(MicroblazeMagics)
    103     display_javascript(js, raw=True)
--> 104     import nest_asyncio
    105     nest_asyncio.apply()
    106 

ModuleNotFoundError: No module named 'nest_asyncio'
type or paste code here

So I installed nest_asyncio with ‘pip install nest_asyncio’. I had to do this before when I tried to run the introduction notebook.

I then tried to run ‘import pynq’ in 1-vadd again and got this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-3-672ed9753c68> in <module>
----> 1 import pynq

~/anaconda3/lib/python3.7/site-packages/pynq/__init__.py in <module>
     45 from .buffer import allocate
     46 from .pl_server import Device
---> 47 from . import lib
     48 
     49 

~/anaconda3/lib/python3.7/site-packages/pynq/lib/__init__.py in <module>
     73 from .logictools import FSMGenerator
     74 
---> 75 from . import video
     76 from . import audio
     77 from . import dma

~/anaconda3/lib/python3.7/site-packages/pynq/lib/video/__init__.py in <module>
     38 from . import clocks
     39 
---> 40 if pynq.ps.CPU_ARCH == pynq.ps.ZYNQ_ARCH:
     41     from . import dvi
     42 elif pynq.ps.CPU_ARCH == pynq.ps.ZU_ARCH:

AttributeError: module 'pynq' has no attribute 'ps'
type or paste code here

After restarting jupyter I was able to run ‘import pynq’ from 1-vadd but 'ol = pynq.Overlay(‘intro.xclbin’) gave me the same error as the original post.

I ran the Resources.ipynb to get the examples and also got the same error from 1-vector-addition.ipynb.

The device name that I get from welcome-to-pynq.ipynb is xilinx_aws-vu9p-f1_shell-v04261818_201920_2 .

This may be some relevant info from dmesg

[ 5700.333028] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 0
[ 5700.340642] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 6
[ 5700.347886] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 8
[ 5700.355050] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 7
[ 5700.362152] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 11
[ 5700.369342] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 14
[ 5700.376379] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 2
[ 5700.385758] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_axlf_section_header: skip section header 20
[ 5700.392423] [drm] Finding MEM_TOPOLOGY section header
[ 5700.396480] [drm] Section MEM_TOPOLOGY details:
[ 5700.400546] [drm]   offset = 0x2f8
[ 5700.402060] [drm]   size = 0x120
[ 5700.405088] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 icap_read_from_peer: reading from peer
[ 5700.412893] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_request: sending request: 10 via SW
[ 5700.420688] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_read: Software TX msg is too big
[ 5700.447980] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_cleanup_mem_nolock: Taking down DDR : 0
[ 5700.452966] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 __icap_download_bitstream_axlf: incoming xclbin: ad6945c3-8a91-4275-b2da-9d6770d591d9
on device xclbin: 00000000-0000-0000-0000-000000000000
[ 5700.474050] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 icap_cache_bitstream_axlf_section: found kind 6(MEM_TOPOLOGY)
[ 5700.485113] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_request: sending request: 8 via SW
[ 5700.496277] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_read: Software TX msg is too big
[ 5702.101866] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 __icap_peer_xclbin_download: peer xclbin download err: -5
[ 5702.113981] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 icap_download_bitstream_axlf: err: -5
[ 5702.124264] xocl 0000:00:1d.0:  ffff8ffbfaf08098 exec_reset: exec_reset(1) cfg(0)
[ 5702.132727] xocl 0000:00:1d.0:  ffff8ffbfaf08098 exec_reset: exec_reset resets
[ 5702.138358] xocl 0000:00:1d.0:  ffff8ffbfaf08098 exec_reset: exec->xclbin(00000000-0000-0000-0000-000000000000),xclbin(00000000-0000-0000-0000-000000000000)
[ 5702.146356] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_init_mem: Topology count = 7, data_length = 280
[ 5702.151744] xocl 0000:00:1d.0: p2p.u.10485760 ffff8ffbffd7a010 p2p_mem_init: already initialized
[ 5702.160280] xocl 0000:00:1d.0:  ffff8ffbfaf08098 xocl_read_axlf_helper: Failed to download xclbin, err: -5
type or paste code here

marioruiz · May 18, 2022, 2:17pm

What XRT version are you using?

Are you able to validate the instance with other xclbin?

AlphaApodis · May 18, 2022, 2:53pm

XRT version info

      XRT Build Version: 2.8.0
    Build Version Branch: 2020.2
      Build Version Hash: 77d5484b5c4daa691a7f78235053fb036829b1e9
 Build Version Hash Date: Fri, 13 Nov 2020 07:51:30 -0800
      Build Version Date: Mon, 14 Dec 2020 02:59:03 +0000
                    XOCL: 2.8.0,77d5484b5c4daa691a7f78235053fb036829b1e9
                 XCLMGMT: unknown

I have ran an AWS tutorial a few months ago that ran some simple xclbins.

marioruiz · May 18, 2022, 3:18pm

Can you try to program the xclbin directly with XRT?

I do not recall the exact syntax, it may be something like this.

xbutil program -r -p <one of the xlcbin files>

Any chance that you can move to an instance to an AMI like 1.11.x? We have been using the 1.11 and did not run into any problem.

Mario

AlphaApodis · May 18, 2022, 3:56pm

Using xbutil to program resulted in:

xbutil program -p intro.xclbin 
INFO: Found total 1 card(s), 1 are usable
XRT build version: 2.8.0
Build hash: 77d5484b5c4daa691a7f78235053fb036829b1e9
Build date: 2020-12-14 02:59:03
Git branch: 2020.2
PID: 11314
UID: 1000
[Wed May 18 15:52:48 2022 GMT]
HOST: ip-172-31-10-96.us-gov-west-1.compute.internal
EXE: /opt/xilinx/xrt/bin/unwrapped/xbutil
[XRT] ERROR: See dmesg log for details. err=-5
ERROR: xbutil program failed.```

dmesg output:

[11991.950247] xocl 0000:00:1d.0: ffff8ffbfaf08098 _xocl_drvinst_open: OPEN 2
[11991.954769] [drm] creating scheduler client for pid(11693), ret: 0
[11991.971713] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 icap_read_from_peer: reading from peer
[11991.977667] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_request: sending request: 10 via SW
[11991.984299] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_read: Software TX msg is too big
[11992.007948] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 icap_cached_ocl_frequency: no cached data for 3
[11992.015435] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: trying to find section header for axlf section 20
[11992.021863] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 0
[11992.027135] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 6
[11992.036130] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 8
[11992.045033] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 7
[11992.053246] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 11
[11992.060739] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 14
[11992.067429] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: saw section header: 2
[11992.076497] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_axlf_section_header: skip section header 20
[11992.085635] [drm] Finding MEM_TOPOLOGY section header
[11992.090505] [drm] Section MEM_TOPOLOGY details:
[11992.095667] [drm] offset = 0x2f8
[11992.097667] [drm] size = 0x120
[11992.101566] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_cleanup_mem_nolock: Taking down DDR : 0
[11992.110210] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 __icap_download_bitstream_axlf: incoming xclbin: ad6945c3-8a91-4275-b2da-9d6770d591d9
on device xclbin: 00000000-0000-0000-0000-000000000000
[11992.128549] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 icap_cache_bitstream_axlf_section: found kind 6(MEM_TOPOLOGY)
[11992.139104] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_request: sending request: 8 via SW
[11992.148504] xocl 0000:00:1d.0: mailbox.u.9437184 ffff8ffbfc378810 mailbox_read: Software TX msg is too big
[11993.497905] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 __icap_peer_xclbin_download: peer xclbin download err: -5
[11993.505053] xocl 0000:00:1d.0: icap.u.22020096 ffff8ffbffd78c10 icap_download_bitstream_axlf: err: -5
[11993.510880] xocl 0000:00:1d.0: ffff8ffbfaf08098 exec_reset: exec_reset(1) cfg(0)
[11993.515788] xocl 0000:00:1d.0: ffff8ffbfaf08098 exec_reset: exec_reset resets
[11993.520438] xocl 0000:00:1d.0: ffff8ffbfaf08098 exec_reset: exec->xclbin(00000000-0000-0000-0000-000000000000),xclbin(00000000-0000-0000-0000-000000000000)
[11993.529316] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_init_mem: Topology count = 7, data_length = 280
[11993.535236] xocl 0000:00:1d.0: p2p.u.10485760 ffff8ffbffd7a010 p2p_mem_init: already initialized
[11993.540733] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_read_axlf_helper: Failed to download xclbin, err: -5
[11993.551334] [drm] client exits pid(11693)
[11993.554264] xocl 0000:00:1d.0: ffff8ffbfaf08098 xocl_drvinst_close: CLOSE 3


I requested for a 1.11.x AMI but it will take a day or so to get set up.

marioruiz · May 18, 2022, 4:29pm

If XRT can’t download the xclbin, PYNQ won’t either.

Do you have any xclbin that works in that instance?

AlphaApodis · May 18, 2022, 5:43pm

I am working on rebuilding some examples. Do you know of any that would be good to test?

marioruiz · May 18, 2022, 6:50pm

You can try with any of the pre-build design here Vision Application using Vitis Accelerated Libraries | XUP Vitis Tutorial

AlphaApodis · May 18, 2022, 8:35pm

I will try that and the AMI version you recommended and follow up.

AlphaApodis · May 23, 2022, 5:32pm

Hi @marioruiz ,

I set up he version 1.11.x on the commercial AWS and I was able to follow the steps almost exactly and the 1st notebook loaded the xclbin.

Unfortunately, after setting up the AWS Govcloud AMI 1.11.x and follow the same steps the xclbin would not load. Following the XUP Tutorial you shared also errored out when trying to download the xclbin:

[13016.990548] xocl 0000:00:1d.0: icap.u.23068672 ffff8aa65bcf9c10 icap_download_bitstream_axlf: err: -5
[13016.990556] xocl 0000:00:1d.0:  ffff8aa644f04098 xocl_init_mem: Topology count = 7, data_length = 280
[13016.990562] xocl 0000:00:1d.0: p2p.u.11534336 ffff8aa65bcfa810 p2p_mem_init: already initialized
[13016.990570] xocl 0000:00:1d.0:  ffff8aa644f04098 xocl_read_axlf_helper: Failed to download xclbin, err: -5
[13017.002155] xocl 0000:00:1d.0:  ffff8aa644f04098 xocl_destroy_client: client exits pid(6052)
[13017.002161] xocl 0000:00:1d.0:  ffff8aa644f04098 xocl_drvinst_close: CLOSE 4

This was for the hardware example. The sw_emu part ran correctly.

Is there a way the PYNQ Team could reach out to the AWS support team and troubleshoot PYNQ Govcloud functionality?

marioruiz · May 23, 2022, 5:47pm

Hi @AlphaApodis,

I think this may be a thing of permissions for the awsxclbin file in the Govcloud. All of these xclbin are hosted in North Virginia, so if you are in a different region or different group you may not be able to access these files.

You can try to regenerate the designs and and generate the awsxclbin file. This should work.

Mario

marioruiz · May 23, 2022, 5:57pm

You can try to check the status of the AFIs with this command

aws ec2 describe-fpga-images --fpga-image-ids <AFI_ID>

For instance,

aws ec2 describe-fpga-images --fpga-image-ids afi-0609503124177e0ce

The AFI ID are included in the solution folder of the xup_compute_acceleration repo

Some instructions to check the status here xup_compute_acceleration/Creating_AFI.md at master · Xilinx/xup_compute_acceleration · GitHub

As, I mentioned in the previous answer. It is very likely that you cannot access these files from the Govcloud

Mario

AlphaApodis · May 23, 2022, 6:52pm

It doesn’t look like I can. How do we transfer the AFI’s to the govcloud region? I had this issue a few months ago with the DPU and Xilinx had to reach out to AWS to correct it because the Vitis AI AMI is available for use in the govcloud but was unusable because the DPU wasn’t there.

AlphaApodis · May 24, 2022, 6:27pm

I tried a couple of times to regenerate the designs but implementation fails every time. This is the end of the log:

A possible reason is high utilization of BRAMs, DSPs, URAMs, or RPMs. Please check user constraints to make sure design is not over-utilized in the constraint areas (if any) keeping in mind some macros require a number of consecutively available sites
ERROR: DSP utilization in constrained region pblock_dynamic_SLR1 is greater than it's capacity. Detected utilization = 902.857 percent.

Resolution: For technical support on this issue, please visit http://www.xilinx.com/support.
Phase 2.5 Global Placement Core | Checksum: 1c04132c5

Time (s): cpu = 00:58:54 ; elapsed = 00:34:00 . Memory (MB): peak = 16036.148 ; gain = 3653.164 ; free physical = 98222 ; free virtual = 115376
Phase 2 Global Placement | Checksum: 1c04132c5

Time (s): cpu = 00:58:58 ; elapsed = 00:34:04 . Memory (MB): peak = 16036.148 ; gain = 3653.164 ; free physical = 98387 ; free virtual = 115541
ERROR: [Place 30-99] Placer failed with error: 'Exit after global placer'
Please review all ERROR, CRITICAL WARNING, and WARNING messages during placement to understand the cause for failure.
Ending Placer Task | Checksum: aa718fd7

Time (s): cpu = 00:59:48 ; elapsed = 00:34:55 . Memory (MB): peak = 16036.148 ; gain = 3653.164 ; free physical = 98814 ; free virtual = 115969
INFO: [Common 17-83] Releasing license: Implementation
206 Infos, 373 Warnings, 151 Critical Warnings and 4 Errors encountered.
place_design failed
ERROR: [Common 17-69] Command failed: Placer could not place all instances
INFO: [Common 17-206] Exiting Vivado at Tue May 24 15:56:32 2022...

AlphaApodis · May 26, 2022, 3:31pm

I did get the design to build successfully but I had to remove the slr_floorplan.xdc. It looks like there is an issue between using an Alveo and AWS F1 as build targets for the xclbin for the XUP Vision Application using Vitis Accelerated Libraries example.

Is there a different set of constraints for the F1? If I do manage to build and create an AFI for this example, how do I go about rebuilding the intro.xclbin for use on the Govcloud because I don’t think the source is provided?

cathalmccabe · May 26, 2022, 5:09pm

The layout of the shells between the Alveo U200 and AWS is quite different which looks like it is causing the problems you saw.

Which example(s) were you trying to rebuild? You shouldn’t need a xdc, or at least you don’t need one with floorplan constraints. If you are rebuilding Vitis designs, I would suggest you start with a new VItis project and only use the software sources. You may need to check the memory mapping too depending on the design.

Cathal

AlphaApodis · May 26, 2022, 5:28pm

I opened up the linked Vivado project to make the changes to slr_floorplan.xdc, removing the pblock constraints.

Currently, I am looking here to see if I can figure out how to regenerate xclbins for AWS F1:

But Mario referred me to Vision Application using Vitis Accelerated Libraries | XUP Vitis Tutorial

I cannot re-generate the xclbin for that example, I get the error message about DSP utilization in pblock SLR1 I shared above.

Ultimately, I am trying to build and run xclbins for PYNQ on the Govcloud side.

AlphaApodis · May 29, 2022, 4:07pm

I was able to rebuild the xclbin file from this example:

I was also able to run create afi successfully.

Is the sources available to recreate the PYNQ intro.xclbin and other getting started examples?

Topic		Replies	Views
Xclbin cannot be downloaded on AWS F1 instance Support	8	839	June 11, 2020
Getting Started with PYNQ - Alveo Edition using AWS F1 Instances Learn	0	3010	March 2, 2020
Overlay loading error Support	2	104	June 27, 2024
Pandas installation fails on PYNQ Z2 Support	4	793	June 26, 2022
Xclbin cannot be downloaded Support	3	1626	January 10, 2021

Running PYNQ on AWS F1

Related topics