Adding Reconfigurable Modules to Partitions in the Composable Pipeline Design

Hi all,

I am currently looking into adding an IP from the Vitis Vision library into one of the dfx regions of the composable overlay design and synthesising it using the Abstract Shell methodology. The Xilinx guides on how to do so are very clear and straightforward, however, I have some queries as to how to adapt the procedure for the composable overlay design.

First I understand I need to add a Reconfigurable Module (RM) using the DFX wizard, however, when I attempt to do so I get greeted with the following message:

Is there something I need to do first to allow me to add new RM to the Reconfigurable Partitions?

Secondly I am curious as to why the static IP’s are all connected by 1 main stream in and 1 stream out to the Axis Stream Switch, whereas all of the RM preloaded in the DFX regions of the design have 2 streams in and out to the dfx decoupler before then the Axis Stream switch?
Is this just to allow for multi-stream functionality (e.g. as utilised by the dilate and erode IP or by the Subtract IP) and both don’t actually need to be connected. For example if i was to add a simple rgb2bgr IP from the Vitis vision library I could just have a stream in and out like I did when swapping it into the static region?

@marioruiz I acknowledge this task hasn’t been tested by you guys (as you mentioned in Changing the IP in the Composable Pipeline - #6 by marioruiz) but I hope maybe you can answer these fundamentals around it. Or is this maybe more suited for the general Xilinx forums?

If I was to create a new RM with the rgb2bgr IP block can I simply wire it up directly to the Axi Slice and out again or would I need to add a fifo block for the second stream input like with the rgb2xyz IP:

Any guidance is appreciated! :slight_smile:
Thanks,
Cameron

@cking

Did you add BDC?
for newer Vivado BDC is what you need to activate the partial block feature.

BTW, you need a decoupler between PR block to main AXI or any signal your are trying to use.

ENJOY~

I have dug a bit deeper into the docs it has lead me down the path of the Abstract Shell flow, however I have a few questions on the creation of the composable overlay project from the makefile.

Does the makefile process write abstract shells for the Partial Reconfigurable regions (pr_0,pr_1,pr_2) after it has finished implementation?

As I notice these .dcp files present in the impl_1 folder:

If I want to add a new filter into Pr_0 am I right in saying I create the .bd file as part of the bdc, carry out synthesis. Then open the .bd in a separate vivado project and add the folder: abs_shell_video_cp_i_composable_pr_0.dcp through tcl console?

From here then I plan to follow on from step 4 in the Abstract shell flow tutorial shown here: AMD Adaptive Computing Documentation Portal

Any help is appreciated,
Thanks!
Cameron :slight_smile:

@cking

Maybe you need more understand how to create the PR block and the flow of PR block:

It is very similar to new version of Vivado only the PR block uses BDC in newer version.

ENJOY~

Hi,

From my understanding there is the BDC flow for adding PM to PR blocks, which you are describing, that takes longer to generate the partial bitstream for the new RM as it incorporates implementation of the full static design with the new RM.

I am investigating the abstract shell flow which looks into extracting only a small section of the static design (all the timing and spatial constraint required external to the PR block) saved as a checkpoint, to carry out implementation with the new RM for partial bitstream generation. Greatly reducing implementation time.

@cking

Yes, that is correct which the tutorial also uses this method and I think only the method that could works in Vivado 2020 or below. Read more on the tcl command of the tutorial.
While I am not sure the changes in the newer Vivado.

One major thing is that you need decoupler on each interface to the PR block.

ENJOY~

Hi Cameron,

No, the script do not write the abstract shell dcp explicitly, this is something you’ll have to do yourself. But, it is using the abstract shell methodology under the hood.

I think the questions are getting too advanced to be answered in this forum. Maybe, the Xilinx forum is more appropiate.

There’s a full tutorial about abstract shell here
https://xilinx.github.io/kria-apps-docs/dfx/build/html/docs/Kria_DFX_K26.html
https://xilinx.github.io/kria-apps-docs/dfx/build/html/docs/creation_of_new_RM.html

To answer your previous questions:

Is there something I need to do first to allow me to add new RM to the Reconfigurable Partitions?

In this design the RP are already created and configured, I suggest you look at the DFX tutorial to see how to create them.

Secondly I am curious as to why the static IP’s are all connected by 1 main stream in and 1 stream out to the Axis Stream Switch, whereas all of the RM preloaded in the DFX regions of the design have 2 streams in and out to the dfx decoupler before then the Axis Stream switch?

This was a design decision to be able to place fork, join IP in any RP.

If I was to create a new RM with the rgb2bgr IP block can I simply wire it up directly to the Axi Slice and out again or would I need to add a fifo block for the second stream input like with the rgb2xyz IP:

Yes, the design you’re showing should work.
Mario

1 Like

@cking

I think it is better to clarify the response a bit.
My response is referring to the time for re-implement new PR block when design is update.
Using checkpoint vai tcl command as shown in the tutorial is highly suggest to reduce design turnover time.

However for abstract shell dcp @marioruiz have a better experience and I would suggest to learn from the proposed links.

ENJOY~

Hi @marioruiz,

I am still attempting to add my own Reconfigurable modules into the composable overlay Vivado design. I have been mainly following this guide: AMD Adaptive Computing Documentation Portal, but it is a very basic use case as with most xilinx guides and doesn’t cater for many RMs in different RPs.
I have been bounced around on the main Xilinx forums but they cannot provide much guidance, I’m back here just looking a bit of clarification on the limits of the design:

  1. Is there room in any/all of the p_blocks for me to add my own logic (nothing too large, I am trying to add the rgb2xyz block design to pr_0 as well as pr_1) or do I need to edit the floorplan of the FPGA to extend the p_block and so with it the reconfigurable partition?
    → Currently when trying to implement the design I get a place_design error, suggesting the P_block can’t handle the additional logic.
    → Would it be easier to delete current RMs to replace with my own in the same RP?

  2. Do I need to create my own DFX region via block design container and connect it to the dfx decoupler to be able to add more RM?

  3. Bit of a stretch but do you know about the DFX Wizard and what configuration runs I need to carry out to implement (and eventually generate partial bitstream) for the new RM… I.e. do I need a run with just the new RM, or the new RM in one RP and all the other possible RMs in the other RPs?

I know the main reason for the support forms isn’t for messing with the design of the overlay, but I believe it is a great open source tool that if easily customizable, could greatly accelerate many areas of FPGA development.
Hence any help is much appreciated!!

Thanks,
Cameron

Hi Cameron,

I’ll answer to the best of my knowledge, but these questions are out-of-scope for this forum.

This document is the best place I found where it is described what you’re looking for. Please, check it carefully and look at the associated scripts. All the steps are there. I would suggest you reproduce these steps as you’ll clearly see what is needed (starting from only an abstract shell dcp), although the design is different the steps will be the same.

  1. Yes, it should be room to fit this IP in any of the PR regions.
  2. No, with Abstract shell you can target the existing RP.
  3. This is cover in the document linked above. You need a different project to do this. Define the interfaces, add IP cores, synthesize, link against dcp and finally generate partial bistream.

Mario

1 Like

Hi Mario,

Ok great thanks for clearing them few things up for me that’s a big help. I will have a go at following that abstract shell example and attempt to use it for doing the same to the composable overlay.

I was reading through the bdc_dfx.tcl (24.8 KB) file to see how the RM are instantiated within the RPs and the implementation runs set up to gather the partial bitstreams.
Out of curiosity, would it be possible (and if so would it be easier) for me to write my own tcl script emulating these commands to first build a module such as laid out here:

And then also create a config run and child implementation leading to bitstream generation such as is done again in bdc_dfx.tcl:

I’m not the most familiar with tcl commands but this seems like an easy approach if wanting to continually add/trial new modules? Can this process be done easily within the tcl console of the main project, or does this only work in the build flow?

Thanks again Mario,
Cameron

Hi Cameron,

Out of curiosity, would it be possible (and if so would it be easier) for me to write my own tcl script emulating these commands to first build a module such as laid out here:

You can certainly do this.

And then also create a config run and child implementation leading to bitstream generation such as is done again in bdc_dfx.tcl:

You can also do this as well.

Can this process be done easily within the tcl console of the main project, or does this only work in the build flow?

Yes, you can execute these commands from the TCL console in Vivado. But, you should make sure that all the variables are previously defined.

Mario

1 Like

Hi Mario,

Thanks for the feedback I am currently giving it a go and will give feedback on the process if it works.

I have written my own tcl file from sections of the bdc_dfx.tcl file (uploaded here:
Adding_rgb2xyz_RM.tcl (4.3 KB))

I am currently struggling at the synthesis stage. I have removed all trace of the other RM from the script, but because of this the synthesis line:


Leads to a vivado error as the variables of the other RM in pr_0 are undefined and hence prompts the error:
can’t read “pr_0_dilate_erode”: no such variable

If I only synthesise the new pr_0_rgb2xyz_fifo_viatcl it leads to the deletion of the sources from the pr_0 block and I cant set the pr_0_rgb2xyz_fifo_viatcl block as an active source.

Do I need to synthesise all the blocks or only focus on my new RM?

Thanks for the assistance,
Cameron

Hi Cameron,

Do I need to synthesise all the blocks or only focus on my new RM?

It really depends on what you want to do. But, I would suggest you add your new RM in the existing bdc_dfx.tcl file, so all the variables are defined.

You can only focus on a single RM if you’re basing the implementation on a abstract shell dcp.

Mario

Hi Mario,

I would like to trial and error with new filter designs in one of the RP. So ideally I would like to be able to build a template tcl script that I can run within an already fully implemented design, to add another RM to one of the RP - then implement and generate the full and partial bitstream required.

By adding my new module to the bdc_dfx.tcl script, are you suggesting the procedure of running from the makefile to process it. or could I then run the bdc_dfx.tcl from the tcl console in the main project?

Current errors and BDC condition if any help:

Thanks,
Cameron

Hi Cameron,

By adding my new module to the bdc_dfx.tcl script, are you suggesting the procedure of running from the makefile to process it.

Yes

or could I then run the bdc_dfx.tcl from the tcl console in the main project?

I haven’t tried this flow myself. But, I assume if you try this, all changes made by bdc_dfx.tcl for the RP you’re working on will be overwritten. On top of this, previous partial bitstreams may not be usable anymore because the main bitstream can change.

What I can suggest is that you open a new Vivado instance and source this file https://github.com/mariodruiz/PYNQ_Composable_Pipeline/blob/v1.1.0-dev.2022.2/boards/KV260/cv_dfx_3_pr.tcl. Once the whole process finish, you should have all the variables declared in the environment.

Hope this helps.

Mario

Hi Mario,

Ok Cheers, well I will try running from the makefile with my new module added into bdc_dfx.tcl to be able to say I have been able to add in my new module to the design.

With regards to opening a new vivado instance and sourcing cv_dfx_3_pr am I correct in guessing that this will do the same as running the makefile essentially?

Earlier you mentioned using a dcp from within a new Vivado instance. Could I write a tcl script following abstract shell flow in a new instance to generate a partial bitstream for a new module?

EDIT: Also just wondering Mario if I add my new module to the RP in the project and set it as the active synthesis source will it require a new full bitstream?

Yes, but only the part of the Vivado project. It won’t run the dependencies.

Earlier you mentioned using a dcp from within a new Vivado instance. Could I write a tcl script following abstract shell flow in a new instance to generate a partial bitstream for a new module?

Yes, this process is described in this tutorial Create a new accelerator RM — Kria SOM DFX Examples 1.0 documentation

EDIT: Also just wondering Mario if I add my new module to the RP in the project and set it as the active synthesis source will it require a new full bitstream?

No, it shouldn’t. But, I haven’t tried to do this once the full project is created.

1 Like

Hi Mario,

An Update:
I have given up on trying to create a tcl script to run in project mode (within the full generated project). It leads down a dead end where I cant launch an implementation run to bitstream generation for the impl_1 (full static) and the child_7_impl_1 (new RM in pr_0), as it tells me “write_bitstream needs reset - reset runs” (im assuming impl_1) but when I do so it resets impl_1 and all the child runs.

When re launced after this then the implementation leads to an error due to a “ports mismatch - 296 ports are missing”. Am I effectively destroying the implemented design in an attempt regenerate an image of it with my new module in? Is this why you suggest going to a different project?

I am attempting now to follow the guide you sent (adding a new RM to the dfx KV260 design) and carry out the steps but for the composable overlay. In this guide it references working from the abstract shell design checkpoints within that project. Do these correspond to those written as part of the impl_1 run stored in the build files of the composable overlay:


(meaning abs_shell_video_cp_i…)

Lastly, The dfx kv260 example guide creates an abstract shell where the partial bitstream is generated. Do I not also need a new full static bitstream aswell to recognise the partial bitstream?

Thanks Mario for the continued assistance and great support!
Cameron

Hi Mario,

Disregard my last message, thanks for the handy guide to the abstract shell flow it was easy enough to follow and adapt to the composable overlay now that I am more familiar with reading tcl scripts.

I’ll post a tutorial in the learn section for anyone else interested so they don’t have to struggle through these posts and so I can add in any deviations from the tutorial I had to make.

Thanks for the help!
Cameron

2 Likes