From 8c9ac9f840bce71c50a36f773bec5abc2a21d01c Mon Sep 17 00:00:00 2001 From: Deep Patel Date: Wed, 21 Dec 2016 15:08:02 -0600 Subject: [PATCH 01/29] Converting fiels to unix line endings --- FAQs.md | 230 +++---- hdk/LICENSE.txt | 322 ++++----- hdk/cl/examples/README.md | 420 ++++++------ hdk/cl/examples/cl_hello_world/README.md | 44 +- hdk/docs/AWS_Shell_Interface_Specification.md | 632 +++++++++--------- sdk/LICENSE.txt | 164 ++--- 6 files changed, 906 insertions(+), 906 deletions(-) diff --git a/FAQs.md b/FAQs.md index 8c5ede101..ebb3af740 100644 --- a/FAQs.md +++ b/FAQs.md @@ -1,115 +1,115 @@ -**Frequently Asked Questions** - -**What do I need to get started on building accelerators for FPGA -instances?** - -Getting started requires downloading the latest HDK and SDK from the AWS -FPGA GitHub repository. The HDK and SDK provide the needed code and -information for building FPGA code. The HDK provides all the information -needed on building source code for use within the FPGA. The SDK provides -all the information needed on building software for managing FPGAs on an -F1 instance. - -FPGA code requires a simulator to test code and a Vivado tool set for -synthesis of source code into compiled FPGA code. The FPGA Developer AMI -includes the Xilinx Vivado tools for simulation and synthesis of -compiled FPGA code. - -**How do I develop accelerator code for an FPGA in an F1 instance?** - -Start with the Shell interface specification: -AWS\_Shell\_Interface\_Specification.md. This document describes the -interface between Custom Logic and the AWS Shell. All Custom Logic for -an accelerator resides within the Custom Logic region of the F1 FPGA. - -**What are the major areas of the GitHub repository?** - -The HDK side of the GitHub repository contains the AWS Shell code, Build -scripts, Documentation, and Examples. Shell code is contained in -aws-fpga/hdk/common. Build scripts are in -aws-fpga/hdk/common/shell\_current/build. Documentation is in -aws-fpga/hdk/docs. Custom Logic examples are in aws-fpga/hdk/cl. - -The SDK side of the GitHub repository contains the FPGA Management -Tools, a preview of the AWS CLI for F1, and software for Xilinx XDMA and -SDAccell. The FPGA Management Tools are for loading/clearing AFIs and -getting status of the FPGAs mapped to an instance. FPGA Management Tools -are in aws-fpga/sdk/management. The AWS CLI preview is in -aws-fpga/sdk/aws-cli-preview. - -**What is included in the HDK?** - -The HDK includes documentation for the Shell interface and other Custom -Logic implementation guidelines, the Shell code needed for Custom Logic -development, simulation models for the Shell, software for exercising -the Custom Logic examples, a getting started guide for Custom Logic, and -examples for starting a Custom Logic Design. - -**What is in the AWS Shell?** - -The AWS Shell includes the PCIe interface for the FPGA, a single DDR -interface, and necessary FPGA management functionality. Also provided as -part of the Shell code, but implemented within the Custom Logic region -of the FPGA are three DDR interfaces. These interfaces are provided for -implementation within the Custom Logic region to provide maximum -efficiency for the developer. - -**Are there examples for getting started on accelerators?** - -Yes, examples are in the aws-fpga/hdk/cl/examples directory. The -cl\_hello\_world example is a simple example to build and test the CL -development process. The cl\_simple example provides an expanded example -for testing access to the DDR interfaces. - -**How do I get access to the Developer AMI?** - -Start with an AWS account and request access to the Developer AMI in AWS -Marketplace. Currently, the FPGA Developer AMI is private. You will -receive permission on the AWS account you submitted for access to the -FPGA Developer AMI. The AMI can be launched directly from AWS -Marketplace on any EC2 instance. See the FPGA Developer AMI README for -more details. - -**What is an AFI?** - -An AFI stands for Amazon FPGA Image. That is the compiled FPGA code that -is loaded into an FPGA for performing the Custom Logic function created -by the developer. AFIs are maintained by AWS according to the AWS -account that created them. An AFI ID is used to reference a particular -AFI from an F1 instance. The AFI ID is used to indicate the AFI that -should be loaded into a specific FPGA within the instance. - -**What is the process for creating an AFI?** - -The AFI process starts by creating Custom Logic code that conforms to -the Shell Specification. Then, the Custom Logic must be compiled using -the Vivado tools to create a Design Checkpoint. That Design Checkpoint -is submitted to AWS for generating an AFI using the API. - -See aws-fpga/hdk/cl and aws-fpga/hdk/cl/examples for more detailed -information. - -**Is there any software I need on my instance?** - -The required AWS software is the FPGA Management Tool set found in the -SDK directory. This software manages loading and clearing AFIs for FPGAs -in the instance. It also allows developers to retrieve status on the -FPGAs from within the instance. See the README in aws-fpga/sdk for more -details. - -**Why do I see error “vivado not found” while running hdk\_setup.sh** - -This is an indication that Xilinx vivado tool set are not installed. Try -installing the tool, or alternative use AWS FPGA Development AMI -available on AWS Marketplace, which comes with pre-installed Vivado -toolset and license - -**Do AWS Marketplace customers see FPGA source code or a bitstream?** - -Neither: AWS Marketplace customers that pick up an AMI with with one our -more AFIs associated with it will not see any source code nor bitstream. -Marketplace customers actually have permission to use the AFI but not -permission to see its code. The only reference to the AFI is through the -AFI ID. The Customer would call fpga-local-load-image with the correct -AFI ID for that Marketplace offering, which will result in AWS loading -the AFI into the FPGA. No FPGA internal design code is exposed. +**Frequently Asked Questions** + +**What do I need to get started on building accelerators for FPGA +instances?** + +Getting started requires downloading the latest HDK and SDK from the AWS +FPGA GitHub repository. The HDK and SDK provide the needed code and +information for building FPGA code. The HDK provides all the information +needed on building source code for use within the FPGA. The SDK provides +all the information needed on building software for managing FPGAs on an +F1 instance. + +FPGA code requires a simulator to test code and a Vivado tool set for +synthesis of source code into compiled FPGA code. The FPGA Developer AMI +includes the Xilinx Vivado tools for simulation and synthesis of +compiled FPGA code. + +**How do I develop accelerator code for an FPGA in an F1 instance?** + +Start with the Shell interface specification: +AWS\_Shell\_Interface\_Specification.md. This document describes the +interface between Custom Logic and the AWS Shell. All Custom Logic for +an accelerator resides within the Custom Logic region of the F1 FPGA. + +**What are the major areas of the GitHub repository?** + +The HDK side of the GitHub repository contains the AWS Shell code, Build +scripts, Documentation, and Examples. Shell code is contained in +aws-fpga/hdk/common. Build scripts are in +aws-fpga/hdk/common/shell\_current/build. Documentation is in +aws-fpga/hdk/docs. Custom Logic examples are in aws-fpga/hdk/cl. + +The SDK side of the GitHub repository contains the FPGA Management +Tools, a preview of the AWS CLI for F1, and software for Xilinx XDMA and +SDAccell. The FPGA Management Tools are for loading/clearing AFIs and +getting status of the FPGAs mapped to an instance. FPGA Management Tools +are in aws-fpga/sdk/management. The AWS CLI preview is in +aws-fpga/sdk/aws-cli-preview. + +**What is included in the HDK?** + +The HDK includes documentation for the Shell interface and other Custom +Logic implementation guidelines, the Shell code needed for Custom Logic +development, simulation models for the Shell, software for exercising +the Custom Logic examples, a getting started guide for Custom Logic, and +examples for starting a Custom Logic Design. + +**What is in the AWS Shell?** + +The AWS Shell includes the PCIe interface for the FPGA, a single DDR +interface, and necessary FPGA management functionality. Also provided as +part of the Shell code, but implemented within the Custom Logic region +of the FPGA are three DDR interfaces. These interfaces are provided for +implementation within the Custom Logic region to provide maximum +efficiency for the developer. + +**Are there examples for getting started on accelerators?** + +Yes, examples are in the aws-fpga/hdk/cl/examples directory. The +cl\_hello\_world example is a simple example to build and test the CL +development process. The cl\_simple example provides an expanded example +for testing access to the DDR interfaces. + +**How do I get access to the Developer AMI?** + +Start with an AWS account and request access to the Developer AMI in AWS +Marketplace. Currently, the FPGA Developer AMI is private. You will +receive permission on the AWS account you submitted for access to the +FPGA Developer AMI. The AMI can be launched directly from AWS +Marketplace on any EC2 instance. See the FPGA Developer AMI README for +more details. + +**What is an AFI?** + +An AFI stands for Amazon FPGA Image. That is the compiled FPGA code that +is loaded into an FPGA for performing the Custom Logic function created +by the developer. AFIs are maintained by AWS according to the AWS +account that created them. An AFI ID is used to reference a particular +AFI from an F1 instance. The AFI ID is used to indicate the AFI that +should be loaded into a specific FPGA within the instance. + +**What is the process for creating an AFI?** + +The AFI process starts by creating Custom Logic code that conforms to +the Shell Specification. Then, the Custom Logic must be compiled using +the Vivado tools to create a Design Checkpoint. That Design Checkpoint +is submitted to AWS for generating an AFI using the API. + +See aws-fpga/hdk/cl and aws-fpga/hdk/cl/examples for more detailed +information. + +**Is there any software I need on my instance?** + +The required AWS software is the FPGA Management Tool set found in the +SDK directory. This software manages loading and clearing AFIs for FPGAs +in the instance. It also allows developers to retrieve status on the +FPGAs from within the instance. See the README in aws-fpga/sdk for more +details. + +**Why do I see error “vivado not found” while running hdk\_setup.sh** + +This is an indication that Xilinx vivado tool set are not installed. Try +installing the tool, or alternative use AWS FPGA Development AMI +available on AWS Marketplace, which comes with pre-installed Vivado +toolset and license + +**Do AWS Marketplace customers see FPGA source code or a bitstream?** + +Neither: AWS Marketplace customers that pick up an AMI with with one our +more AFIs associated with it will not see any source code nor bitstream. +Marketplace customers actually have permission to use the AFI but not +permission to see its code. The only reference to the AFI is through the +AFI ID. The Customer would call fpga-local-load-image with the correct +AFI ID for that Marketplace offering, which will result in AWS loading +the AFI into the FPGA. No FPGA internal design code is exposed. diff --git a/hdk/LICENSE.txt b/hdk/LICENSE.txt index f22ce3c74..d217d7177 100644 --- a/hdk/LICENSE.txt +++ b/hdk/LICENSE.txt @@ -1,161 +1,161 @@ -Amazon Software License - - - - - - -1. Definitions - - - - - -Licensor means any person or entity that distributes its Work. - - - - - -Software means the original work of authorship made available under this License. - - - - - -Work means the Software and any additions to or derivative works of the Software that are -made available under this License. - - - - - -The terms reproduce, reproduction, derivative works, and distribution have the meaning -as provided under U.S. copyright law; provided, however, that for the purposes of this License, -derivative works shall not include works that remain separable from, or merely link (or bind by -name) to the interfaces of, the Work. - - - - - -Works, including the Software, are made available under this License by including in or with -the Work either (a) a copyright notice referencing the applicability of this License to the Work, -or (b) a copy of this License. - - -2. License Grants - - - - - -2.1 Copyright Grant. Subject to the terms and conditions of this License, each Licensor grants to -you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare -derivative works of, publicly display, publicly perform, sublicense and distribute its Work and -any resulting derivative works in any form. - - - - - -2.2 Patent Grant. Subject to the terms and conditions of this License, each Licensor grants to you -a perpetual, worldwide, non-exclusive, royalty-free patent license to make, have made, use, sell, -offer for sale, import, and otherwise transfer its Work, in whole or in part. The foregoing license -applies only to the patent claims licensable by Licensor that would be infringed by Licensors -Work (or portion thereof) individually and excluding any combinations with any other materials -or technology. - - -3. Limitations - - - - - -3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this -License, (b) you include a complete copy of this License with your distribution, and (c) you -retain without modification any copyright, patent, trademark, or attribution notices that are -present in the Work. - - - - - -3.2 Derivative Works. You may specify that additional or different terms apply to the use, -reproduction, and distribution of your derivative works of the Work (Your Terms) only if (a) -Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and -(b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding -Your Terms, this License (including the redistribution requirements in Section 3.1) will continue -to apply to the Work itself. - - - - - -3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for -use with the web services, computing platforms or applications provided by Amazon.com, Inc. -or its affiliates, including Amazon Web Services, Inc. - - - - - -3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor -(including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you -allege are infringed by any Work, then your rights under this License from such Licensor -(including the grants in Sections 2.1 and 2.2) will terminate immediately. - - - - - -3.5 Trademarks. This License does not grant any rights to use any Licensors or its affiliates -names, logos, or trademarks, except as necessary to reproduce the notices described in this -License. - - - - - -3.6 Termination. If you violate any term of this License, then your rights under this License -(including the grants in Sections 2.1 and 2.2) will terminate immediately. - - -4. Disclaimer of Warranty. - - - - - -THE WORK IS PROVIDED AS IS WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS -OF M ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON- -INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER -THIS LICENSE. SOME STATES CONSUMER LAWS DO NOT ALLOW EXCLUSION OF -AN IMPLIED WARRANTY, SO THIS DISCLAIMER MAY NOT APPLY TO YOU. - - -5. Limitation of Liability. - - - - - -EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO -LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR -OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL -DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR -INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF -GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER -FAILURE OR MALFUNCTION, OR ANY OTHER COMM ERCIAL DAMAGES OR -LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF -SUCH DAMAGES. - - - - - -Effective Date April 18, 2008 2008 Amazon.com, Inc. or its affiliates. All rights reserved. - +Amazon Software License + + + + + + +1. Definitions + + + + + +Licensor means any person or entity that distributes its Work. + + + + + +Software means the original work of authorship made available under this License. + + + + + +Work means the Software and any additions to or derivative works of the Software that are +made available under this License. + + + + + +The terms reproduce, reproduction, derivative works, and distribution have the meaning +as provided under U.S. copyright law; provided, however, that for the purposes of this License, +derivative works shall not include works that remain separable from, or merely link (or bind by +name) to the interfaces of, the Work. + + + + + +Works, including the Software, are made available under this License by including in or with +the Work either (a) a copyright notice referencing the applicability of this License to the Work, +or (b) a copy of this License. + + +2. License Grants + + + + + +2.1 Copyright Grant. Subject to the terms and conditions of this License, each Licensor grants to +you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare +derivative works of, publicly display, publicly perform, sublicense and distribute its Work and +any resulting derivative works in any form. + + + + + +2.2 Patent Grant. Subject to the terms and conditions of this License, each Licensor grants to you +a perpetual, worldwide, non-exclusive, royalty-free patent license to make, have made, use, sell, +offer for sale, import, and otherwise transfer its Work, in whole or in part. The foregoing license +applies only to the patent claims licensable by Licensor that would be infringed by Licensors +Work (or portion thereof) individually and excluding any combinations with any other materials +or technology. + + +3. Limitations + + + + + +3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this +License, (b) you include a complete copy of this License with your distribution, and (c) you +retain without modification any copyright, patent, trademark, or attribution notices that are +present in the Work. + + + + + +3.2 Derivative Works. You may specify that additional or different terms apply to the use, +reproduction, and distribution of your derivative works of the Work (Your Terms) only if (a) +Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and +(b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding +Your Terms, this License (including the redistribution requirements in Section 3.1) will continue +to apply to the Work itself. + + + + + +3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for +use with the web services, computing platforms or applications provided by Amazon.com, Inc. +or its affiliates, including Amazon Web Services, Inc. + + + + + +3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor +(including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you +allege are infringed by any Work, then your rights under this License from such Licensor +(including the grants in Sections 2.1 and 2.2) will terminate immediately. + + + + + +3.5 Trademarks. This License does not grant any rights to use any Licensors or its affiliates +names, logos, or trademarks, except as necessary to reproduce the notices described in this +License. + + + + + +3.6 Termination. If you violate any term of this License, then your rights under this License +(including the grants in Sections 2.1 and 2.2) will terminate immediately. + + +4. Disclaimer of Warranty. + + + + + +THE WORK IS PROVIDED AS IS WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS +OF M ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON- +INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER +THIS LICENSE. SOME STATES CONSUMER LAWS DO NOT ALLOW EXCLUSION OF +AN IMPLIED WARRANTY, SO THIS DISCLAIMER MAY NOT APPLY TO YOU. + + +5. Limitation of Liability. + + + + + +EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO +LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR +OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL +DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR +INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF +GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER +FAILURE OR MALFUNCTION, OR ANY OTHER COMM ERCIAL DAMAGES OR +LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + + + + +Effective Date April 18, 2008 2008 Amazon.com, Inc. or its affiliates. All rights reserved. + diff --git a/hdk/cl/examples/README.md b/hdk/cl/examples/README.md index 7ffa2f3e9..96f45677a 100644 --- a/hdk/cl/examples/README.md +++ b/hdk/cl/examples/README.md @@ -1,210 +1,210 @@ -# Overview on process for building a Custom Logic (CL) implementation for AWS FPGA instances - -The developer can build their own Custom Logic (CL) and deploy it on AWS. -The CL must comply with the [AWS Shell specifications](../../docs/AWS_Shell_Interface_Specification.md), and pass through the build scripts. - -The [CL Examples directory](https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples) is provided to assist developers in creating a -functional CL implementation. Each example includes: - -1) The design source code for the example under the `/design` directory. -2) The timing, clock and placement constraints files, scripts for compiling the example design. (This requires running in an instance/server that have Xilinx tools and license installed. Developers are recommended to use the "FPGA Development AMI" available free of charge on [AWS Marketplace](https://aws.amazon.com/marketplace/). -3) The final build, called Design CheckPoint (DCP) that can be submitted for AWS to generate the AFI. -4) An AFI-ID for a pre-generated AFI that matches the example design. -5) Software source code required on the FPGA-enabled instance to run the example. -6) Software binary that can be loaded on an FPGA-enabled instance to test the AFI. - -In summary: - -- An AFI can be created using the files in 1, 2, and 3. The AFI creation can take place on any EC2 instance or on premise. -- The AFI can be used in an EC2 F1 instance by using the files in 4, 5 and 6. - -By following the example CLs, a developer should learn how to interface to the AWS Shell of the FPGA, compile the source code to create an AFI, and load an AFI from the F1 instance for use. - -# Step by step guide on how to create an AFI from one of the CL examples - -As a pre-requested to building the AFI, the developer should have an instance/server with Xilinx vivado tools and license. The "FPGA Developer AMI" provided free of charge on AWS Marketplace will be an ideal place to start an instance from. See the README.md on the AMI for the details how to launch the FPGA Developer's AMI, install the tools and set up the license. - -**NOTE:** *steps 1 through 3 can be done on any server or EC2 instance, C4/C5 instances are recommended for fastest build time* - -**NOTE:** *You can skip steps 0 through 3 if you are not interested in the build process. Step 4 through 6 will show you how to use one of the predesigned AFI* - -### 0. Setup the HDK and install AWS CLI - - $ git clone https://github.com/aws/aws-fpga - $ cd aws-fpga - $ source hdk_shell.sh - -To install the AWS CLI, please follow the instructions here: (http://docs.aws.amazon.com/cli/latest/userguide/installing.html). - - $ aws configure # to set your credentials (found in your console.aws.amazon.com page) and region (typically us-east-1) - -**NOTE**: During the F1 preview, not all FPGA-specific AWS CLI commands are available to the public. -To extend your AWS CLI installation, please execute the following: - - $ aws configure add-model --service-model file://$(pwd)/sdk/aws-cli-preview/ec2_preview_model.json - - -### 1. Pick one of the examples and move to its directory - -There are couple of ways to start a new CL: one option is to copy one of the examples provided in the HDK and modify the design files, scripts and constrains directory. - -Alternatively, by creating a new directory, setup the environment variables, and prepare the project datastructure: - - $ cd $HDK_DIR/cl/examples/cl_hello_world # you can change cl_hello_world to any other example - $ export CL_DIR=$(pwd) - -Setting up the CL_DIR environment variable is crucial as the build scripts rely on that value. -Each one of the examples following the recommended directory structure to match what's expected by the HDK simulation and build scripts. - -If you like to start your own CL, check out the [How to create your own CL Readme](../developer_designs/README.md) - -### 2. Build the CL before submitting to AWS - -**NOTE** *This step requires you have Xilinx Vivado Tools installed as well Vivado License:* - - $ vivado -mode batch # Run this command to see if vivado is installed - $ sudo perl /home/centos/src/project_data/license/license_manager.pl -status # To check if license server is up. this command is for AWS-provided FPGA Development machine, the license manager can be in different directory in your systems - -The next script two steps will go through the entire implementation process converting the CL design into a completed Design Checkpoint that meets timing and placement constrains of the target FPGA - - $ cd $CL_DIR/build/scripts - $ ./aws_build_dcp_from_cl.tcl - -**NOTE**: The DCP generation can take up to several hours to complete. -We recommend that you initiate the generation in a way that prevents interruption. -For example, if working on a remote machine, we recommend using window management tools such as [`screen`](https://www.gnu.org/software/screen/manual/screen.html) to mitigate potential network disconnects. - - -### 3. Submit the Design Checkpoint to AWS to register the AFI - -To submit the DCP, create an S3 bucket for submitting the design and upload the tar-zipped archive into that bucket. -You need to prepare the following information: - -1. Name of the logic design. -2. Generic description of the logic design. -3. PCI IDs: Device, Vendor, Subsystem, SubsystemVendor (these IDs should be found in the README files in the respective CL example directory). -4. Location of the DCP object (S3 bucket name and key). -5. Location of the directory to write logs (S3 bucket name and key). -6. Version of the AWS Shell. - -To upload your DCP to S3, - - $ aws s3 mb s3:// # Create an S3 bucket (choose a unique bucket name) - $ aws s3 cp *.SH_CL_routed.dcp \ # Upload the DCP file to S3 - s3:///cl_simple.dcp - -To generate the AFI, follow one of the two methods listed below. -After the AFI generation is complete, AWS will put the logs into the bucket location provided by the developer and notify them -by email. - -#### Method 1: If you have access to AWS EC2 CLI with support for `create-fpga-image` action -To create an AFI from the generated DCP, you need to upload the tar-zipped DCP file to an S3 bucket, and execute the `aws ec2 create-fpga-image` command as follows: - - $ aws ec2 create-fpga-image \ - --fpga-image-architecture xvu9p \ - --shell-version \ - --fpga-pci-id deviceId=,vendorId=,subsystemId=,subsystemVendorId= \ - --input-storage-location Bucket=,Key=cl_simple.dcp - --name MyCL - --logs-storage-location Bucket=,Key=logs/ - -The output of this command includes two identifiers that refer to your AFI: -- FPGA Image Identifier or AFI ID: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. - This ID is regional, i.e., if an AFI is copied across multiple regions, it will have a different unique AFI ID in each region. - An example AFI ID is `afi-01234567890abcdef`. -- Glogal FPGA Image Identifier or AGFI ID: this is a global ID that is used to refer to an AFI from within an F1 instance. - For example, to load or clear an AFI from an FPGA slot, you use the AGFI ID. - Since the AGFI IDs is global (by design), it allows you to copy a combination of AFI/AMI to multiple regions, and they will work without requiring any extra setup. - An example AFI ID is `agfi-01234567890abcdef`. - -#### Method 2: During F1 preview and before AWS EC2 CLI action `create-fpga-image` is available - -Add a policy to the created S3 bucket granting [read/write permissions](http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html) to our team's account (Account ID: 371834676912). -A sample policy is shown below. - - { - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "Bucket level permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::371834676912:root" - }, - "Action": [ - "s3:ListBucket" - ], - "Resource": "arn:aws:s3:::" - }, - { - "Sid": "Object read permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::371834676912:root" - }, - "Action": [ - "s3:GetObject" - ], - "Resource": "arn:aws:s3:::/" - }, - { - "Sid": "Folder write permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::371834676912:root" - }, - "Action": [ - "s3:PutObject" - ], - "Resource": "arn:aws:s3:::/*" - } - ] - } - -Then, send an email to AWS (email TBD) providing the information listed earlier. - - -# Step by step guide how to load and test a registered AFI from within an F1 instance - -To follow the next steps, you have to run an instance on F1. AWS recommend you run an instance with latest Amazon Linux that have the FPGA management tools included, or alternatively the FPGA Developer AMI with both the HDK and SDK. - -## 4. Setup AWS FPGA management tools - -Execute the following: - - $ git clone https://github.com/aws/aws-fpga # Not needed if you have installed the HDK as in Step 0. - $ cd aws-fpga - $ source sdk_setup.sh - -## 5. Associate the AFI with your AMI - -To start using the AFI, you need to associate it with an [AMI (Amazon Machine Image)](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) that you own. -Association means that any instance launched using this AMI will be able to load the AFIs to FPGAs as described in the next section. -You can associate multiple AFIs with your AMI. -There is a default limit of eight AFIs per AMI, if you need more, please reach out to AWS with your use case and we can adjust your limit. -To associate, simply invoke the following AWS EC2 CLI command. - - $ aws ec2 associate-fpga-image --fpga-image-id --image-id - -## 6. Load the AFI - -Run's the `fpga-describe-local-image` on slot 0 to confirm that the FPGA is cleared, and you should see similar output to the 4 lines below: - - $ sudo fpga-describe-local-image -S 0 -H - - Type FpgaImageSlot FpgaImageId StatusName StatusCode - AFI 0 none cleared 1 - Type VendorId DeviceId DBDF - AFIDEVICE 0x1d0f 0x1042 0000:00:17.0 - -Then loading the example AFI to FPGA slot 0 (you should have the AGFI ID from Step 3 above): - - $ sudo fpga-load-local-image -S 0 -I - -Now, you can verify the status of the previous load command: - - $ sudo fpga-describe-local-image -S 0 -H - -## 7. Call the specific CL example software - -[Validating CL Designs](https://github.com/aws/aws-fpga/wiki/Validating-CL-Designs#quick-start) +# Overview on process for building a Custom Logic (CL) implementation for AWS FPGA instances + +The developer can build their own Custom Logic (CL) and deploy it on AWS. +The CL must comply with the [AWS Shell specifications](../../docs/AWS_Shell_Interface_Specification.md), and pass through the build scripts. + +The [CL Examples directory](https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples) is provided to assist developers in creating a +functional CL implementation. Each example includes: + +1) The design source code for the example under the `/design` directory. +2) The timing, clock and placement constraints files, scripts for compiling the example design. (This requires running in an instance/server that have Xilinx tools and license installed. Developers are recommended to use the "FPGA Development AMI" available free of charge on [AWS Marketplace](https://aws.amazon.com/marketplace/). +3) The final build, called Design CheckPoint (DCP) that can be submitted for AWS to generate the AFI. +4) An AFI-ID for a pre-generated AFI that matches the example design. +5) Software source code required on the FPGA-enabled instance to run the example. +6) Software binary that can be loaded on an FPGA-enabled instance to test the AFI. + +In summary: + +- An AFI can be created using the files in 1, 2, and 3. The AFI creation can take place on any EC2 instance or on premise. +- The AFI can be used in an EC2 F1 instance by using the files in 4, 5 and 6. + +By following the example CLs, a developer should learn how to interface to the AWS Shell of the FPGA, compile the source code to create an AFI, and load an AFI from the F1 instance for use. + +# Step by step guide on how to create an AFI from one of the CL examples + +As a pre-requested to building the AFI, the developer should have an instance/server with Xilinx vivado tools and license. The "FPGA Developer AMI" provided free of charge on AWS Marketplace will be an ideal place to start an instance from. See the README.md on the AMI for the details how to launch the FPGA Developer's AMI, install the tools and set up the license. + +**NOTE:** *steps 1 through 3 can be done on any server or EC2 instance, C4/C5 instances are recommended for fastest build time* + +**NOTE:** *You can skip steps 0 through 3 if you are not interested in the build process. Step 4 through 6 will show you how to use one of the predesigned AFI* + +### 0. Setup the HDK and install AWS CLI + + $ git clone https://github.com/aws/aws-fpga + $ cd aws-fpga + $ source hdk_shell.sh + +To install the AWS CLI, please follow the instructions here: (http://docs.aws.amazon.com/cli/latest/userguide/installing.html). + + $ aws configure # to set your credentials (found in your console.aws.amazon.com page) and region (typically us-east-1) + +**NOTE**: During the F1 preview, not all FPGA-specific AWS CLI commands are available to the public. +To extend your AWS CLI installation, please execute the following: + + $ aws configure add-model --service-model file://$(pwd)/sdk/aws-cli-preview/ec2_preview_model.json + + +### 1. Pick one of the examples and move to its directory + +There are couple of ways to start a new CL: one option is to copy one of the examples provided in the HDK and modify the design files, scripts and constrains directory. + +Alternatively, by creating a new directory, setup the environment variables, and prepare the project datastructure: + + $ cd $HDK_DIR/cl/examples/cl_hello_world # you can change cl_hello_world to any other example + $ export CL_DIR=$(pwd) + +Setting up the CL_DIR environment variable is crucial as the build scripts rely on that value. +Each one of the examples following the recommended directory structure to match what's expected by the HDK simulation and build scripts. + +If you like to start your own CL, check out the [How to create your own CL Readme](../developer_designs/README.md) + +### 2. Build the CL before submitting to AWS + +**NOTE** *This step requires you have Xilinx Vivado Tools installed as well Vivado License:* + + $ vivado -mode batch # Run this command to see if vivado is installed + $ sudo perl /home/centos/src/project_data/license/license_manager.pl -status # To check if license server is up. this command is for AWS-provided FPGA Development machine, the license manager can be in different directory in your systems + +The next script two steps will go through the entire implementation process converting the CL design into a completed Design Checkpoint that meets timing and placement constrains of the target FPGA + + $ cd $CL_DIR/build/scripts + $ ./aws_build_dcp_from_cl.tcl + +**NOTE**: The DCP generation can take up to several hours to complete. +We recommend that you initiate the generation in a way that prevents interruption. +For example, if working on a remote machine, we recommend using window management tools such as [`screen`](https://www.gnu.org/software/screen/manual/screen.html) to mitigate potential network disconnects. + + +### 3. Submit the Design Checkpoint to AWS to register the AFI + +To submit the DCP, create an S3 bucket for submitting the design and upload the tar-zipped archive into that bucket. +You need to prepare the following information: + +1. Name of the logic design. +2. Generic description of the logic design. +3. PCI IDs: Device, Vendor, Subsystem, SubsystemVendor (these IDs should be found in the README files in the respective CL example directory). +4. Location of the DCP object (S3 bucket name and key). +5. Location of the directory to write logs (S3 bucket name and key). +6. Version of the AWS Shell. + +To upload your DCP to S3, + + $ aws s3 mb s3:// # Create an S3 bucket (choose a unique bucket name) + $ aws s3 cp *.SH_CL_routed.dcp \ # Upload the DCP file to S3 + s3:///cl_simple.dcp + +To generate the AFI, follow one of the two methods listed below. +After the AFI generation is complete, AWS will put the logs into the bucket location provided by the developer and notify them +by email. + +#### Method 1: If you have access to AWS EC2 CLI with support for `create-fpga-image` action +To create an AFI from the generated DCP, you need to upload the tar-zipped DCP file to an S3 bucket, and execute the `aws ec2 create-fpga-image` command as follows: + + $ aws ec2 create-fpga-image \ + --fpga-image-architecture xvu9p \ + --shell-version \ + --fpga-pci-id deviceId=,vendorId=,subsystemId=,subsystemVendorId= \ + --input-storage-location Bucket=,Key=cl_simple.dcp + --name MyCL + --logs-storage-location Bucket=,Key=logs/ + +The output of this command includes two identifiers that refer to your AFI: +- FPGA Image Identifier or AFI ID: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. + This ID is regional, i.e., if an AFI is copied across multiple regions, it will have a different unique AFI ID in each region. + An example AFI ID is `afi-01234567890abcdef`. +- Glogal FPGA Image Identifier or AGFI ID: this is a global ID that is used to refer to an AFI from within an F1 instance. + For example, to load or clear an AFI from an FPGA slot, you use the AGFI ID. + Since the AGFI IDs is global (by design), it allows you to copy a combination of AFI/AMI to multiple regions, and they will work without requiring any extra setup. + An example AFI ID is `agfi-01234567890abcdef`. + +#### Method 2: During F1 preview and before AWS EC2 CLI action `create-fpga-image` is available + +Add a policy to the created S3 bucket granting [read/write permissions](http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html) to our team's account (Account ID: 371834676912). +A sample policy is shown below. + + { + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "Bucket level permissions", + "Effect": "Allow", + "Principal": { + "AWS": "arn:aws:iam::371834676912:root" + }, + "Action": [ + "s3:ListBucket" + ], + "Resource": "arn:aws:s3:::" + }, + { + "Sid": "Object read permissions", + "Effect": "Allow", + "Principal": { + "AWS": "arn:aws:iam::371834676912:root" + }, + "Action": [ + "s3:GetObject" + ], + "Resource": "arn:aws:s3:::/" + }, + { + "Sid": "Folder write permissions", + "Effect": "Allow", + "Principal": { + "AWS": "arn:aws:iam::371834676912:root" + }, + "Action": [ + "s3:PutObject" + ], + "Resource": "arn:aws:s3:::/*" + } + ] + } + +Then, send an email to AWS (email TBD) providing the information listed earlier. + + +# Step by step guide how to load and test a registered AFI from within an F1 instance + +To follow the next steps, you have to run an instance on F1. AWS recommend you run an instance with latest Amazon Linux that have the FPGA management tools included, or alternatively the FPGA Developer AMI with both the HDK and SDK. + +## 4. Setup AWS FPGA management tools + +Execute the following: + + $ git clone https://github.com/aws/aws-fpga # Not needed if you have installed the HDK as in Step 0. + $ cd aws-fpga + $ source sdk_setup.sh + +## 5. Associate the AFI with your AMI + +To start using the AFI, you need to associate it with an [AMI (Amazon Machine Image)](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) that you own. +Association means that any instance launched using this AMI will be able to load the AFIs to FPGAs as described in the next section. +You can associate multiple AFIs with your AMI. +There is a default limit of eight AFIs per AMI, if you need more, please reach out to AWS with your use case and we can adjust your limit. +To associate, simply invoke the following AWS EC2 CLI command. + + $ aws ec2 associate-fpga-image --fpga-image-id --image-id + +## 6. Load the AFI + +Run's the `fpga-describe-local-image` on slot 0 to confirm that the FPGA is cleared, and you should see similar output to the 4 lines below: + + $ sudo fpga-describe-local-image -S 0 -H + + Type FpgaImageSlot FpgaImageId StatusName StatusCode + AFI 0 none cleared 1 + Type VendorId DeviceId DBDF + AFIDEVICE 0x1d0f 0x1042 0000:00:17.0 + +Then loading the example AFI to FPGA slot 0 (you should have the AGFI ID from Step 3 above): + + $ sudo fpga-load-local-image -S 0 -I + +Now, you can verify the status of the previous load command: + + $ sudo fpga-describe-local-image -S 0 -H + +## 7. Call the specific CL example software + +[Validating CL Designs](https://github.com/aws/aws-fpga/wiki/Validating-CL-Designs#quick-start) diff --git a/hdk/cl/examples/cl_hello_world/README.md b/hdk/cl/examples/cl_hello_world/README.md index 353f2f8de..78f28e013 100644 --- a/hdk/cl/examples/cl_hello_world/README.md +++ b/hdk/cl/examples/cl_hello_world/README.md @@ -1,22 +1,22 @@ -# Hello World CL - -This simple *hello_world* example builds a Custom Logic (CL) that will enable the instance to "peek" and "poke" registers in the memory space of the CL inside the FPGA. - -Please read here for [general instructions to build the CL, register an AFI, and start using it on an F1 instance](https://github.com/aws/aws-fpga/blob/master/hdk/cl/examples/README.md). - -##Meta-data about this CL - -The following table displays information about the CL that is required to register it as an AFI with AWS. -Alternatively, you can directly use a pre-generated AFI for this CL which you can associate to an instance or AMI. - -| Key | Value | -|-----------|------| -| FPGA Image Architecture | xvu9p | -| Shell Version | 0x???????? | -| PCI Device ID | 0x???? | -| PCI Vendor ID | 0x???? | -| PCI Subsystem ID | 0x???? | -| PCI Subsystem Vendor ID | 0x???? | -| Pre-generated AFI ID | afi-????????????????? | -| Pre-generated AGFI ID | agfi-????????????????? | - +# Hello World CL + +This simple *hello_world* example builds a Custom Logic (CL) that will enable the instance to "peek" and "poke" registers in the memory space of the CL inside the FPGA. + +Please read here for [general instructions to build the CL, register an AFI, and start using it on an F1 instance](https://github.com/aws/aws-fpga/blob/master/hdk/cl/examples/README.md). + +##Meta-data about this CL + +The following table displays information about the CL that is required to register it as an AFI with AWS. +Alternatively, you can directly use a pre-generated AFI for this CL which you can associate to an instance or AMI. + +| Key | Value | +|-----------|------| +| FPGA Image Architecture | xvu9p | +| Shell Version | 0x???????? | +| PCI Device ID | 0x???? | +| PCI Vendor ID | 0x???? | +| PCI Subsystem ID | 0x???? | +| PCI Subsystem Vendor ID | 0x???? | +| Pre-generated AFI ID | afi-????????????????? | +| Pre-generated AGFI ID | agfi-????????????????? | + diff --git a/hdk/docs/AWS_Shell_Interface_Specification.md b/hdk/docs/AWS_Shell_Interface_Specification.md index a4991d604..488ab5afb 100644 --- a/hdk/docs/AWS_Shell_Interface_Specification.md +++ b/hdk/docs/AWS_Shell_Interface_Specification.md @@ -1,316 +1,316 @@ - -## Revision History - -2016/11/28 - Initial public release with HDK release version - -2016/12/06 - Added capability to remove DDR controllers in the CL through parameters in `sh_ddr.sv` - - - -# Overview - -The AWS FPGA instance provides FPGA acceleration capability to AWS -compute instances. Each FPGA is divided into two partitions: - -- Shell (SH) – AWS platform logic responsible for taking care of the FPGA external peripherals, PCIe, and Interrupts. - -- Customer Logic (CL) – Custom acceleration logic created by an FPGA Developer - -At the end of the development process, the Shell and CL will become an Amazon FPGA Image (AFI) - -This document specifies the hardware interface and functional behavior between the Shell and the CL. - -While there could be multiple versions and multiple generations of the FPGA-accelerated EC2 instances, the rest of this document focuses on the Shell design for xvu9p architecture used in EC2 F1 instance. - -Full details of the available FPGA enabled instances are [here](https://aws.amazon.com/ec2/instance-types) - -## Architecture and Version - -This specification applies to Xilinx Virtex Ultrascale Plus platform, referred to in AWS APIs and the HDK release as `FpgaImageArchitecture=xvu9p`. - -The Shell is tagged with a revision number. Note while AWS tries to keep the revision constant, sometimes it is necessary to update the revision due to discovered issues or added functionality. The HDK release includes the latest Shell version under `/hdk/common/shell_latest` - -New shell versions require updated CL implementation and regenerating the AFI. - -## Convention - - -**CL –** Custom’s Logic: the Logic to be provided by the developer and integrated with AWS Shell. - -**DW –** Doubleword: referring to 4-byte (32-bit) data size. - -**AXI-4** ARM Advanced eXtensible Interface. - -**AXI-4 Stream –** ARM Advanced eXtensible Stream Interface. - -**M –** Typically refers to the Master side of an AXI bus. - -**S –** Typical refers to the Slave side of AXI bus. - -# Shell Interfaces (for xvu9p architecture as in EC F1 instances) - - -The F1 FPGA platform includes the following interfaces available to the -CL: - -- One x16 PCIe Gen 3 Interface. - -- Four DDR4 RDIMM interfaces, each interface is 72-bit wide including ECC. - - -## CL/Shell Interfaces (AXI-4) - - -All interfaces except the inter-FPGA links uses the AXI-4 protocol. The AXI-4 interfaces in the Shell have the following restrictions: - -- AxSIZE – All transfers must be the entire width of the bus. While byte-enables bitmap are supported, it must adhere to the interface - protocol (i.e. PCIe contiguous byte enables on all transfers larger than 64-bits). - -- AxBURST – Only INCR burst is supported. - -- AxLOCK – Lock is not supported. - -- AxCACHE – Memory type is not supported. - -- AxPROT – Protection type is not supported. - -- AxQOS – Quality of Service is not supported. - -- AxREGION – Region identifier is not supported. - - -![alt tag](./images/AWS_Shell_CL_overview.jpg) - -### External Memory Interfaces implemented in CL - -Some of the DRAM interface controllers are implemented in the CL rather than the Shell for optimized resource utilization of the FPGA (Allowing higher utilization for the CL place and route region to maximize usuable FPGA resources) . For those interfaces, the designs and the constrains are provided by AWS and must be instantiated in the CL (by including the `sh_ddr.sv`). - -There are four DRAM interfaces labeled A, B, C, and D. Interfaces A, B, and D are in the CL while interface C is implemented in the Shell. A design block (sh_ddr.sv) instantiates the three DRAM interfaces in the CL (A, B, D). - -For DRAM interface controllers that are implemented in the CL, the AXI-4 interfaces do not connect into the Shell, but connect locally inside the CL to the AWS provided blocks. There are also statistics interfaces that must be connected from Shell to the DRAM interface controller modules. - -All CL's **must** instantiate sh_ddr.sv, regardless of the number of DDR's that should be implemented. There are three parameters (all default to '1') that define which DDR controllers are implemented: - * DDR_A_PRESENT - * DDR_B_PRESENT - * DDR_D_PRESENT - -These parameters are used to control which DDR controllers are impemented in the CL design. An example instantiation: - ``` - sh_ddr #(.DDR_A_PRESENT(1), - .DDR_B_PRESENT(1), - .DDR_D_PRESENT(0)) - SH_DDR ( - .clk(clk), - ... - ``` - - -**NOTE:** *There is no performance or frequency difference between the four DRAM controllers regardless whether anyone of them resides in the CL or the Shell logic* - - -### Clocking/Reset - -A single 250MHz clock, and associated asynchronous reset is provided to the CL. All Shell interfaces are synchronous to the 250MHz clock. The CL can derive clocks off of the 250MHz clock. - -+The Xilinx Mixed Mode Clock Manager (MMCM) IP can be used to generate slower clocks off of the 250MHz clock. - -![alt tag](./images/Dividing_clocks_inside_CL.jpg) - -The reset signal combines the board reset and PCIe reset conditions. Please refer to the Xilinx documentation (ug974) for more information. - -### Function Level Reset - -FLR is supported for the Application Physical Function using a separate FLR interface: - -- sh_cl_flr_assert – Level signal that is asserted when FLR has - been requested - -- cl_sh_flr_done – Asserted for a single clock to acknowledge - the FLR. This must be asserted in response to sh_cl_flr_assert. - Note due to pipeline delays it is possible sh_cl_flr_assert is - asserted for some number of clocks after cl_sh_flr_done. - - -## PCIe Endpoint Presentation to Instance - -There are two PCIe Physical Functions (PFs) presented to the F1 instance: - -- Management PF – This PF allows for management of the FPGA using the [FPGA Management Tools](../../sdk/management/management_tools/README.md) , including tracking FPGA state and loading CL images onto the FPGA. - -- Application PF – The PF for the Custom Logic (CL) specific functionality - - -### Management PF - -The management PF is a separate PF from the CL PF. Details are provided for reference for understanding the PCIe mapping from an F1 instance. This interface is strictly for AWS FPGA Management Tools, and does not support any interface with the CL code. - -The Management PF exposes: - -a) Amazon’s specific and fixed PCIe VendorID (0x1D05) and DeviceID. - -b) Two BARs with 4KB size - -c) Single MSI-X interrupt. - -d) No BusMaster support. - -e) A range of 32-bit addressable registers. - -The Management PF is persistent throughout the lifetime of the instance, and it will not be reset or cleared (even during the AFI attach/detach process). - -### Application PF (AppPF) - -The Application PF exposes the following: - -a) PCIe BAR0 as a 64-bit prefetchable BAR sized as 128MB (*note the BAR size is subject to change, goal is 64GB, but will be no smaller - than 128MB)*. This BAR may be used to map the entire External/Internal memory space to the instance address space if desired, through `mmap()` type calls. - -b) PCIe BAR2 as a 64-bit prefetchable BAR sized as 4KB for the MSI-X interrupt tables. - -c) FLR capability that will reset the CL. - -d) BusMaster capability to allow the CL to master transactions towards the instance memory. - -e) CL’s specific PCIe VendorID, DeviceID, VendorSystemID and SubsystemID as registered through `aws ec2 fpgaImageCreate` [*Available soon*] - -The Developer can write drivers for the App PF or can leverage the reference driver provided in the SDK (With plan to include the driver included in Amazon Linux by default). - - -### CL Interface to PCIe Interface via Shell - -The PCIe interface connecting the FPGA to the instance is in the Shell, and the CL can accessed it two AXI-4 interfaces: - -#### AXI-4 for Inbound PCIe Transactions (Shell is master, CL is slave) - -This AXI-4 bus is for PCIe transactions mastered by the instance and targeting AppPF BAR0. - -It is a 512-bit wide AXI-4 interface that supports 32-bit transactions only. *Future revisions this interface will support larger burst sizes (up to the Maximum Payload Size)*. - -A read or write request on this AXI-4 bus that is not acknowledged by the CL within a certain time window, will be internally terminated by the Shell [*May not be supported in early releases*]. If the time-out error happens on a read, the Shell will return `0xDEADBEEF` data back to the instance. This error is reported through the Management PF and could be retrieved by FPGA Management Tools metric. - -#### AXI-4 for Outbound PCIe Transactions (CL is master, Shell is slave) - -This is a 512-bit wide AXI-4 Interface for the CL to master cycles to the PCIe bus. This is used, for example, to DMA data to/from instance memory. - -The following PCIe interface configuration parameters are provided from the Shell to the CL, and the CL logic must respect these maximum limits: - -- sh_cl_cfg_max_payload[1:0] – PCIe max payload size: - - 2’b00 – 128 Byte - - 2’b01 – 256 Byte (Most probable value) - - 2’b10 – 512 Byte - - 2’b11 – Reserved - -- sh_cl_cfg_max_read_req[2:0] - - 3’b000 – 128 Byte - - 3’b001 – 256 Byte - - 3’b010 – 512 Byte (Most probable value) - - 3’b011 – 1024 Byte - - 3’b100 – 2048 Byte - - 3’b101 – 4096 Byte - -The PCIe CL to Shell AXI-4 interfaces **MUST** implement “USER” bits on the address channels (`AxUSER[18:0]`). - -- AxUSER[10:0] – DW length of the request. This is 1-based (0: zero DW, 1: one DW, 2: two DW, etc…) -- AxUSER[14:11] – First DW's Byte enable for the Request -- AxUSER[18:15] – Last DW's Byte enable for the Request - -##### Outbound PCIe AXI-4 Interface Restrictions: - -- Transfers must not violate PCIe byte enable rules (see byte enables below). -- Transfers must not cross a 4Kbyte address boundary (PCIe restriction). -- Transfers must not violate Max Payload Size. -- Read requests must not violate Max Read Request Size. -- A read request transaction must not be issued using the same ARID (AXI4 Read ID), if that ARID is already outstanding. **NOTE:** *The Shell does not enforce ordering between individual read transactions and read response could could in arbiterary order*. -- The PCIe interface supports 5-bit ARID (32 outstanding read transactions maximum), as PCIe extended tag is not supported on the PCIe interface. -- The address on the AXI-4 interface must reflect the correct byte address of the transfer. The Shell does not support using a 64-bit - aligned address, and using STRB to signal the actual starting DW. -- The first/last byte enables are determined from the AxUSER bits. In addition, for writesm the WSTRB signal must be correct and reflect the appropriate valid bytes on the WDATA bus even if it was provided on AxUSER. - -##### Byte Enable Rules - -All PCIe transactions must adhere to the PCIe Byte Enable rules (see PCI Express Base specification). Rules are summarized below: - -- All transactions larger than two DW must have contiguous byte enables -- Transactions that are less than two DW may have non-contiguous byte enables - -### AXI4 Error handling - -Transaction on AXI4 interface will be terminated and reported as SLVERR on the RRESP/BRESP signals and will not passed to the instance in the following cases: - -- PCIe BME (BusMaster Enable) is not set in the PCIe configuration space - -- Illegal transaction address (Addressing memory space that’s not supported by the instance) - -- Transaction crossing 4KB boundaries violating PCIe specifications - -- Illegal byte-masking - -- Illegal length - -- Illegal ARID (ARID is already been used for an outstanding read transaction) - -**NOTE** Pre-GA versions of the Shell and the FPGA Magagement tools many not have some of these checks and associated metrics exposed to the developers. - -### Interrupts (Future) - -Interrupts are not supported in the current version of the Shell. Future -versions of the Shell will have support for at least 16 interrupt -sources. - -## DDR4 DRAM Interface - -Each DRAM interface is accessed via an AXI-4 interface: - -- AXI-4 (CL Master and DRAM controller is slave) – 512-bit AXI-4 interface to read/write DDR - -There is a single status signal that the DRAM interface is trained and ready for access. The addressing uses ROW/COLUMN/BANK mapping of AXI address to DRAM Row/Col/BankGroup. The Read and Write channels are serviced with roundrobin arbitration (equal priority). - -The DRAM interface uses Xilinx DDR-4 Interface controller. The AXI-4 interface adheres to the Xilinx specification. User bits are added to the read data channel to signal ECC errors with the read data. - -**NOTE:** even if no DDR4 controllers are desired in the CL, the `sh_ddr.sv` block must be instantiated in the CL (parameters are used to remove DDR controllers). If the `sh_ddr.sv` module is not instantiated the design will have build errors. - -### DRAM Content Preservation between AFI Loads (Future) - -In future Shell versions a DRAM content preservation feature will be implemented. This feature allows the DDR state to be preserved when dynamically changing CL logic. The current Shell version will not guarantee preservation of DRAM contents if the CL logic is re-loaded. - -#### Miscellaneous signals - -There are some miscellaneous generic signals between the Shell and CL. - -### PCIe IDs - -Some signals must include the PCIe IDs of the CL. A Developer’s specific PCIe VendorID, DeviceID, SubsystemVendorID and SubsystemID are registered through `aws ec2 fpgaImageCreate` command to reserve the PCIe IDs of the CL for mapping of the device into an F1 instance when the AFI is loaded. - -- cl_sh_id0 - - - [15:0] – Vendor ID - - - [31:16] – Device ID - -- cl_sh_id1 - - - [15:0] – Subsystem ID - - - [31:16] – Subsystem Vendor ID - -### General control/status - -The functionality of these signals is TBD. - -- cl_sh_status0[31:0] – Placeholder for generic CL to Shell status - -- cl_sh_status1[31:0] – Placeholder for generic CL to Shell status - -- sh_cl_ctl0[31:0] – Placeholder for generic Shell to CL control information - -- sh_cl_ctl1[31:0] – Placeholder for generic Shell to CL control information - -- sh_cl_pwr_state[1:0] – This is the power state of the FPGA. 0x0 - - - 0x0 – Power is normal - - - 0x1 – Power level 1 - - - 0x2 – Power level 2 - - - 0x3 – Power is critical and FPGA is subject to shutting off clocks or powering down + +## Revision History + +2016/11/28 - Initial public release with HDK release version + +2016/12/06 - Added capability to remove DDR controllers in the CL through parameters in `sh_ddr.sv` + + + +# Overview + +The AWS FPGA instance provides FPGA acceleration capability to AWS +compute instances. Each FPGA is divided into two partitions: + +- Shell (SH) – AWS platform logic responsible for taking care of the FPGA external peripherals, PCIe, and Interrupts. + +- Customer Logic (CL) – Custom acceleration logic created by an FPGA Developer + +At the end of the development process, the Shell and CL will become an Amazon FPGA Image (AFI) + +This document specifies the hardware interface and functional behavior between the Shell and the CL. + +While there could be multiple versions and multiple generations of the FPGA-accelerated EC2 instances, the rest of this document focuses on the Shell design for xvu9p architecture used in EC2 F1 instance. + +Full details of the available FPGA enabled instances are [here](https://aws.amazon.com/ec2/instance-types) + +## Architecture and Version + +This specification applies to Xilinx Virtex Ultrascale Plus platform, referred to in AWS APIs and the HDK release as `FpgaImageArchitecture=xvu9p`. + +The Shell is tagged with a revision number. Note while AWS tries to keep the revision constant, sometimes it is necessary to update the revision due to discovered issues or added functionality. The HDK release includes the latest Shell version under `/hdk/common/shell_latest` + +New shell versions require updated CL implementation and regenerating the AFI. + +## Convention + + +**CL –** Custom’s Logic: the Logic to be provided by the developer and integrated with AWS Shell. + +**DW –** Doubleword: referring to 4-byte (32-bit) data size. + +**AXI-4** ARM Advanced eXtensible Interface. + +**AXI-4 Stream –** ARM Advanced eXtensible Stream Interface. + +**M –** Typically refers to the Master side of an AXI bus. + +**S –** Typical refers to the Slave side of AXI bus. + +# Shell Interfaces (for xvu9p architecture as in EC F1 instances) + + +The F1 FPGA platform includes the following interfaces available to the +CL: + +- One x16 PCIe Gen 3 Interface. + +- Four DDR4 RDIMM interfaces, each interface is 72-bit wide including ECC. + + +## CL/Shell Interfaces (AXI-4) + + +All interfaces except the inter-FPGA links uses the AXI-4 protocol. The AXI-4 interfaces in the Shell have the following restrictions: + +- AxSIZE – All transfers must be the entire width of the bus. While byte-enables bitmap are supported, it must adhere to the interface + protocol (i.e. PCIe contiguous byte enables on all transfers larger than 64-bits). + +- AxBURST – Only INCR burst is supported. + +- AxLOCK – Lock is not supported. + +- AxCACHE – Memory type is not supported. + +- AxPROT – Protection type is not supported. + +- AxQOS – Quality of Service is not supported. + +- AxREGION – Region identifier is not supported. + + +![alt tag](./images/AWS_Shell_CL_overview.jpg) + +### External Memory Interfaces implemented in CL + +Some of the DRAM interface controllers are implemented in the CL rather than the Shell for optimized resource utilization of the FPGA (Allowing higher utilization for the CL place and route region to maximize usuable FPGA resources) . For those interfaces, the designs and the constrains are provided by AWS and must be instantiated in the CL (by including the `sh_ddr.sv`). + +There are four DRAM interfaces labeled A, B, C, and D. Interfaces A, B, and D are in the CL while interface C is implemented in the Shell. A design block (sh_ddr.sv) instantiates the three DRAM interfaces in the CL (A, B, D). + +For DRAM interface controllers that are implemented in the CL, the AXI-4 interfaces do not connect into the Shell, but connect locally inside the CL to the AWS provided blocks. There are also statistics interfaces that must be connected from Shell to the DRAM interface controller modules. + +All CL's **must** instantiate sh_ddr.sv, regardless of the number of DDR's that should be implemented. There are three parameters (all default to '1') that define which DDR controllers are implemented: + * DDR_A_PRESENT + * DDR_B_PRESENT + * DDR_D_PRESENT + +These parameters are used to control which DDR controllers are impemented in the CL design. An example instantiation: + ``` + sh_ddr #(.DDR_A_PRESENT(1), + .DDR_B_PRESENT(1), + .DDR_D_PRESENT(0)) + SH_DDR ( + .clk(clk), + ... + ``` + + +**NOTE:** *There is no performance or frequency difference between the four DRAM controllers regardless whether anyone of them resides in the CL or the Shell logic* + + +### Clocking/Reset + +A single 250MHz clock, and associated asynchronous reset is provided to the CL. All Shell interfaces are synchronous to the 250MHz clock. The CL can derive clocks off of the 250MHz clock. + ++The Xilinx Mixed Mode Clock Manager (MMCM) IP can be used to generate slower clocks off of the 250MHz clock. + +![alt tag](./images/Dividing_clocks_inside_CL.jpg) + +The reset signal combines the board reset and PCIe reset conditions. Please refer to the Xilinx documentation (ug974) for more information. + +### Function Level Reset + +FLR is supported for the Application Physical Function using a separate FLR interface: + +- sh_cl_flr_assert – Level signal that is asserted when FLR has + been requested + +- cl_sh_flr_done – Asserted for a single clock to acknowledge + the FLR. This must be asserted in response to sh_cl_flr_assert. + Note due to pipeline delays it is possible sh_cl_flr_assert is + asserted for some number of clocks after cl_sh_flr_done. + + +## PCIe Endpoint Presentation to Instance + +There are two PCIe Physical Functions (PFs) presented to the F1 instance: + +- Management PF – This PF allows for management of the FPGA using the [FPGA Management Tools](../../sdk/management/management_tools/README.md) , including tracking FPGA state and loading CL images onto the FPGA. + +- Application PF – The PF for the Custom Logic (CL) specific functionality + + +### Management PF + +The management PF is a separate PF from the CL PF. Details are provided for reference for understanding the PCIe mapping from an F1 instance. This interface is strictly for AWS FPGA Management Tools, and does not support any interface with the CL code. + +The Management PF exposes: + +a) Amazon’s specific and fixed PCIe VendorID (0x1D05) and DeviceID. + +b) Two BARs with 4KB size + +c) Single MSI-X interrupt. + +d) No BusMaster support. + +e) A range of 32-bit addressable registers. + +The Management PF is persistent throughout the lifetime of the instance, and it will not be reset or cleared (even during the AFI attach/detach process). + +### Application PF (AppPF) + +The Application PF exposes the following: + +a) PCIe BAR0 as a 64-bit prefetchable BAR sized as 128MB (*note the BAR size is subject to change, goal is 64GB, but will be no smaller + than 128MB)*. This BAR may be used to map the entire External/Internal memory space to the instance address space if desired, through `mmap()` type calls. + +b) PCIe BAR2 as a 64-bit prefetchable BAR sized as 4KB for the MSI-X interrupt tables. + +c) FLR capability that will reset the CL. + +d) BusMaster capability to allow the CL to master transactions towards the instance memory. + +e) CL’s specific PCIe VendorID, DeviceID, VendorSystemID and SubsystemID as registered through `aws ec2 fpgaImageCreate` [*Available soon*] + +The Developer can write drivers for the App PF or can leverage the reference driver provided in the SDK (With plan to include the driver included in Amazon Linux by default). + + +### CL Interface to PCIe Interface via Shell + +The PCIe interface connecting the FPGA to the instance is in the Shell, and the CL can accessed it two AXI-4 interfaces: + +#### AXI-4 for Inbound PCIe Transactions (Shell is master, CL is slave) + +This AXI-4 bus is for PCIe transactions mastered by the instance and targeting AppPF BAR0. + +It is a 512-bit wide AXI-4 interface that supports 32-bit transactions only. *Future revisions this interface will support larger burst sizes (up to the Maximum Payload Size)*. + +A read or write request on this AXI-4 bus that is not acknowledged by the CL within a certain time window, will be internally terminated by the Shell [*May not be supported in early releases*]. If the time-out error happens on a read, the Shell will return `0xDEADBEEF` data back to the instance. This error is reported through the Management PF and could be retrieved by FPGA Management Tools metric. + +#### AXI-4 for Outbound PCIe Transactions (CL is master, Shell is slave) + +This is a 512-bit wide AXI-4 Interface for the CL to master cycles to the PCIe bus. This is used, for example, to DMA data to/from instance memory. + +The following PCIe interface configuration parameters are provided from the Shell to the CL, and the CL logic must respect these maximum limits: + +- sh_cl_cfg_max_payload[1:0] – PCIe max payload size: + - 2’b00 – 128 Byte + - 2’b01 – 256 Byte (Most probable value) + - 2’b10 – 512 Byte + - 2’b11 – Reserved + +- sh_cl_cfg_max_read_req[2:0] + - 3’b000 – 128 Byte + - 3’b001 – 256 Byte + - 3’b010 – 512 Byte (Most probable value) + - 3’b011 – 1024 Byte + - 3’b100 – 2048 Byte + - 3’b101 – 4096 Byte + +The PCIe CL to Shell AXI-4 interfaces **MUST** implement “USER” bits on the address channels (`AxUSER[18:0]`). + +- AxUSER[10:0] – DW length of the request. This is 1-based (0: zero DW, 1: one DW, 2: two DW, etc…) +- AxUSER[14:11] – First DW's Byte enable for the Request +- AxUSER[18:15] – Last DW's Byte enable for the Request + +##### Outbound PCIe AXI-4 Interface Restrictions: + +- Transfers must not violate PCIe byte enable rules (see byte enables below). +- Transfers must not cross a 4Kbyte address boundary (PCIe restriction). +- Transfers must not violate Max Payload Size. +- Read requests must not violate Max Read Request Size. +- A read request transaction must not be issued using the same ARID (AXI4 Read ID), if that ARID is already outstanding. **NOTE:** *The Shell does not enforce ordering between individual read transactions and read response could could in arbiterary order*. +- The PCIe interface supports 5-bit ARID (32 outstanding read transactions maximum), as PCIe extended tag is not supported on the PCIe interface. +- The address on the AXI-4 interface must reflect the correct byte address of the transfer. The Shell does not support using a 64-bit + aligned address, and using STRB to signal the actual starting DW. +- The first/last byte enables are determined from the AxUSER bits. In addition, for writesm the WSTRB signal must be correct and reflect the appropriate valid bytes on the WDATA bus even if it was provided on AxUSER. + +##### Byte Enable Rules + +All PCIe transactions must adhere to the PCIe Byte Enable rules (see PCI Express Base specification). Rules are summarized below: + +- All transactions larger than two DW must have contiguous byte enables +- Transactions that are less than two DW may have non-contiguous byte enables + +### AXI4 Error handling + +Transaction on AXI4 interface will be terminated and reported as SLVERR on the RRESP/BRESP signals and will not passed to the instance in the following cases: + +- PCIe BME (BusMaster Enable) is not set in the PCIe configuration space + +- Illegal transaction address (Addressing memory space that’s not supported by the instance) + +- Transaction crossing 4KB boundaries violating PCIe specifications + +- Illegal byte-masking + +- Illegal length + +- Illegal ARID (ARID is already been used for an outstanding read transaction) + +**NOTE** Pre-GA versions of the Shell and the FPGA Magagement tools many not have some of these checks and associated metrics exposed to the developers. + +### Interrupts (Future) + +Interrupts are not supported in the current version of the Shell. Future +versions of the Shell will have support for at least 16 interrupt +sources. + +## DDR4 DRAM Interface + +Each DRAM interface is accessed via an AXI-4 interface: + +- AXI-4 (CL Master and DRAM controller is slave) – 512-bit AXI-4 interface to read/write DDR + +There is a single status signal that the DRAM interface is trained and ready for access. The addressing uses ROW/COLUMN/BANK mapping of AXI address to DRAM Row/Col/BankGroup. The Read and Write channels are serviced with roundrobin arbitration (equal priority). + +The DRAM interface uses Xilinx DDR-4 Interface controller. The AXI-4 interface adheres to the Xilinx specification. User bits are added to the read data channel to signal ECC errors with the read data. + +**NOTE:** even if no DDR4 controllers are desired in the CL, the `sh_ddr.sv` block must be instantiated in the CL (parameters are used to remove DDR controllers). If the `sh_ddr.sv` module is not instantiated the design will have build errors. + +### DRAM Content Preservation between AFI Loads (Future) + +In future Shell versions a DRAM content preservation feature will be implemented. This feature allows the DDR state to be preserved when dynamically changing CL logic. The current Shell version will not guarantee preservation of DRAM contents if the CL logic is re-loaded. + +#### Miscellaneous signals + +There are some miscellaneous generic signals between the Shell and CL. + +### PCIe IDs + +Some signals must include the PCIe IDs of the CL. A Developer’s specific PCIe VendorID, DeviceID, SubsystemVendorID and SubsystemID are registered through `aws ec2 fpgaImageCreate` command to reserve the PCIe IDs of the CL for mapping of the device into an F1 instance when the AFI is loaded. + +- cl_sh_id0 + + - [15:0] – Vendor ID + + - [31:16] – Device ID + +- cl_sh_id1 + + - [15:0] – Subsystem ID + + - [31:16] – Subsystem Vendor ID + +### General control/status + +The functionality of these signals is TBD. + +- cl_sh_status0[31:0] – Placeholder for generic CL to Shell status + +- cl_sh_status1[31:0] – Placeholder for generic CL to Shell status + +- sh_cl_ctl0[31:0] – Placeholder for generic Shell to CL control information + +- sh_cl_ctl1[31:0] – Placeholder for generic Shell to CL control information + +- sh_cl_pwr_state[1:0] – This is the power state of the FPGA. 0x0 + + - 0x0 – Power is normal + + - 0x1 – Power level 1 + + - 0x2 – Power level 2 + + - 0x3 – Power is critical and FPGA is subject to shutting off clocks or powering down diff --git a/sdk/LICENSE.txt b/sdk/LICENSE.txt index 0f416dad4..bf5d99aed 100644 --- a/sdk/LICENSE.txt +++ b/sdk/LICENSE.txt @@ -1,82 +1,82 @@ -Copyright 2014-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"). You -may not use this file except in compliance with the License. A copy of -the License is located at - - http://aws.amazon.com/apache2.0/ - -or in the "license" file accompanying this file. This file is -distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF -ANY KIND, either express or implied. See the License for the specific -language governing permissions and limitations under the License. - -*** -Includes the following packages: -*** - - -each of which are licensed under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance with the -License. - -You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -implied. See the License for the specific language governing -permissions and limitations under the License. - -Apache License - -Version 2.0, January 2004 - -http://www.apache.org/licenses/ -TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION -1. Definitions. -"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. -"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. -"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. -"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. -"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. -"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. -"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). -"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. -"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." -"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. -2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. -3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. -4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: -1. You must give any other recipients of the Work or Derivative Works a copy of this License; and -2. You must cause any modified files to carry prominent notices stating that You changed the files; and -3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and -4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. - -You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. -5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. -6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. -7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. -8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. -9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. -END OF TERMS AND CONDITIONS -APPENDIX: How to apply the Apache License to your work -To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. -Copyright [yyyy] [name of copyright owner] - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. - - - +Copyright 2014-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved. + +Licensed under the Apache License, Version 2.0 (the "License"). You +may not use this file except in compliance with the License. A copy of +the License is located at + + http://aws.amazon.com/apache2.0/ + +or in the "license" file accompanying this file. This file is +distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF +ANY KIND, either express or implied. See the License for the specific +language governing permissions and limitations under the License. + +*** +Includes the following packages: +*** + + +each of which are licensed under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance with the +License. + +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or +implied. See the License for the specific language governing +permissions and limitations under the License. + +Apache License + +Version 2.0, January 2004 + +http://www.apache.org/licenses/ +TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION +1. Definitions. +"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. +"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. +"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. +"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. +"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. +"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. +"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). +"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. +"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." +"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. +2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. +3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. +4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: +1. You must give any other recipients of the Work or Derivative Works a copy of this License; and +2. You must cause any modified files to carry prominent notices stating that You changed the files; and +3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and +4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. + +You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. +5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. +6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. +7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. +8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. +9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. +END OF TERMS AND CONDITIONS +APPENDIX: How to apply the Apache License to your work +To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. +Copyright [yyyy] [name of copyright owner] + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + + + From 0728e7570bf896dab5942e01211b5c56058405d8 Mon Sep 17 00:00:00 2001 From: Robert Johnson Date: Wed, 21 Dec 2016 21:47:25 +0000 Subject: [PATCH 02/29] FPGA Image Tools: fpga-describe-local-image outputs the SH version + supporting MBOX HAL changes. Also added rescan option for future use --- .../fpga_image_tools/src/fpga_local_cmd.c | 69 ++++++++++- .../fpga_image_tools/src/fpga_local_cmd.h | 24 ++++ .../src/fpga_local_cmd_parse.c | 36 ++++-- .../src/fpga_local_cmd_pci_sysfs.c | 108 ++++++++++++++++++ sdk/management/hal/include/fpga_hal_mbox.h | 18 +++ .../hal/src/api/mbox/hw/fpga_hal_mbox.c | 15 +++ .../hal/src/api/mbox/hw/fpga_hal_mbox_regs.h | 1 + .../hal/src/platform/hw/fpga_hal_plat.c | 4 +- 8 files changed, 258 insertions(+), 17 deletions(-) diff --git a/sdk/management/fpga_image_tools/src/fpga_local_cmd.c b/sdk/management/fpga_image_tools/src/fpga_local_cmd.c index 785efb48e..16b8f6f38 100644 --- a/sdk/management/fpga_image_tools/src/fpga_local_cmd.c +++ b/sdk/management/fpga_image_tools/src/fpga_local_cmd.c @@ -124,6 +124,56 @@ cli_show_slot_app_pfs(uint32_t afi_slot) return -1; } +/** + * Rescan the application PF map info for the given AFI slot. + * + * @param[in] afi_slot the fpga slot + * + * @returns + * 0 on success + * -1 on failure + */ +static int +cli_rescan_slot_app_pfs(uint32_t afi_slot) +{ + fail_on_quiet(afi_slot >= FPGA_SLOT_MAX, err, CLI_INTERNAL_ERR_STR); + + /** Retrieve and display associated application PFs (if any) */ + bool found_app_pf = false; + int ret; + int i; + for (i = F1_APP_PF_START; i <= F1_APP_PF_END; i++) { + struct fpga_pci_resource_map app_map; + + /** + * cli_get_app_pf_map will skip the Mbox PF (ret==1). + * We continue up through F1_APP_PF_END (e.g. 15) for future + * compatibilty with any gaps in the PF numbering. + */ + ret = cli_get_app_pf_map(afi_slot, i, &app_map); + if (ret == 0) { + ret = cli_remove_app_pf(afi_slot, app_map.func); + if (ret != 0) { + /** Output an error but continue with the other app PFs */ + printf("Error: could not remove application PF device " + PCI_DEV_FMT "\n", + app_map.domain, app_map.bus, app_map.dev, + app_map.func); + } + found_app_pf = true; + } + } + if (found_app_pf) { + ret = cli_pci_rescan(); + fail_on_user(ret != 0, err, + "Error: could not rescan for application PF devices"); + } + + return 0; +err: + return -1; +} + /** * Attach for CLI processing. * @@ -533,6 +583,7 @@ handle_afi_cmd_metrics_rsp(const union afi_cmd *cmd, (void)cmd; /** We've already validated the header... */ struct afi_cmd_metrics_rsp *metrics = (void *)rsp->body; + int ret; uint32_t tmp_len = sizeof(struct afi_cmd_hdr) + sizeof(struct afi_cmd_metrics_rsp); @@ -541,7 +592,7 @@ handle_afi_cmd_metrics_rsp(const union afi_cmd *cmd, len, tmp_len); if (f1.show_headers) { - printf("Type FpgaImageSlot FpgaImageId StatusName StatusCode\n"); + printf("Type FpgaImageSlot FpgaImageId StatusName StatusCode ShVersion\n"); } char *afi_id = (!metrics->ids.afi_id[0]) ? "none" : metrics->ids.afi_id; @@ -551,10 +602,22 @@ handle_afi_cmd_metrics_rsp(const union afi_cmd *cmd, if ((metrics->status < ACMS_END) && acms_tbl[metrics->status]) { status_name = (void *)acms_tbl[metrics->status]; } - printf(" %-8s %2u\n", status_name, metrics->status); + + struct fpga_hal_mbox_versions ver; + ret = fpga_hal_mbox_get_versions(&ver); + fail_on_quiet(ret != 0, err, "fpga_hal_mbox_get_versions failed"); + + printf(" %-8s %2u 0x%08x\n", + status_name, metrics->status, ver.sh_version); + + if (f1.rescan) { + /** Rescan the application PFs for this slot */ + ret = cli_rescan_slot_app_pfs(f1.afi_slot); + fail_on_quiet(ret != 0, err, "cli_rescan_slot_app_pfs failed"); + } /** Display the application PFs for this slot */ - int ret = cli_show_slot_app_pfs(f1.afi_slot); + ret = cli_show_slot_app_pfs(f1.afi_slot); fail_on_quiet(ret != 0, err, "cli_show_slot_app_pfs failed"); return 0; diff --git a/sdk/management/fpga_image_tools/src/fpga_local_cmd.h b/sdk/management/fpga_image_tools/src/fpga_local_cmd.h index f1f0fab7c..409fcee6a 100644 --- a/sdk/management/fpga_image_tools/src/fpga_local_cmd.h +++ b/sdk/management/fpga_image_tools/src/fpga_local_cmd.h @@ -120,6 +120,7 @@ struct ec2_fpga_cmd { bool show_headers; bool get_hw_metrics; bool clear_hw_metrics; + bool rescan; }; extern struct ec2_fpga_cmd f1; @@ -160,3 +161,26 @@ void cli_pci_free(void); */ int cli_get_app_pf_map(uint32_t slot, uint32_t app_pf_num, struct fpga_pci_resource_map *map); + +/** + * Remove the application PF for the given mbox slot. + * + * @param[in] slot the fpga slot + * @param[in] app_pf_num the application PF number to check + * + * @returns + * 0 on success + * -1 on failure + */ +int +cli_remove_app_pf(uint32_t slot, uint32_t app_pf_num); + +/** + * PCI rescan. + * + * @returns + * 0 on success + * -1 on failure + */ +int +cli_pci_rescan(void); diff --git a/sdk/management/fpga_image_tools/src/fpga_local_cmd_parse.c b/sdk/management/fpga_image_tools/src/fpga_local_cmd_parse.c index 64ae12e8a..3192e8853 100644 --- a/sdk/management/fpga_image_tools/src/fpga_local_cmd_parse.c +++ b/sdk/management/fpga_image_tools/src/fpga_local_cmd_parse.c @@ -82,6 +82,13 @@ static const char *describe_afi_usage[] = { " -S, --fpga-image-slot", " The logical slot number for the FPGA image.", " Constraints: Positive integer from 0 to the total slots minus 1.", + " -R --rescan", + " Rescan the AFIDEVICE to update the per-AFI PCI VendorId and", + " DeviceId that may be dynamically modified due to a", + " fpga-load-local-image or fpga-clear-local-image command.", + " NOTE: this option removes the AFIDEVICE from the sysfs PCI", + " subsystem and then rescans the PCI subsystem in order for", + " the modified AFI PCI IDs to be refreshed.", " -?, --help", " Display this help.", " -H, --headers", @@ -224,7 +231,7 @@ parse_args_load_afi(int argc, char *argv[]) static struct option long_options[] = { {"fpga-image-slot", required_argument, 0, 'S' }, {"fpga-image-id", required_argument, 0, 'I' }, - {"request-timeout", required_argument, 0, 'R' }, + {"request-timeout", required_argument, 0, 'r' }, {"headers", no_argument, 0, 'H' }, {"help", no_argument, 0, '?' }, {"version", no_argument, 0, 'V' }, @@ -232,7 +239,7 @@ parse_args_load_afi(int argc, char *argv[]) }; int long_index = 0; - while ((opt = getopt_long(argc, argv, "S:I:R:H?hV", + while ((opt = getopt_long(argc, argv, "S:I:r:H?hV", long_options, &long_index)) != -1) { switch (opt) { case 'S': { @@ -249,7 +256,7 @@ parse_args_load_afi(int argc, char *argv[]) f1.afi_id[sizeof(f1.afi_id) - 1] = 0; break; } - case 'R': { + case 'r': { uint32_t value32; string_to_uint(&value32, optarg); int ret = config_request_timeout(value32); @@ -295,7 +302,7 @@ parse_args_clear_afi(int argc, char *argv[]) static struct option long_options[] = { {"fpga-image-slot", required_argument, 0, 'S' }, - {"request-timeout", required_argument, 0, 'R' }, + {"request-timeout", required_argument, 0, 'r' }, {"headers", no_argument, 0, 'H' }, {"help", no_argument, 0, '?' }, {"version", no_argument, 0, 'V' }, @@ -303,7 +310,7 @@ parse_args_clear_afi(int argc, char *argv[]) }; int long_index = 0; - while ((opt = getopt_long(argc, argv, "S:R:H?hV", + while ((opt = getopt_long(argc, argv, "S:r:H?hV", long_options, &long_index)) != -1) { switch (opt) { case 'S': { @@ -312,7 +319,7 @@ parse_args_clear_afi(int argc, char *argv[]) FPGA_SLOT_MAX); break; } - case 'R': { + case 'r': { uint32_t value32; string_to_uint(&value32, optarg); int ret = config_request_timeout(value32); @@ -357,7 +364,8 @@ parse_args_describe_afi(int argc, char *argv[]) static struct option long_options[] = { {"fpga-image-slot", required_argument, 0, 'S' }, - {"request-timeout", required_argument, 0, 'R' }, + {"request-timeout", required_argument, 0, 'r' }, + {"rescan", no_argument, 0, 'R' }, {"headers", no_argument, 0, 'H' }, {"help", no_argument, 0, '?' }, {"version", no_argument, 0, 'V' }, @@ -365,7 +373,7 @@ parse_args_describe_afi(int argc, char *argv[]) }; int long_index = 0; - while ((opt = getopt_long(argc, argv, "S:MCR:H?hV", + while ((opt = getopt_long(argc, argv, "S:r:RH?hV", long_options, &long_index)) != -1) { switch (opt) { case 'S': { @@ -374,13 +382,17 @@ parse_args_describe_afi(int argc, char *argv[]) "fpga-image-slot must be less than %u", FPGA_SLOT_MAX); break; } - case 'R': { + case 'r': { uint32_t value32; string_to_uint(&value32, optarg); int ret = config_request_timeout(value32); fail_on_quiet(ret != 0, err, "Could not configure the request-timeout"); break; } + case 'R': { + f1.rescan = true; + break; + } case 'H': { f1.show_headers = true; break; @@ -418,7 +430,7 @@ parse_args_describe_afi_slots(int argc, char *argv[]) int opt = 0; static struct option long_options[] = { - {"request-timeout", required_argument, 0, 'R' }, + {"request-timeout", required_argument, 0, 'r' }, {"headers", no_argument, 0, 'H' }, {"help", no_argument, 0, '?' }, {"version", no_argument, 0, 'V' }, @@ -426,10 +438,10 @@ parse_args_describe_afi_slots(int argc, char *argv[]) }; int long_index = 0; - while ((opt = getopt_long(argc, argv, "R:H?hV", + while ((opt = getopt_long(argc, argv, "r:H?hV", long_options, &long_index)) != -1) { switch (opt) { - case 'R': { + case 'r': { uint32_t value32; string_to_uint(&value32, optarg); int ret = config_request_timeout(value32); diff --git a/sdk/management/fpga_image_tools/src/fpga_local_cmd_pci_sysfs.c b/sdk/management/fpga_image_tools/src/fpga_local_cmd_pci_sysfs.c index 7dd1b3ef7..f8aa6c960 100644 --- a/sdk/management/fpga_image_tools/src/fpga_local_cmd_pci_sysfs.c +++ b/sdk/management/fpga_image_tools/src/fpga_local_cmd_pci_sysfs.c @@ -19,6 +19,8 @@ #include #include +#include +#include #include #include #include @@ -227,6 +229,112 @@ cli_get_app_pf_map(uint32_t slot, uint32_t app_pf_num, return -1; } +/** + * Write a '1' to the given sysfs file. + * + * @param[in] path the sysfs file path + * + * @returns + * 0 on success + * -1 on failure + */ +static int +cli_write_one2file(char *path) +{ + int ret = -1; + + int fd = open(path, O_WRONLY); + fail_on_quiet(fd == -1, err, "opening %s", path); + + char buf[] = { '1', 0 }; + ret = -!!write_loop(fd, buf, sizeof(buf)); + fail_on_quiet(ret != 0, err_close, "error writing %s", path); + +err_close: + close(fd); +err: + return ret; +} + +/** + * Remove the application PF for the given mbox slot. + * + * @param[in] slot the fpga slot + * @param[in] app_pf_num the application PF number to check + * + * @returns + * 0 on success + * -1 on failure + */ +int +cli_remove_app_pf(uint32_t slot, uint32_t app_pf_num) +{ + fail_on_quiet(slot >= FPGA_SLOT_MAX, err, CLI_INTERNAL_ERR_STR); + fail_on_quiet(app_pf_num > F1_APP_PF_END, err, CLI_INTERNAL_ERR_STR); + + /** Setup pointers to the mbox and associated PCI resource maps */ + struct fpga_pci_resource_map *mbox_map = &f1.mbox_slot_devs[slot].map; + + /** Construct the PCI device directory name using the PCI_DEV_FMT */ + char dir_name[NAME_MAX + 1]; + int ret = snprintf(dir_name, sizeof(dir_name), PCI_DEV_FMT, + mbox_map->domain, mbox_map->bus, mbox_map->dev, app_pf_num); + + fail_on_quiet(ret < 0, err, "Error building the dir_name"); + fail_on_quiet((size_t) ret >= sizeof(dir_name), err, "dir_name too long"); + + /** Setup the path to the device's remove file */ + char sysfs_name[NAME_MAX + 1]; + ret = snprintf(sysfs_name, sizeof(sysfs_name), + "/sys/bus/pci/devices/%s/remove", dir_name); + + fail_on_quiet(ret < 0, err, + "Error building the sysfs path for remove file"); + fail_on_quiet((size_t) ret >= sizeof(sysfs_name), err, + "sysfs path too long for remove file"); + + /** Write a "1" to the device's remove file */ + ret = cli_write_one2file(sysfs_name); + fail_on_quiet(ret != 0, err, "cli_write_one2file failed"); + + /** Check for file existence, should fail */ + struct stat file_stat; + ret = stat(sysfs_name, &file_stat); + fail_on_quiet(ret == 0, err, "remove failed for path=%s", sysfs_name); + + return 0; +err: + return -1; +} + +/** + * PCI rescan. + * + * @returns + * 0 on success + * -1 on failure + */ +int +cli_pci_rescan(void) +{ + /** Setup and write '1' to the PCI rescan file */ + char sysfs_name[NAME_MAX + 1]; + int ret = snprintf(sysfs_name, sizeof(sysfs_name), "/sys/bus/pci/rescan"); + + fail_on_quiet(ret < 0, err, + "Error building the sysfs path for PCI rescan file"); + fail_on_quiet((size_t) ret >= sizeof(sysfs_name), err, + "sysfs path too long for PCI rescan file"); + + /** Write a "1" to the PCI rescan file */ + ret = cli_write_one2file(sysfs_name); + fail_on_quiet(ret != 0, err, "cli_write_one2file failed"); + + return 0; +err: + return -1; +} + /** * Handle one PCI device directory with the given directory name, and see if * it is an AFI mbox slot. If so, initialize a slot device structure for it diff --git a/sdk/management/hal/include/fpga_hal_mbox.h b/sdk/management/hal/include/fpga_hal_mbox.h index 4aa2caff1..7ebfa0fa1 100644 --- a/sdk/management/hal/include/fpga_hal_mbox.h +++ b/sdk/management/hal/include/fpga_hal_mbox.h @@ -26,6 +26,13 @@ #define FPGA_MBOX_MSG_DATA_LEN 4096 +/** + * Mailbox version info. + */ +struct fpga_hal_mbox_versions { + uint32_t sh_version; +}; + /** * Mailbox init structure. */ @@ -55,6 +62,17 @@ int fpga_hal_mbox_init(struct fpga_hal_mbox *mbox); */ int fpga_hal_mbox_reset(void); +/** + * Get Mailbox versions. + * + * @param[in,out] ver Mailbox version info to return. + * + * @returns + * 0 on success + * -1 on failure + */ +int fpga_hal_mbox_get_versions(struct fpga_hal_mbox_versions *ver); + /** * Attach the Mailbox. Wrapper around fpga_hal_mbox_reset for attach * semantics. diff --git a/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox.c b/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox.c index 324a99408..ca7ba3cd1 100644 --- a/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox.c +++ b/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox.c @@ -111,6 +111,21 @@ fpga_hal_mbox_detach(bool clear_state) return -1; } +int +fpga_hal_mbox_get_versions(struct fpga_hal_mbox_versions *ver) +{ + log_debug("enter"); + + int ret = fpga_hal_reg_read(FMB_REG_SH_VERSION, &ver->sh_version); + fail_on(ret != 0, err, "Error reading sh_version register"); + + log_debug("returning sh_version=0x%08x", ver->sh_version); + + return 0; +err: + return -1; +} + static int fpga_hal_mbox_check_len(uint32_t len) { diff --git a/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox_regs.h b/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox_regs.h index 98d246bc0..29b1be7e4 100644 --- a/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox_regs.h +++ b/sdk/management/hal/src/api/mbox/hw/fpga_hal_mbox_regs.h @@ -24,6 +24,7 @@ #define FMB_BASE 0x0 #define FMB_REG(offset) (FMB_BASE + (offset)) +#define FMB_REG_SH_VERSION FMB_REG(0x0) #define FMB_REG_STATUS FMB_REG(0xc) #define FMB_REG_WR_INDEX FMB_REG(0x20) #define FMB_REG_WR_DATA FMB_REG(0x24) diff --git a/sdk/management/hal/src/platform/hw/fpga_hal_plat.c b/sdk/management/hal/src/platform/hw/fpga_hal_plat.c index 2acee8f2b..e85fe7fe1 100644 --- a/sdk/management/hal/src/platform/hw/fpga_hal_plat.c +++ b/sdk/management/hal/src/platform/hw/fpga_hal_plat.c @@ -98,8 +98,8 @@ fpga_plat_attach(struct fpga_slot_spec *spec) log_debug("enter"); struct fpga_pci_resource_map *map = &spec->map; - log_debug("vendor_id=0x%04x, device_id=0x%04x, DBDF:" PCI_DEV_FMT - ", resource_num=%u, size=%u", + log_info("vendor_id=0x%04x, device_id=0x%04x, DBDF:" + PCI_DEV_FMT ", resource_num=%u, size=%u", map->vendor_id, map->device_id, map->domain, map->bus, map->dev, map->func, map->resource_num, map->size); From c021d6a4fe993cea7ce2d3e9456ff3a4e711a0e7 Mon Sep 17 00:00:00 2001 From: Robert Johnson Date: Wed, 21 Dec 2016 22:01:48 +0000 Subject: [PATCH 03/29] FPGA Image Tools README.md: added SH version output to fpga-describe-local-image examples --- sdk/management/fpga_image_tools/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sdk/management/fpga_image_tools/README.md b/sdk/management/fpga_image_tools/README.md index 87cd4a1f2..bb11b6705 100644 --- a/sdk/management/fpga_image_tools/README.md +++ b/sdk/management/fpga_image_tools/README.md @@ -66,8 +66,8 @@ The following command displays the current state for the given FPGA slot number. $ sudo fpga-describe-local-image -S 0 -H - Type FpgaImageSlot FpgaImageId StatusName StatusCode - AFI 0 none cleared 1 + Type FpgaImageSlot FpgaImageId StatusName StatusCode ShVersion + AFI 0 none cleared 1 0x11241611 Type FpgaImageSlot VendorId DeviceId DBDF AFIDEVICE 0 0x1d0f 0x1042 0000:00:17.0 @@ -83,8 +83,8 @@ Displays the current state for the given FPGA slot number. The output shows the $ sudo fpga-describe-local-image -S 0 -H - Type FpgaImageSlot FpgaImageId StatusName StatusCode - AFI 0 agfi-0123456789abcdefg loaded 0 + Type FpgaImageSlot FpgaImageId StatusName StatusCode ShVersion + AFI 0 agfi-0123456789abcdefg loaded 0 0x11241611 Type FpgaImageSlot VendorId DeviceId DBDF AFIDEVICE 0 0x1d0f 0x1042 0000:00:17.0 @@ -100,8 +100,8 @@ The following command displays the current state for the given FPGA slot number. $ sudo fpga-describe-local-image -S 0 -H - Type FpgaImageSlot FpgaImageId StatusName StatusCode - AFI 0 none cleared 1 + Type FpgaImageSlot FpgaImageId StatusName StatusCode ShVersion + AFI 0 none cleared 1 0x11241611 Type FpgaImageSlot VendorId DeviceId DBDF AFIDEVICE 0 0x1d0f 0x1042 0000:00:17.0 From 99c32abb6e1f63be4e3e9d9cb377808379195e51 Mon Sep 17 00:00:00 2001 From: Atta Date: Thu, 22 Dec 2016 18:31:46 -0800 Subject: [PATCH 04/29] Adding note on including AWS Account ID. --- hdk/cl/examples/README.md | 35 ++++++++++++------- .../new_cl_template/build/README.md | 2 +- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/hdk/cl/examples/README.md b/hdk/cl/examples/README.md index 96f45677a..d70721cc1 100644 --- a/hdk/cl/examples/README.md +++ b/hdk/cl/examples/README.md @@ -6,12 +6,12 @@ The CL must comply with the [AWS Shell specifications](../../docs/AWS_Shell_Inte The [CL Examples directory](https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples) is provided to assist developers in creating a functional CL implementation. Each example includes: -1) The design source code for the example under the `/design` directory. -2) The timing, clock and placement constraints files, scripts for compiling the example design. (This requires running in an instance/server that have Xilinx tools and license installed. Developers are recommended to use the "FPGA Development AMI" available free of charge on [AWS Marketplace](https://aws.amazon.com/marketplace/). -3) The final build, called Design CheckPoint (DCP) that can be submitted for AWS to generate the AFI. -4) An AFI-ID for a pre-generated AFI that matches the example design. -5) Software source code required on the FPGA-enabled instance to run the example. -6) Software binary that can be loaded on an FPGA-enabled instance to test the AFI. +1. The design source code for the example under the `/design` directory. +2. The timing, clock and placement constraints files, scripts for compiling the example design. (This requires running in an instance/server that have Xilinx tools and license installed. Developers are recommended to use the "FPGA Development AMI" available free of charge on [AWS Marketplace](https://aws.amazon.com/marketplace/). +3. The final build, called Design CheckPoint (DCP) that can be submitted for AWS to generate the AFI. +4. An AFI-ID for a pre-generated AFI that matches the example design. +5. Software source code required on the FPGA-enabled instance to run the example. +6. Software binary that can be loaded on an FPGA-enabled instance to test the AFI. In summary: @@ -98,6 +98,11 @@ After the AFI generation is complete, AWS will put the logs into the bucket loca by email. #### Method 1: If you have access to AWS EC2 CLI with support for `create-fpga-image` action +To check whether you have access to the `create-fpga-image` command, simply try executing the command as follows. +If you get an "Invalid choice" error, then move to [Method 2](https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples#method-2-during-f1-preview-and-before-aws-ec2-cli-action-create-fpga-image-is-available). + + $ aws ec2 create-fpga-image + To create an AFI from the generated DCP, you need to upload the tar-zipped DCP file to an S3 bucket, and execute the `aws ec2 create-fpga-image` command as follows: $ aws ec2 create-fpga-image \ @@ -109,13 +114,13 @@ To create an AFI from the generated DCP, you need to upload the tar-zipped DCP f --logs-storage-location Bucket=,Key=logs/ The output of this command includes two identifiers that refer to your AFI: -- FPGA Image Identifier or AFI ID: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. +- **FPGA Image Identifier** or **AFI ID**: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. This ID is regional, i.e., if an AFI is copied across multiple regions, it will have a different unique AFI ID in each region. - An example AFI ID is `afi-01234567890abcdef`. -- Glogal FPGA Image Identifier or AGFI ID: this is a global ID that is used to refer to an AFI from within an F1 instance. + An example AFI ID is **`afi-01234567890abcdef`**. +- **Glogal FPGA Image Identifier** or **AGFI ID**: this is a global ID that is used to refer to an AFI from within an F1 instance. For example, to load or clear an AFI from an FPGA slot, you use the AGFI ID. Since the AGFI IDs is global (by design), it allows you to copy a combination of AFI/AMI to multiple regions, and they will work without requiring any extra setup. - An example AFI ID is `agfi-01234567890abcdef`. + An example AGFI ID is **`agfi-01234567890abcdef`**. #### Method 2: During F1 preview and before AWS EC2 CLI action `create-fpga-image` is available @@ -161,14 +166,14 @@ A sample policy is shown below. ] } -Then, send an email to AWS (email TBD) providing the information listed earlier. +Then, send an email to AWS (email TBD) providing the information listed earlier (numbered 1-6), in addition to your **AWS Account ID number**. # Step by step guide how to load and test a registered AFI from within an F1 instance To follow the next steps, you have to run an instance on F1. AWS recommend you run an instance with latest Amazon Linux that have the FPGA management tools included, or alternatively the FPGA Developer AMI with both the HDK and SDK. -## 4. Setup AWS FPGA management tools +## 4. Setup AWS FPGA Management tools Execute the following: @@ -185,10 +190,12 @@ There is a default limit of eight AFIs per AMI, if you need more, please reach o To associate, simply invoke the following AWS EC2 CLI command. $ aws ec2 associate-fpga-image --fpga-image-id --image-id + +**NOTE**: The AWS CLI commands use the AFI ID (not the AGFI ID). ## 6. Load the AFI -Run's the `fpga-describe-local-image` on slot 0 to confirm that the FPGA is cleared, and you should see similar output to the 4 lines below: +Run the `fpga-describe-local-image` on slot 0 to confirm that the FPGA is cleared, and you should see similar output to the 4 lines below: $ sudo fpga-describe-local-image -S 0 -H @@ -201,6 +208,8 @@ Then loading the example AFI to FPGA slot 0 (you should have the AGFI ID from St $ sudo fpga-load-local-image -S 0 -I +**NOTE**: The FPGA Management tools use the AGFI ID (not the AFI ID). + Now, you can verify the status of the previous load command: $ sudo fpga-describe-local-image -S 0 -H diff --git a/hdk/common/shell_current/new_cl_template/build/README.md b/hdk/common/shell_current/new_cl_template/build/README.md index b6f2de337..3a41c6034 100644 --- a/hdk/common/shell_current/new_cl_template/build/README.md +++ b/hdk/common/shell_current/new_cl_template/build/README.md @@ -103,7 +103,7 @@ The output of this command includes two identifiers that refer to your AFI: After the AFI generation is complete, AWS will put the AFI generation logs into the bucket location provided by the developer and notify them by email. -**NOTE**: Preview-program customers without access to the AWS CLI EC2 action `create-fpga-image` should instead follow the instructions [here](https://github.com/aws/aws-fpga/blob/master/hdk/cl/examples/Getting_Started_With_CL_Examples.md). +**NOTE**: Preview-program customers without access to the AWS CLI EC2 action `create-fpga-image` should instead follow the instructions [here](https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples#method-2-during-f1-preview-and-before-aws-ec2-cli-action-create-fpga-image-is-available). ## About Encryption Developer RTL is encrypted using IEEE 1735 V2 encryption. This level of encryption protects both the raw source files and the implemented design. From 544858374d6fd4f7feea4553eb9c3ee66c3913a3 Mon Sep 17 00:00:00 2001 From: Deep Patel Date: Fri, 23 Dec 2016 15:38:56 -0600 Subject: [PATCH 05/29] Changes to /cl/examples/README.md (#34) * Fixing typos * Add text about owning AMI --- hdk/cl/examples/README.md | 75 +++++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 35 deletions(-) diff --git a/hdk/cl/examples/README.md b/hdk/cl/examples/README.md index d70721cc1..33713fc1e 100644 --- a/hdk/cl/examples/README.md +++ b/hdk/cl/examples/README.md @@ -1,7 +1,7 @@ # Overview on process for building a Custom Logic (CL) implementation for AWS FPGA instances The developer can build their own Custom Logic (CL) and deploy it on AWS. -The CL must comply with the [AWS Shell specifications](../../docs/AWS_Shell_Interface_Specification.md), and pass through the build scripts. +The CL must comply with the [AWS Shell specifications](../../docs/AWS_Shell_Interface_Specification.md), and pass through the build scripts. The [CL Examples directory](https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples) is provided to assist developers in creating a functional CL implementation. Each example includes: @@ -11,7 +11,7 @@ functional CL implementation. Each example includes: 3. The final build, called Design CheckPoint (DCP) that can be submitted for AWS to generate the AFI. 4. An AFI-ID for a pre-generated AFI that matches the example design. 5. Software source code required on the FPGA-enabled instance to run the example. -6. Software binary that can be loaded on an FPGA-enabled instance to test the AFI. +6. Software binary that can be loaded on an FPGA-enabled instance to test the AFI. In summary: @@ -22,7 +22,7 @@ By following the example CLs, a developer should learn how to interface to the A # Step by step guide on how to create an AFI from one of the CL examples -As a pre-requested to building the AFI, the developer should have an instance/server with Xilinx vivado tools and license. The "FPGA Developer AMI" provided free of charge on AWS Marketplace will be an ideal place to start an instance from. See the README.md on the AMI for the details how to launch the FPGA Developer's AMI, install the tools and set up the license. +As a pre-requisite to building the AFI, the developer should have an instance/server with Xilinx Vivado Tools and the necessary Licenses. The "FPGA Developer AMI" provided free of charge on AWS Marketplace will be an ideal place to start an instance from. See the README.md on the AMI for the details how to launch the FPGA Developer's AMI, install the tools and set up the license. **NOTE:** *steps 1 through 3 can be done on any server or EC2 instance, C4/C5 instances are recommended for fastest build time* @@ -33,17 +33,17 @@ As a pre-requested to building the AFI, the developer should have an instance/se $ git clone https://github.com/aws/aws-fpga $ cd aws-fpga $ source hdk_shell.sh - + To install the AWS CLI, please follow the instructions here: (http://docs.aws.amazon.com/cli/latest/userguide/installing.html). - + $ aws configure # to set your credentials (found in your console.aws.amazon.com page) and region (typically us-east-1) -**NOTE**: During the F1 preview, not all FPGA-specific AWS CLI commands are available to the public. +**NOTE**: During the F1 preview, not all FPGA-specific AWS CLI commands are available to the public. To extend your AWS CLI installation, please execute the following: $ aws configure add-model --service-model file://$(pwd)/sdk/aws-cli-preview/ec2_preview_model.json - - + + ### 1. Pick one of the examples and move to its directory There are couple of ways to start a new CL: one option is to copy one of the examples provided in the HDK and modify the design files, scripts and constrains directory. @@ -52,7 +52,7 @@ Alternatively, by creating a new directory, setup the environment variables, and $ cd $HDK_DIR/cl/examples/cl_hello_world # you can change cl_hello_world to any other example $ export CL_DIR=$(pwd) - + Setting up the CL_DIR environment variable is crucial as the build scripts rely on that value. Each one of the examples following the recommended directory structure to match what's expected by the HDK simulation and build scripts. @@ -60,24 +60,23 @@ If you like to start your own CL, check out the [How to create your own CL Readm ### 2. Build the CL before submitting to AWS -**NOTE** *This step requires you have Xilinx Vivado Tools installed as well Vivado License:* +**NOTE** *This step requires you to have Xilinx Vivado Tools and Licenses installed* $ vivado -mode batch # Run this command to see if vivado is installed - $ sudo perl /home/centos/src/project_data/license/license_manager.pl -status # To check if license server is up. this command is for AWS-provided FPGA Development machine, the license manager can be in different directory in your systems - + The next script two steps will go through the entire implementation process converting the CL design into a completed Design Checkpoint that meets timing and placement constrains of the target FPGA $ cd $CL_DIR/build/scripts $ ./aws_build_dcp_from_cl.tcl -**NOTE**: The DCP generation can take up to several hours to complete. -We recommend that you initiate the generation in a way that prevents interruption. +**NOTE**: The DCP generation can take up to several hours to complete. +We recommend that you initiate the generation in a way that prevents interruption. For example, if working on a remote machine, we recommend using window management tools such as [`screen`](https://www.gnu.org/software/screen/manual/screen.html) to mitigate potential network disconnects. - + ### 3. Submit the Design Checkpoint to AWS to register the AFI -To submit the DCP, create an S3 bucket for submitting the design and upload the tar-zipped archive into that bucket. +To submit the DCP, create an S3 bucket for submitting the design and upload the tar-zipped archive into that bucket. You need to prepare the following information: 1. Name of the logic design. @@ -87,7 +86,7 @@ You need to prepare the following information: 5. Location of the directory to write logs (S3 bucket name and key). 6. Version of the AWS Shell. -To upload your DCP to S3, +To upload your DCP to S3, $ aws s3 mb s3:// # Create an S3 bucket (choose a unique bucket name) $ aws s3 cp *.SH_CL_routed.dcp \ # Upload the DCP file to S3 @@ -100,10 +99,10 @@ by email. #### Method 1: If you have access to AWS EC2 CLI with support for `create-fpga-image` action To check whether you have access to the `create-fpga-image` command, simply try executing the command as follows. If you get an "Invalid choice" error, then move to [Method 2](https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples#method-2-during-f1-preview-and-before-aws-ec2-cli-action-create-fpga-image-is-available). - + $ aws ec2 create-fpga-image -To create an AFI from the generated DCP, you need to upload the tar-zipped DCP file to an S3 bucket, and execute the `aws ec2 create-fpga-image` command as follows: +To create an AFI from the generated DCP, you need to upload the tar-zipped DCP file to an S3 bucket, and execute the `aws ec2 create-fpga-image` command as follows: $ aws ec2 create-fpga-image \ --fpga-image-architecture xvu9p \ @@ -114,18 +113,18 @@ To create an AFI from the generated DCP, you need to upload the tar-zipped DCP f --logs-storage-location Bucket=,Key=logs/ The output of this command includes two identifiers that refer to your AFI: -- **FPGA Image Identifier** or **AFI ID**: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. +- **FPGA Image Identifier** or **AFI ID**: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. This ID is regional, i.e., if an AFI is copied across multiple regions, it will have a different unique AFI ID in each region. - An example AFI ID is **`afi-01234567890abcdef`**. + An example AFI ID is **`afi-01234567890abcdef`**. - **Glogal FPGA Image Identifier** or **AGFI ID**: this is a global ID that is used to refer to an AFI from within an F1 instance. For example, to load or clear an AFI from an FPGA slot, you use the AGFI ID. - Since the AGFI IDs is global (by design), it allows you to copy a combination of AFI/AMI to multiple regions, and they will work without requiring any extra setup. + Since the AGFI IDs is global (by design), it allows you to copy a combination of AFI/AMI to multiple regions, and they will work without requiring any extra setup. An example AGFI ID is **`agfi-01234567890abcdef`**. #### Method 2: During F1 preview and before AWS EC2 CLI action `create-fpga-image` is available -Add a policy to the created S3 bucket granting [read/write permissions](http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html) to our team's account (Account ID: 371834676912). -A sample policy is shown below. +Add a policy to the created S3 bucket granting [read/write permissions](http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html) to our team's account (Account ID: 371834676912). +A sample policy is shown below. { "Version": "2012-10-17", @@ -171,7 +170,7 @@ Then, send an email to AWS (email TBD) providing the information listed earlier # Step by step guide how to load and test a registered AFI from within an F1 instance -To follow the next steps, you have to run an instance on F1. AWS recommend you run an instance with latest Amazon Linux that have the FPGA management tools included, or alternatively the FPGA Developer AMI with both the HDK and SDK. +To follow the next steps, you have to run an instance on F1. AWS recommends that you run an instance with latest Amazon Linux that has the FPGA management tools included, or alternatively the FPGA Developer AMI with both the HDK and SDK. ## 4. Setup AWS FPGA Management tools @@ -180,19 +179,25 @@ Execute the following: $ git clone https://github.com/aws/aws-fpga # Not needed if you have installed the HDK as in Step 0. $ cd aws-fpga $ source sdk_setup.sh - + ## 5. Associate the AFI with your AMI -To start using the AFI, you need to associate it with an [AMI (Amazon Machine Image)](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) that you own. -Association means that any instance launched using this AMI will be able to load the AFIs to FPGAs as described in the next section. -You can associate multiple AFIs with your AMI. -There is a default limit of eight AFIs per AMI, if you need more, please reach out to AWS with your use case and we can adjust your limit. -To associate, simply invoke the following AWS EC2 CLI command. +* To start using the AFI, you need to associate it with an [AMI (Amazon Machine Image)](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) that you own. +* Association means that any instance launched using this AMI will be able to load the AFIs to FPGAs as described in the next section. +* You can associate multiple AFIs with your AMI. +* There is a default limit of eight AFIs per AMI, if you need more, please reach out to AWS with your use case and we can adjust your limit. +* FPGA Developer AMI's are owned by AWS and you can not associate your AFI with them. + * If you are developing using the FPGA Developer AMI's, just create a new image of your instance after you are done developing and that will create an AMI of your instance that you own. + $ aws create-image --instance-id --name + * This will create a new AMI of the current state of your instance and you would be able to associate an AFI with this AMI. + * You would have to start your F1 Instance with this new image as that load commands would only work on associated AMI's + +* To associate, simply invoke the following AWS EC2 CLI command. $ aws ec2 associate-fpga-image --fpga-image-id --image-id - + **NOTE**: The AWS CLI commands use the AFI ID (not the AGFI ID). - + ## 6. Load the AFI Run the `fpga-describe-local-image` on slot 0 to confirm that the FPGA is cleared, and you should see similar output to the 4 lines below: @@ -204,12 +209,12 @@ Run the `fpga-describe-local-image` on slot 0 to confirm that the FPGA is cleare Type VendorId DeviceId DBDF AFIDEVICE 0x1d0f 0x1042 0000:00:17.0 -Then loading the example AFI to FPGA slot 0 (you should have the AGFI ID from Step 3 above): +Then loading the example AFI to FPGA slot 0 (you should have the AGFI ID from Step 3 above): $ sudo fpga-load-local-image -S 0 -I **NOTE**: The FPGA Management tools use the AGFI ID (not the AFI ID). - + Now, you can verify the status of the previous load command: $ sudo fpga-describe-local-image -S 0 -H From c2d0c536fd99ddda792795a360ae266cc4de6cbc Mon Sep 17 00:00:00 2001 From: King Date: Fri, 23 Dec 2016 15:44:30 -0600 Subject: [PATCH 06/29] clean up of devAMI mentions --- FAQs.md | 21 +++++++++++++++------ README.md | 13 +++++++++++-- RELEASE_NOTES.md | 2 +- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/FAQs.md b/FAQs.md index ebb3af740..8e1d5d35b 100644 --- a/FAQs.md +++ b/FAQs.md @@ -63,12 +63,10 @@ for testing access to the DDR interfaces. **How do I get access to the Developer AMI?** -Start with an AWS account and request access to the Developer AMI in AWS -Marketplace. Currently, the FPGA Developer AMI is private. You will -receive permission on the AWS account you submitted for access to the -FPGA Developer AMI. The AMI can be launched directly from AWS -Marketplace on any EC2 instance. See the FPGA Developer AMI README for -more details. +Currently, the FPGA Developer AMI is private and you will need to be whitelisted. You will +receive permission and notifications via email. Email aws-fpga-developer-support@amazon.com with any questions +See the FPGA Developer AMI README for more details. + **What is an AFI?** @@ -113,3 +111,14 @@ permission to see its code. The only reference to the AFI is through the AFI ID. The Customer would call fpga-local-load-image with the correct AFI ID for that Marketplace offering, which will result in AWS loading the AFI into the FPGA. No FPGA internal design code is exposed. + +**Why did my example job run and die without generating a DCP file?** + +The error message below indicates that you ran out of memory. Restart your instance +with a different instance type that has 8GiB or more. + +Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:06:26 ; +elapsed = 00:08:59 . Memory (MB): peak = 4032.184 ; gain = 3031.297 ; free physical = 1285 ; free virtual = 1957 +/opt/Xilinx/Vivado/2016.3/bin/loader: line 164: 8160 Killed "$RDI_PROG" "$@" +Parent process (pid 8160) has died. This helper process will now exit + diff --git a/README.md b/README.md index 6aac9ebc8..486a5e4de 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ Please click the "Watch" button in GitHub upper right corner to stay posted. ## FPGA HDK -The [HDK directory](./hdk) is recommended for developers wanting to start building Amazon FPGA Images (AFI). It includes the development environment, simulation, build and AFI creation scripts. The HDK can be installed on any server or EC2 instance. AWS recommends the use of the [FPGA Developer AMI on AWS Marketplace](https//aws.amazon.com/marketplace/AmazonFPGAAmi). The HDK is not required if you are using a pre-built AFI and not planning to build your own AFI. +The [HDK directory](./hdk) is recommended for developers wanting to start building Amazon FPGA Images (AFI). It includes the development environment, simulation, build and AFI creation scripts. The HDK can be installed on any server or EC2 instance. The HDK is not required if you are using a pre-built AFI and not planning to build your own AFI. Execute [`source ./hdk_setup.sh`](./hdk_setup.sh) to setup the environment variables required by the rest of the HDK scripts. @@ -22,11 +22,20 @@ Execute [`source ./hdk_setup.sh`](./hdk_setup.sh) to setup the environment varia The [SDK directory](./sdk) includes the drivers and runtime environment required by any EC2 Instance running on F1. It includes the drivers and tools to interact with pre-built AFIs that are loaded to EC2 F1 FPGAs. The SDK is not required during the AFI development process; it is only required once the AFI is loaded onto an F1 instance. +## FPGA Developer AMI + +AWS recommends the use of the F1 FPGA developer AMI for development on EC2 instances. The HDK examples and quick start can be run on any [C4/M4](https://aws.amazon.com/ec2/instance-types/) EC2 instance with atleast 8GiB Memory. For the best performance, c4.2xlarge is recommended. To start using the AMI your AWS account needs to be whitelisted. Once you are whitelisted, from the AWS console you will have access to AMIs. Make sure you are in N. Virginia (us-east-1). +Go to EC2->Launch Instance->My AMIs +Tick the ‘Shared with me’ box on the Ownership tab on the left. +FPGA developer AMI will be prefixed with F1 + +During private access period, developers are emailed with details on how to get started with the AMI, terms and conditions and additional info on how to get started using F1 instances. Please email aws-fpga-developer-support@amazon.com for questions regarding developer AMI. + # Quick Start ## Building an Example AFI -By following the next few steps, you would have downloaded the HDK, compiled and built one of the example Custom Logic (CL) designs included in this HDK, and registered it with AWS. You can run these steps on any EC2 instance, with [C4](https://aws.amazon.com/ec2/instance-types/) and [M4](https://aws.amazon.com/ec2/instance-types/) being the recommended instance types for performance. +By following the next few steps, you would have downloaded the HDK, compiled and built one of the example Custom Logic (CL) designs included in this HDK, and registered it with AWS. #### Prerequisites * AWS FPGA HDK and SDK run in Linux environment only. diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md index 534db4495..bc256bd45 100644 --- a/RELEASE_NOTES.md +++ b/RELEASE_NOTES.md @@ -66,7 +66,7 @@ This is first public release for AWS EC2 FPGA Development Kit. The kit comes wit * The HDK and SDK are designed for **Linux** environment and has not been tested on other platforms. * First install of AWS FPGA SDK requires having gcc installed in the instance server. If that's not available, try `sudo yum update && sudo yum group install "Development Tools"` -* The HDK build step requires having Xilinx's Vivado tool and Vivado License Management running +* The HDK build step requires having Xilinx's Vivado tool and Vivado License Management running. Tools and licenses are provided free of charge in AWS FPGA Developer AMI * Vivado License need to support VU9p ES1 FPGA * Vivado License need to support encryption * This release tested and validated with Vivado 2016.3 From 9aad00c1e46b14e4e4a1e5794b0793de786f11e1 Mon Sep 17 00:00:00 2001 From: AWSwinefred Date: Fri, 23 Dec 2016 15:46:27 -0600 Subject: [PATCH 07/29] Updating Develop Branch with Latest master branch changes (#35) * FPGA Image Tools README.md: added SH version output to fpga-describe-local-image examples * FPGA Image Tools: fpga-describe-local-image outputs the SH version + supporting MBOX HAL changes. Also added rescan option for future use * adding SDAccel driver (#32) Change-Id: I4409f468bb2f680086ce8768e5a700fcd2280cd5 * Update ISSUE_TEMPLATE.md (#31) Added @aws/fpga-user to enable email notifiations --- .github/ISSUE_TEMPLATE.md | 8 +- sdk/SDAccel/HAL/driver/include/xclbin.h | 124 ++ sdk/SDAccel/HAL/driver/include/xclhal.h | 394 +++++ sdk/SDAccel/HAL/driver/include/xclperf.h | 300 ++++ .../xcldma/include/perfmon_parameters.h | 274 +++ .../xcldma/include/xbar_sys_parameters.h | 146 ++ .../HAL/driver/xcldma/include/xdma-ioctl.h | 148 ++ .../HAL/driver/xcldma/user/datamover.h | 182 ++ .../HAL/driver/xcldma/user/memorymanager.cpp | 220 +++ .../HAL/driver/xcldma/user/memorymanager.h | 76 + sdk/SDAccel/HAL/driver/xcldma/user/perf.cpp | 980 +++++++++++ sdk/SDAccel/HAL/driver/xcldma/user/prom.cpp | 445 +++++ sdk/SDAccel/HAL/driver/xcldma/user/shim.cpp | 1250 ++++++++++++++ sdk/SDAccel/HAL/driver/xcldma/user/shim.h | 256 +++ sdk/SDAccel/HAL/driver/xcldma/user/xspi.cpp | 1531 +++++++++++++++++ 15 files changed, 6328 insertions(+), 6 deletions(-) create mode 100644 sdk/SDAccel/HAL/driver/include/xclbin.h create mode 100644 sdk/SDAccel/HAL/driver/include/xclhal.h create mode 100755 sdk/SDAccel/HAL/driver/include/xclperf.h create mode 100644 sdk/SDAccel/HAL/driver/xcldma/include/perfmon_parameters.h create mode 100644 sdk/SDAccel/HAL/driver/xcldma/include/xbar_sys_parameters.h create mode 100644 sdk/SDAccel/HAL/driver/xcldma/include/xdma-ioctl.h create mode 100644 sdk/SDAccel/HAL/driver/xcldma/user/datamover.h create mode 100644 sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.cpp create mode 100644 sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.h create mode 100644 sdk/SDAccel/HAL/driver/xcldma/user/perf.cpp create mode 100644 sdk/SDAccel/HAL/driver/xcldma/user/prom.cpp create mode 100644 sdk/SDAccel/HAL/driver/xcldma/user/shim.cpp create mode 100644 sdk/SDAccel/HAL/driver/xcldma/user/shim.h create mode 100755 sdk/SDAccel/HAL/driver/xcldma/user/xspi.cpp diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md index 744c16b60..b412d3eeb 100644 --- a/.github/ISSUE_TEMPLATE.md +++ b/.github/ISSUE_TEMPLATE.md @@ -1,7 +1,3 @@ -### Release version +Please don't remove the line below -### Expected Behavior - -### Actual Behavior - -### Steps to reproduce +@aws/fpga-user diff --git a/sdk/SDAccel/HAL/driver/include/xclbin.h b/sdk/SDAccel/HAL/driver/include/xclbin.h new file mode 100644 index 000000000..dcd3c1a67 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/include/xclbin.h @@ -0,0 +1,124 @@ +/** + * Xilinx SDAccel xclbin container definition + * Copyright (C) 2015-2016, Xilinx Inc - All rights reserved + */ + +#ifndef _XCLBIN_H_ +#define _XCLBIN_H_ + +#if defined(__KERNEL__) +#include +#elif defined(__cplusplus) +#include +#include +#else +#include +#include +#endif + +#ifdef __cplusplus +extern "C" { +#endif + + /** + * Container format for Xilinx bitstreams, metadata and other + * binary blobs. + * Every segment must be aligned at 8 byte boundary with null byte padding + * between adjacent segments if required. + * For segements which are not present both offset and length must be 0 in + * the header. + * Currently only xclbin0\0 is recognized as file magic. In future if/when file + * format is updated the magic string will be changed to xclbin1\0 and so on. + */ + enum XCLBIN_MODE { + XCLBIN_FLAT, + XCLBIN_PR, + XCLBIN_TANDEM_STAGE2, + XCLBIN_TANDEM_STAGE2_WITH_PR, + XCLBIN_MODE_MAX + }; + + struct xclBin { + char m_magic[8]; /* should be xclbin0\0 */ + uint64_t m_length; /* total size of the xclbin file */ + uint64_t m_timeStamp; /* number of seconds since epoch when xclbin was created */ + uint64_t m_version; /* tool version used to create xclbin */ + unsigned m_mode; /* XCLBIN_MODE */ + char m_nextXclBin[24]; /* Name of next xclbin file in the daisy chain */ + uint64_t m_metadataOffset; /* file offset of embedded metadata */ + uint64_t m_metadataLength; /* size of the embedded metdata */ + uint64_t m_primaryFirmwareOffset; /* file offset of bitstream or emulation archive */ + uint64_t m_primaryFirmwareLength; /* size of the bistream or emulation archive */ + uint64_t m_secondaryFirmwareOffset; /* file offset of clear bitstream if any */ + uint64_t m_secondaryFirmwareLength; /* size of the clear bitstream */ + uint64_t m_driverOffset; /* file offset of embedded device driver if any (currently unused) */ + uint64_t m_driverLength; /* size of the embedded device driver (currently unused) */ + + // Extra debug information for hardware and hardware emulation debug + + uint64_t m_dwarfOffset ; + uint64_t m_dwarfLength ; + uint64_t m_ipiMappingOffset ; + uint64_t m_ipiMappingLength ; + }; + + /* + * XCLBIN1 LAYOUT + * -------------- + * + * ----------------------------------------- + * | Magic | + * ----------------------------------------- + * | Header | + * ----------------------------------------- + * | One or more section headers | + * ----------------------------------------- + * | Matching number of sections with data | + * ----------------------------------------- + * + */ + enum xclBin1SectionKind { + BITSTREAM, + CLEARING_BITSTREAM, + EMBEDDED_METADATA, + FIRMWARE, + DEBUG_DATA + }; + + struct xclBin1SectionHeader { + unsigned m_sectionKind; /* Section type */ + unsigned short m_freq[4]; /* Target frequency for the section if applicable */ + char m_sectionName[16]; /* Examples: "stage2", "clear1", "clear2", "ocl1", "ocl2, "ublaze" */ + unsigned m_customFlagsA; /* Example: Number of Kernels in this region */ + unsigned m_customFlagsB; /* Example: Number of Kernels in this region */ + uint64_t m_sectionOffset; /* File offset of section data */ + uint64_t m_sectionSize; /* Size of section data */ + }; + + struct xclBin1Header { + uint64_t m_length; /* Total size of the xclbin file */ + uint64_t m_timeStamp; /* Number of seconds since epoch when xclbin was created */ + unsigned m_version; /* Tool version used to create xclbin */ + unsigned m_mode; /* XCLBIN_MODE */ + uint64_t m_platformId; /* 64 bit platform ID: vendor-device-subvendor-subdev */ + uint64_t m_featureId; /* 64 bit feature id */ + char m_nextXclBin[16]; /* Name of next xclbin file in the daisy chain */ + char m_debugBin[16]; /* Name of binary with debug information */ + unsigned m_numSections; /* Number of section headers */ + }; + + struct xclBin1 { + char m_magic[8]; /* Should be xclbin1\0 */ + uint64_t m_signature[4]; /* File signature for validation of binary */ + struct xclBin1Header m_header; /* Inline header */ + struct xclBin1SectionHeader m_sections[1]; /* One or more section headers follow */ + }; + + +#ifdef __cplusplus +} +#endif + +#endif + +// XSIP watermark, do not delete 67d7842dbbe25473c3c32b93c0da8047785f30d78e8a024de1b57352245f9689 diff --git a/sdk/SDAccel/HAL/driver/include/xclhal.h b/sdk/SDAccel/HAL/driver/include/xclhal.h new file mode 100644 index 000000000..ce4c78fe6 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/include/xclhal.h @@ -0,0 +1,394 @@ +/** + * Xilinx SDAccel HAL userspace driver APIs + * Copyright (C) 2015-2016, Xilinx Inc - All rights reserved + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#ifndef _XCL_HAL_H_ +#define _XCL_HAL_H_ + +#ifdef __cplusplus +#include +#include +#else +#include +#include +#endif + +#if defined(_WIN32) +#ifdef XCL_DRIVER_DLL_EXPORT +#define XCL_DRIVER_DLLESPEC __declspec(dllexport) +#else +#define XCL_DRIVER_DLLESPEC __declspec(dllimport) +#endif +#else +#define XCL_DRIVER_DLLESPEC __attribute__((visibility("default"))) +#endif + + +#include "xclperf.h" + +#ifdef __cplusplus +extern "C" { +#endif + + typedef void * xclDeviceHandle; + + struct xclBin; + /** + * Structure used to obtain various bits of information from the device. + */ + + struct xclDeviceInfo { + unsigned mMagic; // = 0X586C0C6C; XL OpenCL X->58(ASCII), L->6C(ASCII), O->0 C->C L->6C(ASCII); + char mName[256]; + unsigned short mHALMajorVersion; + unsigned short mHALMinorVersion; + unsigned short mVendorId; + unsigned short mDeviceId; + unsigned mDeviceVersion; + unsigned short mSubsystemId; + unsigned short mSubsystemVendorId; + size_t mDDRSize; // Size of DDR memory + size_t mDataAlignment; // Minimum data alignment requirement for host buffers + size_t mDDRFreeSize; // Total unused/available DDR memory + size_t mMinTransferSize; // Minimum DMA buffer size + float mTemp; + float mVoltage; + float mCurrent; + unsigned mDDRBankCount; + unsigned mOCLFrequency; + unsigned mPCIeLinkWidth; + unsigned mPCIeLinkSpeed; + unsigned short mDMAThreads; + // More properties here + }; + + struct xclDeviceInfo2 { + unsigned mMagic; // = 0X586C0C6C; XL OpenCL X->58(ASCII), L->6C(ASCII), O->0 C->C L->6C(ASCII); + char mName[256]; + unsigned short mHALMajorVersion; + unsigned short mHALMinorVersion; + unsigned short mVendorId; + unsigned short mDeviceId; + unsigned short mSubsystemId; + unsigned short mSubsystemVendorId; + unsigned short mDeviceVersion; +// unsigned mDriverVersion; // Enable this after driver unification since it changes the ABI + size_t mDDRSize; // Size of DDR memory + size_t mDataAlignment; // Minimum data alignment requirement for host buffers + size_t mDDRFreeSize; // Total unused/available DDR memory + size_t mMinTransferSize; // Minimum DMA buffer size +// size_t mBRAMSize; // Enable this after driver unification since it changes the ABI + unsigned short mDDRBankCount; + unsigned short mOCLFrequency[4]; + unsigned short mPCIeLinkWidth; + unsigned short mPCIeLinkSpeed; + unsigned short mDMAThreads; + short mOnChipTemp; + short mFanTemp; + unsigned short mVInt; + unsigned short mVAux; + unsigned short mVBram; + float mCurrent; +// unsigned short mCurrent; // Change float to short after driver unification since it changes the ABI + unsigned short mNumClocks; + unsigned short mFanSpeed; + bool mMigCalib; + // More properties here + }; + + enum xclMemoryDomains { + XCL_MEM_HOST_RAM = 0x00000000, + XCL_MEM_DEVICE_RAM = 0x00000001, + XCL_MEM_DEVICE_BRAM = 0x00000002, + XCL_MEM_SVM = 0x00000003, + XCL_MEM_CMA = 0x00000004, + XCL_MEM_DEVICE_REG = 0x00000005 + }; + + enum xclDDRFlags { + XCL_DEVICE_RAM_BANK0 = 0, + XCL_DEVICE_RAM_BANK1 = 1, + XCL_DEVICE_RAM_BANK2 = 2, + XCL_DEVICE_RAM_BANK3 = 3 + }; + + enum xclBRAMFlags { + XCL_DEVICE_BRAM0 = 0, + XCL_DEVICE_BRAM1 = 1, + XCL_DEVICE_BRAM2 = 2, + XCL_DEVICE_BRAM3 = 3, + }; + + /** + * Define address spaces on the device AXI bus. The enums are used in xclRead() and xclWrite() + * to pass relative offsets. + */ + + enum xclAddressSpace { + XCL_ADDR_SPACE_DEVICE_FLAT = 0, // Absolute address space + XCL_ADDR_SPACE_DEVICE_RAM = 1, // Address space for the DDR memory + XCL_ADDR_KERNEL_CTRL = 2, // Address space for the OCL Region control port + XCL_ADDR_SPACE_DEVICE_PERFMON = 3, // Address space for the Performance monitors + XCL_ADDR_SPACE_DEVICE_REG = 4, // Address space for device registers. + XCL_ADDR_SPACE_MAX = 8 + }; + + /** + * Defines verbosity levels which are passed to xclOpen during device creation time + */ + + enum xclVerbosityLevel { + XCL_QUIET = 0, + XCL_INFO = 1, + XCL_WARN = 2, + XCL_ERROR = 3 + }; + + enum xclResetKind { + XCL_RESET_KERNEL, + XCL_RESET_FULL + }; + + // VERSION 1.0 APIs + // ---------------- + + /** + * @defgroup devman DEVICE MANAGMENT APIs + * -------------------------------------- + * APIs to open, close, query and program the device + * @{ + */ + + /** + * Open a device and obtain its handle. + * "deviceIndex" is 0 for first device, 1 for the second device and so on + * "logFileName" is optional and if not NULL should be used to log messages + * "level" specifies the verbosity level for the messages being logged to logFileName + */ + + XCL_DRIVER_DLLESPEC xclDeviceHandle xclOpen(unsigned deviceIndex, const char *logFileName, xclVerbosityLevel level); + + /** + * Close an opened device + */ + + XCL_DRIVER_DLLESPEC void xclClose(xclDeviceHandle handle); + + /** + * Obtain various bits of information from the device + */ + + XCL_DRIVER_DLLESPEC int xclGetDeviceInfo(xclDeviceHandle handle, xclDeviceInfo *info); + + /** + * Obtain various bits of information from the device + */ + + XCL_DRIVER_DLLESPEC int xclGetDeviceInfo2(xclDeviceHandle handle, xclDeviceInfo2 *info); + + /** + * Download bitstream to the device. The bitstream is in xclBin format and stored in xclBinFileName. + * The bitstream may be PR bistream for devices which support PR and full bitstream for devices + * which require full configuration. + */ + + XCL_DRIVER_DLLESPEC int xclLoadBitstream(xclDeviceHandle handle, const char *xclBinFileName); + + /** + * Download bitstream to the device. The bitstream is passed in memory in xclBin format. The bitstream + * may be PR bistream for devices which support PR and full bitstream for devices which require full + * configuration. + */ + + XCL_DRIVER_DLLESPEC int xclLoadXclBin(xclDeviceHandle handle, const xclBin *buffer); + + /** @} */ + + /** + * @defgroup bufman BUFFER MANAGMENT APIs + * -------------------------------------- + * + * Buffer management APIs are used for managing device memory. The board vendors are expected to + * provide a memory manager with the following 4 APIs. The xclCopyXXX functions will be used by + * runtime to migrate buffers between host and device memory. + * @{ + */ + + /** + * Allocate a buffer on the device DDR and return its address + */ + + XCL_DRIVER_DLLESPEC uint64_t xclAllocDeviceBuffer(xclDeviceHandle handle, size_t size); + + /** + * Allocate a buffer on the device DDR bank and return its address + */ + + XCL_DRIVER_DLLESPEC uint64_t xclAllocDeviceBuffer2(xclDeviceHandle handle, size_t size, + xclMemoryDomains domain, + unsigned flags); + + /** + * Free a previously allocated buffer on the device DDR + */ + + XCL_DRIVER_DLLESPEC void xclFreeDeviceBuffer(xclDeviceHandle handle, uint64_t buf); + + /** + * Copy host buffer contents to previously allocated device memory. "seek" specifies how many bytes to skip + * at the beginning of the destination before copying "size" bytes of host buffer. + */ + + XCL_DRIVER_DLLESPEC size_t xclCopyBufferHost2Device(xclDeviceHandle handle, uint64_t dest, + const void *src, size_t size, size_t seek); + + /** + * Copy contents of previously allocated device memory to host buffer. "skip" specifies how many bytes to skip + * from the beginning of the source before copying "size" bytes of device buffer. + */ + + XCL_DRIVER_DLLESPEC size_t xclCopyBufferDevice2Host(xclDeviceHandle handle, void *dest, + uint64_t src, size_t size, size_t skip); + + /** @} */ + + /** + * @defgroup readwrite DEVICE READ AND WRITE APIs + * ---------------------------------------------- + * + * These functions are used to read and write peripherals sitting on the address map. An implementation + * may use these to implement xclCopyXXX functions. OpenCL runtime will be using the BUFFER MANAGEMNT + * APIs described above to manage OpenCL buffers. It would use xclRead/xclWrite to program and manage + * peripherals on the card. For programming the Kernel, OpenCL runtime uses the kernel control register + * map generated by the OpenCL compiler. + * Note that the offset is wrt the address space + * @{ + */ + + XCL_DRIVER_DLLESPEC size_t xclWrite(xclDeviceHandle handle, xclAddressSpace space, uint64_t offset, + const void *hostBuf, size_t size); + + XCL_DRIVER_DLLESPEC size_t xclRead(xclDeviceHandle handle, xclAddressSpace space, uint64_t offset, + void *hostbuf, size_t size); + + /** @} */ + + // EXTENSIONS FOR PARTIAL RECONFIG FLOW + // ------------------------------------ + // TODO: Deprecate this. Update the device PROM with new base bitsream + XCL_DRIVER_DLLESPEC int xclUpgradeFirmware(xclDeviceHandle handle, const char *fileName); + + // Update the device PROM with new base bitsream(s). + XCL_DRIVER_DLLESPEC int xclUpgradeFirmware2(xclDeviceHandle handle, const char *file1, const char* file2); + + //TODO: Deprecate this. Update the device PROM for XSpi + XCL_DRIVER_DLLESPEC int xclUpgradeFirmwareXSpi(xclDeviceHandle handle, const char *fileName, int index); + + //Test the flash + XCL_DRIVER_DLLESPEC int xclTestXSpi(xclDeviceHandle handle, int slave_index); + + // Boot the FPGA with new bitsream in PROM. This will break the PCIe link and render the device + // unusable till a reboot of the host + XCL_DRIVER_DLLESPEC int xclBootFPGA(xclDeviceHandle handle); + + // NEW APIs in VERSION 1.1 + // ----------------------- + + /** + * @addtogroup devman + * @{ + */ + + /** + * Reset the device. All running kernels will be killed and buffers in DDR will be purged. + * A device would be reset if a user's application dies without waiting for running kernel(s) to finish. + */ + + XCL_DRIVER_DLLESPEC int xclResetDevice(xclDeviceHandle handle, xclResetKind kind); + + /** + * Set the OCL region frequncy + */ + + XCL_DRIVER_DLLESPEC int xclReClock(xclDeviceHandle handle, unsigned targetFreqMHz); + + /** + * Set the OCL region frequncies + */ + + XCL_DRIVER_DLLESPEC int xclReClock2(xclDeviceHandle handle, unsigned short region, + const unsigned short *targetFreqMHz); + + /** + * Return a count of devices found in the system + */ + XCL_DRIVER_DLLESPEC unsigned xclProbe(); + + /** + * Get exclusive ownership of the device. The lock is necessary before performing buffer + * migration, register access or bitstream downloads. + */ + XCL_DRIVER_DLLESPEC int xclLockDevice(xclDeviceHandle handle); + + /** @} */ + + /** + * @defgroup perfmon PERFORMANCE MONITORING OPERATIONS + * --------------------------------------------------- + * + * These functions are used to read and write to the performance monitoring infrastructure. + * OpenCL runtime will be using the BUFFER MANAGEMNT APIs described above to manage OpenCL buffers. + * It would use these functions to initialize and sample the performance monitoring on the card. + * Note that the offset is wrt the address space + */ + + XCL_DRIVER_DLLESPEC size_t xclGetDeviceTimestamp(xclDeviceHandle handle); + + XCL_DRIVER_DLLESPEC double xclGetDeviceClockFreqMHz(xclDeviceHandle handle); + + XCL_DRIVER_DLLESPEC double xclGetReadMaxBandwidthMBps(xclDeviceHandle handle); + + XCL_DRIVER_DLLESPEC double xclGetWriteMaxBandwidthMBps(xclDeviceHandle handle); + + XCL_DRIVER_DLLESPEC void xclSetOclRegionProfilingNumberSlots(xclDeviceHandle handle, + uint32_t numSlots); + + XCL_DRIVER_DLLESPEC size_t xclPerfMonClockTraining(xclDeviceHandle handle, xclPerfMonType type); + + XCL_DRIVER_DLLESPEC size_t xclPerfMonStartCounters(xclDeviceHandle handle, xclPerfMonType type); + + XCL_DRIVER_DLLESPEC size_t xclPerfMonStopCounters(xclDeviceHandle handle, xclPerfMonType type); + + XCL_DRIVER_DLLESPEC size_t xclPerfMonReadCounters(xclDeviceHandle handle, xclPerfMonType type, + xclCounterResults& counterResults); + + XCL_DRIVER_DLLESPEC size_t xclPerfMonStartTrace(xclDeviceHandle handle, xclPerfMonType type, + uint32_t startTrigger); + + XCL_DRIVER_DLLESPEC size_t xclPerfMonStopTrace(xclDeviceHandle handle, xclPerfMonType type); + + XCL_DRIVER_DLLESPEC uint32_t xclPerfMonGetTraceCount(xclDeviceHandle handle, xclPerfMonType type); + + XCL_DRIVER_DLLESPEC size_t xclPerfMonReadTrace(xclDeviceHandle handle, xclPerfMonType type, + xclTraceResultsVector& traceVector); + + /** @} */ + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/sdk/SDAccel/HAL/driver/include/xclperf.h b/sdk/SDAccel/HAL/driver/include/xclperf.h new file mode 100755 index 000000000..6be7ae9f8 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/include/xclperf.h @@ -0,0 +1,300 @@ +/** + * Xilinx SDAccel HAL userspace driver extension APIs + * Performance Monitoring Exposed Parameters + * Copyright (C) 2015-2016, Xilinx Inc - All rights reserved + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#ifndef _XCL_PERF_H_ +#define _XCL_PERF_H_ + +// DSA version (e.g., XCL_PLATFORM=xilinx_adm-pcie-7v3_1ddr_1_1) +// TODO: this will eventually be read from the device using lspci (see CR 870994) +#define DSA_MAJOR_VERSION 1 +#define DSA_MINOR_VERSION 1 + +/************************ APM 0: Monitor MIG Ports ****************************/ + +#define XPAR_AXI_PERF_MON_0_NUMBER_SLOTS 2 + +#if 1 +#define XPAR_AXI_PERF_MON_0_SLOT0_NAME "OCL Region" +#define XPAR_AXI_PERF_MON_0_SLOT1_NAME "Host" +#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT 0 +#define XPAR_AXI_PERF_MON_0_HOST_SLOT 1 +#else +// Uncomment for DSA v1.0 +// NOTE: since device profiling didn't work in v1.0, we'll leave this commented +//#define XPAR_AXI_PERF_MON_0_SLOT0_NAME "Host" +//#define XPAR_AXI_PERF_MON_0_SLOT1_NAME "OCL Region" +//#define XPAR_AXI_PERF_MON_0_HOST_SLOT 0 +//#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT 1 +#endif + +#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT2 2 +#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT3 3 +#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT4 4 +#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT5 5 +#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT6 6 +#define XPAR_AXI_PERF_MON_0_OCL_REGION_SLOT7 7 + +#define XPAR_AXI_PERF_MON_0_SLOT2_NAME "OCL Region, Master 2" +#define XPAR_AXI_PERF_MON_0_SLOT3_NAME "OCL Region, Master 3" +#define XPAR_AXI_PERF_MON_0_SLOT4_NAME "OCL Region, Master 4" +#define XPAR_AXI_PERF_MON_0_SLOT5_NAME "OCL Region, Master 5" +#define XPAR_AXI_PERF_MON_0_SLOT6_NAME "OCL Region, Master 6" +#define XPAR_AXI_PERF_MON_0_SLOT7_NAME "OCL Region, Master 7" + +#define XPAR_AXI_PERF_MON_0_SLOT0_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_0_SLOT1_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_0_SLOT2_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_0_SLOT3_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_0_SLOT4_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_0_SLOT5_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_0_SLOT6_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_0_SLOT7_DATA_WIDTH 512 + +/* Profile */ +#define XPAR_AXI_PERF_MON_0_IS_EVENT_COUNT 1 +#define XPAR_AXI_PERF_MON_0_HAVE_SAMPLED_COUNTERS 1 +#define XPAR_AXI_PERF_MON_0_NUMBER_COUNTERS (XPAR_AXI_PERF_MON_0_NUMBER_SLOTS * XAPM_METRIC_COUNTERS_PER_SLOT) + +/* Trace */ +#define XPAR_AXI_PERF_MON_0_IS_EVENT_LOG 1 +#define XPAR_AXI_PERF_MON_0_SHOW_AXI_IDS 1 +#define XPAR_AXI_PERF_MON_0_SHOW_AXI_LEN 1 +// 2 DDR platform +#define XPAR_AXI_PERF_MON_0_SHOW_AXI_IDS_2DDR 0 +#define XPAR_AXI_PERF_MON_0_SHOW_AXI_LEN_2DDR 1 + +/* AXI Stream FIFOs */ +#define XPAR_AXI_PERF_MON_0_TRACE_NUMBER_FIFO 3 +#define XPAR_AXI_PERF_MON_0_TRACE_WORD_WIDTH 128 +#define XPAR_AXI_PERF_MON_0_TRACE_NUMBER_SAMPLES 4096 +#define MAX_TRACE_NUMBER_SAMPLES 8192 + +#define XPAR_AXI_PERF_MON_0_TRACE_OFFSET_0 0x010000 +#define XPAR_AXI_PERF_MON_0_TRACE_OFFSET_1 0x011000 +#define XPAR_AXI_PERF_MON_0_TRACE_OFFSET_2 0x012000 +// CR 877788: the extra 0x80001000 is a bug in Vivado where the AXI4 base address is not set correctly +// TODO: remove it once that bug is fixed! +#define XPAR_AXI_PERF_MON_0_TRACE_OFFSET_AXI_FULL (0x2000000000 + 0x80001000) + +/********************* APM 1: Monitor PCIe DMA Masters ************************/ + +#define XPAR_AXI_PERF_MON_1_NUMBER_SLOTS 2 + +#define XPAR_AXI_PERF_MON_1_SLOT0_NAME "DMA AXI4 Master" +#define XPAR_AXI_PERF_MON_1_SLOT1_NAME "DMA AXI4-Lite Master" +#define XPAR_AXI_PERF_MON_1_SLOT2_NAME "Null" +#define XPAR_AXI_PERF_MON_1_SLOT3_NAME "Null" +#define XPAR_AXI_PERF_MON_1_SLOT4_NAME "Null" +#define XPAR_AXI_PERF_MON_1_SLOT5_NAME "Null" +#define XPAR_AXI_PERF_MON_1_SLOT6_NAME "Null" +#define XPAR_AXI_PERF_MON_1_SLOT7_NAME "Null" + +#define XPAR_AXI_PERF_MON_1_SLOT0_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_1_SLOT1_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_1_SLOT2_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_1_SLOT3_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_1_SLOT4_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_1_SLOT5_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_1_SLOT6_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_1_SLOT7_DATA_WIDTH 512 + +/* Profile */ +#define XPAR_AXI_PERF_MON_1_IS_EVENT_COUNT 1 +#define XPAR_AXI_PERF_MON_1_HAVE_SAMPLED_COUNTERS 1 +#define XPAR_AXI_PERF_MON_1_NUMBER_COUNTERS (XPAR_AXI_PERF_MON_1_NUMBER_SLOTS * XAPM_METRIC_COUNTERS_PER_SLOT) +#define XPAR_AXI_PERF_MON_1_SCALE_FACTOR 1 + +/* Trace */ +#define XPAR_AXI_PERF_MON_1_IS_EVENT_LOG 0 +#define XPAR_AXI_PERF_MON_1_SHOW_AXI_IDS 0 +#define XPAR_AXI_PERF_MON_1_SHOW_AXI_LEN 0 + +/* AXI Stream FIFOs */ +#define XPAR_AXI_PERF_MON_1_TRACE_NUMBER_FIFO 0 +#define XPAR_AXI_PERF_MON_1_TRACE_WORD_WIDTH 0 +#define XPAR_AXI_PERF_MON_1_TRACE_NUMBER_SAMPLES 0 + +/************************ APM 2: Monitor OCL Region ***************************/ + +#define XPAR_AXI_PERF_MON_2_NUMBER_SLOTS 1 + +#define XPAR_AXI_PERF_MON_2_SLOT0_NAME "Kernel0" +#define XPAR_AXI_PERF_MON_2_SLOT1_NAME "Kernel1" +#define XPAR_AXI_PERF_MON_2_SLOT2_NAME "Kernel2" +#define XPAR_AXI_PERF_MON_2_SLOT3_NAME "Kernel3" +#define XPAR_AXI_PERF_MON_2_SLOT4_NAME "Kernel4" +#define XPAR_AXI_PERF_MON_2_SLOT5_NAME "Kernel5" +#define XPAR_AXI_PERF_MON_2_SLOT6_NAME "Kernel6" +#define XPAR_AXI_PERF_MON_2_SLOT7_NAME "Kernel7" + +#define XPAR_AXI_PERF_MON_2_SLOT0_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_2_SLOT1_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_2_SLOT2_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_2_SLOT3_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_2_SLOT4_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_2_SLOT5_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_2_SLOT6_DATA_WIDTH 512 +#define XPAR_AXI_PERF_MON_2_SLOT7_DATA_WIDTH 512 + +/* Profile */ +#define XPAR_AXI_PERF_MON_2_IS_EVENT_COUNT 0 +#define XPAR_AXI_PERF_MON_2_HAVE_SAMPLED_COUNTERS 0 +#define XPAR_AXI_PERF_MON_2_NUMBER_COUNTERS 0 +#define XPAR_AXI_PERF_MON_2_SCALE_FACTOR 1 + +/* Trace */ +#define XPAR_AXI_PERF_MON_2_IS_EVENT_LOG 1 +#define XPAR_AXI_PERF_MON_2_SHOW_AXI_IDS 0 +#define XPAR_AXI_PERF_MON_2_SHOW_AXI_LEN 0 + +/* AXI Stream FIFOs */ +/* NOTE: number of FIFOs is dependent upon the number of compute units being monitored */ +//#define XPAR_AXI_PERF_MON_2_TRACE_NUMBER_FIFO 2 +#define XPAR_AXI_PERF_MON_2_TRACE_WORD_WIDTH 64 +#define XPAR_AXI_PERF_MON_2_TRACE_NUMBER_SAMPLES 4096 + +#define XPAR_AXI_PERF_MON_2_TRACE_OFFSET_0 -0x03000 +#define XPAR_AXI_PERF_MON_2_TRACE_OFFSET_1 -0x02000 +#define XPAR_AXI_PERF_MON_2_TRACE_OFFSET_2 -0x01000 + +/************************ APM Profile Counters ********************************/ + +#define XAPM_MAX_NUMBER_SLOTS 8 +#define XAPM_METRIC_COUNTERS_PER_SLOT 8 + +/* Metric counters per slot */ +#define XAPM_METRIC_WRITE_BYTES 0 +#define XAPM_METRIC_WRITE_TRANX 1 +#define XAPM_METRIC_WRITE_LATENCY 2 +#define XAPM_METRIC_READ_BYTES 3 +#define XAPM_METRIC_READ_TRANX 4 +#define XAPM_METRIC_READ_LATENCY 5 +#define XAPM_METRIC_WRITE_MIN_MAX 6 +#define XAPM_METRIC_READ_MIN_MAX 7 + +#define XAPM_METRIC_COUNT0_NAME "Write Byte Count" +#define XAPM_METRIC_COUNT1_NAME "Write Transaction Count" +#define XAPM_METRIC_COUNT2_NAME "Total Write Latency" +#define XAPM_METRIC_COUNT3_NAME "Read Byte Count" +#define XAPM_METRIC_COUNT4_NAME "Read Transaction Count" +#define XAPM_METRIC_COUNT5_NAME "Total Read Latency" +#define XAPM_METRIC_COUNT6_NAME "Min/Max Write Latency" +#define XAPM_METRIC_COUNT7_NAME "Min/Max Read Latency" + +/************************ APM Trace Stream ************************************/ + +/* Bit locations of trace flags */ +#define XAPM_READ_LAST 6 +#define XAPM_READ_FIRST 5 +#define XAPM_READ_ADDR 4 +#define XAPM_RESPONSE 3 +#define XAPM_WRITE_LAST 2 +#define XAPM_WRITE_FIRST 1 +#define XAPM_WRITE_ADDR 0 + +/* Bit locations of external event flags */ +#define XAPM_EXT_START 2 +#define XAPM_EXT_STOP 1 +#define XAPM_EXT_EVENT 0 + +/* Total number of bits per slot */ +#define FLAGS_PER_SLOT 7 +#define EXT_EVENTS_PER_SLOT 3 + +/* Cycles to add to timestamp if overflow occurs */ +#define LOOP_ADD_TIME (1<<16) + +/********************** Definitions: Enums, Structs ***************************/ + +/* Performance monitor type or location */ +enum xclPerfMonType { + XCL_PERF_MON_MEMORY = 0, + XCL_PERF_MON_HOST_INTERFACE = 1, + XCL_PERF_MON_OCL_REGION = 2, + XCL_PERF_MON_TOTAL_PROFILE = 3 +}; + +/* Performance monitor start event */ +enum xclPerfMonStartEvent { + XCL_PERF_MON_START_ADDR = 0, + XCL_PERF_MON_START_FIRST_DATA = 1 +}; + +/* Performance monitor end event */ +enum xclPerfMonEndEvent { + XCL_PERF_MON_END_LAST_DATA = 0, + XCL_PERF_MON_END_RESPONSE = 1 +}; + +enum xclPerfMonCounterType { + XCL_PERF_MON_WRITE_BYTES = 0, + XCL_PERF_MON_WRITE_TRANX = 1, + XCL_PERF_MON_WRITE_LATENCY = 2, + XCL_PERF_MON_READ_BYTES = 3, + XCL_PERF_MON_READ_TRANX = 4, + XCL_PERF_MON_READ_LATENCY = 5 +}; + +/* Performance monitor counter results */ +typedef struct { + //unsigned int NumSlots; + float SampleIntervalUsec; + unsigned int WriteBytes[XAPM_MAX_NUMBER_SLOTS]; + unsigned int WriteTranx[XAPM_MAX_NUMBER_SLOTS]; + unsigned int WriteLatency[XAPM_MAX_NUMBER_SLOTS]; + unsigned short WriteMinLatency[XAPM_MAX_NUMBER_SLOTS]; + unsigned short WriteMaxLatency[XAPM_MAX_NUMBER_SLOTS]; + unsigned int ReadBytes[XAPM_MAX_NUMBER_SLOTS]; + unsigned int ReadTranx[XAPM_MAX_NUMBER_SLOTS]; + unsigned int ReadLatency[XAPM_MAX_NUMBER_SLOTS]; + unsigned short ReadMinLatency[XAPM_MAX_NUMBER_SLOTS]; + unsigned short ReadMaxLatency[XAPM_MAX_NUMBER_SLOTS]; +} xclCounterResults; + +/* Performance monitor trace results */ +typedef struct { + unsigned char LogID; /* 0: event flags, 1: host timestamp */ + unsigned char Overflow; + unsigned char WriteStartEvent; + unsigned char WriteEndEvent; + unsigned char ReadStartEvent; + unsigned short Timestamp; + unsigned int HostTimestamp; + unsigned char RID[XAPM_MAX_NUMBER_SLOTS]; + unsigned char ARID[XAPM_MAX_NUMBER_SLOTS]; + unsigned char BID[XAPM_MAX_NUMBER_SLOTS]; + unsigned char AWID[XAPM_MAX_NUMBER_SLOTS]; + unsigned char EventFlags[XAPM_MAX_NUMBER_SLOTS]; + unsigned char ExtEventFlags[XAPM_MAX_NUMBER_SLOTS]; + unsigned char WriteAddrLen[XAPM_MAX_NUMBER_SLOTS]; + unsigned char ReadAddrLen[XAPM_MAX_NUMBER_SLOTS]; + unsigned short WriteBytes[XAPM_MAX_NUMBER_SLOTS]; + unsigned short ReadBytes[XAPM_MAX_NUMBER_SLOTS]; + unsigned short WriteAddrId[XAPM_MAX_NUMBER_SLOTS]; + unsigned short ReadAddrId[XAPM_MAX_NUMBER_SLOTS]; +} xclTraceResults; + +typedef struct { + unsigned int mLength; + //unsigned int mNumSlots; + xclTraceResults mArray[MAX_TRACE_NUMBER_SAMPLES]; +} xclTraceResultsVector; + +#endif + +// XSIP watermark, do not delete 67d7842dbbe25473c3c32b93c0da8047785f30d78e8a024de1b57352245f9689 diff --git a/sdk/SDAccel/HAL/driver/xcldma/include/perfmon_parameters.h b/sdk/SDAccel/HAL/driver/xcldma/include/perfmon_parameters.h new file mode 100644 index 000000000..85be84d10 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/include/perfmon_parameters.h @@ -0,0 +1,274 @@ +/* + * Performance Monitoring Internal Parameters + * Date: January 9, 2015 + * Author: Paul Schumacher + * + * NOTE: partially taken from file xaxipmon_hw.h in v5.0 of APM driver + */ + +#ifndef _PERFMON_PARAMETERS_H +#define _PERFMON_PARAMETERS_H + +/************************ AXI Stream FIFOs ************************************/ + +/* Address offsets in core */ +#define AXI_FIFO_RDFR 0x18 +#define AXI_FIFO_RDFD 0x20 +#define AXI_FIFO_RDFD_AXI_FULL 0x1000 +#define AXI_FIFO_RLR 0x24 +#define AXI_FIFO_SRR 0x28 +#define AXI_FIFO_RESET_VALUE 0xA5 + +/************************ APM Constant Definitions ****************************/ + +/* Register offsets of AXIMONITOR in the Device Config */ + +#define XAPM_GCC_HIGH_OFFSET 0x0000 /**< Global Clock Counter 32 to 63 bits */ +#define XAPM_GCC_LOW_OFFSET 0x0004 /**< Global Clock Counter Lower 0-31 bits */ +#define XAPM_SI_HIGH_OFFSET 0x0020 /**< Sample Interval MSB */ +#define XAPM_SI_LOW_OFFSET 0x0024 /**< Sample Interval LSB */ +#define XAPM_SICR_OFFSET 0x0028 /**< Sample Interval Control Register */ +#define XAPM_SR_OFFSET 0x002C /**< Sample Register */ +#define XAPM_GIE_OFFSET 0x0030 /**< Global Interrupt Enable Register */ +#define XAPM_IE_OFFSET 0x0034 /**< Interrupt Enable Register */ +#define XAPM_IS_OFFSET 0x0038 /**< Interrupt Status Register */ + +#define XAPM_MSR0_OFFSET 0x0044 /**< Metric Selector 0 Register */ +#define XAPM_MSR1_OFFSET 0x0048 /**< Metric Selector 1 Register */ +#define XAPM_MSR2_OFFSET 0x004C /**< Metric Selector 2 Register */ + +#define XAPM_MC0_OFFSET 0x0100 /**< Metric Counter 0 Register */ +#define XAPM_INC0_OFFSET 0x0104 /**< Incrementer 0 Register */ +#define XAPM_RANGE0_OFFSET 0x0108 /**< Range 0 Register */ +#define XAPM_MC0LOGEN_OFFSET 0x010C /**< Metric Counter 0 Log Enable Register */ +#define XAPM_MC1_OFFSET 0x0110 /**< Metric Counter 1 Register */ +#define XAPM_INC1_OFFSET 0x0114 /**< Incrementer 1 Register */ +#define XAPM_RANGE1_OFFSET 0x0118 /**< Range 1 Register */ +#define XAPM_MC1LOGEN_OFFSET 0x011C /**< Metric Counter 1 Log Enable Register */ +#define XAPM_MC2_OFFSET 0x0120 /**< Metric Counter 2 Register */ +#define XAPM_INC2_OFFSET 0x0124 /**< Incrementer 2 Register */ +#define XAPM_RANGE2_OFFSET 0x0128 /**< Range 2 Register */ +#define XAPM_MC2LOGEN_OFFSET 0x012C /**< Metric Counter 2 Log Enable Register */ +#define XAPM_MC3_OFFSET 0x0130 /**< Metric Counter 3 Register */ +#define XAPM_INC3_OFFSET 0x0134 /**< Incrementer 3 Register */ +#define XAPM_RANGE3_OFFSET 0x0138 /**< Range 3 Register */ +#define XAPM_MC3LOGEN_OFFSET 0x013C /**< Metric Counter 3 Log Enable Register */ +#define XAPM_MC4_OFFSET 0x0140 /**< Metric Counter 4 Register */ +#define XAPM_INC4_OFFSET 0x0144 /**< Incrementer 4 Register */ +#define XAPM_RANGE4_OFFSET 0x0148 /**< Range 4 Register */ +#define XAPM_MC4LOGEN_OFFSET 0x014C /**< Metric Counter 4 Log Enable Register */ +#define XAPM_MC5_OFFSET 0x0150 /**< Metric Counter 5 Register */ +#define XAPM_INC5_OFFSET 0x0154 /**< Incrementer 5 Register */ +#define XAPM_RANGE5_OFFSET 0x0158 /**< Range 5 Register */ +#define XAPM_MC5LOGEN_OFFSET 0x015C /**< Metric Counter 5 Log Enable Register */ +#define XAPM_MC6_OFFSET 0x0160 /**< Metric Counter 6 Register */ +#define XAPM_INC6_OFFSET 0x0164 /**< Incrementer 6 Register */ +#define XAPM_RANGE6_OFFSET 0x0168 /**< Range 6 Register */ +#define XAPM_MC6LOGEN_OFFSET 0x016C /**< Metric Counter 6 Log Enable Register */ +#define XAPM_MC7_OFFSET 0x0170 /**< Metric Counter 7 Register */ +#define XAPM_INC7_OFFSET 0x0174 /**< Incrementer 7 Register */ +#define XAPM_RANGE7_OFFSET 0x0178 /**< Range 7 Register */ +#define XAPM_MC7LOGEN_OFFSET 0x017C /**< Metric Counter 7 Log Enable Register */ +#define XAPM_MC8_OFFSET 0x0180 /**< Metric Counter 8 Register */ +#define XAPM_INC8_OFFSET 0x0184 /**< Incrementer 8 Register */ +#define XAPM_RANGE8_OFFSET 0x0188 /**< Range 8 Register */ +#define XAPM_MC8LOGEN_OFFSET 0x018C /**< Metric Counter 8 Log Enable Register */ +#define XAPM_MC9_OFFSET 0x0190 /**< Metric Counter 9 Register */ +#define XAPM_INC9_OFFSET 0x0194 /**< Incrementer 9 Register */ +#define XAPM_RANGE9_OFFSET 0x0198 /**< Range 9 Register */ +#define XAPM_MC9LOGEN_OFFSET 0x019C /**< Metric Counter 9 Log Enable Register */ + +#define XAPM_SMC0_OFFSET 0x0200 /**< Sampled Metric Counter 0 Register */ +#define XAPM_SINC0_OFFSET 0x0204 /**< Sampled Incrementer 0 Register */ +#define XAPM_SMC1_OFFSET 0x0210 /**< Sampled Metric Counter 1 Register */ +#define XAPM_SINC1_OFFSET 0x0214 /**< Sampled Incrementer 1 Register */ +#define XAPM_SMC2_OFFSET 0x0220 /**< Sampled Metric Counter 2 Register */ +#define XAPM_SINC2_OFFSET 0x0224 /**< Sampled Incrementer 2 Register */ +#define XAPM_SMC3_OFFSET 0x0230 /**< Sampled Metric Counter 3 Register */ +#define XAPM_SINC3_OFFSET 0x0234 /**< Sampled Incrementer 3 Register */ +#define XAPM_SMC4_OFFSET 0x0240 /**< Sampled Metric Counter 4 Register */ +#define XAPM_SINC4_OFFSET 0x0244 /**< Sampled Incrementer 4 Register */ +#define XAPM_SMC5_OFFSET 0x0250 /**< Sampled Metric Counter 5 Register */ +#define XAPM_SINC5_OFFSET 0x0254 /**< Sampled Incrementer 5 Register */ +#define XAPM_SMC6_OFFSET 0x0260 /**< Sampled Metric Counter 6 Register */ +#define XAPM_SINC6_OFFSET 0x0264 /**< Sampled Incrementer 6 Register */ +#define XAPM_SMC7_OFFSET 0x0270 /**< Sampled Metric Counter 7 Register */ +#define XAPM_SINC7_OFFSET 0x0274 /**< Sampled Incrementer 7 Register */ +#define XAPM_SMC8_OFFSET 0x0280 /**< Sampled Metric Counter 8 Register */ +#define XAPM_SINC8_OFFSET 0x0284 /**< Sampled Incrementer 8 Register */ +#define XAPM_SMC9_OFFSET 0x0290 /**< Sampled Metric Counter 9 Register */ +#define XAPM_SINC9_OFFSET 0x0294 /**< Sampled Incrementer 9 Register */ + +#define XAPM_MC10_OFFSET 0x01A0 /**< Metric Counter 10 Register */ +#define XAPM_MC11_OFFSET 0x01B0 /**< Metric Counter 11 Register */ +#define XAPM_MC12_OFFSET 0x0500 /**< Metric Counter 12 Register */ +#define XAPM_MC13_OFFSET 0x0510 /**< Metric Counter 13 Register */ +#define XAPM_MC14_OFFSET 0x0520 /**< Metric Counter 14Register */ +#define XAPM_MC15_OFFSET 0x0530 /**< Metric Counter 15 Register */ +#define XAPM_MC16_OFFSET 0x0540 /**< Metric Counter 16 Register */ +#define XAPM_MC17_OFFSET 0x0550 /**< Metric Counter 17 Register */ +#define XAPM_MC18_OFFSET 0x0560 /**< Metric Counter 18 Register */ +#define XAPM_MC19_OFFSET 0x0570 /**< Metric Counter 19 Register */ +#define XAPM_MC20_OFFSET 0x0580 /**< Metric Counter 20 Register */ +#define XAPM_MC21_OFFSET 0x0590 /**< Metric Counter 21 Register */ +#define XAPM_MC22_OFFSET 0x05A0 /**< Metric Counter 22 Register */ +#define XAPM_MC23_OFFSET 0x05B0 /**< Metric Counter 23 Register */ +#define XAPM_MC24_OFFSET 0x0700 /**< Metric Counter 24 Register */ +#define XAPM_MC25_OFFSET 0x0710 /**< Metric Counter 25 Register */ +#define XAPM_MC26_OFFSET 0x0720 /**< Metric Counter 26 Register */ +#define XAPM_MC27_OFFSET 0x0730 /**< Metric Counter 27 Register */ +#define XAPM_MC28_OFFSET 0x0740 /**< Metric Counter 28 Register */ +#define XAPM_MC29_OFFSET 0x0750 /**< Metric Counter 29 Register */ +#define XAPM_MC30_OFFSET 0x0760 /**< Metric Counter 30 Register */ +#define XAPM_MC31_OFFSET 0x0770 /**< Metric Counter 31 Register */ +#define XAPM_MC32_OFFSET 0x0780 /**< Metric Counter 32 Register */ +#define XAPM_MC33_OFFSET 0x0790 /**< Metric Counter 33 Register */ +#define XAPM_MC34_OFFSET 0x07A0 /**< Metric Counter 34 Register */ +#define XAPM_MC35_OFFSET 0x07B0 /**< Metric Counter 35 Register */ +#define XAPM_MC36_OFFSET 0x0900 /**< Metric Counter 36 Register */ +#define XAPM_MC37_OFFSET 0x0910 /**< Metric Counter 37 Register */ +#define XAPM_MC38_OFFSET 0x0920 /**< Metric Counter 38 Register */ +#define XAPM_MC39_OFFSET 0x0930 /**< Metric Counter 39 Register */ +#define XAPM_MC40_OFFSET 0x0940 /**< Metric Counter 40 Register */ +#define XAPM_MC41_OFFSET 0x0950 /**< Metric Counter 41 Register */ +#define XAPM_MC42_OFFSET 0x0960 /**< Metric Counter 42 Register */ +#define XAPM_MC43_OFFSET 0x0970 /**< Metric Counter 43 Register */ +#define XAPM_MC44_OFFSET 0x0980 /**< Metric Counter 44 Register */ +#define XAPM_MC45_OFFSET 0x0990 /**< Metric Counter 45 Register */ +#define XAPM_MC46_OFFSET 0x09A0 /**< Metric Counter 46 Register */ +#define XAPM_MC47_OFFSET 0x09B0 /**< Metric Counter 47 Register */ + +#define XAPM_SMC10_OFFSET 0x02A0 /**< Sampled Metric Counter 10 Register */ +#define XAPM_SMC11_OFFSET 0x02B0 /**< Sampled Metric Counter 11 Register */ +#define XAPM_SMC12_OFFSET 0x0600 /**< Sampled Metric Counter 12 Register */ +#define XAPM_SMC13_OFFSET 0x0610 /**< Sampled Metric Counter 13 Register */ +#define XAPM_SMC14_OFFSET 0x0620 /**< Sampled Metric Counter 14 Register */ +#define XAPM_SMC15_OFFSET 0x0630 /**< Sampled Metric Counter 15 Register */ +#define XAPM_SMC16_OFFSET 0x0640 /**< Sampled Metric Counter 16 Register */ +#define XAPM_SMC17_OFFSET 0x0650 /**< Sampled Metric Counter 17 Register */ +#define XAPM_SMC18_OFFSET 0x0660 /**< Sampled Metric Counter 18 Register */ +#define XAPM_SMC19_OFFSET 0x0670 /**< Sampled Metric Counter 19 Register */ +#define XAPM_SMC20_OFFSET 0x0680 /**< Sampled Metric Counter 20 Register */ +#define XAPM_SMC21_OFFSET 0x0690 /**< Sampled Metric Counter 21 Register */ +#define XAPM_SMC22_OFFSET 0x06A0 /**< Sampled Metric Counter 22 Register */ +#define XAPM_SMC23_OFFSET 0x06B0 /**< Sampled Metric Counter 23 Register */ +#define XAPM_SMC24_OFFSET 0x0800 /**< Sampled Metric Counter 24 Register */ +#define XAPM_SMC25_OFFSET 0x0810 /**< Sampled Metric Counter 25 Register */ +#define XAPM_SMC26_OFFSET 0x0820 /**< Sampled Metric Counter 26 Register */ +#define XAPM_SMC27_OFFSET 0x0830 /**< Sampled Metric Counter 27 Register */ +#define XAPM_SMC28_OFFSET 0x0840 /**< Sampled Metric Counter 28 Register */ +#define XAPM_SMC29_OFFSET 0x0850 /**< Sampled Metric Counter 29 Register */ +#define XAPM_SMC30_OFFSET 0x0860 /**< Sampled Metric Counter 30 Register */ +#define XAPM_SMC31_OFFSET 0x0870 /**< Sampled Metric Counter 31 Register */ +#define XAPM_SMC32_OFFSET 0x0880 /**< Sampled Metric Counter 32 Register */ +#define XAPM_SMC33_OFFSET 0x0890 /**< Sampled Metric Counter 33 Register */ +#define XAPM_SMC34_OFFSET 0x08A0 /**< Sampled Metric Counter 34 Register */ +#define XAPM_SMC35_OFFSET 0x08B0 /**< Sampled Metric Counter 35 Register */ +#define XAPM_SMC36_OFFSET 0x0A00 /**< Sampled Metric Counter 36 Register */ +#define XAPM_SMC37_OFFSET 0x0A10 /**< Sampled Metric Counter 37 Register */ +#define XAPM_SMC38_OFFSET 0x0A20 /**< Sampled Metric Counter 38 Register */ +#define XAPM_SMC39_OFFSET 0x0A30 /**< Sampled Metric Counter 39 Register */ +#define XAPM_SMC40_OFFSET 0x0A40 /**< Sampled Metric Counter 40 Register */ +#define XAPM_SMC41_OFFSET 0x0A50 /**< Sampled Metric Counter 41 Register */ +#define XAPM_SMC42_OFFSET 0x0A60 /**< Sampled Metric Counter 42 Register */ +#define XAPM_SMC43_OFFSET 0x0A70 /**< Sampled Metric Counter 43 Register */ +#define XAPM_SMC44_OFFSET 0x0A80 /**< Sampled Metric Counter 44 Register */ +#define XAPM_SMC45_OFFSET 0x0A90 /**< Sampled Metric Counter 45 Register */ +#define XAPM_SMC46_OFFSET 0x0AA0 /**< Sampled Metric Counter 46 Register */ +#define XAPM_SMC47_OFFSET 0x0AB0 /**< Sampled Metric Counter 47 Register */ +/* Sampled metric counters 48-63: In Profile mode, this are min/max latency registers */ +#define XAPM_SMC48_OFFSET 0x0254 /**< Sampled Metric Counter 48 Register */ +#define XAPM_SMC49_OFFSET 0x0258 /**< Sampled Metric Counter 49 Register */ +#define XAPM_SMC50_OFFSET 0x02B4 /**< Sampled Metric Counter 50 Register */ +#define XAPM_SMC51_OFFSET 0x02B8 /**< Sampled Metric Counter 51 Register */ +#define XAPM_SMC52_OFFSET 0x0654 /**< Sampled Metric Counter 52 Register */ +#define XAPM_SMC53_OFFSET 0x0658 /**< Sampled Metric Counter 53 Register */ +#define XAPM_SMC54_OFFSET 0x06B4 /**< Sampled Metric Counter 54 Register */ +#define XAPM_SMC55_OFFSET 0x06B8 /**< Sampled Metric Counter 55 Register */ +#define XAPM_SMC56_OFFSET 0x0854 /**< Sampled Metric Counter 56 Register */ +#define XAPM_SMC57_OFFSET 0x0858 /**< Sampled Metric Counter 57 Register */ +#define XAPM_SMC58_OFFSET 0x08B4 /**< Sampled Metric Counter 58 Register */ +#define XAPM_SMC59_OFFSET 0x08B8 /**< Sampled Metric Counter 59 Register */ +#define XAPM_SMC60_OFFSET 0x0A54 /**< Sampled Metric Counter 60 Register */ +#define XAPM_SMC61_OFFSET 0x0A58 /**< Sampled Metric Counter 61 Register */ +#define XAPM_SMC62_OFFSET 0x0AB4 /**< Sampled Metric Counter 62 Register */ +#define XAPM_SMC63_OFFSET 0x0AB8 /**< Sampled Metric Counter 63 Register */ + +#define XAPM_CTL_OFFSET 0x0300 /**< Control Register */ +#define XAPM_ID_OFFSET 0x0304 /**< Latency ID Register */ +#define XAPM_IDMASK_OFFSET 0x0308 /**< ID Mask Register */ +#define XAPM_FEC_OFFSET 0x0400 /**< Flag Enable Control Register */ +#define XAPM_SWD_OFFSET 0x0404 /**< Software-written Data Register */ +#define XAPM_ENT_OFFSET 0x0408 /**< Enable Trace Register */ + +/* AXI Monitor Sample Interval Control Register mask(s) */ + +#define XAPM_SICR_MCNTR_RST_MASK 0x00000100 /**< Enable the Metric Counter Reset */ +#define XAPM_SICR_LOAD_MASK 0x00000002 /**< Load the Sample Interval Register Value into the counter */ +#define XAPM_SICR_ENABLE_MASK 0x00000001 /**< Enable the downcounter */ + +/* Interrupt Status/Enable Register Bit Definitions and Masks */ + +#define XAPM_IXR_MC9_OVERFLOW_MASK 0x00001000 /**< Metric Counter 9 Overflow> */ +#define XAPM_IXR_MC8_OVERFLOW_MASK 0x00000800 /**< Metric Counter 8 Overflow> */ +#define XAPM_IXR_MC7_OVERFLOW_MASK 0x00000400 /**< Metric Counter 7 Overflow> */ +#define XAPM_IXR_MC6_OVERFLOW_MASK 0x00000200 /**< Metric Counter 6 Overflow> */ +#define XAPM_IXR_MC5_OVERFLOW_MASK 0x00000100 /**< Metric Counter 5 Overflow> */ +#define XAPM_IXR_MC4_OVERFLOW_MASK 0x00000080 /**< Metric Counter 4 Overflow> */ +#define XAPM_IXR_MC3_OVERFLOW_MASK 0x00000040 /**< Metric Counter 3 Overflow> */ +#define XAPM_IXR_MC2_OVERFLOW_MASK 0x00000020 /**< Metric Counter 2 Overflow> */ +#define XAPM_IXR_MC1_OVERFLOW_MASK 0x00000010 /**< Metric Counter 1 Overflow> */ +#define XAPM_IXR_MC0_OVERFLOW_MASK 0x00000008 /**< Metric Counter 0 Overflow> */ +#define XAPM_IXR_FIFO_FULL_MASK 0x00000004 /**< Event Log FIFO full> */ +#define XAPM_IXR_SIC_OVERFLOW_MASK 0x00000002 /**< Sample Interval Counter Overflow> */ +#define XAPM_IXR_GCC_OVERFLOW_MASK 0x00000001 /**< Global Clock Counter Overflow> */ +#define XAPM_IXR_ALL_MASK (XAPM_IXR_SIC_OVERFLOW_MASK | \ + XAPM_IXR_GCC_OVERFLOW_MASK | \ + XAPM_IXR_FIFO_FULL_MASK | \ + XAPM_IXR_MC0_OVERFLOW_MASK | \ + XAPM_IXR_MC1_OVERFLOW_MASK | \ + XAPM_IXR_MC2_OVERFLOW_MASK | \ + XAPM_IXR_MC3_OVERFLOW_MASK | \ + XAPM_IXR_MC4_OVERFLOW_MASK | \ + XAPM_IXR_MC5_OVERFLOW_MASK | \ + XAPM_IXR_MC6_OVERFLOW_MASK | \ + XAPM_IXR_MC7_OVERFLOW_MASK | \ + XAPM_IXR_MC8_OVERFLOW_MASK | \ + XAPM_IXR_MC9_OVERFLOW_MASK) + +/* AXI Monitor Control Register mask(s) */ + +#define XAPM_CR_FIFO_RESET_MASK 0x02000000 /**< FIFO Reset */ +#define XAPM_CR_GCC_RESET_MASK 0x00020000 /**< Global Clk Counter Reset */ +#define XAPM_CR_GCC_ENABLE_MASK 0x00010000 /**< Global Clk Counter Enable */ +#define XAPM_CR_EVTLOG_EXTTRIGGER_MASK 0x00000200 /**< Enable External trigger to start event Log */ +#define XAPM_CR_EVENTLOG_ENABLE_MASK 0x00000100 /**< Event Log Enable */ +#define XAPM_CR_RDLATENCY_END_MASK 0x00000080 /**< Write Latency End point */ +#define XAPM_CR_RDLATENCY_START_MASK 0x00000040 /**< Read Latency Start point */ +#define XAPM_CR_WRLATENCY_END_MASK 0x00000020 /**< Write Latency End point */ +#define XAPM_CR_WRLATENCY_START_MASK 0x00000010 /**< Write Latency Start point */ +#define XAPM_CR_IDFILTER_ENABLE_MASK 0x00000008 /**< ID Filter Enable */ +#define XAPM_CR_MCNTR_EXTTRIGGER_MASK 0x00000004 /**< Enable External trigger to start Metric Counters */ +#define XAPM_CR_MCNTR_RESET_MASK 0x00000002 /**< Metrics Counter Reset */ +#define XAPM_CR_MCNTR_ENABLE_MASK 0x00000001 /**< Metrics Counter Enable */ + +/* AXI Monitor ID Register mask(s) */ + +#define XAPM_ID_RID_MASK 0xFFFF0000 /**< Read ID */ +#define XAPM_ID_WID_MASK 0x0000FFFF /**< Write ID */ + +/* AXI Monitor ID Mask Register mask(s) */ + +#define XAPM_MASKID_RID_MASK 0xFFFF0000 /**< Read ID Mask */ +#define XAPM_MASKID_WID_MASK 0x0000FFFF /**< Write ID Mask*/ + +/* AXI Monitor Min/Max Register masks and shifts */ + +#define XAPM_MAX_LATENCY_MASK 0xFFFF0000 /**< Max Latency Mask */ +#define XAPM_MIN_LATENCY_MASK 0x0000FFFF /**< Min Latency Mask */ +#define XAPM_MAX_LATENCY_SHIFT 16 /**< Max Latency Shift */ +#define XAPM_MIN_LATENCY_SHIFT 0 /**< Min Latency Shift */ + +#endif + +// XSIP watermark, do not delete 67d7842dbbe25473c3c32b93c0da8047785f30d78e8a024de1b57352245f9689 diff --git a/sdk/SDAccel/HAL/driver/xcldma/include/xbar_sys_parameters.h b/sdk/SDAccel/HAL/driver/xcldma/include/xbar_sys_parameters.h new file mode 100644 index 000000000..3e5ae36a3 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/include/xbar_sys_parameters.h @@ -0,0 +1,146 @@ +// Copyright Xilinx, Inc 2014-2016 +// Author: Sonal Santan +// Register definition for the XDMA + +#ifndef __XDMA_SYS_PARAMETERS_H__ +#define __XDMA_SYS_PARAMETERS_H__ + +#include "perfmon_parameters.h" + +#define XILINX_VENDOR_ID 0x10EE + +//parameters for HWICAP, Flash and APM on PCIe BAR +#define OCL_CTLR_OFFSET 0x000000 +#define HWICAP_OFFSET 0x020000 +#define AXI_GATE_OFFSET 0x030000 +#define AXI_GATE_OFFSET_READ 0x030008 + +#define FEATURE_ID 0x031000 + +#define GENERAL_STATUS 0x032000 + +#define BPI_FLASH_OFFSET 0x040000 + +#define AXI_I2C_OFFSET 0x041000 +#define PERFMON0_OFFSET 0x100000 +#define PERFMON1_OFFSET 0x120000 +#define PERFMON2_OFFSET 0x010000 + +#define OCL_CLKWIZ_OFFSET 0x050000 +#define OCL_CLKWIZ_BASEADDR 0x050000 +#define OCL_CLKWIZ_BASEADDR2 0x051000 + +#define OCL_CLKWIZ_STATUS_OFFSET 0x4 +#define OCL_CLKWIZ_CONFIG_OFFSET(n) (0x200 + 4 * (n)) + +// These are kept only for backwards compatipility. These macros should +// not be used anymore. +#define OCL_CLKWIZ_STATUS (OCL_CLKWIZ_BASEADDR + OCL_CLKWIZ_STATUS_OFFSET) +#define OCL_CLKWIZ_CONFIG(n) (OCL_CLKWIZ_BASEADDR + OCL_CLKWIZ_CONFIG_OFFSET(n)) + +#define HWICAP_BAR 0 +#define BPI_FLASH_BAR 0 +#define ACCELERATOR_BAR 0 +#define PERFMON_BAR 0 +#define HWICAP_WRITE_FIFO_SIZE 64 +#define MMAP_SIZE_USER 0x400000 +#define MMAP_SIZE_CTRL 0x8000 +#define DDR_BUFFER_ALIGNMENT 0x40 +#define DMA_HWICAP_BITFILE_BUFFER_SIZE 1024 +#define OCL_CU_CTRL_RANGE 0x1000 + +#define ULTRASCALE_MCAP_CONFIG_BASE 0x340 + +/************************** Constant Definitions ****************************/ + +/* Input frequency */ +#define XDMA_7V3_INPUT_FREQ 100 +#define XDMA_KU3_INPUT_FREQ 100 + +#define XDMA_7V3_CLKWIZ_CONFIG0 0x04000a01 +#define XDMA_KU3_CLKWIZ_CONFIG0 0x04000a01 + +/* Used for parsing bitstream header */ +#define XHI_EVEN_MAGIC_BYTE 0x0f +#define XHI_ODD_MAGIC_BYTE 0xf0 + +/* Extra mode for IDLE */ +#define XHI_OP_IDLE -1 + +#define XHI_BIT_HEADER_FAILURE -1 + +/* The imaginary module length register */ +#define XHI_MLR 15 + +/** + * AXI IIC Bus Interface v2.0 + * http://www.xilinx.com/support/documentation/ip_documentation/axi_iic/v2_0/pg090-axi-iic.pdf + */ +#define AXI_I2C_SOFT_RESET AXI_I2C_OFFSET+0x040 +#define AXI_I2C_CR AXI_I2C_OFFSET+0x100 +#define AXI_I2C_TX_FIFO AXI_I2C_OFFSET+0x108 +#define AXI_I2C_RX_FIFO AXI_I2C_OFFSET+0x10c +#define AXI_I2C_RX_FIFO_PIRQ AXI_I2C_OFFSET+0x120 + + +/** ICAP register definition **/ +#define XHWICAP_GIER HWICAP_OFFSET+0x1c +#define XHWICAP_ISR HWICAP_OFFSET+0x20 +#define XHWICAP_IER HWICAP_OFFSET+0x28 +#define XHWICAP_WF HWICAP_OFFSET+0x100 +#define XHWICAP_RF HWICAP_OFFSET+0x104 +#define XHWICAP_SZ HWICAP_OFFSET+0x108 +#define XHWICAP_CR HWICAP_OFFSET+0x10c +#define XHWICAP_SR HWICAP_OFFSET+0x110 +#define XHWICAP_WFV HWICAP_OFFSET+0x114 +#define XHWICAP_RFO HWICAP_OFFSET+0x118 +#define XHWICAP_ASR HWICAP_OFFSET+0x11c + +/** +* Bitstream header information. +*/ +typedef struct { + unsigned int HeaderLength; /* Length of header in 32 bit words */ + unsigned int BitstreamLength; /* Length of bitstream to read in bytes*/ + unsigned char *DesignName; /* Design name read from bitstream header */ + unsigned char *PartName; /* Part name read from bitstream header */ + unsigned char *Date; /* Date read from bitstream header */ + unsigned char *Time; /* Bitstream creation time read from header */ + unsigned int MagicLength; /* Length of the magic numbers in header */ +} XHwIcap_Bit_Header; + +/* + * Flash programming constants + * XAPP 518 + * http://www.xilinx.com/support/documentation/application_notes/xapp518-isp-bpi-prom-virtex-6-pcie.pdf + * Table 1 + */ + +#define START_ADDR_CMD 0x53410000 +#define END_ADDR_CMD 0x45000000 +#define UNLOCK_CMD 0x556E6C6B +#define ERASE_CMD 0x45726173 +#define PROGRAM_CMD 0x50726F67 + +#define READY_STAT 0x00008000 +#define ERASE_STAT 0x00000000 +#define PROGRAM_STAT 0x00000080 + +/* + * Booting FPGA from PROM + * http://www.xilinx.com/support/documentation/user_guides/ug470_7Series_Config.pdf + * Table 7.1 + */ + +#define DUMMY_WORD 0xFFFFFFFF +#define SYNC_WORD 0xAA995566 +#define TYPE1_NOOP 0x20000000 +#define TYPE1_WRITE_WBSTAR 0x30020001 +#define WBSTAR_ADD10 0x00000000 +#define WBSTAR_ADD11 0x01000000 +#define TYPE1_WRITE_CMD 0x30008001 +#define IPROG_CMD 0x0000000F + +#endif + +// XSIP watermark, do not delete 67d7842dbbe25473c3c32b93c0da8047785f30d78e8a024de1b57352245f9689 diff --git a/sdk/SDAccel/HAL/driver/xcldma/include/xdma-ioctl.h b/sdk/SDAccel/HAL/driver/xcldma/include/xdma-ioctl.h new file mode 100644 index 000000000..a4e5b16e5 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/include/xdma-ioctl.h @@ -0,0 +1,148 @@ +#ifndef _XDMA_IOCALLS_POSIX_H_ +#define _XDMA_IOCALLS_POSIX_H_ + +#ifndef _WINDOWS +// TODO: Windows build support +#include +#endif + +/* Use 'x' as magic number */ +#define XDMA_IOC_MAGIC 'x' +/* XL OpenCL X->58(ASCII), L->6C(ASCII), O->0 C->C L->6C(ASCII); */ +#define XDMA_XCL_MAGIC 0X586C0C6C + +#define OCL_NUM_CLOCKS 2 + +/* + * S means "Set" through a ptr, + * T means "Tell" directly with the argument value + * G means "Get": reply by setting through a pointer + * Q means "Query": response is on the return value + * X means "eXchange": switch G and S atomically + * H means "sHift": switch T and Q atomically + * + * _IO(type,nr) no arguments + * _IOR(type,nr,datatype) read data from driver + * _IOW(type,nr.datatype) write data to driver + * _IORW(type,nr,datatype) read/write data + * + * _IOC_DIR(nr) returns direction + * _IOC_TYPE(nr) returns magic + * _IOC_NR(nr) returns number + * _IOC_SIZE(nr) returns size + */ + +enum XDMA_IOC_TYPES { + XDMA_IOC_NOP, + XDMA_IOC_INFO, + XDMA_IOC_ICAP_DOWNLOAD, + XDMA_IOC_MCAP_DOWNLOAD, + XDMA_IOC_HOT_RESET, + XDMA_IOC_OCL_RESET, + XDMA_IOC_OCL_FREQ_SCALING, + XDMA_IOC_REBOOT, + XDMA_IOC_INFO2, + XDMA_IOC_OCL_FREQ_SCALING2, + XDMA_IOC_MAX +}; + +/** + * TODO: Change the structs to use linux kernel preferred types like (u)int64_t + * instead of (unsigned) short, etc. + */ + +struct xdma_ioc_base { + unsigned int magic; + unsigned int command; +}; + +struct xdma_ioc_info { + struct xdma_ioc_base base; + unsigned short vendor; + unsigned short device; + unsigned short subsystem_vendor; + unsigned short subsystem_device; + unsigned dma_engine_version; + unsigned driver_version; + unsigned long long feature_id; + unsigned ocl_frequency; + unsigned pcie_link_width; + unsigned pcie_link_speed; +}; + +struct xdma_ioc_info2 { + struct xdma_ioc_base base; + unsigned short vendor; + unsigned short device; + unsigned short subsystem_vendor; + unsigned short subsystem_device; + unsigned dma_engine_version; + unsigned driver_version; + unsigned long long feature_id; + unsigned short ocl_frequency[OCL_NUM_CLOCKS]; + unsigned short pcie_link_width; + unsigned short pcie_link_speed; + unsigned short num_clocks; + int16_t onchip_temp; + int16_t fan_temp; + unsigned short fan_speed; + unsigned short vcc_int; + unsigned short vcc_aux; + unsigned short vcc_bram; + bool mig_calibration; + char reserved[64]; +}; + +struct xdma_ioc_bitstream { + struct xdma_ioc_base base; + struct xclBin *xclbin; +}; + +struct xdma_performance_ioctl +{ + /* IOCTL_XDMA_IOCTL_Vx */ + uint32_t version; + uint32_t transfer_size; + /* measurement */ + uint32_t stopped; + uint32_t iterations; + uint64_t clock_cycle_count; + uint64_t data_cycle_count; + uint64_t pending_count; +}; + +struct xdma_ioc_freqscaling { + struct xdma_ioc_base base; + unsigned ocl_target_freq; +}; + +struct xdma_ioc_freqscaling2 { + struct xdma_ioc_base base; + unsigned ocl_region; + unsigned short ocl_target_freq[OCL_NUM_CLOCKS]; +}; + +#define XDMA_IOCINFO _IOWR(XDMA_IOC_MAGIC,XDMA_IOC_INFO, struct xdma_ioc_info) +#define XDMA_IOCINFO2 _IOWR(XDMA_IOC_MAGIC,XDMA_IOC_INFO2, struct xdma_ioc_info2) +#define XDMA_IOCICAPDOWNLOAD _IOW(XDMA_IOC_MAGIC,XDMA_IOC_ICAP_DOWNLOAD, struct xdma_ioc_bitstream) +#define XDMA_IOCMCAPDOWNLOAD _IOW(XDMA_IOC_MAGIC,XDMA_IOC_MCAP_DOWNLOAD, struct xdma_ioc_bitstream) +#define XDMA_IOCHOTRESET _IOW(XDMA_IOC_MAGIC,XDMA_IOC_HOT_RESET, struct xdma_ioc_base) +#define XDMA_IOCOCLRESET _IOW(XDMA_IOC_MAGIC,XDMA_IOC_OCL_RESET, struct xdma_ioc_base) +#define XDMA_IOCFREQSCALING _IOWR(XDMA_IOC_MAGIC,XDMA_IOC_OCL_FREQ_SCALING, struct xdma_ioc_freqscaling) +#define XDMA_IOCFREQSCALING2 _IOWR(XDMA_IOC_MAGIC,XDMA_IOC_OCL_FREQ_SCALING2, struct xdma_ioc_freqscaling2) +#define XDMA_IOCREBOOT _IOW(XDMA_IOC_MAGIC,XDMA_IOC_REBOOT, struct xdma_ioc_base) +// Legacy IOCTL NAME +#define XDMA_IOCRESET (XDMA_IOCHOTRESET) +#define IOCTL_XDMA_PERF_V1 (1) + +/* IOCTL codes */ +#define IOCTL_XDMA_PERF_START _IOW('q', 1, struct xdma_performance_ioctl *) +#define IOCTL_XDMA_PERF_STOP _IOW('q', 2, struct xdma_performance_ioctl *) +#define IOCTL_XDMA_PERF_GET _IOR('q', 3, struct xdma_performance_ioctl *) + +#define IOCTL_XDMA_ADDRMODE_SET _IOW('q', 4, int) +#define IOCTL_XDMA_ADDRMODE_GET _IOR('q', 5, int) + +#define XDMA_ADDRMODE_MEMORY (0) +#define XDMA_ADDRMODE_FIXED (1) +#endif diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/datamover.h b/sdk/SDAccel/HAL/driver/xcldma/user/datamover.h new file mode 100644 index 000000000..706fc8697 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/datamover.h @@ -0,0 +1,182 @@ +#ifndef _XDMA_DATA_MOVER_H_ +#define _XDMA_DATA_MOVER_H_ + +/** + * Copyright (C) 2016 Xilinx, Inc + * Author: Sonal Santan + * XDMA HAL multi-threading safe, multi-channel DMA read/write support + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +// Work around GCC 4.8 + XDMA BAR implementation bugs +// With -O3 PCIe BAR read/write are not reliable hence force -O2 as max +// optimization level for pcieBarRead() and pcieBarWrite() +#if defined(__GNUC__) && defined(NDEBUG) +#define SHIM_O2 __attribute__ ((optimize("-O2"))) +#else +#define SHIM_O2 +#endif + +namespace xclxdma { + class DMAChannelManager + { + public: + DMAChannelManager(unsigned deviceIndex, unsigned count, std::ios_base::openmode mode) : mCount(count) { + std::string baseName("/dev/xcldma/xcldma"); + baseName += std::to_string(deviceIndex); + assert((mode == std::ios_base::in) || (mode == std::ios_base::out)); + const char *suffix = (mode == std::ios_base::out) ? "_h2c_" : "_c2h_"; + baseName += suffix; + for (mIndex = 0; mIndex < static_cast(mCount); ++mIndex) { + std::string fileName(baseName); + fileName += std::to_string(mIndex); + mChannel.push_back(open(fileName.c_str(), (mode == std::ios_base::out) ? O_WRONLY : O_RDONLY)); + } + --mIndex; + } + + ~DMAChannelManager() { + unlock(); + for (unsigned i = 0; i < mCount; i++) { + close(mChannel[i]); + } + } + + bool isGood() const { + for (unsigned i = 0; i < mCount; i++) { + if (mChannel[i] < 0) + return false; + } + return true; + } + + void releaseDMAChannel(int channel) { + std::lock_guard lck(mMtx); + mChannel[++mIndex] = channel; + mCV.notify_one(); + } + + int acquireDMAChannel() { + std::unique_lock lck(mMtx); + while(mIndex < 0) { + mCV.wait(lck); + } + return mChannel[mIndex--]; + } + + bool lock() const { + for (unsigned i = 0; i < mCount; i++) { + if (!flock(mChannel[i], LOCK_EX | LOCK_NB)) + continue; + // Unable to lock channel i, unlock all channels locked so far + for (unsigned j = 0; j < i; j++) { + flock(mChannel[j], LOCK_UN); + } + return false; + } + return true; + } + + void unlock() const { + for (unsigned i = 0; i < mCount; i++) { + flock(mChannel[i], LOCK_UN); + } + } + + unsigned channelCount() const { + return mCount; + } + + private: + std::mutex mMtx; + std::condition_variable mCV; + std::vector mChannel; + const unsigned mCount; + int mIndex; + }; + + class DataMover { + public: + DataMover(unsigned index, unsigned count) : mWrite(index, count, std::ios_base::out), + mRead(index, count, std::ios_base::in) {} + + // TODO: Make pwrite64 and pread64 use RAII for the channel resource + ssize_t pwrite64(const void* buf, size_t count, off64_t offset) { + int fd = mWrite.acquireDMAChannel(); + ssize_t rc = pwrite(fd, buf, count, offset); + mWrite.releaseDMAChannel(fd); + return rc; + } + ssize_t pread64(void* buf, size_t count, off64_t offset) { + int fd = mRead.acquireDMAChannel(); + ssize_t rc = pread(fd, buf, count, offset); + mRead.releaseDMAChannel(fd); + return rc; + } + // Like memset but using pwrite + void pset64(const void* buf, size_t count, off64_t offset, unsigned rep) { + int fd = mWrite.acquireDMAChannel(); + off64_t curr = offset; + while (rep-- > 0) { +#ifndef RDI_COVERITY +# pragma GCC diagnostic push +# pragma GCC diagnostic ignored "-Wunused-result" + pwrite(fd, buf, count, curr); +# pragma GCC diagnostic pop + curr += count; +#endif + } + mWrite.releaseDMAChannel(fd); + } + bool isGood() { + return (mWrite.isGood() && mRead.isGood()); + } + + int lock() { + if (mWrite.lock() && mRead.lock()) + return true; + unlock(); + return false; + } + + void unlock() { + mWrite.unlock(); + mRead.unlock(); + } + + unsigned channelCount() const { + return mWrite.channelCount() + mRead.channelCount(); + } + + private: + DMAChannelManager mWrite; + DMAChannelManager mRead; + }; +} + + +#endif diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.cpp b/sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.cpp new file mode 100644 index 000000000..f4cb7fc1d --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.cpp @@ -0,0 +1,220 @@ +/** + * Copyright (C) 2015 Xilinx, Inc + * Author: Sonal Santan + * XDMA HAL Driver layered on top of XDMA kernel driver + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#include "memorymanager.h" +#include +#include + +/* + * Define GCC version macro so we can use newer C++11 features + * if possible + */ +#define GCC_VERSION (__GNUC__ * 10000 \ + + __GNUC_MINOR__ * 100 \ + + __GNUC_PATCHLEVEL__) + + +xclxdma::MemoryManager::MemoryManager(uint64_t size, uint64_t start, + unsigned alignment) : mSize(size), mStart(start), mAlignment(alignment), + mCoalesceThreshold(4), mFreeSize(0) +{ + assert(start % alignment == 0); + mFreeBufferList.push_back(std::make_pair(mStart, mSize)); + mFreeSize = mSize; +} + +xclxdma::MemoryManager::~MemoryManager() +{ + + +} + +uint64_t +xclxdma::MemoryManager::alloc(size_t size) +{ + if (size == 0) + size = mAlignment; + + uint64_t result = mNull; + const size_t mod_size = size % mAlignment; + const size_t pad = (mod_size > 0) ? (mAlignment - mod_size) : 0; + size += pad; + + std::lock_guard lock(mMemManagerMutex); + for (PairList::iterator i = mFreeBufferList.begin(), e = mFreeBufferList.end(); i != e; ++i) { + if (i->second < size) + continue; + result = i->first; + if (i->second > size) { + // Resize the existing entry in freelist + i->first += size; + i->second -= size; + } + else { + // remove the exact match found + mFreeBufferList.erase(i); + } + mBusyBufferList.push_back(std::make_pair(result, size)); + mFreeSize -= size; + break; + } + return result; +} + +void +xclxdma::MemoryManager::free(uint64_t buf) +{ + std::lock_guard lock(mMemManagerMutex); + PairList::iterator i = find(buf); + if (i == mBusyBufferList.end()) + return; + mFreeSize += i->second; + mFreeBufferList.push_back(std::make_pair(i->first, i->second)); + mBusyBufferList.erase(i); + if (mFreeBufferList.size() > mCoalesceThreshold) { + coalesce(); + } +} + + +void +xclxdma::MemoryManager::coalesce() +{ + // First sort the free buffers and then attempt to coalesce the neighbors + mFreeBufferList.sort(); + + PairList::iterator curr = mFreeBufferList.begin(); + PairList::iterator next = curr; + ++next; + PairList::iterator last = mFreeBufferList.end(); + while (next != last) { + if ((curr->first + curr->second) != next->first) { + // Non contiguous blocks + curr = next; + ++next; + continue; + } + // Coalesce curr and next + curr->second += next->second; + mFreeBufferList.erase(next); + next = curr; + ++next; + } +} + +// Caller should have acquired the mutex lock before calling find(); +xclxdma::MemoryManager::PairList::iterator +xclxdma::MemoryManager::find(uint64_t buf) +{ +#if GCC_VERSION >= 40800 + PairList::iterator i = std::find_if(mBusyBufferList.begin(), mBusyBufferList.end(), [&] (const PairList::value_type& s) + { return s.first == buf; }); +#else + PairList::iterator i = mBusyBufferList.begin(); + PairList::iterator last = mBusyBufferList.end(); + while(i != last) { + if (i->first == buf) + break; + ++i; + } +#endif + return i; +} + +void +xclxdma::MemoryManager::reset() +{ + std::lock_guard lock(mMemManagerMutex); + mFreeBufferList.clear(); + mBusyBufferList.clear(); + mFreeBufferList.push_back(std::make_pair(mStart, mSize)); + mFreeSize = 0; +} + +std::pair +xclxdma::MemoryManager::lookup(uint64_t buf) +{ + std::lock_guard lock(mMemManagerMutex); + PairList::iterator i = find(buf); + if (i != mBusyBufferList.end()) + return *i; + // Compiler bug -- Some versions of GCC C++11 compiler do not + // like mNull directly inside std::make_pair, so capture mNull + // in a temporary + const uint64_t v = mNull; + return std::make_pair(v, v); +} + + +bool +xclxdma::MemoryManager::reserve(uint64_t base, size_t size) +{ + assert(size); + if (size > mSize) + return false; + + if (base < mStart) + return false; + + if (base > (mStart + mSize)) + return false; + + const size_t mod_size = size % mAlignment; + const size_t pad = (mod_size > 0) ? (mAlignment - mod_size) : 0; + size += pad; + + std::lock_guard lock(mMemManagerMutex); + for (PairList::iterator i = mFreeBufferList.begin(), e = mFreeBufferList.end(); i != e; ++i) { + if (i->second < size) + continue; + if (i->first > base) + continue; + if ((base + size) > (i->first + i->second)) + continue; + uint64_t a = i->first; + uint64_t b = i->second; + + i->second = base - i->first; + if ((i->first == base) && (i->second == 0)) { + //Exact match + mFreeBufferList.erase(i); + break; + } + if (i->first == base) { + // Hole at the end; Resize exisiting entry + i->first = base + size; + break; + } + if ((i->first + i->second) == (base + size)) { + // Hole in the beginning; Resize exisiting entry + i->second -= size; + break; + } + // We have holes on both sides + // Resize hole in the beginning + i->second = base - i->first; + + // Now create an entry for the hole at the end + b = b + a - base - size; + a = base + size; + mFreeBufferList.insert(++i, std::make_pair(a, b)); + } + mBusyBufferList.push_back(std::make_pair(base, size)); + mFreeSize -= size; + return true; +} diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.h b/sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.h new file mode 100644 index 000000000..85661cd73 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/memorymanager.h @@ -0,0 +1,76 @@ +#ifndef _XDMA_MEMORY_MANAGER_H_ +#define _XDMA_MEMORY_MANAGER_H_ + +/** + * Copyright (C) 2015 Xilinx, Inc + * Author: Sonal Santan + * Simple usermode XDMA DDR memory manager used by HAL + * Eventually the common code here will be used by all HAL drivers. + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + + +#include +#include +#include "driver/include/xclhal.h" + +namespace xclxdma { + class MemoryManager { + std::mutex mMemManagerMutex; + std::list > mFreeBufferList; + std::list > mBusyBufferList; + const uint64_t mSize; + const uint64_t mStart; + const uint64_t mAlignment; + const unsigned mCoalesceThreshold; + uint64_t mFreeSize; + + typedef std::list > PairList; + + public: + static const uint64_t mNull = 0xffffffffffffffffull; + + public: + MemoryManager(uint64_t size, uint64_t start, unsigned alignment); + ~MemoryManager(); + uint64_t alloc(size_t size); + void free(uint64_t buf); + void reset(); + std::pairlookup(uint64_t buf); + bool reserve(uint64_t base, size_t size); + + uint64_t size() const { + return mSize; + } + + uint64_t start() const { + return mStart; + } + + uint64_t freeSize() const { + return mFreeSize; + } + + static bool isNullAlloc(const std::pair& buf) { + return ((buf.first == mNull) || (buf.second == mNull)); + } + + private: + /* Note that these should be called after acquiring mMemManagerMutex */ + void coalesce(); + PairList::iterator find(uint64_t buf); + }; +} + +#endif diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/perf.cpp b/sdk/SDAccel/HAL/driver/xcldma/user/perf.cpp new file mode 100644 index 000000000..ffdd10ec6 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/perf.cpp @@ -0,0 +1,980 @@ +/* + * Copyright (C) 2015 Xilinx, Inc + * Author: Paul Schumacher + * Performance Monitoring using PCIe for XDMA HAL Driver + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#include "shim.h" +#include "datamover.h" + +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#ifndef _WINDOWS +// TODO: Windows build support +// unistd.h is linux only header file +// it is included for read, write, close, lseek64 +#include +#endif + +#ifdef _WINDOWS +#define __func__ __FUNCTION__ +#endif + +#define FAST_OFFLOAD_MAJOR 2 +#define FAST_OFFLOAD_MINOR 2 + +namespace xclxdma { + // **************** + // Helper functions + // **************** + + bool XDMAShim::isDSAVersion(unsigned majorVersion, unsigned minorVersion, bool onlyThisVersion) { + unsigned checkVersion = (majorVersion << 4) + (minorVersion); + if (onlyThisVersion) + return (mDeviceInfo.mDeviceVersion == checkVersion); + return (mDeviceInfo.mDeviceVersion >= checkVersion); + } + + unsigned XDMAShim::getBankCount() { + return mDeviceInfo.mDDRBankCount; + } + + void XDMAShim::xclSetOclRegionProfilingNumberSlots(uint32_t numSlots) { + mOclRegionProfilingNumberSlots = numSlots; + } + + // Get host timestamp to write to APM + // IMPORTANT NOTE: this *must* be compatible with the method of generating + // timestamps as defined in RTProfile::getTraceTime() + uint64_t XDMAShim::getHostTraceTimeNsec() { + struct timespec now; + int err; + if ((err = clock_gettime(CLOCK_MONOTONIC, &now)) < 0) + return 0; + + return (uint64_t) now.tv_sec * 1000000000UL + (uint64_t) now.tv_nsec; + } + + uint64_t XDMAShim::getPerfMonBaseAddress(xclPerfMonType type) { + if (type == XCL_PERF_MON_MEMORY) return PERFMON0_OFFSET; + if (type == XCL_PERF_MON_HOST_INTERFACE) return PERFMON1_OFFSET; + if (type == XCL_PERF_MON_OCL_REGION) return PERFMON2_OFFSET; + return 0; + } + + uint64_t XDMAShim::getPerfMonFifoBaseAddress(xclPerfMonType type, uint32_t fifonum) { + if (type == XCL_PERF_MON_MEMORY) { + // Only one FIFO in >= v2.2 + if (isDSAVersion(FAST_OFFLOAD_MAJOR, FAST_OFFLOAD_MINOR, false)) + return PERFMON0_OFFSET + XPAR_AXI_PERF_MON_0_TRACE_OFFSET_0; + + if (fifonum == 0) return (PERFMON0_OFFSET + XPAR_AXI_PERF_MON_0_TRACE_OFFSET_0); + if (fifonum == 1) return (PERFMON0_OFFSET + XPAR_AXI_PERF_MON_0_TRACE_OFFSET_1); + if (fifonum == 2) return (PERFMON0_OFFSET + XPAR_AXI_PERF_MON_0_TRACE_OFFSET_2); + return 0; + } + if (type == XCL_PERF_MON_OCL_REGION) { + if (fifonum == 0) return (PERFMON2_OFFSET + XPAR_AXI_PERF_MON_2_TRACE_OFFSET_0); + if (fifonum == 1) return (PERFMON2_OFFSET + XPAR_AXI_PERF_MON_2_TRACE_OFFSET_1); + if (fifonum == 2) return (PERFMON2_OFFSET + XPAR_AXI_PERF_MON_2_TRACE_OFFSET_2); + return 0; + } + return 0; + } + + uint64_t XDMAShim::getPerfMonFifoReadBaseAddress(xclPerfMonType type, uint32_t fifonum) { + if (type == XCL_PERF_MON_MEMORY) { + // Use AXI-MM to access trace FIFO + // NOTE: requires compatible change in base platform + if (isDSAVersion(FAST_OFFLOAD_MAJOR, FAST_OFFLOAD_MINOR, false)) + return XPAR_AXI_PERF_MON_0_TRACE_OFFSET_AXI_FULL; + + if (fifonum == 0) return (PERFMON0_OFFSET + XPAR_AXI_PERF_MON_0_TRACE_OFFSET_0); + if (fifonum == 1) return (PERFMON0_OFFSET + XPAR_AXI_PERF_MON_0_TRACE_OFFSET_1); + if (fifonum == 2) return (PERFMON0_OFFSET + XPAR_AXI_PERF_MON_0_TRACE_OFFSET_2); + return 0; + } + if (type == XCL_PERF_MON_OCL_REGION) { + if (fifonum == 0) return (PERFMON2_OFFSET + XPAR_AXI_PERF_MON_2_TRACE_OFFSET_0); + if (fifonum == 1) return (PERFMON2_OFFSET + XPAR_AXI_PERF_MON_2_TRACE_OFFSET_1); + if (fifonum == 2) return (PERFMON2_OFFSET + XPAR_AXI_PERF_MON_2_TRACE_OFFSET_2); + return 0; + } + return 0; + } + + uint32_t XDMAShim::getPerfMonNumberFifos(xclPerfMonType type) { + if (type == XCL_PERF_MON_MEMORY) + return XPAR_AXI_PERF_MON_0_TRACE_NUMBER_FIFO; + if (type == XCL_PERF_MON_HOST_INTERFACE) + return XPAR_AXI_PERF_MON_1_TRACE_NUMBER_FIFO; + if (type == XCL_PERF_MON_OCL_REGION) { + if (mOclRegionProfilingNumberSlots > 4) + return 3; + else + return 2; + } + return 0; + } + + uint32_t XDMAShim::getPerfMonNumberSlots(xclPerfMonType type) { + if (type == XCL_PERF_MON_MEMORY) { + return (getBankCount() + 1); + } + if (type == XCL_PERF_MON_HOST_INTERFACE) { + return XPAR_AXI_PERF_MON_1_NUMBER_SLOTS; + } + if (type == XCL_PERF_MON_OCL_REGION) { + return mOclRegionProfilingNumberSlots; + } + return 1; + } + + uint32_t XDMAShim::getPerfMonNumberSamples(xclPerfMonType type) { + if (type == XCL_PERF_MON_MEMORY) return XPAR_AXI_PERF_MON_0_TRACE_NUMBER_SAMPLES; + if (type == XCL_PERF_MON_HOST_INTERFACE) return XPAR_AXI_PERF_MON_1_TRACE_NUMBER_SAMPLES; + // TODO: get number of samples from metadata + if (type == XCL_PERF_MON_OCL_REGION) return XPAR_AXI_PERF_MON_2_TRACE_NUMBER_SAMPLES; + return 0; + } + + uint32_t XDMAShim::getPerfMonByteScaleFactor(xclPerfMonType type) { + // NOTE: In the NWL DMA base platform, the APM slot data was only 32 bits + // while the MIG interface was 512 bits + //if (type == XCL_PERF_MON_MEMORY && isDSAVersion(1, 0, true)) + // return 16; + return 1; + } + + uint8_t XDMAShim::getPerfMonShowIDS(xclPerfMonType type) { + if (type == XCL_PERF_MON_MEMORY) { + if (isDSAVersion(1, 0, true)) + return 0; + if (getBankCount() > 1) + return XPAR_AXI_PERF_MON_0_SHOW_AXI_IDS_2DDR; + return XPAR_AXI_PERF_MON_0_SHOW_AXI_IDS; + } + if (type == XCL_PERF_MON_HOST_INTERFACE) { + return XPAR_AXI_PERF_MON_1_SHOW_AXI_IDS; + } + // TODO: get show IDs + if (type == XCL_PERF_MON_OCL_REGION) { + return XPAR_AXI_PERF_MON_2_SHOW_AXI_IDS; + } + return 0; + } + + uint8_t XDMAShim::getPerfMonShowLEN(xclPerfMonType type) { + if (type == XCL_PERF_MON_MEMORY) { + if (getBankCount() > 1) + return XPAR_AXI_PERF_MON_0_SHOW_AXI_LEN_2DDR; + return XPAR_AXI_PERF_MON_0_SHOW_AXI_LEN; + } + if (type == XCL_PERF_MON_HOST_INTERFACE) { + return XPAR_AXI_PERF_MON_1_SHOW_AXI_LEN; + } + // TODO: get show IDs + if (type == XCL_PERF_MON_OCL_REGION) { + return XPAR_AXI_PERF_MON_2_SHOW_AXI_LEN; + } + return 0; + } + + uint32_t XDMAShim::getPerfMonSlotStartBit(xclPerfMonType type, uint32_t slotnum) { + // NOTE: ID widths also set to 5 in HEAD/data/sdaccel/board_support/alpha_data/common/xclplat/xclplat_ip.tcl + uint32_t bitsPerID = 5; + uint8_t showIDs = getPerfMonShowIDS(type); + uint8_t showLen = getPerfMonShowLEN(type); + uint32_t bitsPerSlot = 10 + (bitsPerID * 4 * showIDs) + (16 * showLen); + return (18 + (bitsPerSlot * slotnum)); + } + + uint32_t XDMAShim::getPerfMonSlotDataWidth(xclPerfMonType type, uint32_t slotnum) { + // TODO: this only supports slot 0 + if (slotnum == 0) return XPAR_AXI_PERF_MON_0_SLOT0_DATA_WIDTH; + if (slotnum == 1) return XPAR_AXI_PERF_MON_0_SLOT1_DATA_WIDTH; + if (slotnum == 2) return XPAR_AXI_PERF_MON_0_SLOT2_DATA_WIDTH; + if (slotnum == 3) return XPAR_AXI_PERF_MON_0_SLOT3_DATA_WIDTH; + if (slotnum == 4) return XPAR_AXI_PERF_MON_0_SLOT4_DATA_WIDTH; + if (slotnum == 5) return XPAR_AXI_PERF_MON_0_SLOT5_DATA_WIDTH; + if (slotnum == 6) return XPAR_AXI_PERF_MON_0_SLOT6_DATA_WIDTH; + if (slotnum == 7) return XPAR_AXI_PERF_MON_0_SLOT7_DATA_WIDTH; + return XPAR_AXI_PERF_MON_0_SLOT0_DATA_WIDTH; + } + + // Get the device clock frequency (in MHz) + double XDMAShim::xclGetDeviceClockFreqMHz() { + unsigned clockFreq = mDeviceInfo.mOCLFrequency[0]; + if (clockFreq == 0) + clockFreq = 200; + + //if (mLogStream.is_open()) + // mLogStream << __func__ << ": clock freq = " << clockFreq << std::endl; + return ((double)clockFreq); + } + + // Get the maximum bandwidth for host reads from the device (in MB/sec) + // NOTE: for now, set to: (256/8 bytes) * 300 MHz = 9600 MBps + double XDMAShim::xclGetReadMaxBandwidthMBps() { + return 9600.0; + } + + // Get the maximum bandwidth for host writes to the device (in MB/sec) + // NOTE: for now, set to: (256/8 bytes) * 300 MHz = 9600 MBps + double XDMAShim::xclGetWriteMaxBandwidthMBps() { + return 9600.0; + } + + // Convert binary string to decimal + uint32_t XDMAShim::bin2dec(std::string str, int start, int number) { + return bin2dec(str.c_str(), start, number); + } + + // Convert binary char * to decimal + uint32_t XDMAShim::bin2dec(const char* ptr, int start, int number) { + const char* temp_ptr = ptr + start; + uint32_t value = 0; + int i = 0; + + do { + if (*temp_ptr != '0' && *temp_ptr!= '1') + return value; + value <<= 1; + if(*temp_ptr=='1') + value += 1; + i++; + temp_ptr++; + } while (i < number); + + return value; + } + + // Convert decimal to binary string + // NOTE: length of string is always sizeof(uint32_t) * 8 + std::string XDMAShim::dec2bin(uint32_t n) { + char result[(sizeof(uint32_t) * 8) + 1]; + unsigned index = sizeof(uint32_t) * 8; + result[index] = '\0'; + + do { + result[ --index ] = '0' + (n & 1); + } while (n >>= 1); + + for (int i=index-1; i >= 0; --i) + result[i] = '0'; + + return std::string( result ); + } + + // Convert decimal to binary string of length bits + std::string XDMAShim::dec2bin(uint32_t n, unsigned bits) { + char result[bits + 1]; + unsigned index = bits; + result[index] = '\0'; + + do result[ --index ] = '0' + (n & 1); + while (n >>= 1); + + for (int i=index-1; i >= 0; --i) + result[i] = '0'; + + return std::string( result ); + } + + // Reset all APM trace AXI stream FIFOs + size_t XDMAShim::resetFifos(xclPerfMonType type) { + uint64_t resetCoreAddress[] = { + getPerfMonFifoBaseAddress(type, 0) + AXI_FIFO_SRR, + getPerfMonFifoBaseAddress(type, 1) + AXI_FIFO_SRR, + getPerfMonFifoBaseAddress(type, 2) + AXI_FIFO_SRR + }; + + uint64_t resetFifoAddress[] = { + getPerfMonFifoBaseAddress(type, 0) + AXI_FIFO_RDFR, + getPerfMonFifoBaseAddress(type, 1) + AXI_FIFO_RDFR, + getPerfMonFifoBaseAddress(type, 2) + AXI_FIFO_RDFR + }; + + size_t size = 0; + uint32_t regValue = AXI_FIFO_RESET_VALUE; + + for (int f=0; f < XPAR_AXI_PERF_MON_0_TRACE_NUMBER_FIFO; f++) { + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, resetCoreAddress[f], ®Value, 4); + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, resetFifoAddress[f], ®Value, 4); + } + + return size; + } + + // ******** + // Counters + // ******** + + // Start device counters performance monitoring + size_t XDMAShim::xclPerfMonStartCounters(xclPerfMonType type) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " + << type << ", Start device counters..." << std::endl; + } + + size_t size = 0; + uint32_t regValue; + uint64_t baseAddress = getPerfMonBaseAddress(type); + + // 1. Reset APM metric counters + size += xclRead(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_CTL_OFFSET, ®Value, 4); + + regValue = regValue | XAPM_CR_MCNTR_RESET_MASK; + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_CTL_OFFSET, ®Value, 4); + + regValue = regValue & ~(XAPM_CR_MCNTR_RESET_MASK); + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_CTL_OFFSET, ®Value, 4); + + // 2. Start APM metric counters + regValue = regValue | XAPM_CR_MCNTR_ENABLE_MASK; + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_CTL_OFFSET, ®Value, 4); + + // 3. Specify APM metric counters to _not_ reset after reading + regValue = 0x0; + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_SICR_OFFSET, ®Value, 4); + + // 4. Read from sample register to ensure total time is read again at end + size += xclRead(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_SR_OFFSET, ®Value, 4); + + return size; + } + + // Stop both profile and trace performance monitoring + size_t XDMAShim::xclPerfMonStopCounters(xclPerfMonType type) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " + << type << ", Stop and reset device counters..." << std::endl; + } + + size_t size = 0; + uint32_t regValue; + uint64_t baseAddress = getPerfMonBaseAddress(type); + + // 1. Stop APM metric counters + size += xclRead(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_CTL_OFFSET, ®Value, 4); + + regValue = regValue & ~(XAPM_CR_MCNTR_ENABLE_MASK); + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_CTL_OFFSET, ®Value, 4); + + return size; + } + + // Read APM performance counters + size_t XDMAShim::xclPerfMonReadCounters(xclPerfMonType type, xclCounterResults& counterResults) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() + << ", " << type << ", " << &counterResults + << ", Read device counters..." << std::endl; + } + + // Initialize all values in struct to 0 + memset(&counterResults, 0, sizeof(xclCounterResults)); + + size_t size = 0; + uint32_t scaleFactor = getPerfMonByteScaleFactor(type); + uint64_t baseAddress = getPerfMonBaseAddress(type); + + uint64_t intervalAddress = baseAddress + XAPM_SR_OFFSET; + uint64_t metricAddress[] = { + // Slot 0 + baseAddress + XAPM_SMC0_OFFSET, baseAddress + XAPM_SMC1_OFFSET, + baseAddress + XAPM_SMC2_OFFSET, baseAddress + XAPM_SMC3_OFFSET, + baseAddress + XAPM_SMC4_OFFSET, baseAddress + XAPM_SMC5_OFFSET, + baseAddress + XAPM_SMC48_OFFSET, baseAddress + XAPM_SMC49_OFFSET, + // Slot 1 + baseAddress + XAPM_SMC6_OFFSET, baseAddress + XAPM_SMC7_OFFSET, + baseAddress + XAPM_SMC8_OFFSET, baseAddress + XAPM_SMC9_OFFSET, + baseAddress + XAPM_SMC10_OFFSET, baseAddress + XAPM_SMC11_OFFSET, + baseAddress + XAPM_SMC50_OFFSET, baseAddress + XAPM_SMC51_OFFSET, + // Slot 2 + baseAddress + XAPM_SMC12_OFFSET, baseAddress + XAPM_SMC13_OFFSET, + baseAddress + XAPM_SMC14_OFFSET, baseAddress + XAPM_SMC15_OFFSET, + baseAddress + XAPM_SMC16_OFFSET, baseAddress + XAPM_SMC17_OFFSET, + baseAddress + XAPM_SMC52_OFFSET, baseAddress + XAPM_SMC53_OFFSET, + // Slot 3 + baseAddress + XAPM_SMC18_OFFSET, baseAddress + XAPM_SMC19_OFFSET, + baseAddress + XAPM_SMC20_OFFSET, baseAddress + XAPM_SMC21_OFFSET, + baseAddress + XAPM_SMC22_OFFSET, baseAddress + XAPM_SMC23_OFFSET, + baseAddress + XAPM_SMC54_OFFSET, baseAddress + XAPM_SMC55_OFFSET, + // Slot 4 + baseAddress + XAPM_SMC24_OFFSET, baseAddress + XAPM_SMC25_OFFSET, + baseAddress + XAPM_SMC26_OFFSET, baseAddress + XAPM_SMC27_OFFSET, + baseAddress + XAPM_SMC28_OFFSET, baseAddress + XAPM_SMC29_OFFSET, + baseAddress + XAPM_SMC56_OFFSET, baseAddress + XAPM_SMC57_OFFSET, + // Slot 5 + baseAddress + XAPM_SMC30_OFFSET, baseAddress + XAPM_SMC31_OFFSET, + baseAddress + XAPM_SMC32_OFFSET, baseAddress + XAPM_SMC33_OFFSET, + baseAddress + XAPM_SMC34_OFFSET, baseAddress + XAPM_SMC35_OFFSET, + baseAddress + XAPM_SMC58_OFFSET, baseAddress + XAPM_SMC59_OFFSET, + // Slot 6 + baseAddress + XAPM_SMC36_OFFSET, baseAddress + XAPM_SMC37_OFFSET, + baseAddress + XAPM_SMC38_OFFSET, baseAddress + XAPM_SMC39_OFFSET, + baseAddress + XAPM_SMC40_OFFSET, baseAddress + XAPM_SMC41_OFFSET, + baseAddress + XAPM_SMC60_OFFSET, baseAddress + XAPM_SMC61_OFFSET, + // Slot 7 + baseAddress + XAPM_SMC42_OFFSET, baseAddress + XAPM_SMC43_OFFSET, + baseAddress + XAPM_SMC44_OFFSET, baseAddress + XAPM_SMC45_OFFSET, + baseAddress + XAPM_SMC46_OFFSET, baseAddress + XAPM_SMC47_OFFSET, + baseAddress + XAPM_SMC62_OFFSET, baseAddress + XAPM_SMC63_OFFSET + }; + + // Read sample interval register + // NOTE: this also latches the sampled metric counters + uint32_t sampleInterval; + size_t ret = xclRead(XCL_ADDR_SPACE_DEVICE_PERFMON, intervalAddress, &sampleInterval, 4); + if (ret < 0) return ret; + counterResults.SampleIntervalUsec = sampleInterval / xclGetDeviceClockFreqMHz(); + + // Read all sampled metric counters + uint32_t countnum = 0; + uint32_t numSlots = getPerfMonNumberSlots(type); + //counterResults.NumSlots = numSlots; + + uint32_t temp[XAPM_METRIC_COUNTERS_PER_SLOT]; + + for (uint32_t s=0; s < numSlots; s++) { + for (int c=0; c < XAPM_METRIC_COUNTERS_PER_SLOT; c++) + size += xclRead(XCL_ADDR_SPACE_DEVICE_PERFMON, metricAddress[countnum++], &temp[c], 4); + + counterResults.WriteBytes[s] = temp[XAPM_METRIC_WRITE_BYTES] * scaleFactor; + counterResults.WriteTranx[s] = temp[XAPM_METRIC_WRITE_TRANX]; + counterResults.WriteLatency[s] = temp[XAPM_METRIC_WRITE_LATENCY]; + counterResults.WriteMinLatency[s] = (temp[XAPM_METRIC_WRITE_MIN_MAX] & XAPM_MIN_LATENCY_MASK) >> XAPM_MIN_LATENCY_SHIFT; + counterResults.WriteMaxLatency[s] = (temp[XAPM_METRIC_WRITE_MIN_MAX] & XAPM_MAX_LATENCY_MASK) >> XAPM_MAX_LATENCY_SHIFT; + + counterResults.ReadBytes[s] = temp[XAPM_METRIC_READ_BYTES] * scaleFactor; + counterResults.ReadTranx[s] = temp[XAPM_METRIC_READ_TRANX]; + counterResults.ReadLatency[s] = temp[XAPM_METRIC_READ_LATENCY]; + counterResults.ReadMinLatency[s] = (temp[XAPM_METRIC_READ_MIN_MAX] & XAPM_MIN_LATENCY_MASK) >> XAPM_MIN_LATENCY_SHIFT; + counterResults.ReadMaxLatency[s] = (temp[XAPM_METRIC_READ_MIN_MAX] & XAPM_MAX_LATENCY_MASK) >> XAPM_MAX_LATENCY_SHIFT; + } + + return size; + } + + // ***** + // Trace + // ***** + + // Clock training used in converting device trace timestamps to host domain + size_t XDMAShim::xclPerfMonClockTraining(xclPerfMonType type) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " + << type << ", Send clock training..." << std::endl; + } + + size_t size = 0; + uint64_t baseAddress = getPerfMonBaseAddress(type); + + // Send host timestamps to target device + // NOTE: this is used for training to interpolate between time domains + for (int i=0; i < 3; i++) { +#if 1 + uint64_t hostTimeNsec = getHostTraceTimeNsec(); + + uint32_t hostTimeHigh = hostTimeNsec >> 32; + uint32_t hostTimeLow = hostTimeNsec & 0xffffffff; +#else + // Test values + uint32_t hostTimeHigh = 0xf00df00d; + uint32_t hostTimeLow = 0xdeadbeef; +#endif + + // Send upper then lower 32 bits of host timestamp to APM SW data register + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_SWD_OFFSET, &hostTimeHigh, 4); + size += xclWrite(XCL_ADDR_SPACE_DEVICE_PERFMON, baseAddress + XAPM_SWD_OFFSET, &hostTimeLow, 4); + + if (mLogStream.is_open()) { + mLogStream << " Host timestamp: 0x" << std::hex << hostTimeHigh + << " " << hostTimeLow << std::dec << std::endl; + } + } + + return size; + } + + // Start trace performance monitoring + size_t XDMAShim::xclPerfMonStartTrace(xclPerfMonType type, uint32_t startTrigger) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() + << ", " << type << ", " << startTrigger + << ", Start device tracing..." << std::endl; + } + + size_t size = 0; + uint32_t regValue; + uint64_t ctrlAddress = getPerfMonBaseAddress(type) + XAPM_CTL_OFFSET; + xclAddressSpace addressSpace = (type == XCL_PERF_MON_OCL_REGION) ? + XCL_ADDR_KERNEL_CTRL : XCL_ADDR_SPACE_DEVICE_PERFMON; + + // 1. Reset APM trace stream FIFO + size += xclRead(addressSpace, ctrlAddress, ®Value, 4); + + regValue = regValue | XAPM_CR_FIFO_RESET_MASK; + size += xclWrite(addressSpace, ctrlAddress, ®Value, 4); + + regValue = regValue & ~(XAPM_CR_FIFO_RESET_MASK); + size += xclWrite(addressSpace, ctrlAddress, ®Value, 4); + + // 2. Start APM event log + regValue = regValue | XAPM_CR_EVENTLOG_ENABLE_MASK; + size += xclWrite(addressSpace, ctrlAddress, ®Value, 4); + + // 3. Reset trace FIFOs + size += resetFifos(type); + + // 4. Send host timestamps to target device + size += xclPerfMonClockTraining(type); + + // 5. Disable host monitoring on slot 1 + // TODO: replace check for value of startTrigger (temp way + // of keeping slot 1 enabled in 06_perfmon test) + if ((type == XCL_PERF_MON_MEMORY) && (startTrigger == 0)) { + regValue = 0xFFFFFF0F; + uint64_t enableTraceAddress = getPerfMonBaseAddress(type) + XAPM_ENT_OFFSET; + size += xclWrite(addressSpace, enableTraceAddress, ®Value, 4); + } + + // 6. Write to event trace trigger register + // TODO: add support for triggering in device here + //size += xclWrite(addressSpace, TBD, &startTrigger, 4); + + return size; + } + + // Stop trace performance monitoring + size_t XDMAShim::xclPerfMonStopTrace(xclPerfMonType type) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " + << type << ", Stop and reset device tracing..." << std::endl; + } + + size_t size = 0; + uint32_t regValue; + uint64_t ctrlAddress = getPerfMonBaseAddress(type) + XAPM_CTL_OFFSET; + xclAddressSpace addressSpace = (type == XCL_PERF_MON_OCL_REGION) ? + XCL_ADDR_KERNEL_CTRL : XCL_ADDR_SPACE_DEVICE_PERFMON; + + // 1. Stop APM event log and metric counters + size += xclRead(addressSpace, ctrlAddress, ®Value, 4); + + regValue = regValue & ~(XAPM_CR_EVENTLOG_ENABLE_MASK); + size += xclWrite(addressSpace, ctrlAddress, ®Value, 4); + + size += resetFifos(type); + + return size; + } + + // Get trace word count + uint32_t XDMAShim::xclPerfMonGetTraceCount(xclPerfMonType type) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() + << ", " << type << std::endl; + } + + xclAddressSpace addressSpace = (type == XCL_PERF_MON_OCL_REGION) ? + XCL_ADDR_KERNEL_CTRL : XCL_ADDR_SPACE_DEVICE_PERFMON; + + // Only read first FIFO (and assume the others have the same # words) + // NOTE: we do this for speed improvements + uint32_t fifoCount; + xclRead(addressSpace, getPerfMonFifoBaseAddress(type, 0) + AXI_FIFO_RLR, &fifoCount, 4); + // Read bits 22:0 per AXI-Stream FIFO product guide (PG080, 10/1/14) + uint32_t numBytes = fifoCount & 0x7FFFFF; + + uint32_t numSamples = 0; + if (type == XCL_PERF_MON_MEMORY && isDSAVersion(FAST_OFFLOAD_MAJOR, FAST_OFFLOAD_MINOR, false)) + numSamples = numBytes / (XPAR_AXI_PERF_MON_0_TRACE_WORD_WIDTH/8); + else + numSamples = numBytes >> 2; + + if (mLogStream.is_open()) { + mLogStream << " No. of trace samples = " << std::dec << numSamples + << " (fifoCount = 0x" << std::hex << fifoCount << ")" << std::dec << std::endl; + } + + return numSamples; + } + + // Read all values from APM trace AXI stream FIFOs + size_t XDMAShim::xclPerfMonReadTrace(xclPerfMonType type, xclTraceResultsVector& traceVector) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() + << ", " << type << ", " << &traceVector + << ", Reading device trace stream..." << std::endl; + } + + traceVector.mLength = 0; + + uint32_t numSamples = xclPerfMonGetTraceCount(type); + if (numSamples == 0) + return 0; + + uint64_t fifoReadAddress[] = {0, 0, 0}; + if (type == XCL_PERF_MON_MEMORY && isDSAVersion(FAST_OFFLOAD_MAJOR, FAST_OFFLOAD_MINOR, false)) { + fifoReadAddress[0] = getPerfMonFifoReadBaseAddress(type, 0) + AXI_FIFO_RDFD_AXI_FULL; + } + else { + for (int i=0; i < 3; i++) + fifoReadAddress[i] = getPerfMonFifoReadBaseAddress(type, i) + AXI_FIFO_RDFD; + } + + xclAddressSpace addressSpace = (type == XCL_PERF_MON_OCL_REGION) ? + XCL_ADDR_KERNEL_CTRL : XCL_ADDR_SPACE_DEVICE_PERFMON; + uint32_t numSlots = getPerfMonNumberSlots(type); + uint32_t numFifos = getPerfMonNumberFifos(type); + + size_t size = 0; +#ifndef _WINDOWS + // TODO: Windows build support + // runtime array size is not supported + uint32_t temp[numFifos]; + memset(&temp, 0, numFifos*sizeof(uint32_t)); +#else + uint32_t temp[3]; + memset(&temp, 0, 3*sizeof(uint32_t)); +#endif + + // Limit to max number of samples so we don't overrun trace buffer on host + uint32_t maxSamples = getPerfMonNumberSamples(type); + numSamples = (numSamples > maxSamples) ? maxSamples : numSamples; + traceVector.mLength = numSamples; + + const uint32_t bytesPerSample = (XPAR_AXI_PERF_MON_0_TRACE_WORD_WIDTH / 8); + const uint32_t wordsPerSample = (XPAR_AXI_PERF_MON_0_TRACE_WORD_WIDTH / 32); + //uint32_t numBytes = numSamples * bytesPerSample; + uint32_t numWords = numSamples * wordsPerSample; + + // Create trace buffer on host (requires alignment) + const int BUFFER_BYTES = MAX_TRACE_NUMBER_SAMPLES * bytesPerSample; + const int BUFFER_WORDS = MAX_TRACE_NUMBER_SAMPLES * wordsPerSample; +#ifndef _WINDOWS +// TODO: Windows build support +// alignas is defined in c++11 +#if GCC_VERSION >= 40800 + alignas(AXI_FIFO_RDFD_AXI_FULL) uint32_t hostbuf[BUFFER_WORDS]; +#else + AlignedAllocator alignedBuffer(AXI_FIFO_RDFD_AXI_FULL, BUFFER_WORDS); + uint32_t* hostbuf = alignedBuffer.getBuffer(); +#endif +#else + uint32_t hostbuf[BUFFER_WORDS]; +#endif + + // ****************************** + // Read all words from trace FIFO + // NOTE: DSA Version >= 2.2 + // ****************************** + if (type == XCL_PERF_MON_MEMORY && isDSAVersion(FAST_OFFLOAD_MAJOR, FAST_OFFLOAD_MINOR, false)) { + memset((void *)hostbuf, 0, BUFFER_BYTES); + + // Iterate over chunks + // NOTE: AXI limits this to 4K bytes per transfer + uint32_t chunkSizeWords = 256 * wordsPerSample; + if (chunkSizeWords > 1024) chunkSizeWords = 1024; + uint32_t chunkSizeBytes = 4 * chunkSizeWords; + uint32_t words=0; + + // Read trace a chunk of bytes at a time + if (numWords > chunkSizeWords) { + for (; words < (numWords-chunkSizeWords); words += chunkSizeWords) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ": reading " << chunkSizeBytes << " bytes from 0x" + << std::hex << fifoReadAddress[0] << " and writing it to 0x" + << (void *)(hostbuf + words) << std::dec << std::endl; + } + + if (mDataMover->pread64((void *)(hostbuf + words), chunkSizeBytes, fifoReadAddress[0]) < 0) + return 0; + + size += chunkSizeBytes; + } + } + + // Read remainder of trace not divisible by chunk size + if (words < numWords) { + chunkSizeBytes = 4 * (numWords - words); + + if (mLogStream.is_open()) { + mLogStream << __func__ << ": reading " << chunkSizeBytes << " bytes from 0x" + << std::hex << fifoReadAddress[0] << " and writing it to 0x" + << (void *)(hostbuf + words) << std::dec << std::endl; + } + + if (mDataMover->pread64((void *)(hostbuf + words), chunkSizeBytes, fifoReadAddress[0]) < 0) + return 0; + + size += chunkSizeBytes; + } + + if (mLogStream.is_open()) { + mLogStream << __func__ << ": done reading " << size << " bytes " << std::endl; + } + } + + // ****************************** + // Read & process all trace FIFOs + // ****************************** + for (uint32_t wordnum=0; wordnum < numSamples; wordnum++) { + if (type == XCL_PERF_MON_MEMORY && isDSAVersion(FAST_OFFLOAD_MAJOR, FAST_OFFLOAD_MINOR, false)) { + uint32_t index = wordsPerSample * wordnum; + bool allZeros = true; + for (uint32_t fifonum=0; fifonum < numFifos; fifonum++) { + temp[fifonum] = *(hostbuf + index + fifonum); + allZeros &= (temp[fifonum] == 0); + } + if (allZeros) + continue; + } + else { + // NOTE: Using AXI-Lite so we use the same address with burst length of 1 word + for (uint32_t fifonum=0; fifonum < numFifos; fifonum++) + size += xclRead(addressSpace, fifoReadAddress[fifonum], &temp[fifonum], 4); + } + + xclTraceResults results; + // Assign to all 0s to avoid uninitialized variables + memset(&results, 0, sizeof(xclTraceResults)); + + uint64_t temp64 = ((uint64_t)temp[1] << 32) | temp[0]; + results.LogID = temp64 & 0x1; + results.Timestamp = (temp64 >> 1) & 0xFFFF; + results.Overflow = (temp64 >> 17) & 0x1; + results.ReadStartEvent = XCL_PERF_MON_START_ADDR; + results.WriteStartEvent = XCL_PERF_MON_START_ADDR; + results.WriteEndEvent = XCL_PERF_MON_END_LAST_DATA; + + if (results.LogID != 0) { + results.HostTimestamp = (temp64 >> 18) & 0xFFFFFFFF; + } + else { + for (uint32_t s=0; s < numSlots; s++) { + uint32_t b = getPerfMonSlotStartBit(type, s); + + if (b >= 32) + temp64 = ((((uint64_t)temp[2] << 32) | temp[1]) >> (b-32)); + else + temp64 = ((((uint64_t)temp[1] << 32) | temp[0]) >> b); + + results.ExtEventFlags[s] = temp64 & 0x7; + results.EventFlags[s] = (temp64 >> 3) & 0x7F; + + if (getPerfMonShowIDS(type)) { + if (getPerfMonShowLEN(type)) { + results.ReadAddrLen[s] = (temp64 >> 10) & 0xFF; + results.WriteAddrLen[s] = (temp64 >> 18) & 0xFF; + + // TODO: assumes AXI ID width of 5 + results.RID[s] = (temp64 >> 26) & 0x1F; + results.ARID[s] = (temp64 >> 31) & 0x1F; + results.BID[s] = (temp64 >> 36) & 0x1F; + results.AWID[s] = (temp64 >> 41) & 0x1F; + } + else { + // TODO: assumes AXI ID width of 5 + results.RID[s] = (temp64 >> 10) & 0x1F; + results.ARID[s] = (temp64 >> 15) & 0x1F; + results.BID[s] = (temp64 >> 20) & 0x1F; + results.AWID[s] = (temp64 >> 25) & 0x1F; + } + } + else { + if (getPerfMonShowLEN(type)) { + results.ReadAddrLen[s] = (temp64 >> 10) & 0xFF; + results.WriteAddrLen[s] = (temp64 >> 18) & 0xFF; + } + } + + // # bytes = burst length * bytes/burst = (addr len + 1) * bytes/burst + uint32_t dataWidth = getPerfMonSlotDataWidth(type, s); + results.ReadBytes[s] = (results.ReadAddrLen[s] + 1) * (dataWidth/8); + results.WriteBytes[s] = (results.WriteAddrLen[s] + 1) * (dataWidth/8); + } // for slot + } // if-else logID != 0 + + traceVector.mArray[wordnum] = results; + + // Log values (if requested) + if (mLogStream.is_open()) { + mLogStream << " Trace sample " << std::dec << wordnum << ": "; + for (int fifonum=numFifos-1; fifonum >= 0; fifonum--) + mLogStream << dec2bin(temp[fifonum]) << " "; + mLogStream << std::endl; + + if (results.LogID == 1) { + mLogStream << std::hex << " Host Timestamp: " << results.HostTimestamp << std::endl; + } + else { + if (type == XCL_PERF_MON_OCL_REGION) { + mLogStream << " Ext Event flags: "; + for (int slot=numSlots-1; slot >= 0; slot--) + mLogStream << dec2bin(results.ExtEventFlags[slot], 3) << " "; + } + else { + mLogStream << " Event flags: "; + for (int slot=numSlots-1; slot >= 0; slot--) + mLogStream << dec2bin(results.EventFlags[slot], 7) << " "; + } + + mLogStream << "(ReadAddrLen[0] = " << (int)(results.ReadAddrLen[0]) + << ", WriteAddrLen[0] = " << (int)(results.WriteAddrLen[0]) + << ", ReadAddrLen[1] = " << (int)(results.ReadAddrLen[1]) + << ", WriteAddrLen[1] = " << (int)(results.WriteAddrLen[1]); + + if (getPerfMonShowIDS(type)) { + mLogStream << ", RID: " << (int)results.RID[0] << ", ARID: " << (int)results.ARID[0] + << ", BID: " << (int)results.BID[0] << ", AWID: " << (int)results.AWID[0]; + } + mLogStream << ")" << std::endl; + } + } + } + + return size; + } + +} // namespace xclxdma + + +size_t xclPerfMonStartCounters(xclDeviceHandle handle, xclPerfMonType type) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonStartCounters(type); +} + + +size_t xclPerfMonStopCounters(xclDeviceHandle handle, xclPerfMonType type) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonStopCounters(type); +} + + +size_t xclPerfMonReadCounters(xclDeviceHandle handle, xclPerfMonType type, xclCounterResults& counterResults) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonReadCounters(type, counterResults); +} + + +size_t xclPerfMonClockTraining(xclDeviceHandle handle, xclPerfMonType type) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonClockTraining(type); +} + + +size_t xclPerfMonStartTrace(xclDeviceHandle handle, xclPerfMonType type, uint32_t startTrigger) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonStartTrace(type, startTrigger); +} + + +size_t xclPerfMonStopTrace(xclDeviceHandle handle, xclPerfMonType type) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonStopTrace(type); +} + + +uint32_t xclPerfMonGetTraceCount(xclDeviceHandle handle, xclPerfMonType type) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonGetTraceCount(type); +} + + +size_t xclPerfMonReadTrace(xclDeviceHandle handle, xclPerfMonType type, xclTraceResultsVector& traceVector) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclPerfMonReadTrace(type, traceVector); +} + + +double xclGetDeviceClockFreqMHz(xclDeviceHandle handle) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return 0.0; + return drv->xclGetDeviceClockFreqMHz(); +} + + +double xclGetReadMaxBandwidthMBps(xclDeviceHandle handle) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return 0.0; + return drv->xclGetReadMaxBandwidthMBps(); +} + + +double xclGetWriteMaxBandwidthMBps(xclDeviceHandle handle) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return 0.0; + return drv->xclGetWriteMaxBandwidthMBps(); +} + + +size_t xclGetDeviceTimestamp(xclDeviceHandle handle) +{ + return 0; +} + + +void xclSetOclRegionProfilingNumberSlots(xclDeviceHandle handle, uint32_t numSlots) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return; + return drv->xclSetOclRegionProfilingNumberSlots(numSlots); +} + +// XSIP watermark, do not delete 67d7842dbbe25473c3c32b93c0da8047785f30d78e8a024de1b57352245f9689 diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/prom.cpp b/sdk/SDAccel/HAL/driver/xcldma/user/prom.cpp new file mode 100644 index 000000000..061f25428 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/prom.cpp @@ -0,0 +1,445 @@ +/* + * Copyright (C) 2015-2016 Xilinx, Inc + * In-System Programming of BPI PROM using PCIe + * Based on XAPP518 (v1.3) April 23, 2014 + * Author: Sonal Santan + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "shim.h" +#include "driver/xcldma/include/xdma-ioctl.h" + +#ifdef WINDOWS +#define __func__ __FUNCTION__ +#endif + +namespace xclxdma { + int XDMAShim::freezeAXIGate() { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << std::endl; + } + unsigned char buf = 0x0; + return pcieBarWrite(HWICAP_BAR, AXI_GATE_OFFSET, &buf, 1); + } + + int XDMAShim::freeAXIGate() { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << std::endl; + } + // First pulse the OCL RESET. This is important for PR with multiple + // clocks as it resets the edge triggered clock converter FIFO +#ifndef _WINDOWS + const timespec interval = {0, 500}; +#endif + unsigned char buf = 0x2; + if (pcieBarWrite(HWICAP_BAR, AXI_GATE_OFFSET, &buf, 1)) + return -1; + buf = 0x0; +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&interval, 0); +#endif + if (pcieBarWrite(HWICAP_BAR, AXI_GATE_OFFSET, &buf, 1)) + return -1; + buf = 0x2; +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&interval, 0); +#endif + if (pcieBarWrite(HWICAP_BAR, AXI_GATE_OFFSET, &buf, 1)) + return -1; + buf = 0x3; +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&interval, 0); +#endif + return pcieBarWrite(HWICAP_BAR, AXI_GATE_OFFSET, &buf, 1); + } + + + int XDMAShim::xclUpgradeFirmware(const char *mcsFile) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << mcsFile << std::endl; + } + + std::cout << "INFO: Reseting hardware\n"; + if (freezeAXIGate() != 0) { + return -1; + } + +#ifndef _WINDOWS +// TODO: Windows build support +// timespec + const timespec req = {0, 5000}; + nanosleep(&req, 0); +#endif + if (freeAXIGate() != 0) { + return -1; + } +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&req, 0); +#endif + + std::string line; + std::ifstream mcsStream(mcsFile); + std::string startAddress; + ELARecord record; + bool endRecordFound = false; + + if(!mcsStream.is_open()) { + std::cout << "ERROR: Cannot open " << mcsFile << ". Check that it exists and is readable." << std::endl; + return -ENOENT; + } + + std::cout << "INFO: Parsing file " << mcsFile << std::endl; + while (!mcsStream.eof() && !endRecordFound) { + std::string line; + std::getline(mcsStream, line); + if (line.size() == 0) { + continue; + } + if (line[0] != ':') { + return -1; + } + const unsigned dataLen = std::stoi(line.substr(1, 2), 0 , 16); + const unsigned address = std::stoi(line.substr(3, 4), 0, 16); + const unsigned recordType = std::stoi(line.substr(7, 2), 0 , 16); + switch (recordType) { + case 0x00: + { + if (dataLen > 16) { + // For xilinx mcs files data length should be 16 for all records + // except for the last one which can be smaller + return -1; + } + if (address != record.mDataCount) { + return -1; + } + if (record.mEndAddress != address) { + return -1; + } + record.mDataCount += dataLen; + record.mEndAddress += dataLen; + break; + } + case 0x01: + { + if (startAddress.size() == 0) { + break; + } + mRecordList.push_back(record); + endRecordFound = true; + break; + } + case 0x02: + { + break; + } + case 0x04: + { + if (address != 0x0) { + return -1; + } + if (dataLen != 2) { + return -1; + } + std::string newAddress = line.substr(9, dataLen * 2); + if (startAddress.size()) { + // Finish the old record + mRecordList.push_back(record); + } + // Start a new record + record.mStartAddress = std::stoi(newAddress, 0 , 16); + record.mDataPos = mcsStream.tellg(); + record.mEndAddress = 0; + record.mDataCount = 0; + startAddress = newAddress; + } + } + } + + mcsStream.seekg(0); + std::cout << "INFO: Found " << mRecordList.size() << " ELA Records\n"; + + return program(mcsStream); + } + + int XDMAShim::prepare(unsigned startAddress, unsigned endAddress) { + startAddress &= 0x00ffffff; // truncate to 24 bits + startAddress >>= 8; // Pick the middle 16 bits + endAddress &= 0x00ffffff; // truncate to 24 bits + + if (waitForReady(READY_STAT)) { + return -1; + } + + std::cout << "INFO: Sending the address range\n"; + // Send start and end address + unsigned command = START_ADDR_CMD; + command |= startAddress; + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &command, 4)) { + return -1; + } + + command = END_ADDR_CMD; + command |= endAddress; + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &command, 4)) { + return -1; + } + +// if (waitForReady(READY_STAT)) { +// return -1; +// } + + std::cout << "INFO: Sending unlock command\n"; + // Send unlock command + command = UNLOCK_CMD; + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &command, 4)) { + return -1; + } + if (waitForReady(READY_STAT)) { + return -1; + } + + // Send erase command + std::cout << "INFO: Sending erase command\n"; + command = ERASE_CMD; + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &command, 4)) { + return -1; + } + // now hanging here + if (waitForReady(ERASE_STAT)) { + return -1; + } + + if (waitForReady(READY_STAT)) { + return -1; + } + + // Send program command + std::cout << "INFO: Erasing the address range\n"; + command = PROGRAM_CMD; + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &command, 4)) { + return -1; + } + + if (waitForReady(PROGRAM_STAT)) { + return -1; + } + + return 0; + } + + int XDMAShim::program(std::ifstream& mcsStream, const ELARecord& record) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << std::endl; + } +#ifndef _WINDOWS +// TODO: Windows build support +// timespec + const timespec req = {0, 2000}; +#endif + + std::cout << "Programming block (" << std::hex << record.mStartAddress << ", " << record.mEndAddress << std::dec << ")" << std::endl; + assert(mcsStream.tellg() < record.mDataPos); + mcsStream.seekg(record.mDataPos, std::ifstream::beg); + unsigned char buffer[64]; + int bufferIndex = 0; + for (unsigned index = record.mDataCount; index > 0;) { + std::string line; + std::getline(mcsStream, line); + const unsigned dataLen = std::stoi(line.substr(1, 2), 0 , 16); + index -= dataLen; + const unsigned recordType = std::stoi(line.substr(7, 2), 0 , 16); + if (recordType != 0x00) { + continue; + } + const std::string data = line.substr(9, dataLen * 2); + // Write in byte swapped order + for (unsigned i = 0; i < data.length(); i += 2) { + if ((bufferIndex % 4) == 0) { + bufferIndex += 4; + } + assert(bufferIndex <= 64); + unsigned value = std::stoi(data.substr(i, 2), 0, 16); + buffer[--bufferIndex] = (unsigned char)value; + if ((bufferIndex % 4) == 0) { + bufferIndex += 4; + } + if (bufferIndex == 64) { + break; + } + } + + assert((bufferIndex % 4) == 0); + assert(bufferIndex <= 64); + if (bufferIndex == 64) { + if (waitForReady(PROGRAM_STAT, false)) { + return -1; + } + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, buffer, 64)) { + return -1; + } + if (waitForReady(PROGRAM_STAT, false)) { + return -1; + } +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&req, 0); +#endif + bufferIndex = 0; + } + } + if (bufferIndex) { + if (waitForReady(PROGRAM_STAT, false)) { + return -1; + } + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, buffer, bufferIndex)) { + return -1; + } + if (waitForReady(PROGRAM_STAT, false)) { + return -1; + } +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&req, 0); +#endif + } + return 0; + } + + int XDMAShim::program(std::ifstream& mcsStream) { + int status = 0; + for (ELARecordList::iterator i = mRecordList.begin(), e = mRecordList.end(); i != e; ++i) { + i->mStartAddress <<= 16; + i->mEndAddress += i->mStartAddress; + // Convert from 2 bytes address to 4 bytes address + i->mStartAddress /= 2; + i->mEndAddress /= 2; + } + std::cout << "INFO: Start address 0x" << std::hex << mRecordList.front().mStartAddress << std::dec << "\n"; + std::cout << "INFO: End address 0x" << std::hex << mRecordList.back().mEndAddress << std::dec << "\n"; + if (prepare(mRecordList.front().mStartAddress, mRecordList.back().mEndAddress)) { + std::cout << "ERROR: Could not unlock or erase the blocks\n"; + return -1; + } +#ifndef _WINDOWS +// TODO: Windows build support +// timespec + const timespec req = {0, 1000}; +#endif + int beatCount = 0; + for (ELARecordList::iterator i = mRecordList.begin(), e = mRecordList.end(); i != e; ++i) + { + beatCount++; + if(beatCount%10==0) { + std::cout << "." << std::flush; + } + + if (program(mcsStream, *i)) { + std::cout << "ERROR: Could not program the block\n"; + return -1; + } +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&req, 0); +#endif + } + std::cout << std::endl; + // Now keep writing 0xff till the hardware says ready + if (waitAndFinish(READY_STAT, 0xff)) { + return -1; + } + return status; + } + + int XDMAShim::waitForReady(unsigned code, bool verbose) { + unsigned status = ~code; + long long delay = 0; +#ifndef _WINDOWS +// TODO: Windows build support +// timespec + const timespec req = {0, 5000}; +#endif + if (verbose) { + std::cout << "INFO: Waiting for hardware\n"; + } + while ((status != code) && (delay < 30000000000)) { +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&req, 0); +#endif + if (pcieBarRead(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &status, 4)) { + return -1; + } + delay += 5000; + } + return (status == code) ? 0 : -1; + } + + int XDMAShim::waitAndFinish(unsigned code, unsigned data, bool verbose) { + unsigned status = ~code; + long long delay = 0; +#ifndef _WINDOWS +// TODO: Windows build support +// timespec + const timespec req = {0, 5000}; +#endif + if (verbose) { + std::cout << "INFO: Finishing up\n"; + } + if (pcieBarRead(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &status, 4)) { + return -1; + } + while ((status != code) && (delay < 30000000000)) { +#ifndef _WINDOWS +// TODO: Windows build support +// nanosleep is defined in unistd.h + nanosleep(&req, 0); +#endif + if (pcieBarWrite(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &data, 4)) { + return -1; + } + if (pcieBarRead(BPI_FLASH_BAR, BPI_FLASH_OFFSET, &status, 4)) { + return -1; + } + delay += 5000; + } + return (status == code) ? 0 : -1; + } + + int XDMAShim::xclBootFPGA() { + xdma_ioc_base base = {0X586C0C6C, XDMA_IOCREBOOT}; + return ioctl(mUserHandle, XDMA_IOCREBOOT, &base); + } +} diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/shim.cpp b/sdk/SDAccel/HAL/driver/xcldma/user/shim.cpp new file mode 100644 index 000000000..1630dcfde --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/shim.cpp @@ -0,0 +1,1250 @@ +/** + * Copyright (C) 2015-2016 Xilinx, Inc + * Author: Sonal Santan + * XDMA HAL Driver layered on top of XDMA kernel driver + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#include "shim.h" +#include "memorymanager.h" +#include "datamover.h" +#include +/* + * Define GCC version macro so we can use newer C++11 features + * if possible + */ +#define GCC_VERSION (__GNUC__ * 10000 \ + + __GNUC_MINOR__ * 100 \ + + __GNUC_PATCHLEVEL__) + +#include + +#ifndef _WINDOWS +// TODO: Windows build support +// sys/mman.h is linux only header file +// it is included for mmap +#include +#endif + +#ifndef _WINDOWS +// TODO: Windows build support +// unistd.h is linux only header file +// it is included for read, write, close, lseek64 +#include +#endif + +#include +#include + +#ifndef _WINDOWS +// TODO: Windows build support +// sys/ioctl.h is linux only header file +// it is included for ioctl +#include +#endif + +#ifndef _WINDOWS +// TODO: Windows build support +// sys/file.h is linux only header file +// it is included for flock +#include +#endif + + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "driver/include/xclbin.h" +#include "driver/xcldma/include/xdma-ioctl.h" + +#ifdef _WINDOWS +#define __func__ __FUNCTION__ +#endif + +#ifdef _WINDOWS +#define MAP_FAILED (void *)-1 +#endif + +#if defined(__PPC64__) +#define OSTAG "-ppc64le" +#else +#define OSTAG "" +#endif + +namespace xclxdma { + const unsigned XDMAShim::TAG = 0X586C0C6C; // XL OpenCL X->58(ASCII), L->6C(ASCII), O->0 C->C L->6C(ASCII); + + xclDeviceInfo2 to_info2(const xclDeviceInfo info) { + xclDeviceInfo2 info2; + std::memset(&info2, 0, sizeof(info2)); + info2.mMagic = info.mMagic; + std::memcpy(info2.mName, info.mName, 256); + info2.mHALMajorVersion = info.mHALMajorVersion; + info2.mHALMinorVersion = info.mHALMinorVersion; + info2.mVendorId = info.mVendorId; + info2.mDeviceId = info.mDeviceId; + info2.mSubsystemId = info.mSubsystemId; + info2.mSubsystemVendorId = info.mSubsystemVendorId; + info2.mDeviceVersion = info.mDeviceVersion; + info2.mDDRSize = info.mDDRSize; + info2.mDataAlignment = info.mDataAlignment; + info2.mDDRFreeSize = info.mDDRFreeSize; + info2.mMinTransferSize = info.mMinTransferSize; + info2.mDDRBankCount = info.mDDRBankCount; + info2.mOCLFrequency[0] = info.mOCLFrequency; + info2.mPCIeLinkWidth = info.mPCIeLinkWidth; + info2.mPCIeLinkSpeed = info.mPCIeLinkSpeed; + info2.mDMAThreads = info.mDMAThreads; + return info2; + } + + int XDMAShim::xclLoadBitstream(const char *fileName) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << fileName << std::endl; + } + + if (!mLocked) + return -EPERM; + + std::ifstream stream(fileName); + if (!stream.is_open()) { + return errno; + } + + stream.seekg(0, stream.end); + int length = stream.tellg(); + stream.seekg(0, stream.beg); + char *buffer = new char[length]; + stream.read(buffer, length); + stream.close(); + xclBin *header = (xclBin *)buffer; + if (std::memcmp(header->m_magic, "xclbin0", 8)) { + return -EINVAL; + } + + return xclLoadXclBin(header); + } + + + int XDMAShim::xclLoadXclBin(const xclBin *buffer) + { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << buffer << std::endl; + } + + if (!mLocked) + return -EPERM; + +#ifndef _WINDOWS + const unsigned cmd = isUltraScale() ? XDMA_IOCMCAPDOWNLOAD : XDMA_IOCICAPDOWNLOAD; + xdma_ioc_bitstream obj = {{0X586C0C6C, cmd}, const_cast(buffer)}; + int ret = ioctl(mUserHandle, cmd, &obj); + if(0 != ret) + return ret; + + // If it is an XPR DSA, zero out the DDR again as downloading the XCLBIN + // reinitializes the DDR and results in ECC error. + if(isXPR()) { + if (mLogStream.is_open()) { + mLogStream << __func__ << "XPR Device found, zeroing out DDR again.." << std::endl; + } + + if (zeroOutDDR() == false){ + if (mLogStream.is_open()) { + mLogStream << __func__ << "zeroing out DDR failed" << std::endl; + } + return -EIO; + } + } + + return ret; +#endif + } + + size_t XDMAShim::xclReadModifyWrite(uint64_t offset, const void *hostBuf, size_t size) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " + << offset << ", " << hostBuf << ", " << size << std::endl; + } +#ifndef _WINDOWS +// TODO: Windows build support +// alignas is defined in c++11 +#if GCC_VERSION >= 40800 + alignas(DDR_BUFFER_ALIGNMENT) char buffer[DDR_BUFFER_ALIGNMENT]; +#else + AlignedAllocator alignedBuffer(DDR_BUFFER_ALIGNMENT, DDR_BUFFER_ALIGNMENT); + char* buffer = alignedBuffer.getBuffer(); +#endif +#else + char buffer[DDR_BUFFER_ALIGNMENT]; +#endif + + const size_t mod_size = offset % DDR_BUFFER_ALIGNMENT; + // Read back one full aligned block starting from preceding aligned address + const uint64_t mod_offset = offset - mod_size; + if (xclRead(XCL_ADDR_SPACE_DEVICE_RAM, mod_offset, buffer, DDR_BUFFER_ALIGNMENT) != DDR_BUFFER_ALIGNMENT) + return -1; + + // Update the local copy of buffer with user requested data + const size_t copy_size = (size + mod_size > DDR_BUFFER_ALIGNMENT) ? DDR_BUFFER_ALIGNMENT - mod_size : size; + std::memcpy(buffer + mod_size, hostBuf, copy_size); + + // Write back the updated aligned block + if (xclWrite(XCL_ADDR_SPACE_DEVICE_RAM, mod_offset, buffer, DDR_BUFFER_ALIGNMENT) != DDR_BUFFER_ALIGNMENT) + return -1; + + // Write any remaining blocks over DDR_BUFFER_ALIGNMENT size + if (size + mod_size > DDR_BUFFER_ALIGNMENT) { + size_t write_size = xclWrite(XCL_ADDR_SPACE_DEVICE_RAM, mod_offset + DDR_BUFFER_ALIGNMENT, + (const char *)hostBuf + copy_size, size - copy_size); + if (write_size != (size - copy_size)) + return -1; + } + return size; + } + + size_t XDMAShim::xclWrite(xclAddressSpace space, uint64_t offset, const void *hostBuf, size_t size) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << space << ", " + << offset << ", " << hostBuf << ", " << size << std::endl; + } + + if (!mLocked) + return -1; + + switch (space) { + case XCL_ADDR_SPACE_DEVICE_RAM: + { + const size_t totalSize = size; + const size_t mod_size1 = offset % DDR_BUFFER_ALIGNMENT; + const size_t mod_size2 = size % DDR_BUFFER_ALIGNMENT; + if (mod_size1) { + // Buffer not aligned at DDR_BUFFER_ALIGNMENT boundary, need to do Read-Modify-Write + return xclReadModifyWrite(offset, hostBuf, size); + } + else if (mod_size2) { + // Buffer not a multiple of DDR_BUFFER_ALIGNMENT, write out the initial block and + // then perform a Read-Modify-Write for the remainder buffer + const size_t blockSize = size - mod_size2; + if (xclWrite(space, offset, hostBuf, blockSize) != blockSize) + return -1; + offset += blockSize; + hostBuf = (const char *)hostBuf + blockSize; + if (xclReadModifyWrite(offset, hostBuf, mod_size2) != mod_size2) + return -1; + return totalSize; + } + + const char *curr = static_cast(hostBuf); + while (size > maxDMASize) { +#ifndef _WINDOWS +// TODO: Windows build support + if (mDataMover->pwrite64(curr,maxDMASize,offset) < 0) + return -1; +#endif + offset += maxDMASize; + curr += maxDMASize; + size -= maxDMASize; + } +#ifndef _WINDOWS +// TODO: Windows build support + if (mDataMover->pwrite64(curr,size,offset) < 0) + return -1; +#endif + return totalSize; + } + case XCL_ADDR_SPACE_DEVICE_PERFMON: + { + if (pcieBarWrite(PERFMON_BAR, offset, hostBuf, size) == 0) { + return size; + } + return -1; + } + case XCL_ADDR_KERNEL_CTRL: + { + if (mLogStream.is_open()) { + const unsigned *reg = static_cast(hostBuf); + size_t regSize = size / 4; + if (regSize > 32) + regSize = 32; + for (unsigned i = 0; i < regSize; i++) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << space << ", 0x" + << std::hex << offset + i << std::dec << ", 0x" << std::hex << reg[i] << std::dec << std::endl; + + } + } + if (pcieBarWrite(ACCELERATOR_BAR, offset, hostBuf, size) == 0) { + return size; + } + return -1; + } + default: + { + return -1; + } + } + } + + + size_t XDMAShim::xclReadSkipCopy(uint64_t offset, void *hostBuf, size_t size) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " + << offset << ", " << hostBuf << ", " << size << std::endl; + } + + const size_t mod_size = offset % DDR_BUFFER_ALIGNMENT; + // Need to do Read-Modify-Read +#ifndef _WINDOWS +// TODO: Windows build support +// alignas is defined in c++11 +#if GCC_VERSION >= 40800 + alignas(DDR_BUFFER_ALIGNMENT) char buffer[DDR_BUFFER_ALIGNMENT]; +#else + AlignedAllocator alignedBuffer(DDR_BUFFER_ALIGNMENT, DDR_BUFFER_ALIGNMENT); + char* buffer = alignedBuffer.getBuffer(); +#endif +#else + char buffer[DDR_BUFFER_ALIGNMENT]; +#endif + + // Read back one full aligned block starting from preceding aligned address + const uint64_t mod_offset = offset - mod_size; + if (xclRead(XCL_ADDR_SPACE_DEVICE_RAM, mod_offset, buffer, DDR_BUFFER_ALIGNMENT) != DDR_BUFFER_ALIGNMENT) + return -1; + + const size_t copy_size = (size + mod_size > DDR_BUFFER_ALIGNMENT) ? DDR_BUFFER_ALIGNMENT - mod_size : size; + + // Update the user buffer with partial read + std::memcpy(hostBuf, buffer + mod_size, copy_size); + + // Update the remainder of user buffer + if (size + mod_size > DDR_BUFFER_ALIGNMENT) { + const size_t read_size = xclRead(XCL_ADDR_SPACE_DEVICE_RAM, mod_offset + DDR_BUFFER_ALIGNMENT, + (char *)hostBuf + copy_size, size - copy_size); + if (read_size != (size - copy_size)) + return -1; + } + return size; + } + + size_t XDMAShim::xclRead(xclAddressSpace space, uint64_t offset, void *hostBuf, size_t size) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << space << ", " + << offset << ", " << hostBuf << ", " << size << std::endl; + } + + switch (space) { + case XCL_ADDR_SPACE_DEVICE_RAM: + { + const size_t mod_size1 = offset % DDR_BUFFER_ALIGNMENT; + const size_t mod_size2 = size % DDR_BUFFER_ALIGNMENT; + const size_t totalSize = size; + +// if(!mLocked) +// return -1; + + if (mod_size1) { + // Buffer not aligned at DDR_BUFFER_ALIGNMENT boundary, need to do Read-Skip-Copy + return xclReadSkipCopy(offset, hostBuf, size); + } + else if (mod_size2) { + // Buffer not a multiple of DDR_BUFFER_ALIGNMENT, read the initial block and + // then perform a Read-Skip-Copy for the remainder buffer + const size_t blockSize = size - mod_size2; + if (xclRead(space, offset, hostBuf, blockSize) != blockSize) + return -1; + offset += blockSize; + hostBuf = (char *)hostBuf + blockSize; + if (xclReadSkipCopy(offset, hostBuf, mod_size2) != mod_size2) + return -1; + return totalSize; + } + + char *curr = static_cast(hostBuf); + while (size > maxDMASize) { +#ifndef _WINDOWS +// TODO: Windows build support + if (mDataMover->pread64(curr,maxDMASize,offset) < 0) + return -1; +#endif + offset += maxDMASize; + curr += maxDMASize; + size -= maxDMASize; + } + +#ifndef _WINDOWS +// TODO: Windows build support + if (mDataMover->pread64(curr,size,offset) < 0) + return -1; +#endif + return totalSize; + } + case XCL_ADDR_SPACE_DEVICE_PERFMON: + { + if (pcieBarRead(PERFMON_BAR, offset, hostBuf, size) == 0) { + return size; + } + return -1; + } + case XCL_ADDR_KERNEL_CTRL: + { + int result = pcieBarRead(ACCELERATOR_BAR, offset, hostBuf, size); + if (mLogStream.is_open()) { + const unsigned *reg = static_cast(hostBuf); + size_t regSize = size / 4; + if (regSize > 4) + regSize = 4; + for (unsigned i = 0; i < regSize; i++) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << space << ", 0x" + << std::hex << offset + i << std::dec << ", 0x" << std::hex << reg[i] << std::dec << std::endl; + } + } + return !result ? size : 0; + } + default: + { + return -1; + } + } + } + + uint64_t XDMAShim::xclAllocDeviceBuffer(size_t size) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << size << std::endl; + } + + if (size == 0) + size = DDR_BUFFER_ALIGNMENT; + + uint64_t result = MemoryManager::mNull; + for (auto i : mDDRMemoryManager) { + result = i->alloc(size); + if (result != MemoryManager::mNull) + break; + } + return result; + } + + uint64_t XDMAShim::xclAllocDeviceBuffer2(size_t size, xclMemoryDomains domain, unsigned flags) + { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << size << ", " + << domain << ", " << flags << std::endl; + } + + if (domain != XCL_MEM_DEVICE_RAM) + return MemoryManager::mNull; + + if (size == 0) + size = DDR_BUFFER_ALIGNMENT; + + if (flags >= mDDRMemoryManager.size()) { + return MemoryManager::mNull; + } + return mDDRMemoryManager[flags]->alloc(size); + } + + void XDMAShim::xclFreeDeviceBuffer(uint64_t buf) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << buf << std::endl; + } + + uint64_t size = 0; + for (auto i : mDDRMemoryManager) { + size += i->size(); + if (buf < size) { + i->free(buf); + } + } + } + + + size_t XDMAShim::xclCopyBufferHost2Device(uint64_t dest, const void *src, size_t size, size_t seek) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << dest << ", " + << src << ", " << size << ", " << seek << std::endl; + } + +#ifdef DEBUG + { + // Ensure that this buffer was allocated by memory manager before + const uint64_t v = MemoryManager::mNull; + std::pair buf = std::make_pair(v, v); + uint64_t high = 0; + for (auto i : mDDRMemoryManager) { + high += i->size(); + if (dest < high) { + buf = i->lookup(dest); + break; + } + } + if (MemoryManager::isNullAlloc(buf)) + return -1; + + if (buf.second < (size + seek)) + return -1; + } +#endif + dest += seek; + return xclWrite(XCL_ADDR_SPACE_DEVICE_RAM, dest, src, size); + } + + + size_t XDMAShim::xclCopyBufferDevice2Host(void *dest, uint64_t src, size_t size, size_t skip) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << dest << ", " + << src << ", " << size << ", " << skip << std::endl; + } + + +#ifdef DEBUG + { + // Ensure that this buffer was allocated by memory manager before + const uint64_t v = MemoryManager::mNull; + std::pair buf = std::make_pair(v, v); + uint64_t high = 0; + for (auto i : mDDRMemoryManager) { + high += i->size(); + if (src < high) { + buf = i->lookup(src); + break; + } + } + if (MemoryManager::isNullAlloc(buf)) + return -1; + + if (buf.second < (size + skip)) + return -1; + } +#endif + src += skip; + return xclRead(XCL_ADDR_SPACE_DEVICE_RAM, src, dest, size); + } + + + XDMAShim *XDMAShim::handleCheck(void *handle) { + // Sanity checks + if (!handle) + return 0; + if (*(unsigned *)handle != TAG) + return 0; + if (!((XDMAShim *)handle)->isGood()) { + return 0; + } + + return (XDMAShim *)handle; + } + + unsigned XDMAShim::xclProbe() { + char file_name_buf[128]; + unsigned i = 0; + for (i = 0; i < 64; i++) { + std::sprintf((char *)&file_name_buf, "/dev/xcldma/xcldma%d_user", i); +#ifndef _WINDOWS +// TODO: Windows build support +// open, close is defined in unistd.h + int fd = open(file_name_buf, O_RDWR); + if (fd < 0) { + return i; + } + close(fd); +#endif + } + return i; + } + + void XDMAShim::initMemoryManager() + { + if (!mDeviceInfo.mDDRBankCount) + return; + const uint64_t bankSize = mDeviceInfo.mDDRSize / mDeviceInfo.mDDRBankCount; + uint64_t start = 0; + for (unsigned i = 0; i < mDeviceInfo.mDDRBankCount; i++) { + mDDRMemoryManager.push_back(new MemoryManager(bankSize, start, DDR_BUFFER_ALIGNMENT)); + start += bankSize; + } + } + + XDMAShim::~XDMAShim() + { +#ifndef _WINDOWS +// TODO: Windows build support +// munmap is defined in sys/mman.h +// close is defined in unistd.h + if (mUserMap != MAP_FAILED) { + munmap(mUserMap, MMAP_SIZE_USER); + } + if (mUserHandle > 0) { + close(mUserHandle); + } + + delete mDataMover; +#endif + for (auto i : mDDRMemoryManager) { + delete i; + } + + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << std::endl; + mLogStream.close(); + } + } + + XDMAShim::XDMAShim(unsigned index, const char *logfileName, + xclVerbosityLevel verbosity) : mTag(TAG), mBoardNumber(index), + maxDMASize(0xfa0000), + mLocked(false), + mOffsets{0x0, 0x0, 0x0, 0x0}, + mOclRegionProfilingNumberSlots(XPAR_AXI_PERF_MON_2_NUMBER_SLOTS) + { + mDataMover = new DataMover(mBoardNumber, 1 /* 1 channel each dir */); + char file_name_buf[128]; + std::sprintf((char *)&file_name_buf, "/dev/xcldma/xcldma%d_user", mBoardNumber); + mUserHandle = open(file_name_buf, O_RDWR | O_SYNC); + + mUserMap = (char *)mmap(0, MMAP_SIZE_USER, PROT_READ | PROT_WRITE, MAP_SHARED, mUserHandle, 0); + if (mUserMap == MAP_FAILED) { + close(mUserHandle); + mUserHandle = -1; + } + + if (logfileName && (logfileName[0] != '\0')) { + mLogStream.open(logfileName); + mLogStream << "FUNCTION, THREAD ID, ARG..." << std::endl; + mLogStream << __func__ << ", " << std::this_thread::get_id() << std::endl; + } + + // First try the new info2 method and if that fails fall back to legacy info + if (xclGetDeviceInfo2(&mDeviceInfo)) { + xclDeviceInfo oldInfo; + if (xclGetDeviceInfo(&oldInfo)) { + close(mUserHandle); + mUserHandle = -1; + } + else { + mDeviceInfo = to_info2(oldInfo); + } + } + initMemoryManager(); + } + + bool XDMAShim::isGood() const { + if (!mDataMover) + return false; + if (mUserHandle < 0) + return false; + return mDataMover->isGood(); + // TODO: Add sanity check for card state + } + + + int XDMAShim::pcieBarRead(int bar_num, unsigned long long offset, void* buffer, unsigned long long length) { + const char *mem = 0; + switch (bar_num) { + case 0: + { + if ((length + offset) > MMAP_SIZE_USER) { + return -1; + } + mem = mUserMap; + break; + } + default: + { + return -1; + } + } + + char *qBuf = (char *)buffer; + while (length >= 4) { + *(unsigned *)qBuf = *(unsigned *)(mem + offset); + offset += 4; + qBuf += 4; + length -= 4; + } + while (length) { + *qBuf = *(mem + offset); + offset++; + qBuf++; + length--; + } + +// std::memcpy(buffer, mem + offset, length); + return 0; + } + + int XDMAShim::pcieBarWrite(int bar_num, unsigned long long offset, const void* buffer, unsigned long long length) { + char *mem = 0; + switch (bar_num) { + case 0: + { + if ((length + offset) > MMAP_SIZE_USER) { + return -1; + } + mem = mUserMap; + break; + } + default: + { + return -1; + } + } + + char *qBuf = (char *)buffer; + while (length >= 4) { + *(unsigned *)(mem + offset) = *(unsigned *)qBuf; + offset += 4; + qBuf += 4; + length -= 4; + } + while (length) { + *(mem + offset) = *qBuf; + offset++; + qBuf++; + length--; + } + +// std::memcpy(mem + offset, buffer, length); + return 0; + } + + bool XDMAShim::zeroOutDDR() + { + // Zero out the DDR so MIG ECC believes we have touched all the bits + // and it does not complain when we try to read back without explicit + // write. The latter usually happens as a result of read-modify-write + // TODO: Try and speed this up. + // [1] Possibly move to kernel mode driver. + // [2] Zero out specific buffers when they are allocated + static const unsigned long long BLOCK_SIZE = 0x4000000; + void *buf = 0; + if (posix_memalign(&buf, DDR_BUFFER_ALIGNMENT, BLOCK_SIZE)) + return false; + memset(buf, 0, BLOCK_SIZE); + mDataMover->pset64(buf, BLOCK_SIZE, 0, mDeviceInfo.mDDRSize/BLOCK_SIZE); + free(buf); + return true; + } + + bool XDMAShim::xclLockDevice() + { + if (mDataMover->lock() == false) + return false; + + if (flock(mUserHandle, LOCK_EX | LOCK_NB) == -1) { + mDataMover->unlock(); + return false; + } + mLocked = true; + + return zeroOutDDR(); + } + + std::string XDMAShim::getDSAName(unsigned short deviceId, unsigned short subsystemId) + { + std::string dsa("xilinx:?:?:?"); + const unsigned dsaNum = (deviceId << 16) | subsystemId; + switch(dsaNum) + { + case 0x71380121: + dsa = "xilinx:adm-pcie-7v3:1ddr" OSTAG ":2.1"; + break; + case 0x71380122: + dsa = "xilinx:adm-pcie-7v3:1ddr" OSTAG ":2.2"; + break; + case 0x71380123: + dsa = "xilinx:adm-pcie-7v3:1ddr" OSTAG ":2.3"; + break; + case 0x71380130: + dsa = "xilinx:adm-pcie-7v3:1ddr" OSTAG ":3.0"; + break; + case 0x71380131: + dsa = "xilinx:adm-pcie-7v3:1ddr" OSTAG ":3.1"; + break; + case 0x71380132: + dsa = "xilinx:adm-pcie-7v3:1ddr" OSTAG ":3.2"; + break; + case 0x71380221: + dsa = "xilinx:adm-pcie-7v3:2ddr" OSTAG ":2.1"; + break; + case 0x81380121: + dsa = "xilinx:adm-pcie-ku3:1ddr" OSTAG ":2.1"; + break; + case 0x81380122: + dsa = "xilinx:adm-pcie-ku3:1ddr" OSTAG ":2.2"; + break; + case 0x81380130: + dsa = "xilinx:adm-pcie-ku3:1ddr" OSTAG ":3.0"; + break; + case 0x81380221: + dsa = "xilinx:adm-pcie-ku3:2ddr" OSTAG ":2.1"; + break; + case 0x81380222: + dsa = "xilinx:adm-pcie-ku3:2ddr" OSTAG ":2.2"; + break; + case 0x81380230: + dsa = "xilinx:adm-pcie-ku3:2ddr" OSTAG ":3.0"; + break; + case 0x81380231: + dsa = "xilinx:adm-pcie-ku3:2ddr" OSTAG ":3.1"; + break; + case 0x81380232: + dsa = "xilinx:adm-pcie-ku3:2ddr" OSTAG ":3.2"; + break; + case 0x81381231: + dsa = "xilinx:adm-pcie-ku3:2ddr-40g:3.1"; + break; + case 0x81381232: + dsa = "xilinx:adm-pcie-ku3:2ddr-40g:3.2"; + break; + case 0x81388221: + dsa = "xilinx:adm-pcie-ku3:tandem-2ddr:2.1"; + break; + case 0x81388222: + dsa = "xilinx:adm-pcie-ku3:tandem-2ddr:2.2"; + break; + case 0x81388230: + dsa = "xilinx:adm-pcie-ku3:tandem-2ddr:3.0"; + break; + case 0x81384221: + dsa = "xilinx:adm-pcie-ku3:exp-pr-2ddr:2.1"; + break; + case 0x81384222: + dsa = "xilinx:adm-pcie-ku3:2ddr-xpr:2.2"; + break; + case 0x81384230: + dsa = "xilinx:adm-pcie-ku3:2ddr-xpr:3.0"; + break; + case 0x81384231: + dsa = "xilinx:adm-pcie-ku3:2ddr-xpr:3.1"; + break; + case 0x81384232: + dsa = "xilinx:adm-pcie-ku3:2ddr-xpr:3.2"; + break; + case 0x82380222: + dsa = "xilinx:tul-pcie3-ku115:2ddr:2.2"; + break; + case 0x82380230: + dsa = "xilinx:tul-pcie3-ku115:2ddr:3.0"; + break; + case 0x82380231: + dsa = "xilinx:tul-pcie3-ku115:2ddr:3.1"; + break; + case 0x82380232: + dsa = "xilinx:tul-pcie3-ku115:2ddr:3.2"; + break; + case 0x82384422: + dsa = "xilinx:tul-pcie3-ku115:4ddr-xpr:2.2"; + break; + case 0x82384430: + dsa = "xilinx:tul-pcie3-ku115:4ddr-xpr:3.0"; + break; + case 0x82384431: + dsa = "xilinx:tul-pcie3-ku115:4ddr-xpr:3.1"; + break; + case 0x82384432: + dsa = "xilinx:xil-accel-rd-ku115:4ddr-xpr:3.2"; + break; + case 0x83384431: + dsa = "xilinx:tul-pcie3-vu095:4ddr-xpr:3.1"; + break; + case 0x83384432: + dsa = "xilinx:tul-pcie3-vu095:4ddr-xpr:3.2"; + break; + case 0x84380231: + dsa = "xilinx:adm-pcie-8k5:2ddr:3.1"; + break; + case 0x84380232: + dsa = "xilinx:adm-pcie-8k5:2ddr:3.2"; + break; + case 0x923F4232: + dsa = "xilinx:minotaur-pcie-vu9p:2ddr-xpr:3.2"; + break; + case 0x923F4432: + dsa = "xilinx:minotaur-pcie-vu9p:4ddr-xpr:3.2"; + break; + + default: + break; + } + return dsa; + } + + int XDMAShim::xclGetDeviceInfo2(xclDeviceInfo2 *info) + { + std::memset(info, 0, sizeof(xclDeviceInfo2)); + info->mMagic = 0X586C0C6C; + info->mHALMajorVersion = XCLHAL_MAJOR_VER; + info->mHALMajorVersion = XCLHAL_MINOR_VER; + info->mMinTransferSize = DDR_BUFFER_ALIGNMENT; + info->mDMAThreads = mDataMover->channelCount(); +#ifndef _WINDOWS +// TODO: Windows build support +// XDMA_IOCINFO depends on _IOW, which is defined indirectly by +// ioctl is defined in sys/ioctl.h + xdma_ioc_info2 obj = {{0X586C0C6C, XDMA_IOCINFO2}}; + int ret = ioctl(mUserHandle, XDMA_IOCINFO2, &obj); + if (ret) + return ret; + info->mVendorId = obj.vendor; + info->mDeviceId = obj.device; + info->mSubsystemId = obj.subsystem_device; + info->mSubsystemVendorId = obj.subsystem_vendor; + info->mDeviceVersion = obj.subsystem_device & 0x00ff; +#endif + // TUL cards (0x8238) have 4 GB / bank; other cards have 8 GB memory / bank + info->mDDRSize = (info->mDeviceId == 0x8238) ? 0x100000000 : 0x200000000; + info->mDataAlignment = DDR_BUFFER_ALIGNMENT; + info->mNumClocks = obj.num_clocks; + for (int i = 0; i < obj.num_clocks; ++i) { + info->mOCLFrequency[i] = obj.ocl_frequency[i]; + } + info->mPCIeLinkWidth = obj.pcie_link_width; + info->mPCIeLinkSpeed = obj.pcie_link_speed; + info->mDDRBankCount = info->mSubsystemId & 0x0f00; + info->mDDRBankCount >>= 8; + if (info->mDDRBankCount == 0) + info->mDDRBankCount = 1; + + info->mDDRSize *= info->mDDRBankCount; + for (auto i : mDDRMemoryManager) { + info->mDDRFreeSize += i->freeSize(); + } + + const std::string deviceName = getDSAName(info->mDeviceId, info->mSubsystemId); + if (mLogStream.is_open()) + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << deviceName << std::endl; + + std::size_t length = deviceName.copy(info->mName, deviceName.length(),0); + info->mName[length] = '\0'; + + if (mLogStream.is_open()) { + mLogStream << __func__ << ": name=" << deviceName << ", version=0x" << std::hex << info->mDeviceVersion + << ", clock freq=" << std::dec << info->mOCLFrequency[0] + << ", clock freq 2=" << std::dec << info->mOCLFrequency[1] << std::endl; + } + + info->mOnChipTemp = obj.onchip_temp; + info->mFanTemp = obj.fan_temp; + info->mVInt = obj.vcc_int; + info->mVAux = obj.vcc_aux; + info->mVBram = obj.vcc_bram; + info->mMigCalib = obj.mig_calibration; + + return 0; + } + + int XDMAShim::xclGetDeviceInfo(xclDeviceInfo *info) + { + std::memset(info, 0, sizeof(xclDeviceInfo)); + info->mMagic = 0X586C0C6C; + info->mHALMajorVersion = XCLHAL_MAJOR_VER; + info->mHALMajorVersion = XCLHAL_MINOR_VER; + info->mMinTransferSize = DDR_BUFFER_ALIGNMENT; + info->mDMAThreads = mDataMover->channelCount(); +#ifndef _WINDOWS +// TODO: Windows build support +// XDMA_IOCINFO depends on _IOW, which is defined indirectly by +// ioctl is defined in sys/ioctl.h + xdma_ioc_info obj = {{0X586C0C6C, XDMA_IOCINFO}}; + int ret = ioctl(mUserHandle, XDMA_IOCINFO, &obj); + if (ret) + return ret; + info->mVendorId = obj.vendor; + info->mDeviceId = obj.device; + info->mSubsystemId = obj.subsystem_device; + info->mSubsystemVendorId = obj.subsystem_vendor; + info->mDeviceVersion = obj.subsystem_device & 0x00ff; +#endif + // TUL cards (0x8238) have 4 GB / bank; other cards have 8 GB memory / bank + info->mDDRSize = (info->mDeviceId == 0x8238) ? 0x100000000 : 0x200000000; + info->mDataAlignment = DDR_BUFFER_ALIGNMENT; + info->mOCLFrequency = obj.ocl_frequency; + info->mPCIeLinkWidth = obj.pcie_link_width; + info->mPCIeLinkSpeed = obj.pcie_link_speed; + info->mDDRBankCount = info->mSubsystemId & 0x0f00; + info->mDDRBankCount >>= 8; + if (info->mDDRBankCount == 0) + info->mDDRBankCount = 1; + + info->mDDRSize *= info->mDDRBankCount; + for (auto i : mDDRMemoryManager) { + info->mDDRFreeSize += i->freeSize(); + } + + const std::string deviceName = getDSAName(info->mDeviceId, info->mSubsystemId); + if (mLogStream.is_open()) + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << deviceName << std::endl; + + + std::size_t length = deviceName.copy(info->mName, deviceName.length(),0); + info->mName[length] = '\0'; + + if (mLogStream.is_open()) { + mLogStream << __func__ << ": name=" << deviceName << ", version=0x" << std::hex << info->mDeviceVersion + << ", clock freq=" << std::dec << info->mOCLFrequency << std::endl; + } + return 0; + } + + int XDMAShim::resetDevice(xclResetKind kind) { +#ifndef _WINDOWS +// TODO: Windows build support +// XDMA_IOCRESET depends on _IOW, which is defined indirectly by +// ioctl is defined in sys/ioctl.h + for (auto i : mDDRMemoryManager) { + i->reset(); + } + + // Call a new IOCTL to just reset the OCL region + if (kind == XCL_RESET_FULL) { + xdma_ioc_base obj = {0X586C0C6C, XDMA_IOCHOTRESET}; + return ioctl(mUserHandle, XDMA_IOCHOTRESET, &obj); + } + else if (kind == XCL_RESET_KERNEL) { + xdma_ioc_base obj = {0X586C0C6C, XDMA_IOCOCLRESET}; + return ioctl(mUserHandle, XDMA_IOCOCLRESET, &obj); + } + return -EINVAL; +#else + return 0; +#endif + } + + int XDMAShim::xclReClock(unsigned freqMHz) + { + xdma_ioc_freqscaling obj = {{0X586C0C6C, XDMA_IOCFREQSCALING}, freqMHz}; + return ioctl(mUserHandle, XDMA_IOCFREQSCALING, &obj); + } + + int XDMAShim::xclReClock2(unsigned short region, const unsigned short *targetFreqMHz) + { + xdma_ioc_freqscaling2 obj; + std::memset(&obj, 0, sizeof(xdma_ioc_freqscaling2)); + obj.base= {0X586C0C6C, XDMA_IOCFREQSCALING2}; + obj.ocl_region = region; + obj.ocl_target_freq[0] = targetFreqMHz[0]; + obj.ocl_target_freq[1] = targetFreqMHz[1]; + return ioctl(mUserHandle, XDMA_IOCFREQSCALING2, &obj); + } +} + + +xclDeviceHandle xclOpen(unsigned index, const char *logfileName, xclVerbosityLevel level) +{ + xclxdma::XDMAShim *handle = new xclxdma::XDMAShim(index, logfileName, level); + if (!xclxdma::XDMAShim::handleCheck(handle)) { + delete handle; + handle = 0; + } + + return (xclDeviceHandle *)handle; +} + +void xclClose(xclDeviceHandle handle) +{ + if (xclxdma::XDMAShim::handleCheck(handle)) { + delete ((xclxdma::XDMAShim *)handle); + } +} + + +int xclGetDeviceInfo(xclDeviceHandle handle, xclDeviceInfo *info) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclGetDeviceInfo(info); +} + +int xclGetDeviceInfo2(xclDeviceHandle handle, xclDeviceInfo2 *info) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclGetDeviceInfo2(info); +} + +int xclLoadBitstream(xclDeviceHandle handle, const char *xclBinFileName) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclLoadBitstream(xclBinFileName); +} + +int xclLoadXclBin(xclDeviceHandle handle, const xclBin *buffer) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclLoadXclBin(buffer); +} + +size_t xclWrite(xclDeviceHandle handle, xclAddressSpace space, uint64_t offset, const void *hostBuf, size_t size) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclWrite(space, offset, hostBuf, size); +} + +size_t xclRead(xclDeviceHandle handle, xclAddressSpace space, uint64_t offset, void *hostBuf, size_t size) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclRead(space, offset, hostBuf, size); +} + + +uint64_t xclAllocDeviceBuffer(xclDeviceHandle handle, size_t size) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclAllocDeviceBuffer(size); +} + + +uint64_t xclAllocDeviceBuffer2(xclDeviceHandle handle, size_t size, xclMemoryDomains domain, + unsigned flags) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclAllocDeviceBuffer2(size, domain, flags); +} + + +void xclFreeDeviceBuffer(xclDeviceHandle handle, uint64_t buf) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return; + return drv->xclFreeDeviceBuffer(buf); +} + + +size_t xclCopyBufferHost2Device(xclDeviceHandle handle, uint64_t dest, const void *src, size_t size, size_t seek) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclCopyBufferHost2Device(dest, src, size, seek); +} + + +size_t xclCopyBufferDevice2Host(xclDeviceHandle handle, void *dest, uint64_t src, size_t size, size_t skip) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclCopyBufferDevice2Host(dest, src, size, skip); +} + + +//This will be deprecated. +int xclUpgradeFirmware(xclDeviceHandle handle, const char *fileName) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclUpgradeFirmware(fileName); +} + +int xclUpgradeFirmware2(xclDeviceHandle handle, const char *fileName1, const char* fileName2) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + + if(!fileName2 || std::strlen(fileName2) == 0) + return drv->xclUpgradeFirmware(fileName1); + else + return drv->xclUpgradeFirmware2(fileName1, fileName2); +} + +int xclUpgradeFirmwareXSpi(xclDeviceHandle handle, const char *fileName, int index) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclUpgradeFirmwareXSpi(fileName, index); +} + +int xclTestXSpi(xclDeviceHandle handle, int index) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclTestXSpi(index); +} + +int xclBootFPGA(xclDeviceHandle handle) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclBootFPGA(); +} + +unsigned xclProbe() +{ + return xclxdma::XDMAShim::xclProbe(); +} + + +int xclResetDevice(xclDeviceHandle handle, xclResetKind kind) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->resetDevice(kind); +} + +int xclReClock(xclDeviceHandle handle, unsigned targetFreqMHz) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclReClock(targetFreqMHz); +} + + +int xclReClock2(xclDeviceHandle handle, unsigned short region, const unsigned short *targetFreqMHz) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclReClock2(region, targetFreqMHz); +} + + +int xclLockDevice(xclDeviceHandle handle) +{ + xclxdma::XDMAShim *drv = xclxdma::XDMAShim::handleCheck(handle); + if (!drv) + return -1; + return drv->xclLockDevice() ? 0 : -1; +} + +// XSIP watermark, do not delete 67d7842dbbe25473c3c32b93c0da8047785f30d78e8a024de1b57352245f9689 diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/shim.h b/sdk/SDAccel/HAL/driver/xcldma/user/shim.h new file mode 100644 index 000000000..ae3f820d7 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/shim.h @@ -0,0 +1,256 @@ +#ifndef _XDMA_SHIM_H_ +#define _XDMA_SHIM_H_ + +/** + * Copyright (C) 2015-2016 Xilinx, Inc + * Author: Sonal Santan + * XDMA HAL Driver layered on top of XDMA kernel driver + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#include "driver/include/xclhal.h" +#include "driver/include/xclperf.h" +#include "driver/xcldma/include/xbar_sys_parameters.h" + +#include +#include +#include +#include +#include + +// Work around GCC 4.8 + XDMA BAR implementation bugs +// With -O3 PCIe BAR read/write are not reliable hence force -O2 as max +// optimization level for pcieBarRead() and pcieBarWrite() +#if defined(__GNUC__) && defined(NDEBUG) +#define SHIM_O2 __attribute__ ((optimize("-O2"))) +#else +#define SHIM_O2 +#endif + +namespace xclxdma { + // Memory alignment for DDR and AXI-MM trace access + template class AlignedAllocator { + void *mBuffer; + size_t mCount; + public: + T *getBuffer() { + return (T *)mBuffer; + } + + size_t size() const { + return mCount * sizeof(T); + } + + AlignedAllocator(size_t alignment, size_t count) : mBuffer(0), mCount(count) { + if (posix_memalign(&mBuffer, alignment, count * sizeof(T))) { + mBuffer = 0; + } + } + ~AlignedAllocator() { + if (mBuffer) + free(mBuffer); + } + }; + + class MemoryManager; + class DataMover; + // XDMA Shim + class XDMAShim { + + struct ELARecord { + unsigned mStartAddress; + unsigned mEndAddress; + unsigned mDataCount; + + std::streampos mDataPos; + ELARecord() : mStartAddress(0), mEndAddress(0), + mDataCount(0), mDataPos(0) {} + }; + + typedef std::list ELARecordList; + + typedef std::list > PairList; + + public: + + // Bitstreams + int xclLoadBitstream(const char *fileName); + int xclLoadXclBin(const xclBin *buffer); + int xclUpgradeFirmware(const char *fileName); + int xclUpgradeFirmware2(const char *file1, const char* file2); + int xclUpgradeFirmwareXSpi(const char *fileName, int device_index=0); + int xclTestXSpi(int device_index); + int xclBootFPGA(); + int resetDevice(xclResetKind kind); + int xclReClock(unsigned targetFreqMHz); + int xclReClock2(unsigned short region, const unsigned short *targetFreqMHz); + + // Raw read/write + size_t xclWrite(xclAddressSpace space, uint64_t offset, const void *hostBuf, size_t size); + size_t xclRead(xclAddressSpace space, uint64_t offset, void *hostBuf, size_t size); + + // Buffer management + uint64_t xclAllocDeviceBuffer(size_t size); + uint64_t xclAllocDeviceBuffer2(size_t size, xclMemoryDomains domain, unsigned flags); + void xclFreeDeviceBuffer(uint64_t buf); + size_t xclCopyBufferHost2Device(uint64_t dest, const void *src, size_t size, size_t seek); + size_t xclCopyBufferDevice2Host(void *dest, uint64_t src, size_t size, size_t skip); + + // Performance monitoring + // Control + double xclGetDeviceClockFreqMHz(); + double xclGetReadMaxBandwidthMBps(); + double xclGetWriteMaxBandwidthMBps(); + void xclSetOclRegionProfilingNumberSlots(uint32_t numSlots); + size_t xclPerfMonClockTraining(xclPerfMonType type); + // Counters + size_t xclPerfMonStartCounters(xclPerfMonType type); + size_t xclPerfMonStopCounters(xclPerfMonType type); + size_t xclPerfMonReadCounters(xclPerfMonType type, xclCounterResults& counterResults); + // Trace + size_t xclPerfMonStartTrace(xclPerfMonType type, uint32_t startTrigger); + size_t xclPerfMonStopTrace(xclPerfMonType type); + uint32_t xclPerfMonGetTraceCount(xclPerfMonType type); + size_t xclPerfMonReadTrace(xclPerfMonType type, xclTraceResultsVector& traceVector); + + // Sanity checks + int xclGetDeviceInfo(xclDeviceInfo *info); + int xclGetDeviceInfo2(xclDeviceInfo2 *info); + static XDMAShim *handleCheck(void *handle); + static unsigned xclProbe(); + bool xclLockDevice(); + unsigned getTAG() const { + return mTag; + } + bool isGood() const; + + ~XDMAShim(); + XDMAShim(unsigned index, const char *logfileName, xclVerbosityLevel verbosity); + + private: + + size_t xclReadModifyWrite(uint64_t offset, const void *hostBuf, size_t size); + size_t xclReadSkipCopy(uint64_t offset, void *hostBuf, size_t size); + bool zeroOutDDR(); + + bool isXPR() const { + return ((mDeviceInfo.mSubsystemId >> 12) == 4); + } + + bool isMultipleOCLClockSupported() { + unsigned dsaNum = ((mDeviceInfo.mDeviceId << 16) | mDeviceInfo.mSubsystemId); + // 0x82384431 : TUL KU115 4ddr 3.1 DSA + return ((dsaNum == 0x82384431) || (dsaNum == 0x82384432))? true : false; + } + + bool isUltraScale() const { + return (mDeviceInfo.mDeviceId & 0x8000); + } + void initMemoryManager(); + + // Core DMA code + SHIM_O2 int pcieBarRead(int bar_num, unsigned long long offset, void* buffer, unsigned long long length); + SHIM_O2 int pcieBarWrite(int bar_num, unsigned long long offset, const void* buffer, unsigned long long length); + int freezeAXIGate(); + int freeAXIGate(); + + // PROM flashing + int prepare(unsigned startAddress, unsigned endAddress); + int program(std::ifstream& mcsStream, const ELARecord& record); + int program(std::ifstream& mcsStream); + int waitForReady(unsigned code, bool verbose = true); + int waitAndFinish(unsigned code, unsigned data, bool verbose = true); + + //XSpi flashing. + bool prepareXSpi(); + int programXSpi(std::ifstream& mcsStream, const ELARecord& record); + int programXSpi(std::ifstream& mcsStream); + bool waitTxEmpty(); + bool isFlashReady(); + //bool windDownWrites(); + bool bulkErase(); + bool sectorErase(unsigned Addr); + bool writeEnable(); +#if 0 + bool dataTransfer(bool read); +#endif + bool readPage(unsigned addr, uint8_t readCmd = 0xff); + bool writePage(unsigned addr, uint8_t writeCmd = 0xff); + unsigned readReg(unsigned offset); + int writeReg(unsigned regOffset, unsigned value); + bool finalTransfer(uint8_t *sendBufPtr, uint8_t *recvBufPtr, int byteCount); + bool getFlashId(); + //All remaining read /write register commands can be issued through this function. + bool readRegister(unsigned commandCode, unsigned bytes); + bool writeRegister(unsigned commandCode, unsigned value, unsigned bytes); + bool select4ByteAddressMode(); + bool deSelect4ByteAddressMode(); + + + // Performance monitoring helper functions + bool isDSAVersion(unsigned majorVersion, unsigned minorVersion, bool onlyThisVersion); + unsigned getBankCount(); + uint64_t getHostTraceTimeNsec(); + uint64_t getPerfMonBaseAddress(xclPerfMonType type); + uint64_t getPerfMonFifoBaseAddress(xclPerfMonType type, uint32_t fifonum); + uint64_t getPerfMonFifoReadBaseAddress(xclPerfMonType type, uint32_t fifonum); + uint32_t getPerfMonNumberSlots(xclPerfMonType type); + uint32_t getPerfMonNumberSamples(xclPerfMonType type); + uint32_t getPerfMonNumberFifos(xclPerfMonType type); + uint32_t getPerfMonByteScaleFactor(xclPerfMonType type); + uint8_t getPerfMonShowIDS(xclPerfMonType type); + uint8_t getPerfMonShowLEN(xclPerfMonType type); + uint32_t getPerfMonSlotStartBit(xclPerfMonType type, uint32_t slotnum); + uint32_t getPerfMonSlotDataWidth(xclPerfMonType type, uint32_t slotnum); + size_t resetFifos(xclPerfMonType type); + uint32_t bin2dec(std::string str, int start, int number); + uint32_t bin2dec(const char * str, int start, int number); + std::string dec2bin(uint32_t n); + std::string dec2bin(uint32_t n, unsigned bits); + static std::string getDSAName(unsigned short deviceId, unsigned short subsystemId); + + private: + // This is a hidden signature of this class and helps in preventing + // user errors when incorrect pointers are passed in as handles. + const unsigned mTag; + const int mBoardNumber; + const size_t maxDMASize; + bool mLocked; + +#ifndef _WINDOWS +// TODO: Windows build support + // mOffsets doesn't seem to be used + // and it caused window compilation error when we try to initialize it + const uint64_t mOffsets[XCL_ADDR_SPACE_MAX]; +#endif + DataMover *mDataMover; + int mUserHandle; + uint32_t mOclRegionProfilingNumberSlots; + + char *mUserMap; + std::ofstream mLogStream; + xclVerbosityLevel mVerbosity; + std::string mBinfile; + ELARecordList mRecordList; + std::vector mDDRMemoryManager; + xclDeviceInfo2 mDeviceInfo; + + public: + static const unsigned TAG; + }; +} + +#endif + +// XSIP watermark, do not delete 67d7842dbbe25473c3c32b93c0da8047785f30d78e8a024de1b57352245f9689 diff --git a/sdk/SDAccel/HAL/driver/xcldma/user/xspi.cpp b/sdk/SDAccel/HAL/driver/xcldma/user/xspi.cpp new file mode 100755 index 000000000..158050026 --- /dev/null +++ b/sdk/SDAccel/HAL/driver/xcldma/user/xspi.cpp @@ -0,0 +1,1531 @@ +/* + * Copyright (C) 2016 Xilinx, Inc + * Author(s) : Sonal Santan + * : Hem Neema + * + * Licensed under the Apache License, Version 2.0 (the "License"). You may + * not use this file except in compliance with the License. A copy of the + * License is located at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "shim.h" +#include "driver/xcldma/include/xdma-ioctl.h" + +#ifdef WINDOWS +#define __func__ __FUNCTION__ +#endif + +#define FLASH_BASE_ADDRESS BPI_FLASH_OFFSET +#define PAGE_SIZE 256 +static const bool FOUR_BYTE_ADDRESSING = false; + +//testing sizes. +#define WRITE_DATA_SIZE 128 +#define READ_DATA_SIZE 128 + + +#define COMMAND_PAGE_PROGRAM 0x02 /* Page Program command */ +#define COMMAND_QUAD_WRITE 0x32 /* Quad Input Fast Program */ +#define COMMAND_EXT_QUAD_WRITE 0x38 /* Extended quad input fast program */ +#define COMMAND_SECTOR_ERASE 0xD8 /* Sector Erase command */ +#define COMMAND_BULK_ERASE 0xC7 /* Bulk Erase command */ +#define COMMAND_RANDOM_READ 0x03 /* Random read command */ +#define COMMAND_DUAL_READ 0x3B /* Dual Output Fast Read */ +#define COMMAND_DUAL_IO_READ 0xBB /* Dual IO Fast Read */ +#define COMMAND_QUAD_READ 0x6B /* Quad Output Fast Read */ +#define COMMAND_QUAD_IO_READ 0xEB /* Quad IO Fast Read */ +#define COMMAND_IDCODE_READ 0x9F /* Read ID Code */ +//read commands +#define COMMAND_STATUSREG_READ 0x05 /* Status read command */ +#define COMMAND_FLAG_STATUSREG_READ 0x70 /* Status flag read command */ +#define COMMAND_NON_VOLATILE_CFGREG_READ 0xB5 /* Non volatile configuration register read command */ +#define COMMAND_VOLATILE_CFGREG_READ 0x85 /* Volatile configuration register read command */ +#define COMMAND_ENH_VOLATILE_CFGREG_READ 0x65 /* Enhanced volatile configuration register read command */ +#define COMMAND_EXTENDED_ADDRESS_REG_READ 0xC8 /* Enhanced volatile configuration register read command */ +//write commands +#define COMMAND_STATUSREG_WRITE 0x01 /* Status read command */ +#define COMMAND_NON_VOLATILE_CFGREG_WRITE 0xB1 /* Non volatile configuration register read command */ +#define COMMAND_VOLATILE_CFGREG_WRITE 0x81 /* Volatile configuration register read command */ +#define COMMAND_ENH_VOLATILE_CFGREG_WRITE 0x61 /* Enhanced volatile configuration register read command */ +#define COMMAND_EXTENDED_ADDRESS_REG_WRITE 0xC5 /* Enhanced volatile configuration register read command */ + +#define COMMAND_CLEAR_FLAG_REGISTER 0x50 /* Clear flag register */ + +//4-byte addressing +#define ENTER_FOUR_BYTE_ADDR_MODE 0xB7 /* enter 4-byte address mode */ +#define EXIT_FOUR_BYTE_ADDR_MODE 0xE9 /* exit 4-byte address mode */ +#define FOUR_BYTE_READ 0x13 /* 4-byte read */ +#define FOUR_BYTE_FAST_READ 0x0C /* 4-byte fast read */ +#define FOUR_BYTE_DUAL_OUTPUT_FAST_READ 0x3C /* 4-byte dual output fast read */ +#define FOUR_BYTE_DUAL_IO_FAST_READ 0xBC /* 4-byte dual Input/output fast read */ +#define FOUR_BYTE_QUAD_OUTPUT_FAST_READ 0x6C /* 4-byte quad output fast read */ +#define FOUR_BYTE_QUAD_IO_FAST_READ 0xEC /* 4-byte quad output fast read */ +#define FOUR_BYTE_PAGE_PROGRAM 0x12 /* 4-byte page program */ +#define FOUR_BYTE_QUAD_INPUT_FAST_PROGRAM 0x34 /* 4-byte quad input fast program */ +#define FOUR_BYTE_QUAD_INPUT_EXT_FAST_PROGRAM 0x3E /* 4-byte quad input extended fast program */ +#define FOUR_BYTE_SECTOR_ERASE 0xDC /* 4-byte sector erase */ + +static const unsigned READ_WRITE_EXTRA_BYTES = FOUR_BYTE_ADDRESSING ? 5 :4; +static const unsigned SECTOR_ERASE_BYTES = FOUR_BYTE_ADDRESSING ? 5 :4; + + +#define IDCODE_READ_BYTES 5 + +#define DUAL_READ_DUMMY_BYTES 2 +#define QUAD_READ_DUMMY_BYTES 4 +#define DUAL_IO_READ_DUMMY_BYTES 2 +#define QUAD_IO_READ_DUMMY_BYTES 5 + +//#define READ_WRITE_EXTRA_BYTES 4 /* Read/Write extra bytes */ +//#define SECTOR_ERASE_BYTES 4 /* Sector erase extra bytes */ +#define WRITE_ENABLE_BYTES 1 /* Write Enable bytes */ +#define BULK_ERASE_BYTES 1 /* Bulk erase extra bytes */ +#define STATUS_READ_BYTES 2 /* Status read bytes count */ +#define STATUS_WRITE_BYTES 2 /* Status write bytes count */ + + + +#define NUM_SLAVES 2 +#define SLAVE_SELECT_MASK ((1 << NUM_SLAVES) -1) +/* + * Flash not busy mask in the status register of the flash device. + */ +#define FLASH_SR_IS_READY_MASK 0x01 /* Ready mask */ +#define COMMAND_WRITE_ENABLE 0x06 /* Write Enable command */ + +//SPI control reg masks. +#define XSP_CR_LOOPBACK_MASK 0x00000001 /**< Local loopback mode */ +#define XSP_CR_ENABLE_MASK 0x00000002 /**< System enable */ +#define XSP_CR_MASTER_MODE_MASK 0x00000004 /**< Enable master mode */ +#define XSP_CR_CLK_POLARITY_MASK 0x00000008 /**< Clock polarity high + or low */ +#define XSP_CR_CLK_PHASE_MASK 0x00000010 /**< Clock phase 0 or 1 */ +#define XSP_CR_TXFIFO_RESET_MASK 0x00000020 /**< Reset transmit FIFO */ +#define XSP_CR_RXFIFO_RESET_MASK 0x00000040 /**< Reset receive FIFO */ +#define XSP_CR_MANUAL_SS_MASK 0x00000080 /**< Manual slave select + assert */ +#define XSP_CR_TRANS_INHIBIT_MASK 0x00000100 /**< Master transaction + inhibit */ + +/** + * LSB/MSB first data format select. The default data format is MSB first. + * The LSB first data format is not available in all versions of the Xilinx Spi + * Device whereas the MSB first data format is supported by all the versions of + * the Xilinx Spi Devices. Please check the HW specification to see if this + * feature is supported or not. + */ +#define XSP_CR_LSB_MSB_FIRST_MASK 0x00000200 + +//End SPI CR masks + +//SPI status reg masks +#define XSP_SR_RX_EMPTY_MASK 0x00000001 /**< Receive Reg/FIFO is empty */ +#define XSP_SR_RX_FULL_MASK 0x00000002 /**< Receive Reg/FIFO is full */ +#define XSP_SR_TX_EMPTY_MASK 0x00000004 /**< Transmit Reg/FIFO is empty */ +#define XSP_SR_TX_FULL_MASK 0x00000008 /**< Transmit Reg/FIFO is full */ +#define XSP_SR_MODE_FAULT_MASK 0x00000010 /**< Mode fault error */ +#define XSP_SR_SLAVE_MODE_MASK 0x00000020 /**< Slave mode select */ + +/* + * The following bits are available only in axi_qspi Status register. + */ +#define XSP_SR_CPOL_CPHA_ERR_MASK 0x00000040 /**< CPOL/CPHA error */ +#define XSP_SR_SLAVE_MODE_ERR_MASK 0x00000080 /**< Slave mode error */ +#define XSP_SR_MSB_ERR_MASK 0x00000100 /**< MSB Error */ +#define XSP_SR_LOOP_BACK_ERR_MASK 0x00000200 /**< Loop back error */ +#define XSP_SR_CMD_ERR_MASK 0x00000400 /**< 'Invalid cmd' error */ + + +//End SPI SR masks + +#define XSP_SRR_OFFSET 0x40 /**< Software Reset register */ +#define XSP_CR_OFFSET 0x60 /**< Control register */ +#define XSP_SR_OFFSET 0x64 /**< Status Register */ +#define XSP_DTR_OFFSET 0x68 /**< Data transmit */ +#define XSP_DRR_OFFSET 0x6C /**< Data receive */ +#define XSP_SSR_OFFSET 0x70 /**< 32-bit slave select */ +#define XSP_TFO_OFFSET 0x74 /**< Tx FIFO occupancy */ +#define XSP_RFO_OFFSET 0x78 /**< Rx FIFO occupancy */ + +#define BYTE1 0 /* Byte 1 position */ +#define BYTE2 1 /* Byte 2 position */ +#define BYTE3 2 /* Byte 3 position */ +#define BYTE4 3 /* Byte 4 position */ +#define BYTE5 4 /* Byte 5 position */ +#define BYTE6 5 /* Byte 6 position */ +#define BYTE7 6 /* Byte 7 position */ +#define BYTE8 7 /* Byte 8 position */ + +/** + * SPI Software Reset Register (SRR) mask. + */ +#define XSP_SRR_RESET_MASK 0x0000000A + + +//---- +#define XSpi_ReadReg(RegOffset) readReg(RegOffset) +#define XSpi_WriteReg(RegOffset, RegisterValue) writeReg(RegOffset, RegisterValue) + +#define XSpi_SetControlReg(Mask) XSpi_WriteReg(XSP_CR_OFFSET, (Mask)) +#define XSpi_GetControlReg() XSpi_ReadReg(XSP_CR_OFFSET) + +#define XSpi_GetStatusReg() XSpi_ReadReg(XSP_SR_OFFSET) + +#define XSpi_SetSlaveSelectReg(Mask) XSpi_WriteReg(XSP_SSR_OFFSET, (Mask)) +#define XSpi_GetSlaveSelectReg() XSpi_ReadReg(XSP_SSR_OFFSET) + +//--- + +static uint8_t WriteBuffer[PAGE_SIZE + READ_WRITE_EXTRA_BYTES]; +static uint8_t ReadBuffer[PAGE_SIZE + READ_WRITE_EXTRA_BYTES + 4]; + +static int slave_index = 0; + +static bool TEST_MODE = false; +static bool TEST_MODE_MCS_ONLY = false; + +static const uint32_t CONTROL_REG_START_STATE + = XSP_CR_TRANS_INHIBIT_MASK | XSP_CR_MANUAL_SS_MASK |XSP_CR_RXFIFO_RESET_MASK + | XSP_CR_TXFIFO_RESET_MASK | XSP_CR_ENABLE_MASK | XSP_CR_MASTER_MODE_MASK ; + +namespace xclxdma +{ + +static void clearReadBuffer(unsigned size) { + for(unsigned i =0; i < size; ++i) { + ReadBuffer[i] = 0; + } +} + +static void clearWriteBuffer(unsigned size) { + for(unsigned i =0; i < size; ++i) { + WriteBuffer[i] = 0; + } +} + +static void clearBuffers() { + clearReadBuffer(PAGE_SIZE + READ_WRITE_EXTRA_BYTES+4); + clearWriteBuffer(PAGE_SIZE + READ_WRITE_EXTRA_BYTES); +} + +static unsigned getSector(unsigned address) { + return (address >> 24) & 0xF; +} + +int XDMAShim::xclTestXSpi(int index) +{ + TEST_MODE = true; + + if(TEST_MODE_MCS_ONLY) { + //just test the mcs. + return 0; + } + + //2 slaves present, set the slave index. + slave_index = index; + + + //print the IP (not of flash) control/status register. + uint32_t ControlReg = XSpi_GetControlReg(); + uint32_t StatusReg = XSpi_GetStatusReg(); + std::cout << "Boot IP Control/Status " << std::hex << ControlReg << "/" << StatusReg << std::dec << std::endl; + + + //Make sure it is ready to receive commands. + ControlReg = XSpi_GetControlReg(); + ControlReg = CONTROL_REG_START_STATE; + + XSpi_SetControlReg(ControlReg); + ControlReg = XSpi_GetControlReg(); + StatusReg = XSpi_GetStatusReg(); + std::cout << "Reset IP Control/Status " << std::hex << ControlReg << "/" << StatusReg << std::dec << std::endl; + +// if(!isFlashReady()) + // return -1; + + //1. Testing idCode reads. + //-- + std::cout << "Testing id code " << std::endl; + if(!getFlashId()) { + std::cout << "Exiting now, as could not get correct idcode" << std::endl; + exit(0); + return -1; + } + + std::cout << "id code successful (please verify the idcode output too" << std::endl; + std::cout << "Now reading various flash registers" << std::endl; + + //2. Testing register reads. + //Using STATUS_READ_BYTES 2 for all, TODO ? + uint8_t Cmd = COMMAND_STATUSREG_READ; + std::cout << "Testing COMMAND_STATUSREG_READ" << std::endl; + readRegister(Cmd, STATUS_READ_BYTES); + + std::cout << "Testing COMMAND_FLAG_STATUSREG_READ" << std::endl; + Cmd = COMMAND_FLAG_STATUSREG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + std::cout << "Testing COMMAND_NON_VOLATILE_CFGREG_READ" << std::endl; + Cmd = COMMAND_NON_VOLATILE_CFGREG_READ; + readRegister(Cmd, 4); + + std::cout << "Testing COMMAND_VOLATILE_CFGREG_READ" << std::endl; + Cmd = COMMAND_VOLATILE_CFGREG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + std::cout << "Testing COMMAND_ENH_VOLATILE_CFGREG_READ" << std::endl; + Cmd = COMMAND_ENH_VOLATILE_CFGREG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + //3. Testing simple read and write + std::cout << "Testing read and write of 16 bytes" << std::endl; + + //unsigned baseAddr = 0x007A0000; + unsigned baseAddr = 0; + unsigned Addr = 0; + unsigned AddressBytes = 3; + if(FOUR_BYTE_ADDRESSING) { + AddressBytes = 4; + writeRegister(ENTER_FOUR_BYTE_ADDR_MODE, 0, 0); + }else + writeRegister(EXIT_FOUR_BYTE_ADDR_MODE, 0, 0); + + //Verify 3 or 4 byte addressing, 0th bit == 1 => 4 byte. + std::cout << "Testing COMMAND_FLAG_STATUSREG_READ" << std::endl; + Cmd = COMMAND_FLAG_STATUSREG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + uint8_t WriteCmd = 0xff; + uint8_t ReadCmd = 0xff; + + //Test the higher two sectors - first test erase. + + //First try erasing a sector and reading a + //page (we should get FFFF ...) + for(unsigned sector = 2 ; sector <= 3; sector++) + { + clearBuffers(); + + if(!writeRegister(COMMAND_EXTENDED_ADDRESS_REG_WRITE, sector, 1)) + return false; + + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + //Sector Erase will reset TX and RX FIFO + if(!sectorErase(Addr + baseAddr)) + return false; + + bool ready = isFlashReady(); + if(!ready){ + std::cout << "Unable to get flash ready" << std::endl; + return false; + } + + //try faster read. + if(FOUR_BYTE_ADDRESSING) { + ReadCmd = FOUR_BYTE_QUAD_OUTPUT_FAST_READ; + }else + ReadCmd = COMMAND_QUAD_READ; + + //if(!readPage(Addr, ReadCmd)) + if(!readPage(Addr + baseAddr)) + return false; + } + + clearBuffers(); + //---Erase test done + + + //---Now try writing and reading a page. + //first write 2 pages (using 4 128Mb writes) each to 2 sectors, and then read them + + //Write data + for(unsigned sector = 2 ; sector <= 3; sector++) + { + if(!writeRegister(COMMAND_EXTENDED_ADDRESS_REG_WRITE, sector, 1)) + return false; + + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + for(int j = 0; j < 4; ++j) + { + clearBuffers(); + for(unsigned i = 0; i < WRITE_DATA_SIZE; ++ i) { + WriteBuffer[i+ AddressBytes + 1] = j + sector + i; //some random data. + } + + Addr = baseAddr + WRITE_DATA_SIZE*j; + + if(!writePage(Addr)) { + std::cout << "Write page unsuccessful, returning" << std::endl; + return -1; + } + } + + } + + + clearBuffers(); + + //Read the data back, use 2 reads each of 128 bytes, twice to test 2 pages. + for(unsigned sector = 2 ; sector <= 3; sector++) + { + //Select a sector (sector 2) + if(!writeRegister(COMMAND_EXTENDED_ADDRESS_REG_WRITE, sector, 1)) + return false; + + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + + //This read should be mix of a b c .. and Z Y X ... + for(int j = 0 ; j < 4; ++j) + { + clearBuffers(); + Addr = baseAddr + WRITE_DATA_SIZE*j; + if(!readPage(Addr)) { + std::cout << "Read page unsuccessful, returning" << std::endl; + return -1; + } + } + std::cout << "Done reading sector: " << sector << std::endl; + } + + return 0; +} + +int XDMAShim::xclUpgradeFirmware2(const char *file1, const char* file2) { + int status = 0; + status = xclUpgradeFirmwareXSpi(file1, 0); + if(status) + return status; + clearBuffers(); + mRecordList.clear(); + return xclUpgradeFirmwareXSpi(file2, 1); +} + +int XDMAShim::xclUpgradeFirmwareXSpi(const char *mcsFile, int index) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << ", " << mcsFile << std::endl; + } + + slave_index = index; + + if(!TEST_MODE) { +// std::cout << "INFO: Reseting hardware\n"; +// if (freezeAXIGate() != 0) { +// return -1; +// } +// +// const timespec req = {0, 5000}; +// nanosleep(&req, 0); +// if (freeAXIGate() != 0) { +// return -1; +// } +// nanosleep(&req, 0); + } + + std::string line; + std::ifstream mcsStream(mcsFile); + std::string startAddress; + ELARecord record; + bool endRecordFound = false; + + if(!mcsStream.is_open()) { + std::cout << "ERROR: Cannot open " << mcsFile << ". Check that it exists and is readable." << std::endl; + return -ENOENT; + } + + std::cout << "INFO: Parsing file " << mcsFile << std::endl; + while (!mcsStream.eof() && !endRecordFound) { + std::string line; + std::getline(mcsStream, line); + if (line.size() == 0) { + continue; + } + if (line[0] != ':') { + return -1; + } + const unsigned dataLen = std::stoi(line.substr(1, 2), 0 , 16); + const unsigned address = std::stoi(line.substr(3, 4), 0, 16); + const unsigned recordType = std::stoi(line.substr(7, 2), 0 , 16); + switch (recordType) { + case 0x00: + { + if (dataLen > 16) { + // For xilinx mcs files data length should be 16 for all records + // except for the last one which can be smaller + return -1; + } + if (address != record.mDataCount) { + std::cout << "Address is not contiguous ! " << std::endl; + return -1; + } + if (record.mEndAddress != address) { + return -1; + } + record.mDataCount += dataLen; + record.mEndAddress += dataLen; + break; + } + case 0x01: + { + if (startAddress.size() == 0) { + break; + } + mRecordList.push_back(record); + endRecordFound = true; + break; + } + case 0x02: + { + assert(0); + break; + } + case 0x04: + { + if (address != 0x0) { + return -1; + } + if (dataLen != 2) { + return -1; + } + std::string newAddress = line.substr(9, dataLen * 2); + if (startAddress.size()) { + // Finish the old record + mRecordList.push_back(record); + } + // Start a new record + record.mStartAddress = std::stoi(newAddress, 0 , 16); + record.mDataPos = mcsStream.tellg(); + record.mEndAddress = 0; + record.mDataCount = 0; + startAddress = newAddress; + } + } + } + + mcsStream.seekg(0); + std::cout << "INFO: Found " << mRecordList.size() << " ELA Records" << std::endl; + + return programXSpi(mcsStream); +} + +unsigned XDMAShim::readReg(unsigned RegOffset) { + unsigned value; + if(pcieBarRead(BPI_FLASH_BAR, FLASH_BASE_ADDRESS + RegOffset, &value, 4) != 0) { + assert(0); + std::cout << "read reg ERROR" << std::endl; + } + return value; +} + +int XDMAShim::writeReg(unsigned RegOffset, unsigned value) { + int status = pcieBarWrite(BPI_FLASH_BAR, FLASH_BASE_ADDRESS + RegOffset, &value, 4); + if(status != 0) { + assert(0); + std::cout << "write reg ERROR " << std::endl; + } + return status; +} + + +bool XDMAShim::waitTxEmpty() { + long long delay = 0; + const timespec req = {0, 5000}; + while (delay < 30000000000) { + uint32_t StatusReg = XSpi_GetStatusReg(); + if(StatusReg & XSP_SR_TX_EMPTY_MASK ) + return true; + //If not empty, check how many bytes remain. + uint32_t Data = XSpi_ReadReg(XSP_TFO_OFFSET); + std::cout << std::hex << Data << std::dec << std::endl; + nanosleep(&req, 0); + delay += 5000; + } + std::cout << "Unable to get Tx Empty\n"; + return false; +} + +bool XDMAShim::isFlashReady() { + uint32_t StatusReg; + const timespec req = {0, 5000}; + long long delay = 0; + while (delay < 30000000000) { + //StatusReg = XSpi_GetStatusReg(); + WriteBuffer[BYTE1] = COMMAND_STATUSREG_READ; + bool status = finalTransfer(WriteBuffer, ReadBuffer, STATUS_READ_BYTES); + if( !status ) { + return false; + } + //TODO: wait ? + StatusReg = ReadBuffer[1]; + if( (StatusReg & FLASH_SR_IS_READY_MASK) == 0) + return true; + //TODO: Try resetting. Uncomment next line? + //XSpi_WriteReg(XSP_SRR_OFFSET, XSP_SRR_RESET_MASK); + nanosleep(&req, 0); + delay += 5000; + } + std::cout << "Unable to get Flash Ready\n"; + return false; + +#if 0 + uint32_t StatusReg; + const timespec req = {0, 5000}; + long long delay = 0; + while (delay < 30000000000) { + StatusReg = XSpi_GetStatusReg(); + if(StatusReg & FLASH_SR_IS_READY_MASK) + return true; + //Try resetting. + XSpi_WriteReg(XSP_SRR_OFFSET, XSP_SRR_RESET_MASK); + nanosleep(&req, 0); + delay += 5000; + } + std::cout << "Unable to get Flash Ready\n"; + return false; +#endif +} + +bool XDMAShim::sectorErase(unsigned Addr) { + if(!isFlashReady()) + return false; + + if(!writeEnable()) + return false; + + if(TEST_MODE) { + std::cout << "Testing COMMAND_FLAG_STATUSREG_READ" << std::endl; + unsigned Cmd = COMMAND_FLAG_STATUSREG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + } + + uint32_t ControlReg = XSpi_GetControlReg(); + ControlReg |= XSP_CR_RXFIFO_RESET_MASK ; + ControlReg |= XSP_CR_TXFIFO_RESET_MASK; + XSpi_SetControlReg(ControlReg); + + /* + * Prepare the WriteBuffer. + */ + if(!FOUR_BYTE_ADDRESSING) { + WriteBuffer[BYTE1] = COMMAND_SECTOR_ERASE; + WriteBuffer[BYTE2] = (uint8_t) (Addr >> 16); + WriteBuffer[BYTE3] = (uint8_t) (Addr >> 8); + WriteBuffer[BYTE4] = (uint8_t) (Addr); + }else { + WriteBuffer[BYTE1] = FOUR_BYTE_SECTOR_ERASE; + WriteBuffer[BYTE2] = (uint8_t) (Addr >> 24); + WriteBuffer[BYTE3] = (uint8_t) (Addr >> 16); + WriteBuffer[BYTE4] = (uint8_t) (Addr >> 8); + WriteBuffer[BYTE5] = (uint8_t) Addr; + } + + if(!finalTransfer(WriteBuffer, NULL, SECTOR_ERASE_BYTES)) + return false; + + /* + * Wait till the Transfer is complete and check if there are any errors + * in the transaction.. + */ + if(!waitTxEmpty()) + return false; + + return true; +} + +bool XDMAShim::bulkErase() +{ + if(!isFlashReady()) + return false; + + if(!writeEnable()) + return false; + + uint32_t ControlReg = CONTROL_REG_START_STATE; + XSpi_SetControlReg(ControlReg); + + uint32_t testControlReg = XSpi_GetControlReg(); + uint32_t testStatusReg = XSpi_GetStatusReg(); + //2 + WriteBuffer[BYTE1] = COMMAND_BULK_ERASE; + + if(!finalTransfer(WriteBuffer, NULL, BULK_ERASE_BYTES)) + return false; + + return waitTxEmpty(); +} + +bool XDMAShim::writeEnable() { + uint32_t StatusReg = XSpi_GetStatusReg(); + if(StatusReg & XSP_SR_TX_FULL_MASK) { + std::cout << "Tx fifo fill during WriteEnable" << std::endl; + return false; + } + + //1 + uint32_t ControlReg = XSpi_GetControlReg(); + ControlReg |= CONTROL_REG_START_STATE; + XSpi_SetControlReg(ControlReg); + + //2 + WriteBuffer[BYTE1] = COMMAND_WRITE_ENABLE; //0x06 + + if(!finalTransfer(WriteBuffer, NULL, WRITE_ENABLE_BYTES)) + return false; + + return waitTxEmpty(); +} + +bool XDMAShim::getFlashId() +{ + + if(!isFlashReady()) { + std::cout << "Unable to get flash ready " << std::endl; + return false; + } + + bool Status = false; + /* * Prepare the Write Buffer. */ + WriteBuffer[BYTE1] = COMMAND_IDCODE_READ; + + Status = finalTransfer(WriteBuffer, ReadBuffer, IDCODE_READ_BYTES); + if( !Status ) { + return false; + } + + for (int i = 0; i < IDCODE_READ_BYTES; i++) { + std::cout << "Idcode byte[" << i << "] " << std::hex << (int)ReadBuffer[i] << std::endl; + ReadBuffer[i] = 0; + } + + unsigned ffCount = 0; + for (int i = 1; i < IDCODE_READ_BYTES; i++) { + if ((unsigned int)ReadBuffer[i] == 0xff) + ffCount++; + } + + if(ffCount == IDCODE_READ_BYTES -1) + return false; + + return true; +} + + +bool XDMAShim::finalTransfer(uint8_t *SendBufPtr, uint8_t *RecvBufPtr, int ByteCount) +{ + uint32_t ControlReg; + uint32_t StatusReg; + uint32_t Data = 0; + uint8_t DataWidth = 8; + uint32_t SlaveSelectMask = SLAVE_SELECT_MASK; + + uint32_t SlaveSelectReg = 0; + if(slave_index == 0) + SlaveSelectReg = ~0x01; + else if(slave_index == 1) + SlaveSelectReg = ~0x02; + + /* + * Enter a critical section from here to the end of the function since + * state is modified, an interrupt is enabled, and the control register + * is modified (r/m/w). + */ + + ControlReg = XSpi_GetControlReg(); + StatusReg = XSpi_GetStatusReg(); + + if(TEST_MODE) + std::cout << "Control/Status " << std::hex << ControlReg << "/" << StatusReg << std::dec << std::endl; + + + /* + * If configured as a master, be sure there is a slave select bit set + * in the slave select register. If no slaves have been selected, the + * value of the register will equal the mask. When the device is in + * loopback mode, however, no slave selects need be set. + */ + if (ControlReg & XSP_CR_MASTER_MODE_MASK) { + if ((ControlReg & XSP_CR_LOOPBACK_MASK) == 0) { + if (SlaveSelectReg == SlaveSelectMask) { + std::cout << "No slave selected" << std::endl; + return false; + } + } + } + + /* + * Set up buffer pointers. + */ + uint8_t* SendBufferPtr = SendBufPtr; + uint8_t* RecvBufferPtr = RecvBufPtr; + + //int RequestedBytes = ByteCount; + int RemainingBytes = ByteCount; + unsigned int BytesTransferred = 0; + + /* + * Fill the DTR/FIFO with as many bytes as it will take (or as many as + * we have to send). We use the tx full status bit to know if the device + * can take more data. By doing this, the driver does not need to know + * the size of the FIFO or that there even is a FIFO. The downside is + * that the status register must be read each loop iteration. + */ + StatusReg = XSpi_GetStatusReg(); + if((StatusReg & (1<<10)) != 0) { + std::cout << "status reg in error situation " << std::endl; + return false; + } + + while (((StatusReg & XSP_SR_TX_FULL_MASK) == 0) && (RemainingBytes > 0)) { + if (DataWidth == 8) { + Data = *SendBufferPtr; + } else if (DataWidth == 16) { + Data = *(uint16_t *)SendBufferPtr; + } else if (DataWidth == 32){ + Data = *(uint32_t *)SendBufferPtr; + } + + if(pcieBarWrite(BPI_FLASH_BAR, FLASH_BASE_ADDRESS + XSP_DTR_OFFSET, &Data, 4) != 0) + return false; + SendBufferPtr += (DataWidth >> 3); + RemainingBytes -= (DataWidth >> 3); + StatusReg = XSpi_GetStatusReg(); + if((StatusReg & (1<<10)) !=0) { + std::cout << "Write command caused created error" << std::endl; + return false; + } + } + + + /* + * Set the slave select register to select the device on the SPI before + * starting the transfer of data. + */ + XSpi_SetSlaveSelectReg(SlaveSelectReg); + + ControlReg = XSpi_GetControlReg(); + StatusReg = XSpi_GetStatusReg(); + + if(TEST_MODE) + std::cout << "Control/Status " << std::hex << ControlReg << "/" << StatusReg << std::dec << std::endl; + + if((StatusReg & (1<<10)) != 0) { + std::cout << "status reg in error situation: 2 " << std::endl; + return false; + } + + /* + * Start the transfer by no longer inhibiting the transmitter and + * enabling the device. For a master, this will in fact start the + * transfer, but for a slave it only prepares the device for a transfer + * that must be initiated by a master. + */ + ControlReg = XSpi_GetControlReg(); + ControlReg &= ~XSP_CR_TRANS_INHIBIT_MASK; + XSpi_SetControlReg(ControlReg); + + if(TEST_MODE) + std::cout << "Control/Status " << std::hex << ControlReg << "/" << StatusReg << std::dec << std::endl; + + + //Data transfer to actual flash has already started happening here. + + { /* Polled mode of operation */ + + // poll the status register to * Transmit/Receive SPI data. + while(ByteCount > 0) + { + + /* + * Wait for the transfer to be done by polling the + * Transmit empty status bit + */ + do { + StatusReg = XSpi_GetStatusReg(); + } while ((StatusReg & XSP_SR_TX_EMPTY_MASK) == 0); + + + //Do masking of slaves at the end as it doesnt make a difference. + //XSpi_SetSlaveSelectReg(SlaveSelectMask); + + /* + * A transmit has just completed. Process received data + * and check for more data to transmit. Always inhibit + * the transmitter while the transmit register/FIFO is + * being filled, or make sure it is stopped if we're + * done. + */ + ControlReg = XSpi_GetControlReg(); + XSpi_SetControlReg(ControlReg | XSP_CR_TRANS_INHIBIT_MASK); + + ControlReg = XSpi_GetControlReg(); + + if(TEST_MODE) + std::cout << "Control/Status " << std::hex << ControlReg << "/" << StatusReg << std::dec << std::endl; + + /* + * First get the data received as a result of the + * transmit that just completed. We get all the data + * available by reading the status register to determine + * when the Receive register/FIFO is empty. Always get + * the received data, but only fill the receive + * buffer if it points to something (the upper layer + * software may not care to receive data). + */ + StatusReg = XSpi_GetStatusReg(); + + while ((StatusReg & XSP_SR_RX_EMPTY_MASK) == 0) + { + //read the data. + if(pcieBarRead(BPI_FLASH_BAR, FLASH_BASE_ADDRESS + XSP_DRR_OFFSET, &Data, 4) != 0) + return false; + + if (DataWidth == 8) { + if(RecvBufferPtr != NULL) { + *RecvBufferPtr++ = (uint8_t)Data; + } + } else if (DataWidth == 16) { + if (RecvBufferPtr != NULL){ + *(uint16_t *)RecvBufferPtr = (uint16_t)Data; + RecvBufferPtr += 2; + } + } else if (DataWidth == 32) { + if (RecvBufferPtr != NULL){ + *(uint32_t *)RecvBufferPtr = Data; + RecvBufferPtr += 4; + } + } + + BytesTransferred += (DataWidth >> 3); + ByteCount -= (DataWidth >> 3); + StatusReg = XSpi_GetStatusReg(); + if((StatusReg & (1<<10)) != 0) { + std::cout << "status reg in error situation " << std::endl; + return false; + } + } + + //If there are still unwritten bytes, then finishing writing (below code) + //and reading (above code) them. + if (RemainingBytes > 0) { + + /* + * Fill the DTR/FIFO with as many bytes as it + * will take (or as many as we have to send). + * We use the Tx full status bit to know if the + * device can take more data. + * By doing this, the driver does not need to + * know the size of the FIFO or that there even + * is a FIFO. + * The downside is that the status must be read + * each loop iteration. + */ + StatusReg = XSpi_GetStatusReg(); + + while(((StatusReg & XSP_SR_TX_FULL_MASK)== 0) && (RemainingBytes > 0)) + { + if (DataWidth == 8) { + Data = *SendBufferPtr; + } else if (DataWidth == 16) { + Data = *(uint16_t *)SendBufferPtr; + } else if (DataWidth == 32) { + Data = *(uint32_t *)SendBufferPtr; + } + + if(pcieBarWrite(BPI_FLASH_BAR, FLASH_BASE_ADDRESS + XSP_DTR_OFFSET, &Data, 4) != 0) + return false; + + SendBufferPtr += (DataWidth >> 3); + RemainingBytes -= (DataWidth >> 3); + StatusReg = XSpi_GetStatusReg(); + if((StatusReg & (1<<10)) != 0) { + std::cout << "status reg in error situation " << std::endl; + return false; + } + } + + //Start the transfer by not inhibiting the transmitter any longer. + ControlReg = XSpi_GetControlReg(); + ControlReg &= ~XSP_CR_TRANS_INHIBIT_MASK; + XSpi_SetControlReg(ControlReg); + } + } + + //Stop the transfer by inhibiting * the transmitter. + ControlReg = XSpi_GetControlReg(); + XSpi_SetControlReg(ControlReg | XSP_CR_TRANS_INHIBIT_MASK); + + /* + * Deassert the slaves on the SPI bus when the transfer is complete, + */ + XSpi_SetSlaveSelectReg(SlaveSelectMask); + } + + return true; +} + + +bool XDMAShim::writePage(unsigned Addr, uint8_t writeCmd) +{ + if(!isFlashReady()) + return false; + + /* + { + //debug + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + uint8_t Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + if(!isFlashReady()) + return false; + }*/ + + if(!writeEnable()) + return false; + + unsigned bkupAddr = Addr; + + //1 : reset Tx and Rx FIFO's + uint32_t ControlReg = CONTROL_REG_START_STATE; +// uint32_t ControlReg = XSpi_GetControlReg(); +// ControlReg |= XSP_CR_RXFIFO_RESET_MASK ; +// ControlReg |= XSP_CR_TXFIFO_RESET_MASK; + XSpi_SetControlReg(ControlReg); + + uint8_t WriteCmd = writeCmd; + //2 + if(!FOUR_BYTE_ADDRESSING) { + if(writeCmd == 0xff) + WriteCmd = COMMAND_QUAD_WRITE; + bkupAddr &= 0x00ffffff; // truncate to 24 bits + //3 byte address mode + //COMMAND_PAGE_PROGRAM gives out all FF's + //COMMAND_EXT_QUAD_WRITE: hangs the system + WriteBuffer[BYTE1] = WriteCmd; + WriteBuffer[BYTE2] = (uint8_t) (bkupAddr >> 16); + WriteBuffer[BYTE3] = (uint8_t) (bkupAddr >> 8); + WriteBuffer[BYTE4] = (uint8_t) bkupAddr; + }else { + if(writeCmd == 0xff) + WriteBuffer[BYTE1] = FOUR_BYTE_QUAD_INPUT_FAST_PROGRAM; + WriteBuffer[BYTE2] = (uint8_t) (bkupAddr >> 24); + WriteBuffer[BYTE3] = (uint8_t) (bkupAddr >> 16); + WriteBuffer[BYTE4] = (uint8_t) (bkupAddr >> 8); + WriteBuffer[BYTE5] = (uint8_t) bkupAddr; + } + + bkupAddr = Addr; + //The data to write is already filled up, so now just write the buffer. + + if(!finalTransfer(WriteBuffer, ReadBuffer, WRITE_DATA_SIZE + READ_WRITE_EXTRA_BYTES)) + return false; + + if(!waitTxEmpty()) + return false; + + + return true; + +} + +bool XDMAShim::readPage(unsigned Addr, uint8_t readCmd) +{ + if(!isFlashReady()) + return false; + + /* + { + //debug + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + uint8_t Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + if(!isFlashReady()) + return false; + clearBuffer(); + }*/ + + unsigned bkupAddr = Addr; + //-- + uint32_t ControlReg = CONTROL_REG_START_STATE; +// uint32_t ControlReg = XSpi_GetControlReg(); +// ControlReg |= XSP_CR_RXFIFO_RESET_MASK ; +// ControlReg |= XSP_CR_TXFIFO_RESET_MASK; + XSpi_SetControlReg(ControlReg); + + //1 : reset TX/RX FIFO's + uint8_t ReadCmd = readCmd; + + //uint8_t ReadCmd = COMMAND_RANDOM_READ; + if(!FOUR_BYTE_ADDRESSING) { + //3 byte addressing mode + if(readCmd == 0xff) + ReadCmd = COMMAND_QUAD_READ; + bkupAddr &= 0x00ffffff; // truncate to 24 bits + //3 byte address mode + WriteBuffer[BYTE1] = ReadCmd; + WriteBuffer[BYTE2] = (uint8_t) (bkupAddr >> 16); + WriteBuffer[BYTE3] = (uint8_t) (bkupAddr >> 8); + WriteBuffer[BYTE4] = (uint8_t) bkupAddr; + }else { + if(readCmd == 0xff) + ReadCmd = FOUR_BYTE_READ; + WriteBuffer[BYTE1] = ReadCmd; + WriteBuffer[BYTE2] = (uint8_t) (bkupAddr >> 24); + WriteBuffer[BYTE3] = (uint8_t) (bkupAddr >> 16); + WriteBuffer[BYTE4] = (uint8_t) (bkupAddr >> 8); + WriteBuffer[BYTE5] = (uint8_t) bkupAddr; + } + + bkupAddr = Addr; + + + unsigned ByteCount = READ_DATA_SIZE; + + if (ReadCmd == COMMAND_DUAL_READ) { + ByteCount += DUAL_READ_DUMMY_BYTES; + } else if (ReadCmd == COMMAND_DUAL_IO_READ) { + ByteCount += DUAL_READ_DUMMY_BYTES; + } else if (ReadCmd == COMMAND_QUAD_IO_READ) { + ByteCount += QUAD_IO_READ_DUMMY_BYTES; + } else if ( (ReadCmd==COMMAND_QUAD_READ) || (ReadCmd==FOUR_BYTE_QUAD_OUTPUT_FAST_READ)) { + ByteCount += QUAD_READ_DUMMY_BYTES; + } + + //Clear the read buffer +// for(unsigned int i = 0; i < ByteCount + READ_WRITE_EXTRA_BYTES; ++i) { +// ReadBuffer[i] = 0; +// } + + if(!finalTransfer(WriteBuffer, ReadBuffer, ByteCount + READ_WRITE_EXTRA_BYTES)) + return false; + + if(!waitTxEmpty()) + return false; + + //reset the RXFIFO bit so. + ControlReg = XSpi_GetControlReg(); + ControlReg |= XSP_CR_RXFIFO_RESET_MASK ; + XSpi_SetControlReg(ControlReg); + + return true; + +} + +bool XDMAShim::prepareXSpi() +{ + if(TEST_MODE) + return true; + + + uint32_t tControlReg = XSpi_GetControlReg(); + uint32_t tStatusReg = XSpi_GetStatusReg(); + +#if defined(_debug) + std::cout << "Boot Control/Status " << std::hex << tControlReg << "/" << tStatusReg << std::dec << std::endl; +#endif + + uint32_t ControlReg = CONTROL_REG_START_STATE; + XSpi_SetControlReg(ControlReg); + + tControlReg = XSpi_GetControlReg(); + tStatusReg = XSpi_GetStatusReg(); + +#if defined(_debug) + std::cout << "After setting start state, Control/Status " << std::hex << tControlReg << "/" << tStatusReg << std::dec << std::endl; +#endif + //-- + + if(!getFlashId()) { + std::cout << "Exiting now, as could not get correct idcode" << std::endl; + exit(0); + return false; + } + + //WriteEnable writes CONTROL_REG_START_STATE - that should be enough for initial configuration ? + //if(!writeEnable()) + //return false; + + //Bulk erase the flash. + //if(!bulkErase()) + //return false; + + return true; +} + +int XDMAShim::programXSpi(std::ifstream& mcsStream, const ELARecord& record) { + if (mLogStream.is_open()) { + mLogStream << __func__ << ", " << std::this_thread::get_id() << std::endl; + } + + //TODO: decrease the sleep time. + const timespec req = {0, 20000}; + +#if defined(_debug) + std::cout << "Programming block (" << std::hex << record.mStartAddress << ", " << record.mEndAddress << std::dec << ")" << std::endl; +#endif + + assert(mcsStream.tellg() < record.mDataPos); + mcsStream.seekg(record.mDataPos, std::ifstream::beg); + unsigned char* buffer = &WriteBuffer[READ_WRITE_EXTRA_BYTES]; + int bufferIndex = 0; + int pageIndex = 0; + std::string prevLine(""); + for (unsigned index = record.mDataCount; index > 0;) { + std::string line; + std::getline(mcsStream, line); + if(TEST_MODE) + std::cout << line << std::endl; + const unsigned dataLen = std::stoi(line.substr(1, 2), 0 , 16); + index -= dataLen; + const unsigned recordType = std::stoi(line.substr(7, 2), 0 , 16); + if (recordType != 0x00) { + continue; + } + const std::string data = line.substr(9, dataLen * 2); + // Write in byte swapped order + for (unsigned i = 0; i < data.length(); i += 2) { + unsigned value = std::stoi(data.substr(i, 2), 0, 16); + buffer[bufferIndex++] = (unsigned char)value; + assert(bufferIndex <= WRITE_DATA_SIZE); + +#if 0 + //To enable byte swapping uncomment this. +// if ((bufferIndex % 4) == 0) { +// bufferIndex += 4; +// } +// assert(bufferIndex <= WRITE_DATA_SIZE); +// unsigned value = std::stoi(data.substr(i, 2), 0, 16); +// if(TEST_MODE) +// std::cout << data.substr(i, 2); +// buffer[--bufferIndex] = (unsigned char)value; +// if ((bufferIndex % 4) == 0) { +// bufferIndex += 4; +// } +#endif + if (bufferIndex == WRITE_DATA_SIZE) { + break; + } + } + + if(TEST_MODE) + std::cout << std::endl; + +#if 0 + //Uncomment if byte swapping enabled. + + //account for the last line + //which can have say 14 bytes instead of 16 + if((bufferIndex %4)!= 0) { + while ((bufferIndex %4)!= 0) { + unsigned char fillValue = 0xFF; + buffer[--bufferIndex] = fillValue; + } + bufferIndex += 4; + } + + assert((bufferIndex % 4) == 0); +#endif + + assert(bufferIndex <= WRITE_DATA_SIZE); + if (bufferIndex == WRITE_DATA_SIZE) { +#if defined(_debug) + std::cout << "writing page " << pageIndex << std::endl; +#endif + const unsigned address = std::stoi(line.substr(3, 4), 0, 16); + assert ( (address + dataLen) == (pageIndex +1)*WRITE_DATA_SIZE); + if(TEST_MODE) { + std::cout << (address + dataLen) << " " << (pageIndex +1)*WRITE_DATA_SIZE << std::endl; + std::cout << record.mStartAddress << " " << record.mStartAddress + pageIndex*PAGE_SIZE; + std::cout << " " << address << std::endl; + } else + { + if(!writePage(record.mStartAddress + pageIndex*WRITE_DATA_SIZE)) + return -1; + clearBuffers(); + { + //debug stuff +#if defined(_debug) + if(pageIndex == 0) { + if(!readPage(record.mStartAddress + pageIndex*WRITE_DATA_SIZE)) + return -1; + clearBuffers(); + } +#endif + } + } + pageIndex++; + nanosleep(&req, 0); + bufferIndex = 0; + } + prevLine = line; + + } + if (bufferIndex) { + //Write the last page + if(TEST_MODE) { + std::cout << "writing final page " << pageIndex << std::endl; + std::cout << bufferIndex << std::endl; + std::cout << prevLine << std::endl; + } + + const unsigned address = std::stoi(prevLine.substr(3, 4), 0, 16); + const unsigned dataLen = std::stoi(prevLine.substr(1, 2), 0 , 16); + + if(TEST_MODE) + std::cout << address % WRITE_DATA_SIZE << " " << dataLen << std::endl; + + //assert( (address % WRITE_DATA_SIZE + dataLen) == bufferIndex); + + if(!TEST_MODE) { + + //Fill unused half page to FF + for(unsigned i = bufferIndex; i < WRITE_DATA_SIZE; ++i) { + buffer[i] = 0xff; + } + + if(!writePage(record.mStartAddress + pageIndex*WRITE_DATA_SIZE)) + return -1; + nanosleep(&req, 0); + clearBuffers(); + { + //debug stuff +#if defined(_debug) + if(!readPage(record.mStartAddress + pageIndex*WRITE_DATA_SIZE)) + return -1; + clearBuffers(); +#endif + } + } + } + return 0; +} + +int XDMAShim::programXSpi(std::ifstream& mcsStream) +{ +// for (ELARecordList::iterator i = mRecordList.begin(), e = mRecordList.end(); i != e; ++i) { +// i->mStartAddress <<= 16; +// i->mEndAddress += i->mStartAddress; +// // Convert from 2 bytes address to 4 bytes address +// i->mStartAddress /= 2; +// i->mEndAddress /= 2; +// } + + if (!prepareXSpi()) { + std::cout << "ERROR: Unable to prepare the XSpi\n"; + return -1; + } + + //if(!bulkErase()) + //return false; + + const timespec req = {0, 20000}; + nanosleep(&req, 0); + + unsigned current_sector = -1; + std::vector erased_sectors; + erased_sectors.reserve(4); + for(int i =0; i < 4; ++i) + erased_sectors.push_back(false); + + int beatCount = 0; + for (ELARecordList::iterator i = mRecordList.begin(), e = mRecordList.end(); i != e; ++i) + { + beatCount++; + if(beatCount%20==0) { + std::cout << "." << std::flush; + } + + i->mStartAddress <<= 16; + + unsigned sector = getSector(i->mStartAddress); + bool valid_sector = false; + if ( (sector == 0) || (sector == 1) || (sector == 2) || (sector == 3) ) + valid_sector = true; + if(!valid_sector) { + std::cout << "Invalid sector encountered" << std::endl; + return -1; + } + + //Remove the sector determinant half byte. + i->mStartAddress &= 0xFFFFFF; + i->mEndAddress += i->mStartAddress; + + if(TEST_MODE) { + std::cout << "INFO: Start address 0x" << std::hex << mRecordList.front().mStartAddress << std::dec << "\n"; + std::cout << "INFO: End address 0x" << std::hex << mRecordList.back().mEndAddress << std::dec << "\n"; + } + + if(current_sector != sector) { + //Issue sector select + if(!writeRegister(COMMAND_EXTENDED_ADDRESS_REG_WRITE, sector, 1)) + return false; + current_sector = sector; + } + + { + //debug +#if defined(_debug) + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + uint8_t Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + if(!isFlashReady()) + return false; +#endif + } + + //Erase the sector if not already erased. + if(!erased_sectors.at(current_sector)) { + //Use addr 0 to erase the sector. + unsigned Addr = 0; + + //Erase the entire segment. Each segment is 128 Mb (bits). + //Each sector is 64KB (bytes). So total 256 sectors in a segment. + for(int i = 0; i < 256 ; ++i) { + if(!sectorErase(Addr)) { + return false; + } + Addr+= 0x10000; + nanosleep(&req, 0); + } + + Addr = 0; + if(!readPage(Addr)) + return false; + erased_sectors.at(sector)=true; + } + + { + //debug +#if defined(_debug) + std::cout << "Testing COMMAND_EXTENDED_ADDRESS_REG_READ" << std::endl; + uint8_t Cmd = COMMAND_EXTENDED_ADDRESS_REG_READ; + readRegister(Cmd, STATUS_READ_BYTES); + if(!isFlashReady()) + return false; +#endif + } + + bool ready = isFlashReady(); + if(!ready){ + std::cout << "Unable to get flash ready" << std::endl; + return false; + } + + clearBuffers(); + + if (programXSpi(mcsStream, *i)) { + std::cout << "ERROR: Could not programXSpi the block\n"; + return -1; + } + nanosleep(&req, 0); + } + std::cout << std::endl; + return 0; +} + +bool XDMAShim::readRegister(unsigned commandCode, unsigned bytes) { + + if(!isFlashReady()) + return false; + + bool Status = false; + + WriteBuffer[BYTE1] = commandCode; + + Status = finalTransfer(WriteBuffer, ReadBuffer, bytes); + + if( !Status ) { + return false; + } + +#if defined(_debug) + std::cout << "Printing output (with some extra bytes of readRegister cmd)" << std::endl; +#endif + + for(unsigned i = 0; i < 5; ++ i) //Some extra bytes, no harm + { +#if defined(_debug) + std::cout << i << " " << std::hex << (int)ReadBuffer[i] << std::dec << std::endl; +#endif + ReadBuffer[i] = 0; //clear + } + //Reset the FIFO bit. + uint32_t ControlReg = XSpi_GetControlReg(); + ControlReg |= XSP_CR_RXFIFO_RESET_MASK ; + ControlReg |= XSP_CR_TXFIFO_RESET_MASK ; + XSpi_SetControlReg(ControlReg); + + return Status; +} + +//max 16 bits for nonvolative cfg register. +//If extra_bytes == 0, then only the command is sent. +bool XDMAShim::writeRegister(unsigned commandCode, unsigned value, unsigned extra_bytes) { + if(!isFlashReady()) + return false; + + if(!writeEnable()) + return false; + + uint32_t ControlReg = XSpi_GetControlReg(); + ControlReg |= XSP_CR_TXFIFO_RESET_MASK; + ControlReg |= XSP_CR_RXFIFO_RESET_MASK; + XSpi_SetControlReg(ControlReg); + + bool Status = false; + + WriteBuffer[BYTE1] = commandCode; + + if(extra_bytes == 0) { + //do nothing + } else if(extra_bytes == 1) + WriteBuffer[BYTE2] = (uint8_t) (value); + else if(extra_bytes == 2) { + WriteBuffer[BYTE2] = (uint8_t) (value >> 8); + WriteBuffer[BYTE3] = (uint8_t) value; + }else { + std::cout << "ERROR: Setting more than 2 bytes" << std::endl; + assert(0); + } + + //+1 for cmd byte. + Status = finalTransfer(WriteBuffer,NULL, extra_bytes+1); + if(!Status) + return false; + + if(!waitTxEmpty()) + return false; + + return Status; +} + + +} //end namespace From 646a246ef4bc9e30616dc28a35fc8682c2e4d6cc Mon Sep 17 00:00:00 2001 From: AWScpettey Date: Sun, 1 Jan 2017 10:30:35 -0600 Subject: [PATCH 08/29] Update AWS_Shell_Interface_Specification.md --- hdk/docs/AWS_Shell_Interface_Specification.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hdk/docs/AWS_Shell_Interface_Specification.md b/hdk/docs/AWS_Shell_Interface_Specification.md index 488ab5afb..2f1d656b3 100644 --- a/hdk/docs/AWS_Shell_Interface_Specification.md +++ b/hdk/docs/AWS_Shell_Interface_Specification.md @@ -289,9 +289,9 @@ Some signals must include the PCIe IDs of the CL. A Developer’s specific PCIe - cl_sh_id1 - - [15:0] – Subsystem ID + - [15:0] – Subsystem Vendor ID - - [31:16] – Subsystem Vendor ID + - [31:16] – Subsystem ID ### General control/status From 19c3c3c1bac21745ec12bb78c871c732c18855f7 Mon Sep 17 00:00:00 2001 From: AWScpettey Date: Mon, 2 Jan 2017 14:36:12 -0600 Subject: [PATCH 09/29] Update FAQs.md --- FAQs.md | 267 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 267 insertions(+) diff --git a/FAQs.md b/FAQs.md index 8e1d5d35b..1e4b2032e 100644 --- a/FAQs.md +++ b/FAQs.md @@ -122,3 +122,270 @@ elapsed = 00:08:59 . Memory (MB): peak = 4032.184 ; gain = 3031.297 ; free physi /opt/Xilinx/Vivado/2016.3/bin/loader: line 164: 8160 Killed "$RDI_PROG" "$@" Parent process (pid 8160) has died. This helper process will now exit +**Can I bring my own bitstream for loading on an F1 FPGA?** + +No. There is no mechanism for loading a bitstream directly onto the +FPGAs of an F1 instance. All Custom Logic bitstreams are loaded onto the +FPGA by AWS. Developers create an AFI by creating a Vivado Design +Checkpoint (DCP) and submitting that DCP to AWS. AWS creates the final +AFI and bitstream from that DCP and returns an AFI ID for referencing +that AFI. + +**Do I need to interface to the AWS Shell?** + +Yes. The only interface to PCIe and the instance CPU is through the AWS +shell. The AWS Shell is included in all F1 FPGAs. There is no option to +run the F1 FPGA without the Shell. + +**Can I generate my bitstream locally?** + +Yes, local tools can be used to develop the DCP needed for creating an +AFI. The HDK can be downloaded from GitHub and run on any local machine. +If a Developer uses local tools, the exact tool version specified in the +HDK and FPGA Developer AMI will need to be used. Note that AWS does not +provide support for generating a bitstream and testing that bitstream. + +**Do I need to get a Xilinx license to generate an AFI?** + +No, if the Developer uses the FPGA Developer AMI, Xilinx licenses for +simulation and DCP generation are included. Ingestion of a DCP to +generate an AFI is handled by AWS. No license is needed for DCP to AFI +generation. If a local machine is used for development, the Developer is +responsible for obtaining any necessary licenses. AWS only directly +support cloud development using the AWS Developer AMI. + +**Does AWS provide local development boards?** + +No. AWS supports a cloud-only development model and provides the +necessary elements for doing 100% cloud development. No development +board is provided for on-premise development. + +**Which HDL languages are supported?** + +Verilog and HVDL are both supported in the FPGA Developer AMI and in +generating a DCP. The Xilinx Vivado tools and simulator support mixed +mode simulation of Verilog and VHDL. The AWS Shell is written in +Verilog. Support for mixed mode simulation may vary if Developers use +other simulators. Check your simulator documentation for +Verilog/VHDL/System Verilog support. + +**What simulators are supported?** + +The FPGA Developer AMI has built-in support for the Xilinx XSIM +simulator. All licensing and software for XSIM is included in the +Developer AMI when launched. Support for other simulators is included +through the bring-your-own license in the license manager for the +Developer AMI. AWS tests the HDK with Synopsys VCS, Mentor +Questa/ModelSim, and Cadence Incisive. Licenses for these simulators +must be acquired by the Developer. + +**Is OpenCL and/or SDAccel Supported?** + +Yes. OpenCL is supported through either the Xilinx SDAccel tool or any +SDAccel tool capable of generating RTL supported by the Xilinx Vivado +synthesis tool. There is a branch in the AWS SDK tree for SDAccel. Note +that during the Preview period, SDAccel may not be available. + +**Can I use High Level Synthesis (HLS) Tools to generate an AFI?** + +Yes. Vivado HLS and SDAccel are directly supported through the FPGA +Developer AMI. Any HLS tool that generates compatible Verilog or VHDL +for Vivado input can also be used for writing in HLS. + +**Do I need to design for a specific power envelope?** + +Yes, the design scripts provided in the HDK include checks for power +consumption that exceeds the allocated power for the Custom Logic +region. Developers do not need to include design considerations for +DRAM, Shell, or Thermal. AWS includes the design considerations for +those as part of providing the power envelop for the CL region. + +**Is a simulation model of the AWS Shell available?** + +Yes. The HDK includes a simulation model for the AWS shell. See the +HDK/common tree for more information on the Shell simulation model. + +**What example CL designs are provided in the HDK?** + +There are two example designs provided in the HDK. There is a +hello\_world example that accepts reads and writes from an F1 instance. +There is a cl\_simple example that expands on hello\_world by adding +traffic generation to DRAM. Both examples are found in the +hdk/cl/examples directory. + +**What resources within the FPGA does the AWS Shell consume?** + +The Shell consumes 20% of the F1 FPGA resources. The nature of partial +reconfiguration consumes all resources (BRAM, URAM, Logic Elements, DSP, +etc) in the partition allocated for the AWS Shell. No modifications to +the Shell or the partition pins between the Shell and the CL are +possible by the Developer. + +**What IP blocks are provided in the HDK?** + +The HDK includes IP for the Shell and DDR controllers. Inside the Shell, +there is a PCIe interface, the Xilinx XDMA Engine, and one DDR +controller. These blocks are only accessible via the AXI interfaces +defined by the Shell interface. There are IP blocks for DDR controllers, +enabling up to 3 additional DDR interfaces instantiated by the Developer +in the CL region. Future versions of the HDK will include IP for the +FPGA Link interface. + +**Can I use other IP blocks from Xilinx or other 3^rd^ parties?** + +Yes. Developers are free to use any IP blocks within the CL region that +can be utilized by Vivado to create a Partial Reconfiguration region. +Note that AWS does not provide direct support for IP blocks not +contained in the HDK. + +**What OS can run on the F1 instance?** + +Amazon Linux is supported directly on F1. Developers can utilize the +source code in the SDK directory to compile other variants of Linux for +use on F1. Windows is not supported on F1. + +**What support exists for host DMA?** + +There are two mechanisms for host DMA between the instance CPU and the +FPGA. The first is the Xilinx XDMA engine. This engine is included in +the AWS Shell and programmed through address space in a Physical +Function directly mapped to the instance. There are dedicated AXI +interfaces for data movement between the CL and the XDMA in the Shell. +The second is the capability for Developers to create their own DMA +engine in the CL region. Developers can create any DMA structure using +the CL to Shell AXI master interface. Interrupt support is through +MSI-X. See the Shell\_Interface document in HDK/docs for detailed +information. + +**What is the API for the host CPU to the FPGA?** + +There are two types of interface from the host (instance) CPU to the +FPGA. The first is the API for FPGA Image Management Tools. This API is +detailed in the SDK portion of the GitHub repository. FPGA Image +Management tools include APIs to load, clear, and get status of the +FPGA. The second type of interface is direct address access to the +Physical Functions (PF) of the FPGA. There is no API for this access. +Rather, there is direct access to resources in the CL region or Shell +that can be accessed by software written on the instance. For example, +the Chipscope software uses address space in a PF to provide debug +support in the FPGA. Developers can create any API to the resources in +their CL that is needed. See the Shell\_Interface specification for more +details on the PF mapping. + +**Is the FPGA a kernel or user space interface in the instance?** + +The address space in the FPGA can be interfaced via user space. + +**How do I change what AFI is loaded in an FPGA?** + +Changing the AFI loaded in an FPGA is done using the +fpga-load-local-image API from the FGPA Image Management tools. This +command takes the AFI ID and requests it to be programmed into the +identified FPGA. The AWS infrastructure manages the actual bitstream and +programming of the FPGA using Partial Reconfiguration. The AFI bitstream +is not stored in the F1 instance or AMI. The bitstream can’t be read or +modified within the FPGA by the instance. A users may call +fpga-load-local-image at any time during the life of an instance, and +may call fpga-load-local-image any number of times. + +**Will FPGA state be scrubbed?** + +Yes. The AWS infrastructure scrubs FPGA state on termination of an F1 +instance and any reuse of the FPGA hardware. Scrubbing includes both +FPGA internal state and the contents of DRAM attached to the FPGA. +Additionally, users can call the fpga-clear-local-image command from the +FPGA Image Management tools to force a clear of FPGA and DRAM contents +while the instance is running. + +**What does publishing to AWS Marketplace enable?** + +Publishing an AFI and AMI for F1 to AWS Marketplace enables Developers +to sell their AFI/AMI combination through the AWS Marketplace. Once in +Marketplace, customers can launch an F1 instance with that AFI/AMI +combination directly and be billed directly for the use of the instance +and AFI/AMI. Contact AWS Marketplace for more details on becoming an AWS +Marketplace seller. + +**How do the FPGAs connect to the Xeon CPU?** + +Each FPGA in F1 is connected via a x16 Gen3 PCIe interface. Physcial +Functions (PF) within the FPGA are directly mapped into the F1 instance. +Software on the instance can directly access the address in the PF to +take advantage of the high performance PCIe interface. + +**What network performance is available on F1?** + +F1 supports 20Gbps Networking using the AWS ENA interface. + +**Can the FPGAs on F1 directly access Amazon’s network?** + +No. The FPGAs do not have direct access to the network. The FPGAs +communicate via PCIe to the host CPU, where the ENA drivers are run. ENA +provides a high-performance, low-latency network interface suitable for +data movement to the F1 instance. See the AWS ENA driver documentation +for more details. + +**Can the FPGAs on F1 directly access the disks in the instance?** + +No. The FPGAs do not have direct access to the disks on F1. The disks on +F1 are high-performance, NVMe SSD devices. The interface to the host CPU +on the instance is high-performance and low-latency with NVMe. + +**What is FPGA direct and how fast is it?** + +FPGA direct is FPGA to FPGA peer communication through the PCIe links on +each FPGA. The BAR space in the Application PF (see Shell Interface +specification for more details) allows the Developer to map regions of +the CL (such as DDR space) to other FPGAs. The Developer can create +software to DMA data between FPGAs directly, without using Instance +memory as a buffer. The implementation of communication across the PCIe +interface using FPGA direct is left to the Developer. + +**What is FPGA link and how fast is it? ** + +FPGA Link is based on 4 x 100Gbps links on each FPGA card. The FPGA Link +is organized as a ring, with 2 x 100Gbps links to each adjacent card. +This enables each FPGA card to send/receive data from an adjacent card +at 200Gbps. Details on the FPGA Link interface are provided in the Shell +Interface specification when available. + +**What protocol is used for FPGA link?** + +There is no transport protocol for the FPGA link. It is a data streaming +interface. Details on the shell interface to the FPGA Link IP blocks are +provided in the Shell Interface specification when available. + +**What clock speed does the FPGA utilize?** + +The FPGA provides a 250MHz clock from the Shell to the CL region. The +AXI interfaces to the Shell are synchronous to that clock. Developers +can create an ansynchronous interface to the AXI busses and run their CL +region at any clock frequency needed. Clocks can be created in the CL +region using the Xilinx clock generation modules. See the Shell +Interface specification for more details. + +**What FPGA debug capabilities are supported?** + +There are two debug capabilities supported in F1 for FPGA debug. The +first is the use of Xilinx Chipscope. Xilinx Chipscope is natively +supported on F1 by running the FPGA Developer AMI on the F1 instance to +be debugged. The Chipscope software is included in the Developer AMI. +Not that Chipscope in the F1 instance uses a memory-mapped interface to +communicate with the FPGA. The JTAG/ICAP interface is not available to +the F1 instance. The second is the use metrics available through the +FPGA Image Management tools. The fpga-describe-local-image command +allows the F1 instance to query metrics from the Shell and Shell to CL +interface. See Shell Interface specification and FPGA Image Management +tools for more information on supported metrics. + +**What FPGA is used?** + +The FPGA for F1 is the Xilinx Ultrascale+ VU9P device in the -2 speed +grade. The HDK scripts have the compile scripts needed for the VU9P +device. + +**What memory is attached to the FPGA?** + +Each FPGA on F1 has 4 x DDR4 2400 interfaces at 72bits wide (64bit +data). Each DDR interface has 16GB of DRAM attached. This yields 64GB of +total DDR memory local to each FPGA on F1. From 05a2a3c048da596b7d3832db172b4ede2c6f4064 Mon Sep 17 00:00:00 2001 From: AWScpettey Date: Mon, 2 Jan 2017 14:37:41 -0600 Subject: [PATCH 10/29] Update FAQs.md --- FAQs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FAQs.md b/FAQs.md index 1e4b2032e..ab283c218 100644 --- a/FAQs.md +++ b/FAQs.md @@ -341,7 +341,7 @@ software to DMA data between FPGAs directly, without using Instance memory as a buffer. The implementation of communication across the PCIe interface using FPGA direct is left to the Developer. -**What is FPGA link and how fast is it? ** +**What protocol is used for FPGA link?** FPGA Link is based on 4 x 100Gbps links on each FPGA card. The FPGA Link is organized as a ring, with 2 x 100Gbps links to each adjacent card. From 954a81c501d74a4021f9b62664ddd927ed584182 Mon Sep 17 00:00:00 2001 From: AWScpettey Date: Mon, 2 Jan 2017 14:38:38 -0600 Subject: [PATCH 11/29] Update FAQs.md --- FAQs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FAQs.md b/FAQs.md index ab283c218..b6315ceef 100644 --- a/FAQs.md +++ b/FAQs.md @@ -341,7 +341,7 @@ software to DMA data between FPGAs directly, without using Instance memory as a buffer. The implementation of communication across the PCIe interface using FPGA direct is left to the Developer. -**What protocol is used for FPGA link?** +**What is FPGA Link and how fast is it?** FPGA Link is based on 4 x 100Gbps links on each FPGA card. The FPGA Link is organized as a ring, with 2 x 100Gbps links to each adjacent card. From 4e8c5987a79a987e87dca2107ae9534c95bd2b95 Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Wed, 4 Jan 2017 09:18:24 -0800 Subject: [PATCH 12/29] README.md Change-Id: I38d77604e0a62098a15b2b774fcad8cac0f7300e --- README.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 486a5e4de..8f97f4539 100644 --- a/README.md +++ b/README.md @@ -49,15 +49,15 @@ We recommend that you initiate the generation in a way that prevents interruptio For example, if working on a remote machine, we recommend using window management tools such as [`screen`](https://www.gnu.org/software/screen/manual/screen.html) to mitigate potential network disconnects. ``` -$ git clone https://github.com/aws/aws-fpga # Step 1: Download the HDK and SDK code -$ cd aws-fpga # Step 2: Move to the root directory -$ source hdk_setup.sh # Step 3: Set up the HDK environment variables -$ cd hdk/cl/examples/cl_simple # Step 4: Change directory to one of the provided examples -$ export CL_DIR=$(pwd) # Step 5: Define this directory as the root for the CL design -$ cd build/scripts # Step 6: The build directory for synthesizing, placement, timing etc -$ source aws_build_dcp_from_cl.sh # Step 7: Generate a placed-and-routed design checkpoint (DCP) -$ cd $CL_DIR/build/checkpoints/to_aws # Step 8: This directory includes the DCP file -$ aws s3 mb s3:// # Step 9: Create an S3 bucket (choose a unique bucket name) +$ git clone https://github.com/aws/aws-fpga # Step 1: Download the HDK and SDK code +$ cd aws-fpga # Step 2: Move to the root directory +$ source hdk_setup.sh # Step 3: Set up the HDK environment variables +$ cd hdk/cl/examples/cl_simple # Step 4: Change directory to one of the provided examples +$ export CL_DIR=$(pwd) # Step 5: Define this directory as the root for the CL design +$ cd build/scripts # Step 6: The build directory for synthesizing, placement, timing etc +$ source aws_build_dcp_from_cl.sh # Step 7: Generate a placed-and-routed design checkpoint (DCP) +$ cd $CL_DIR/build/checkpoints/to_aws # Step 8: This directory includes the DCP file +$ aws s3 mb s3:// # Step 9: Create an S3 bucket (choose a unique bucket name) $ aws s3 cp *.SH_CL_routed.dcp \ # Step 10: Upload the DCP file to S3 s3:///cl_simple.dcp $ aws ec2 create-fpga-image \ # Step 11: Ingest the generated DCP to create an AFI From 05e66522d1881131ab44ad551d106075e6c4aa6a Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Wed, 4 Jan 2017 09:21:10 -0800 Subject: [PATCH 13/29] Minor edit. Change-Id: I34122b70020f664fbdefbad390a4d0581df557b6 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8f97f4539..94e7aef88 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ $ export CL_DIR=$(pwd) # Step 5: Define this directory $ cd build/scripts # Step 6: The build directory for synthesizing, placement, timing etc $ source aws_build_dcp_from_cl.sh # Step 7: Generate a placed-and-routed design checkpoint (DCP) $ cd $CL_DIR/build/checkpoints/to_aws # Step 8: This directory includes the DCP file -$ aws s3 mb s3:// # Step 9: Create an S3 bucket (choose a unique bucket name) +$ aws s3 mb s3:// # Step 9: Create an S3 bucket (choose a unique bucket name) $ aws s3 cp *.SH_CL_routed.dcp \ # Step 10: Upload the DCP file to S3 s3:///cl_simple.dcp $ aws ec2 create-fpga-image \ # Step 11: Ingest the generated DCP to create an AFI From b71d963dd066fe745527a47279118eedac190534 Mon Sep 17 00:00:00 2001 From: AWScccabra Date: Wed, 4 Jan 2017 16:26:49 -0600 Subject: [PATCH 14/29] Update README.md --- README.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 94e7aef88..2b1fc183e 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,13 @@ -# AWS EC2 FPGA Hardware and Software Development Kit +# Table of Contents + +1. [AWS EC2 FPGA Hardware and Software Development Kits] (#devkit) + - [FPGA Hardware Development Kit (HDK)] (hdk/README.md) + - [FPGA Software Development Kit (SDK)] (sdk/README.md) +2. [Quick Start] (#quickstart) + +# AWS EC2 FPGA Hardware and Software Development Kits This release includes two portions: [HDK](./hdk) for developing Amazon FPGA Image (AFI), and [SDK](./sdk) for using AFI on FPGA-enabled EC2 instances [such as F1](https://aws.amazon.com/ec2/instance-types/f1/). @@ -31,7 +38,7 @@ FPGA developer AMI will be prefixed with F1 During private access period, developers are emailed with details on how to get started with the AMI, terms and conditions and additional info on how to get started using F1 instances. Please email aws-fpga-developer-support@amazon.com for questions regarding developer AMI. -# Quick Start +# Quick Start ## Building an Example AFI From eac7664f88583eb60773dfa079a1499602e82a19 Mon Sep 17 00:00:00 2001 From: AWScccabra Date: Wed, 4 Jan 2017 16:55:55 -0600 Subject: [PATCH 15/29] Update README.md --- hdk/README.md | 110 +++++++++----------------------------------------- 1 file changed, 20 insertions(+), 90 deletions(-) diff --git a/hdk/README.md b/hdk/README.md index cac6bd388..db5009f76 100644 --- a/hdk/README.md +++ b/hdk/README.md @@ -2,14 +2,24 @@ [![API Reference](http://img.shields.io/badge/api-reference-blue.svg)](http://docs.aws.amazon.com/techdoc/fpga) [![Join the chat at https://gitter.im/aws/aws-fpga](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/aws/aws-fpga?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) -<<<<<<< HEAD AWS FPGA HDK is the official kit for developing an Amazon FPGA Image (AFI) which can be loaded on FPGAs in FPGA-enabled EC2 instances (i.e. F1 Instance). Check out the [release notes](../RELEASE_NOTES.md) for information about the latest bug fixes, updates, and features added to the HDK. -## Overview +## Table of Contents +1. [Overview] (#overview) +2. [Getting Started] (#gettingstarted) + - [Xilinx Vivado Tools and License Requirements] (#vivado) + - [HDK Installation and Environment Setup] (#setup) + - [Custom Logic (CL) Examples] (#examples) + - [Start Custom Logic (CL) Design] (#startcl) + - [Simulate Custom Logic (CL) Design] (#simcl) + - [Build Custom Logic (CL) Design for AWS] (#buildcl) +3. [Frequently Asked Questions (FAQ)] (#faq) + +## Overview The AWS FPGA HDK includes all the design files and scripts required to generate an Amazon FPGA Image (AFI). Developers can download the HDK and use it in their preferred design environment. AWS offers the `FPGA Developer AMI` on the [AWS Marketplace](https://aws.amazon.com/marketplace) with the required tools to develop, simulate, and build an AFI. @@ -25,15 +35,15 @@ The [Custom Logic (cl) directory](./cl) is where the Custom Logic is expected to The HDK also includes test benches for each provided example, and instructions on how to run RTL-level simulations. -## Getting Started +## Getting Started -### Have an instance or server with Xilinx Vivado tools and License +### Have an instance or server with Xilinx Vivado tools and License To get started, the developer needs to have a development environment with Xilinx Vivado tools installed. An easy way to get this by using the AWS FPGA Developer AMI and following the instructions inside the README.md of that AMI. Please refer to the [release notes](../RELEASE_NOTES.md) for the exact version of Vivado tools, and the required license components. -### Install the HDK and setup environment +### Install the HDK and setup environment The AWS FPGA HDK can be cloned to your EC2 instance or server by executing: @@ -41,28 +51,28 @@ The AWS FPGA HDK can be cloned to your EC2 instance or server by executing: $ cd aws-fpga $ source hdk_setup.sh -### Try out a "Hello World" example and others +### Try out a "Hello World" example and others The [Getting started with CL examples](./cl/examples/Getting_Started_With_CL_Examples.md) walks you through how to build, register, and use an AFI. The [Hello World readme](./cl/examples/cl_hello_world/README.md) provides the steps to build an AFI from the provided Hello World example CL, and how to load it on an F1 instance. Other examples are available in the [examples directory](./cl/examples), each with its own README.md file. -### Start your own Custom Logic design +### Start your own Custom Logic design The [start your own CL design](./cl/developer_designs/README.md) will guide you on how to setup your own CL project environment. -### Simulate your Custom Logic design +### Simulate your Custom Logic design You can use Vivado XSIM simulator, or bring your own simulator (like Synopsys', Mentor's, or Cadence). Follow the [verification environment setup](https://github.com/aws/aws-fpga/wiki/Simulating-CL-Designs-(RTL-Simulation)#introduction) to run these simulations -### Build and submit the Custom Logic to AWS for generating an AFI +### Build and submit the Custom Logic to AWS for generating an AFI You can follow the [build scripts readme](./common/shell_current/new_cl_template/build/README.md) for step-by-step instructions on how to setup the scripts and run the build process. This [checklist](./cl/CHECKLIST_BEFORE_BUILDING_CL.md) should be consulted before you start the build process. -## FAQ +## FAQ ### Does the HDK Include DMA? The current release of the HDK does not include DMA. Upcoming releases will include both Xilinx's XDMA and AWS EDMA in the HDK and their respective drivers in the SDK. @@ -79,83 +89,3 @@ The HDK does not currently support chipscope debug, but this will be enabled in ### Does the HDK support dynamic Partial Reconfiguration? The HDK supports dynamic partial reconfiguration (PR) of the Custom Logic. Each AFI is actually a partial bitstream, and AFI's can be swapped during operation. Using [FPGA Management Tools provided by the SDK](../sdk/management/fpga_image_tools), the users can load/unload AFIs from within the instance. **NOTE: Users can only load/unload AFI-id(s) that have been associated a priori to the instance-id or the AMI-id** - - - -======= - - -AWS FPGA HDK is the official kit for developing an Amazon FPGA Image (AFI) which can be loaded on FPGAs in FPGA-enabled EC2 instances (i.e. F1 Instance). - -Check out the [release notes](../RELEASE_NOTES.md) for information about the latest bug fixes, updates, and features added to the HDK. - -## Overview - -The AWS FPGA HDK includes all the design files and scripts required to generate an Amazon FPGA Image (AFI). Developers can download the HDK and use it in their preferred design environment. AWS offers the `FPGA Developer AMI` on the [AWS Marketplace](https://aws.amazon.com/marketplace) with the required tools to develop, simulate, and build an AFI. - -**NOTE:** The HDK is developed and tested in a **Linux** environment only - -### Content of the release - -The [documents directory](./docs) provides the specification for the AWS Shell (SH) to Custom Logic (CL) interface, and best practices for CL design and development. - -The [common directory](./common) includes scripts, timing constraints and compile settings required during the AFI generation process. Developers should not change these files. - -The [Custom Logic (cl) directory](./cl) is where the Custom Logic is expected to be developed. It includes a number of examples under the [examples directory](./cl/examples), as well as a placeholder for the developer's own Custom Logic under [developer_designs directory](./cl/developer_designs). - -The HDK also includes test benches for each provided example, and instructions on how to run RTL-level simulations. - -## Getting Started - -### Have an instance or server with Xilinx Vivado tools and License - -To get started, the developer needs to have a development environment with Xilinx Vivado tools installed. An easy way to get this by using the AWS FPGA Developer AMI and following the instructions inside the README.md of that AMI. - -Please refer to the [release notes](../RELEASE_NOTES.md) for the exact version of Vivado tools, and the required license components. - -### Install the HDK and setup environment - -The AWS FPGA HDK can be cloned to your EC2 instance or server by executing: - - $ git clone https://github.com/aws/aws-fpga - $ cd aws-fpga - $ source hdk_setup.sh - -### Try out a "Hello World" example and others - -The [Getting started with CL examples](./cl/examples/README.md) walks you through how to build, register, and use an AFI. -The [Hello World readme](./cl/examples/cl_hello_world/README.md) provides the steps to build an AFI from the provided Hello World example CL, and how to load it on an F1 instance. -Other examples are available in the [examples directory](./cl/examples), each with its own README.md file. - - -### Start your own Custom Logic design - -The [start your own CL design](./cl/developer_designs/README.md) will guide you on how to setup your own CL project environment. - -### Simulate your Custom Logic design - -You can use Vivado XSIM simulator, or bring your own simulator (like Synopsys', Mentor's, or Cadence). -Follow the [verification environment setup](https://github.com/aws/aws-fpga/wiki/Simulating-CL-Designs-(RTL-Simulation)#introduction) to run these simulations - -### Build and submit the Custom Logic to AWS for generating an AFI - -You can follow the [build scripts readme](./common/shell_current/new_cl_template/build/README.md) for step-by-step instructions on how to setup the scripts and run the build process. -This [checklist](./cl/CHECKLIST_BEFORE_BUILDING_CL.md) should be consulted before you start the build process. - -## FAQ - -### Does the HDK Include DMA? -The current release of the HDK does not include DMA. Upcoming releases will include both Xilinx's XDMA and AWS EDMA in the HDK and their respective drivers in the SDK. - -### Does the HDK support OpenCL? -The current release of the HDK does not include OpenCL support. - -### Does the HDK support SDAccel? -The current release of the HDK does not include SDAccel support. - -### Does the HDK support Chipscope? -The HDK does not currently support chipscope debug, but this will be enabled in upcoming HDK/SDK releases. - -### Does the HDK support dynamic Partial Reconfiguration? -The HDK supports dynamic partial reconfiguration (PR) of the Custom Logic. Each AFI is actually a partial bitstream, and AFI's can be swapped during operation. Using [FPGA Management Tools provided by the SDK](../sdk/management/fpga_image_tools), the users can load/unload AFIs from within the instance. **NOTE: Users can only load/unload AFI-id(s) that have been associated a priori to the instance-id or the AMI-id** ->>>>>>> master From fc6101873dda91bbd1b3f62b03ff14a758515bf1 Mon Sep 17 00:00:00 2001 From: AWSNB Date: Wed, 4 Jan 2017 18:44:20 -0800 Subject: [PATCH 16/29] Update README.md --- hdk/README.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/hdk/README.md b/hdk/README.md index db5009f76..bb28c2f0d 100644 --- a/hdk/README.md +++ b/hdk/README.md @@ -1,10 +1,4 @@ # AWS FPGA HDK - -[![API Reference](http://img.shields.io/badge/api-reference-blue.svg)](http://docs.aws.amazon.com/techdoc/fpga) -[![Join the chat at https://gitter.im/aws/aws-fpga](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/aws/aws-fpga?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) - - -AWS FPGA HDK is the official kit for developing an Amazon FPGA Image (AFI) which can be loaded on FPGAs in FPGA-enabled EC2 instances (i.e. F1 Instance). Check out the [release notes](../RELEASE_NOTES.md) for information about the latest bug fixes, updates, and features added to the HDK. From cfa769e2ea2fb8618b4311267c238baf753030dd Mon Sep 17 00:00:00 2001 From: AWSNB Date: Wed, 4 Jan 2017 19:08:34 -0800 Subject: [PATCH 17/29] initial edits --- FAQs.md | 90 ++++++++++++++++++++++++--------------------------------- 1 file changed, 37 insertions(+), 53 deletions(-) diff --git a/FAQs.md b/FAQs.md index b6315ceef..461068cc8 100644 --- a/FAQs.md +++ b/FAQs.md @@ -1,74 +1,42 @@ -**Frequently Asked Questions** +#AWS FPGA - Frequently Asked Questions -**What do I need to get started on building accelerators for FPGA -instances?** +**Q: What do I need to get started on building accelerators for FPGA +instances?* -Getting started requires downloading the latest HDK and SDK from the AWS -FPGA GitHub repository. The HDK and SDK provide the needed code and -information for building FPGA code. The HDK provides all the information -needed on building source code for use within the FPGA. The SDK provides -all the information needed on building software for managing FPGAs on an -F1 instance. +Getting started requires downloading the latest HDK and SDK from the AWS FPGA GitHub repository. The HDK and SDK provide the needed code and information for building FPGA code. The HDK provides all the information needed for developing an FPGA image from source code, while the SDK provides all the runtime software for managing Amazon FPGAs image (AFI) on loaded into F1 instance FPGA. -FPGA code requires a simulator to test code and a Vivado tool set for -synthesis of source code into compiled FPGA code. The FPGA Developer AMI -includes the Xilinx Vivado tools for simulation and synthesis of -compiled FPGA code. +Typically, FPGA development process requires a simulator to perform functional test on the source code, and a Vivado tool set for synthesis of source code into compiled FPGA code. The FPGA Developer AMI provided by AWS includes the complete Xilinx Vivado tools for simulation (XSMI) and synthesis of FPGA . -**How do I develop accelerator code for an FPGA in an F1 instance?** +**Q: How do I develop accelerator code for an FPGA in an F1 instance?** -Start with the Shell interface specification: -AWS\_Shell\_Interface\_Specification.md. This document describes the -interface between Custom Logic and the AWS Shell. All Custom Logic for -an accelerator resides within the Custom Logic region of the F1 FPGA. +Start with the [Shell interface specification](./hdk/docs/AWS_Shell_Interface_Specification.md). This document describes the interface between Custom Logic and the AWS Shell. All Custom Logic for an accelerator resides within the Custom Logic region of the F1 FPGA. -**What are the major areas of the GitHub repository?** +The [HDK README](./hdk/README.md) walks the developer through the steps to build an FPGA image from one of the provided examples as well starting a new code -The HDK side of the GitHub repository contains the AWS Shell code, Build -scripts, Documentation, and Examples. Shell code is contained in -aws-fpga/hdk/common. Build scripts are in -aws-fpga/hdk/common/shell\_current/build. Documentation is in -aws-fpga/hdk/docs. Custom Logic examples are in aws-fpga/hdk/cl. +**Q: What is included in the HDK?** -The SDK side of the GitHub repository contains the FPGA Management -Tools, a preview of the AWS CLI for F1, and software for Xilinx XDMA and -SDAccell. The FPGA Management Tools are for loading/clearing AFIs and -getting status of the FPGAs mapped to an instance. FPGA Management Tools -are in aws-fpga/sdk/management. The AWS CLI preview is in -aws-fpga/sdk/aws-cli-preview. - -**What is included in the HDK?** - -The HDK includes documentation for the Shell interface and other Custom -Logic implementation guidelines, the Shell code needed for Custom Logic +The HDK includes major portions: +1) Documentation for the Shell interface and other Custom Logic implementation guidelines, the Shell code needed for Custom Logic development, simulation models for the Shell, software for exercising -the Custom Logic examples, a getting started guide for Custom Logic, and + +2) Custom Logic examples, a getting started guide for building your own Custom Logic, and examples for starting a Custom Logic Design. -**What is in the AWS Shell?** +3) Scripts for building and submitting Amazon FPGA Image (AFI) from a Custom Logic -The AWS Shell includes the PCIe interface for the FPGA, a single DDR -interface, and necessary FPGA management functionality. Also provided as -part of the Shell code, but implemented within the Custom Logic region -of the FPGA are three DDR interfaces. These interfaces are provided for -implementation within the Custom Logic region to provide maximum -efficiency for the developer. +4) Reference software drivers to be used in conjunction with the Custom Logic examples -**Are there examples for getting started on accelerators?** +5) RTL Simulation models and RTL simula -Yes, examples are in the aws-fpga/hdk/cl/examples directory. The -cl\_hello\_world example is a simple example to build and test the CL -development process. The cl\_simple example provides an expanded example -for testing access to the DDR interfaces. +**Q: What is in the AWS Shell?** -**How do I get access to the Developer AMI?** +The AWS Shell is a piece of code provided and managed by AWS, that does a lot of the non-differentiated heavy lefting like setting up the PCIe interface, and FPGA image loading infrastructure, security and operational isolation, metrics and debug hooks -Currently, the FPGA Developer AMI is private and you will need to be whitelisted. You will -receive permission and notifications via email. Email aws-fpga-developer-support@amazon.com with any questions -See the FPGA Developer AMI README for more details. +Every FPGA deployed in AWS cloud includes AWS shell, and the developer Custom Logic (CL) actually interfaces with the available AWS Shell interfaces. +AWS itselfs includes the PCIe interface for the FPGA, and necessary FPGA management functionality. One of the four DRAM interface controllers is included in the Shell, while the three other DRAM interface controllers is expected to be instanciated in the Custom Logic code (A design choice that was made to achieve optimal utilization of FPGA resources from placement perspective) -**What is an AFI?** +**Q: What is an AFI?** An AFI stands for Amazon FPGA Image. That is the compiled FPGA code that is loaded into an FPGA for performing the Custom Logic function created @@ -77,6 +45,22 @@ account that created them. An AFI ID is used to reference a particular AFI from an F1 instance. The AFI ID is used to indicate the AFI that should be loaded into a specific FPGA within the instance. +**Q: Are there examples for getting started on accelerators?** + +Yes, examples are in the [examples directory](./hdk/cl/examples): + +The [cl_hello_world example](.hdk/cl/examples/cl_hello_world) is an RTL/Verilog simple example to build and test the CL development process, it does not use any of the external interfaces of the FPGA except the PCIe. + +The [cl_simple example]((.hdk/cl/examples/cl_simple) provides an expanded example for testing access to the DRAM interfaces. + +**Q: How do I get access to AWS FPGA Developer AMI?** + +Currently, the FPGA Developer AMI is private and you will need to be whitelisted. You will receive permission and notifications via email. Email aws-fpga-developer-support@amazon.com with any questions. + +Once you got access to the FPGA Developer AMI, we suggest you read the the README file within the FPGA Developer for more details. + +XXXXX + **What is the process for creating an AFI?** The AFI process starts by creating Custom Logic code that conforms to From 487b5f15952d78a210bf5cefdb19343201247f5f Mon Sep 17 00:00:00 2001 From: AWSNB Date: Wed, 4 Jan 2017 22:36:07 -0800 Subject: [PATCH 18/29] 2nd wave of edits, one more left --- FAQs.md | 264 ++++++++++++++++++++++++-------------------------------- 1 file changed, 112 insertions(+), 152 deletions(-) diff --git a/FAQs.md b/FAQs.md index 461068cc8..8825df3fc 100644 --- a/FAQs.md +++ b/FAQs.md @@ -5,7 +5,8 @@ instances?* Getting started requires downloading the latest HDK and SDK from the AWS FPGA GitHub repository. The HDK and SDK provide the needed code and information for building FPGA code. The HDK provides all the information needed for developing an FPGA image from source code, while the SDK provides all the runtime software for managing Amazon FPGAs image (AFI) on loaded into F1 instance FPGA. -Typically, FPGA development process requires a simulator to perform functional test on the source code, and a Vivado tool set for synthesis of source code into compiled FPGA code. The FPGA Developer AMI provided by AWS includes the complete Xilinx Vivado tools for simulation (XSMI) and synthesis of FPGA . +Typically, FPGA development process requires a simulator to perform functional test on the source code, and a Vivado tool set for synthesis of source code into compiled FPGA code. The FPGA Developer AMI provided by AWS includes the complete Xilinx Vivado tools for simulation (XSIM) and synthesis of FPGA . + **Q: How do I develop accelerator code for an FPGA in an F1 instance?** @@ -13,6 +14,7 @@ Start with the [Shell interface specification](./hdk/docs/AWS_Shell_Interface_Sp The [HDK README](./hdk/README.md) walks the developer through the steps to build an FPGA image from one of the provided examples as well starting a new code + **Q: What is included in the HDK?** The HDK includes major portions: @@ -28,6 +30,7 @@ examples for starting a Custom Logic Design. 5) RTL Simulation models and RTL simula + **Q: What is in the AWS Shell?** The AWS Shell is a piece of code provided and managed by AWS, that does a lot of the non-differentiated heavy lefting like setting up the PCIe interface, and FPGA image loading infrastructure, security and operational isolation, metrics and debug hooks @@ -36,22 +39,24 @@ Every FPGA deployed in AWS cloud includes AWS shell, and the developer Custom Lo AWS itselfs includes the PCIe interface for the FPGA, and necessary FPGA management functionality. One of the four DRAM interface controllers is included in the Shell, while the three other DRAM interface controllers is expected to be instanciated in the Custom Logic code (A design choice that was made to achieve optimal utilization of FPGA resources from placement perspective) + **Q: What is an AFI?** -An AFI stands for Amazon FPGA Image. That is the compiled FPGA code that -is loaded into an FPGA for performing the Custom Logic function created -by the developer. AFIs are maintained by AWS according to the AWS -account that created them. An AFI ID is used to reference a particular -AFI from an F1 instance. The AFI ID is used to indicate the AFI that -should be loaded into a specific FPGA within the instance. +An AFI stands for Amazon FPGA Image. That is the compiled FPGA code that is loaded into an FPGA in AWS for performing the Custom Logic function created by the developer. AFIs are maintained by AWS according to the AWS account that created them. An AFI ID is used to reference a particular AFI from an F1 instance. + +The developer can create multiple AFIs at no extra cost, up to a defined limited (typically 100 AFIs per AWS account). An AFI can be loaded as many FPGAs as the developer wants. + +A given instance can only load AFIs that has been assoicated with the instance or with the AMI that created the instance. Please refer to AFI documentation in [AWS AFI docs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AFI) + **Q: Are there examples for getting started on accelerators?** Yes, examples are in the [examples directory](./hdk/cl/examples): -The [cl_hello_world example](.hdk/cl/examples/cl_hello_world) is an RTL/Verilog simple example to build and test the CL development process, it does not use any of the external interfaces of the FPGA except the PCIe. +The [cl_hello_world example](./hdk/cl/examples/cl_hello_world) is an RTL/Verilog simple example to build and test the CL development process, it does not use any of the external interfaces of the FPGA except the PCIe. + +The [cl_simple example](.hdk/cl/examples/cl_simple) provides an expanded example for testing access to the DRAM interfaces. -The [cl_simple example]((.hdk/cl/examples/cl_simple) provides an expanded example for testing access to the DRAM interfaces. **Q: How do I get access to AWS FPGA Developer AMI?** @@ -59,176 +64,137 @@ Currently, the FPGA Developer AMI is private and you will need to be whitelisted Once you got access to the FPGA Developer AMI, we suggest you read the the README file within the FPGA Developer for more details. -XXXXX -**What is the process for creating an AFI?** +**Q: What is the process for creating an AFI?** -The AFI process starts by creating Custom Logic code that conforms to -the Shell Specification. Then, the Custom Logic must be compiled using -the Vivado tools to create a Design Checkpoint. That Design Checkpoint -is submitted to AWS for generating an AFI using the API. +The AFI process starts by creating Custom Logic code that conforms to the [Shell Specification]((./hdk/docs/AWS_Shell_Interface_Specification.md). Then, the Custom Logic must be compiled using the HDK scripts which leverages Vivado tools to create a Design Checkpoint. That Design Checkpoint is submitted to AWS for generating an AFI using the `aws ec2 create-fpga-image` API. -See aws-fpga/hdk/cl and aws-fpga/hdk/cl/examples for more detailed -information. -**Is there any software I need on my instance?** +**Q: Is there any software I need on my F1 instance that will use the AFI?** + +The required AWS software is the [FPGA Management Tool set](./SDK/ManagementTools). This software manages loading and clearing AFIs for FPGAs in the instance. It also allows developers to retrieve status on the FPGAs from within the instance. -The required AWS software is the FPGA Management Tool set found in the -SDK directory. This software manages loading and clearing AFIs for FPGAs -in the instance. It also allows developers to retrieve status on the -FPGAs from within the instance. See the README in aws-fpga/sdk for more -details. +Typically, you will not need the HDK nor any Xilinx vivado tools on F1 instance that using AFIs, unless you want to do in-field debug using Vivado's chipscope. -**Why do I see error “vivado not found” while running hdk\_setup.sh** -This is an indication that Xilinx vivado tool set are not installed. Try -installing the tool, or alternative use AWS FPGA Development AMI -available on AWS Marketplace, which comes with pre-installed Vivado -toolset and license +**Q: Why do I see error “vivado not found” while running hdk_setup.sh*?* -**Do AWS Marketplace customers see FPGA source code or a bitstream?** +This is an indication that Xilinx vivado tool set are not installed. Try installing the tool if you are working on your own environment, or alternative use AWS FPGA Development AMI available on AWS Marketplace, which comes with pre-installed Vivado toolset and license. -Neither: AWS Marketplace customers that pick up an AMI with with one our -more AFIs associated with it will not see any source code nor bitstream. -Marketplace customers actually have permission to use the AFI but not -permission to see its code. The only reference to the AFI is through the -AFI ID. The Customer would call fpga-local-load-image with the correct -AFI ID for that Marketplace offering, which will result in AWS loading -the AFI into the FPGA. No FPGA internal design code is exposed. -**Why did my example job run and die without generating a DCP file?** +**Q: How can i publish my AFI to AWS Marketplace?** + +First, you should create an AMI that includes the drivers and runtime libraries needed to use the AFI. Then you would need to associate one or more of the AFIs you developed to the AMI. And lastely, follow the standard flow for publish AMI on AWS marketplace. + +In other words, AFIs are not published directly on AWS marketplace, rather AFI(s) should be associated with an AMI and the AMI get published. + + +**Q: Do AWS Marketplace customers see FPGA source code or a bitstream?** + +Neither: AWS Marketplace customers that pick up an AMI with with one our more AFIs associated with it will not see any source code nor bitstream. Marketplace customers actually have permission to use the AFI but not permission to see its code. The only reference to the AFI is through the AFI ID. The Customer would call fpga-local-load-image with the correct AFI ID for that Marketplace offering, which will result in **AWS loading the AFI into the FPGA** in sideband and without sending the AFI code through the customer's instance. No FPGA internal design code is exposed. + + +**Q: Why did my example job run and die without generating a DCP file?** The error message below indicates that you ran out of memory. Restart your instance -with a different instance type that has 8GiB or more. +with a different instance type that has 32GiB or more. Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:06:26 ; elapsed = 00:08:59 . Memory (MB): peak = 4032.184 ; gain = 3031.297 ; free physical = 1285 ; free virtual = 1957 /opt/Xilinx/Vivado/2016.3/bin/loader: line 164: 8160 Killed "$RDI_PROG" "$@" Parent process (pid 8160) has died. This helper process will now exit -**Can I bring my own bitstream for loading on an F1 FPGA?** -No. There is no mechanism for loading a bitstream directly onto the -FPGAs of an F1 instance. All Custom Logic bitstreams are loaded onto the -FPGA by AWS. Developers create an AFI by creating a Vivado Design -Checkpoint (DCP) and submitting that DCP to AWS. AWS creates the final -AFI and bitstream from that DCP and returns an AFI ID for referencing -that AFI. +**Q: Can I bring my own bitstream for loading on an F1 FPGA?** + +No. There is no mechanism for loading a bitstream directly onto the FPGAs of an F1 instance. All Custom Logic bitstreams are loaded onto the FPGA by AWS. Developers create an AFI by creating a Vivado Design Checkpoint (DCP) and submitting that DCP to AWS. AWS creates the final AFI and bitstream from that DCP and returns an AFI ID for referencing that AFI. + + +**Q: Do I need to interface to the AWS Shell?** -**Do I need to interface to the AWS Shell?** +Yes. The only interface to PCIe and the instance CPU is through the AWS shell. The AWS Shell is included in all F1 FPGAs. There is no option to run the F1 FPGA without the Shell. The Shell takes care of the non-differentiating heavy lefting like PCIe tuning, FPGA I/O assigment, power and thermal management, and runtimr health monitoring. -Yes. The only interface to PCIe and the instance CPU is through the AWS -shell. The AWS Shell is included in all F1 FPGAs. There is no option to -run the F1 FPGA without the Shell. -**Can I generate my bitstream locally?** +**Q: Can I generate my bitstream on my own desktop/server (not on AWS cloud)?** -Yes, local tools can be used to develop the DCP needed for creating an -AFI. The HDK can be downloaded from GitHub and run on any local machine. -If a Developer uses local tools, the exact tool version specified in the -HDK and FPGA Developer AMI will need to be used. Note that AWS does not -provide support for generating a bitstream and testing that bitstream. +Yes, on-premise tools can be used to develop the (Design checkpoint) DCP needed for creating an AFI. The developer needs to download HDK can be downloaded from GitHub and run on any local machine. -**Do I need to get a Xilinx license to generate an AFI?** +If a Developer uses local tools and license, the exact Xilinx Vivado tool version specified in the HDK and FPGA Developer AMI will need to be use. -No, if the Developer uses the FPGA Developer AMI, Xilinx licenses for -simulation and DCP generation are included. Ingestion of a DCP to + +**Q: Do I need to get a Xilinx license to generate an AFI?** + +If the Developer uses the FPGA Developer AMI, Xilinx licenses for simulation, encryption, SDAccel and DCP generation are included. Ingestion of a DCP to generate an AFI is handled by AWS. No license is needed for DCP to AFI generation. If a local machine is used for development, the Developer is responsible for obtaining any necessary licenses. AWS only directly support cloud development using the AWS Developer AMI. -**Does AWS provide local development boards?** -No. AWS supports a cloud-only development model and provides the -necessary elements for doing 100% cloud development. No development -board is provided for on-premise development. +**Q: Does AWS provide actual FPGA boards for on-premise developer?** + +No. AWS supports a cloud-only development model and provides the necessary elements for doing 100% cloud development including Virtual JTAG (Vivado ChipScope), Emulated LED and Emulated DIP-switch. No development board is provided for on-premise development. + + +**Q: Which HDL languages are supported?** -**Which HDL languages are supported?** +For RTL level development: Verilog and VHDL are both supported in the FPGA Developer AMI and in generating a DCP. The Xilinx Vivado tools and simulator support mixed mode simulation of Verilog and VHDL. The AWS Shell is written in Verilog. Support for mixed mode simulation may vary if Developers use other simulators. Check your simulator documentation for Verilog/VHDL/System Verilog support. -Verilog and HVDL are both supported in the FPGA Developer AMI and in -generating a DCP. The Xilinx Vivado tools and simulator support mixed -mode simulation of Verilog and VHDL. The AWS Shell is written in -Verilog. Support for mixed mode simulation may vary if Developers use -other simulators. Check your simulator documentation for -Verilog/VHDL/System Verilog support. +**Q: What RTL simulators are supported?** -**What simulators are supported?** +The FPGA Developer AMI has built-in support for the Xilinx XSIM simulator. All licensing and software for XSIM is included in the +FPGA Developer AMI when launched. -The FPGA Developer AMI has built-in support for the Xilinx XSIM -simulator. All licensing and software for XSIM is included in the -Developer AMI when launched. Support for other simulators is included -through the bring-your-own license in the license manager for the -Developer AMI. AWS tests the HDK with Synopsys VCS, Mentor -Questa/ModelSim, and Cadence Incisive. Licenses for these simulators -must be acquired by the Developer. +Support for other simulators is included through the bring-your-own license in the license manager for the +FPGA Developer AMI. AWS tests the HDK with Synopsys VCS, Mentor Questa/ModelSim, and Cadence Incisive. Licenses for these simulators must be acquired by the Developer and not available with AWS FPGA Developer AMI. -**Is OpenCL and/or SDAccel Supported?** -Yes. OpenCL is supported through either the Xilinx SDAccel tool or any -SDAccel tool capable of generating RTL supported by the Xilinx Vivado +**Q: Is OpenCL and/or SDAccel Supported?** + +Yes. OpenCL is supported through either the Xilinx SDAccel tool or any OpenCL tool capable of generating RTL supported by the Xilinx Vivado synthesis tool. There is a branch in the AWS SDK tree for SDAccel. Note that during the Preview period, SDAccel may not be available. -**Can I use High Level Synthesis (HLS) Tools to generate an AFI?** -Yes. Vivado HLS and SDAccel are directly supported through the FPGA -Developer AMI. Any HLS tool that generates compatible Verilog or VHDL +**Q: Can I use High Level Synthesis (HLS) Tools to generate an AFI?** + +Yes. Vivado HLS and SDAccel are directly supported through the FPGA Developer AMI. Any HLS tool that generates compatible Verilog or VHDL for Vivado input can also be used for writing in HLS. -**Do I need to design for a specific power envelope?** -Yes, the design scripts provided in the HDK include checks for power -consumption that exceeds the allocated power for the Custom Logic -region. Developers do not need to include design considerations for -DRAM, Shell, or Thermal. AWS includes the design considerations for +**Q: Do I need to design for a specific power envelope?** + +Yes, the design scripts provided in the HDK include checks for power consumption that exceeds the allocated power for the Custom Logic region. Developers do not need to include design considerations for DRAM, Shell, or Thermal. AWS includes the design considerations for those as part of providing the power envelop for the CL region. -**Is a simulation model of the AWS Shell available?** -Yes. The HDK includes a simulation model for the AWS shell. See the -HDK/common tree for more information on the Shell simulation model. +**Q: Is a simulation model of the AWS Shell available?** + +Yes. The HDK includes a simulation model for the AWS shell. See the [HDK common tree](./hdk/common/verif) for more information on the Shell simulation model. + -**What example CL designs are provided in the HDK?** +**Q: What resources within the FPGA does the AWS Shell consume?** -There are two example designs provided in the HDK. There is a -hello\_world example that accepts reads and writes from an F1 instance. -There is a cl\_simple example that expands on hello\_world by adding -traffic generation to DRAM. Both examples are found in the -hdk/cl/examples directory. +The Shell consumes about 20% of the FPGA resources, and that includes the PCIe Gen3 X16, a DMA engine, A DRAM controller interface, chipscope and other health monitoring and image loading logic. No modifications to the Shell or the partition pins between the Shell and the CL are possible by the Developer. -**What resources within the FPGA does the AWS Shell consume?** +**Q: What IP blocks are provided in the HDK?** -The Shell consumes 20% of the F1 FPGA resources. The nature of partial -reconfiguration consumes all resources (BRAM, URAM, Logic Elements, DSP, -etc) in the partition allocated for the AWS Shell. No modifications to -the Shell or the partition pins between the Shell and the CL are -possible by the Developer. +The HDK includes IP for the Shell and DRAM interface controllers. Inside the Shell, there is a PCIe interface, the a DMA Engine, and one DRAM interface controller. These blocks are only accessible via the AXI interfaces defined by the Shell-CL interface. There are IP blocks for the other DRAM interfaces, enabling up to 3 additional DRAM interfaces instantiated by the Developerin the CL region. Future versions of the HDK will include IP for the FPGA Link interface. -**What IP blocks are provided in the HDK?** -The HDK includes IP for the Shell and DDR controllers. Inside the Shell, -there is a PCIe interface, the Xilinx XDMA Engine, and one DDR -controller. These blocks are only accessible via the AXI interfaces -defined by the Shell interface. There are IP blocks for DDR controllers, -enabling up to 3 additional DDR interfaces instantiated by the Developer -in the CL region. Future versions of the HDK will include IP for the -FPGA Link interface. +**Q: Can I use other IP blocks from Xilinx or other 3rd parties?** -**Can I use other IP blocks from Xilinx or other 3^rd^ parties?** +Yes. Developers are free to use any IP blocks within the CL region. Those can be 3rd party IP or IP available in Vivado IP catalog. -Yes. Developers are free to use any IP blocks within the CL region that -can be utilized by Vivado to create a Partial Reconfiguration region. -Note that AWS does not provide direct support for IP blocks not -contained in the HDK. +*Note that AWS does not provide direct support for IP blocks not contained in the HDK.* -**What OS can run on the F1 instance?** -Amazon Linux is supported directly on F1. Developers can utilize the -source code in the SDK directory to compile other variants of Linux for -use on F1. Windows is not supported on F1. +**Q: What OS can run on the F1 instance?** -**What support exists for host DMA?** +Amazon Linux and CentOS 7 are supported and tested on AWS EC2 F1 instance. Developers can utilize the source code in the SDK directory to compile other variants of Linux for use on F1. Windows is not supported on F1. + + +**Q: What support exists for host DMA?** There are two mechanisms for host DMA between the instance CPU and the FPGA. The first is the Xilinx XDMA engine. This engine is included in @@ -333,43 +299,37 @@ This enables each FPGA card to send/receive data from an adjacent card at 200Gbps. Details on the FPGA Link interface are provided in the Shell Interface specification when available. -**What protocol is used for FPGA link?** +**Q: What protocol is used for FPGA link?** + +There is no transport protocol for the FPGA link. It is a generic raw streaming interface. Details on the shell interface to the FPGA Link IP blocks are provided in the Shell Interface specification when available. + +It is expected that developers would take advantage of standard PCIe protocol, Ethernet protocol, or Xilinx's (reliable) Aurora protocol layer on this interface. + +**Q: What clock speed does the FPGA utilize?** + +The FPGA provides a 250MHz clock from the Shell to the CL region. All the AXI interfaces betwenn Shell and CL are synchronous to that clock (with exception of DDR_C interface). Developers can create an ansynchronous interface to the AXI busses and run their CL region at any clock frequency needed. Clocks can be created in the CL region using the Xilinx clock generation modules. See the [Shell Interface specification](./hdk/docs/AWS_Shell_Interface_Specification.md) for more details. + + +**Q: What FPGA debug capabilities are supported?** + +There are four debug capabilities supported in F1 for FPGA debug: + +1) The first is the use of Xilinx's Chipscope. Xilinx Chipscope is natively supported on F1 and included in AWS Shell. It provides equivalent function to JTAG debug with exception that is emulated JTAG-over-PCIe. Chipscope circuit is pre-integrated with AWS Shell and available to the instance over memory-mapped PCIe space. The Chipscope software is included in the FPGA Developer AMI. -There is no transport protocol for the FPGA link. It is a data streaming -interface. Details on the shell interface to the FPGA Link IP blocks are -provided in the Shell Interface specification when available. +2) The second is the use metrics available through the FPGA Image Management tools. The fpga-describe-local-image command allows the F1 instance to query metrics from the Shell and Shell to CL interface. See Shell Interface specification and FPGA Image Management tools for more information on supported metrics. -**What clock speed does the FPGA utilize?** +3) An Emulated LED to represet the status of 16 different LEDs (On/Off), emulated what otherwise will be an on-board LED. The LED status is read through the PCIe management Physical Function (PF). -The FPGA provides a 250MHz clock from the Shell to the CL region. The -AXI interfaces to the Shell are synchronous to that clock. Developers -can create an ansynchronous interface to the AXI busses and run their CL -region at any clock frequency needed. Clocks can be created in the CL -region using the Xilinx clock generation modules. See the Shell -Interface specification for more details. +4) An Emulated DIP Switch to represent a generic 16 binrary DIP switch that get pass to the CL. -**What FPGA debug capabilities are supported?** -There are two debug capabilities supported in F1 for FPGA debug. The -first is the use of Xilinx Chipscope. Xilinx Chipscope is natively -supported on F1 by running the FPGA Developer AMI on the F1 instance to -be debugged. The Chipscope software is included in the Developer AMI. -Not that Chipscope in the F1 instance uses a memory-mapped interface to -communicate with the FPGA. The JTAG/ICAP interface is not available to -the F1 instance. The second is the use metrics available through the -FPGA Image Management tools. The fpga-describe-local-image command -allows the F1 instance to query metrics from the Shell and Shell to CL -interface. See Shell Interface specification and FPGA Image Management -tools for more information on supported metrics. +**Q: What FPGA is used in AWS EC2 F1 instance?** -**What FPGA is used?** +The FPGA for F1 is the Xilinx Ultrascale+ VU9P device with the -2 speed grade. The HDK scripts have the compile scripts needed for the VU9P device. -The FPGA for F1 is the Xilinx Ultrascale+ VU9P device in the -2 speed -grade. The HDK scripts have the compile scripts needed for the VU9P -device. -**What memory is attached to the FPGA?** +**Q: What memory is attached to the FPGA?** -Each FPGA on F1 has 4 x DDR4 2400 interfaces at 72bits wide (64bit -data). Each DDR interface has 16GB of DRAM attached. This yields 64GB of -total DDR memory local to each FPGA on F1. +Each FPGA on F1 has 4 x DDR4-2133 interfaces, each at 72bits wide (64bit +data). Each DDR interface has 16GiB of DRAM attached. This yields 64GB of +total DRAM memory local to each FPGA on F1. From 0d359cef121c53f3e5182e68afc94f12692f966c Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Wed, 11 Jan 2017 08:09:43 -0800 Subject: [PATCH 19/29] Adding separate synthesis strategy scripts. Change-Id: I5b8497bb8e5411c7f37ddbd5d2d1f2e7986c891b --- .../build/scripts/aws_build_dcp_from_cl.sh | 13 +- .../scripts/create_dcp_from_cl.basic.tcl | 300 ++++++++++++++++++ .../scripts/create_dcp_from_cl.congestion.tcl | 274 ++++++++++++++++ .../scripts/create_dcp_from_cl.default.tcl | 274 ++++++++++++++++ .../scripts/create_dcp_from_cl.explore.tcl | 274 ++++++++++++++++ .../scripts/create_dcp_from_cl.timing.tcl | 276 ++++++++++++++++ hdk_setup.sh | 6 + 7 files changed, 1416 insertions(+), 1 deletion(-) create mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl create mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl create mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl create mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl create mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh b/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh index 15a784649..a17481074 100755 --- a/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh +++ b/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh @@ -15,6 +15,13 @@ ## implied. See the License for the specific language governing permissions and ## limitations under the License. +# If specified use script specified, otherwise use default vivado script +if [ "$1" != "" ]; then + vivado_script="$1" +else + vivado_script="create_dcp_from_cl.tcl" +fi + echo "AWS FPGA: Starting the design checkpoint build process" echo "AWS FPGA: Checking for proper environment variables and build directories" @@ -43,6 +50,9 @@ then exit 1 fi +# Use timestamp for logs and output files +timestamp=$(date +"%y_%m_%d-%H%M%S") +logname=$timestamp.vivado.log echo "AWS FPGA: Environment variables and directories are present. Checking for Vivado installation." @@ -50,7 +60,8 @@ echo "AWS FPGA: Environment variables and directories are present. Checking for vivado -version >/dev/null 2>&1 || { echo >&2 "ERROR - Please install/enable Vivado." ; return 1; } # Run vivado -nohup vivado -mode batch -nojournal -source create_dcp_from_cl.tcl & +#nohup vivado -mode batch -nojournal -source create_dcp_from_cl.tcl & +nohup vivado -mode batch -nojournal -log $logname -source $vivado_script -tclargs $timestamp > $timestamp.nohup.out 2>&1& echo "AWS FPGA: Build through Vivado is running as background process, this may take few hours." echo "AWS FPGA: You can set up an email notification upon Vivado run finish by following the instructions in TBD" diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl new file mode 100644 index 000000000..ab2272223 --- /dev/null +++ b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl @@ -0,0 +1,300 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_hello_world.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -directive Explore +check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive WLDrivenBlockPlacement +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive MoreGlobalIterations + +################################# +# CL Final Physical Optimization +################################# +puts "AWS FPGA: Post-route Physical optimization stage "; + +phys_opt_design -directive Explore + +puts "AWS FPGA: Locking design "; + +lock_design -level routing + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +#Verify PR build +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt + +# Write out the CL DCP to integrate with SH_BB +write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp +close_design + +# Integreate Developer CL with SH BB +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp +report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp +pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl new file mode 100644 index 000000000..e83b4ffe6 --- /dev/null +++ b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl @@ -0,0 +1,274 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_hello_world.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -directive AlternateRoutability -no_lc -shreg_min_size 10 -control_set_opt_threshold 16 + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -bufg_opt -control_set_merge -hier_fanout_limit 512 -muxf_remap -propconst -retarget -sweep + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive AltSpreadLogic_medium +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive Explore + +################################# +# CL Final Physical Optimization +################################# +# N/A + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl new file mode 100644 index 000000000..0fa9ff65b --- /dev/null +++ b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl @@ -0,0 +1,274 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_hello_world.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design + +################################# +# CL Final Physical Optimization +################################# +# N/A + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl new file mode 100644 index 000000000..76668dfc0 --- /dev/null +++ b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl @@ -0,0 +1,274 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_hello_world.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -directive Explore + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive Explore +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive Explore + +################################# +# CL Final Physical Optimization +################################# +# N/A + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl new file mode 100644 index 000000000..f2be4b03c --- /dev/null +++ b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl @@ -0,0 +1,276 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_hello_world.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -no_lc -shreg_min_size 5 -fsm_extraction one_hot -resource_sharing off + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -directive Explore + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive ExtraNetDelay_high +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive Explore -tns_cleanup + +################################# +# CL Final Physical Optimization +################################# +puts "AWS FPGA: Post-route Physical optimization stage "; + +phys_opt_design -directive Explore + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk_setup.sh b/hdk_setup.sh index bed24b2f6..4092384ca 100755 --- a/hdk_setup.sh +++ b/hdk_setup.sh @@ -3,6 +3,12 @@ # before going too far make sure Vivado is available vivado -version >/dev/null 2>&1 || { echo >&2 "ERROR - Please install/enable Vivado." ; return 1; } +# Clear environment variables +unset HDK_DIR +unset HDK_COMMON_DIR +unset HDK_SHELL_DIR +unset CL_DIR + export HDK_DIR=${HDK_DIR:=$(pwd)/hdk} # The next variable should not be modified and should always point to the /common directory under HDK_DIR From 3b4ffe2666eb5440f972ae00ce2932ae354630d5 Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Wed, 11 Jan 2017 11:06:19 -0800 Subject: [PATCH 20/29] Adding separate synthesis strategy scripts. Change-Id: Ifd5dcfbef4e94f59404baf5a45c08a1f12ee3393 --- .../build/scripts/aws_build_dcp_from_cl.sh | 13 +- .../scripts/create_dcp_from_cl.basic.tcl | 304 ++++++++++++++++++ .../scripts/create_dcp_from_cl.congestion.tcl | 278 ++++++++++++++++ .../scripts/create_dcp_from_cl.default.tcl | 278 ++++++++++++++++ .../scripts/create_dcp_from_cl.explore.tcl | 278 ++++++++++++++++ .../scripts/create_dcp_from_cl.timing.tcl | 280 ++++++++++++++++ 6 files changed, 1430 insertions(+), 1 deletion(-) create mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl create mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl create mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl create mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl create mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl diff --git a/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh b/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh index 15a784649..a17481074 100755 --- a/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh +++ b/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh @@ -15,6 +15,13 @@ ## implied. See the License for the specific language governing permissions and ## limitations under the License. +# If specified use script specified, otherwise use default vivado script +if [ "$1" != "" ]; then + vivado_script="$1" +else + vivado_script="create_dcp_from_cl.tcl" +fi + echo "AWS FPGA: Starting the design checkpoint build process" echo "AWS FPGA: Checking for proper environment variables and build directories" @@ -43,6 +50,9 @@ then exit 1 fi +# Use timestamp for logs and output files +timestamp=$(date +"%y_%m_%d-%H%M%S") +logname=$timestamp.vivado.log echo "AWS FPGA: Environment variables and directories are present. Checking for Vivado installation." @@ -50,7 +60,8 @@ echo "AWS FPGA: Environment variables and directories are present. Checking for vivado -version >/dev/null 2>&1 || { echo >&2 "ERROR - Please install/enable Vivado." ; return 1; } # Run vivado -nohup vivado -mode batch -nojournal -source create_dcp_from_cl.tcl & +#nohup vivado -mode batch -nojournal -source create_dcp_from_cl.tcl & +nohup vivado -mode batch -nojournal -log $logname -source $vivado_script -tclargs $timestamp > $timestamp.nohup.out 2>&1& echo "AWS FPGA: Build through Vivado is running as background process, this may take few hours." echo "AWS FPGA: You can set up an email notification upon Vivado run finish by following the instructions in TBD" diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl new file mode 100644 index 000000000..23f8be34e --- /dev/null +++ b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl @@ -0,0 +1,304 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_simple_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_simple.sv \ +$CL_DIR/build/src_post_encryption/cl_tst.sv \ +$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ +$CL_DIR/build/src_post_encryption/mem_scrb.sv \ +$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -directive Explore +check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive WLDrivenBlockPlacement +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive MoreGlobalIterations + +################################# +# CL Final Physical Optimization +################################# +puts "AWS FPGA: Post-route Physical optimization stage "; + +phys_opt_design -directive Explore + +puts "AWS FPGA: Locking design "; + +lock_design -level routing + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +#Verify PR build +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt + +# Write out the CL DCP to integrate with SH_BB +write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp +close_design + +# Integreate Developer CL with SH BB +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp +report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp +pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl new file mode 100644 index 000000000..8eea12553 --- /dev/null +++ b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl @@ -0,0 +1,278 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_simple_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_simple.sv \ +$CL_DIR/build/src_post_encryption/cl_tst.sv \ +$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ +$CL_DIR/build/src_post_encryption/mem_scrb.sv \ +$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -directive AlternateRoutability -no_lc -shreg_min_size 10 -control_set_opt_threshold 16 + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -bufg_opt -control_set_merge -hier_fanout_limit 512 -muxf_remap -propconst -retarget -sweep + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive AltSpreadLogic_medium +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive Explore + +################################# +# CL Final Physical Optimization +################################# +# N/A + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl new file mode 100644 index 000000000..badf40c0b --- /dev/null +++ b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl @@ -0,0 +1,278 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_simple_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_simple.sv \ +$CL_DIR/build/src_post_encryption/cl_tst.sv \ +$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ +$CL_DIR/build/src_post_encryption/mem_scrb.sv \ +$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design + +################################# +# CL Final Physical Optimization +################################# +# N/A + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl new file mode 100644 index 000000000..76822788e --- /dev/null +++ b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl @@ -0,0 +1,278 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_simple_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_simple.sv \ +$CL_DIR/build/src_post_encryption/cl_tst.sv \ +$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ +$CL_DIR/build/src_post_encryption/mem_scrb.sv \ +$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -directive Explore + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive Explore +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive Explore + +################################# +# CL Final Physical Optimization +################################# +# N/A + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl new file mode 100644 index 000000000..bcc04fc6c --- /dev/null +++ b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl @@ -0,0 +1,280 @@ +## ============================================================================= +## Copyright 2016 Amazon.com, Inc. or its affiliates. +## All Rights Reserved Worldwide. +## Amazon Confidential information +## Restricted NDA Material +## create_cl.tcl: Build to generate CL design checkpoint based on +## developer code +## ============================================================================= + +package require tar + +################################################# +## Generate CL_routed.dcp (Done by User) +################################################# +puts "AWS FPGA Scripts"; +puts "Creating Design Checkpoint from Custom Logic source code"; +puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; + +#checking if CL_DIR env variable exists +if { [info exists ::env(CL_DIR)] } { + set CL_DIR $::env(CL_DIR) + puts "Using CL directory $CL_DIR"; +} else { + puts "Error: CL_DIR environment variable not defined ! "; + puts "Use export CL_DIR=Your_Design_Root_Directory" + exit 2 +} + +#checking if HDK_SHELL_DIR env variable exists +if { [info exists ::env(HDK_SHELL_DIR)] } { + set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) + puts "Using Shell directory $HDK_SHELL_DIR"; +} else { + puts "Error: HDK_SHELL_DIR environment variable not defined ! "; + puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; + exit 2 +} + +#Convenience to set the root of the RTL directory +#Timestamp passed in from AWS script so that log and output files match +set timestamp [lindex $argv 0] +puts "All reports and intermediate results will be time stamped with $timestamp"; + +##Identify build strategy for script outputs (file or directory names) - TBD +#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { +# set strategy "basic" +#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { +# set strategy "default" +#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { +# set strategy "explore" +#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { +# set strategy "timing" +#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { +# set strategy "congestion" +#} else { +# set strategy "test" +#} + +set_msg_config -severity INFO -suppress +set_msg_config -severity STATUS -suppress +set_msg_config -severity WARNING -suppress +set_msg_config -id {Chipscope 16-3} -suppress +set_msg_config -string {AXI_QUAD_SPI} -suppress + +puts "AWS FPGA: Calling the encrypt.tcl"; + +source encrypt.tcl + +#This sets the Device Type +source $HDK_SHELL_DIR/build/scripts/device_type.tcl + +create_project -in_memory -part [DEVICE_TYPE] -force + +#set_param chipscope.enablePRFlow true + +############################# +## Read design files +############################# + +#---- User would replace this section ----- + +#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl +read_verilog [ list \ + $CL_DIR/build/src_post_encryption/cl_simple_defines.vh +] +set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] +set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] + +puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; + +#User design files (these are the files that were encrypted by encrypt.tcl) +read_verilog [ list \ +$CL_DIR/build/src_post_encryption/cl_simple.sv \ +$CL_DIR/build/src_post_encryption/cl_tst.sv \ +$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ +$CL_DIR/build/src_post_encryption/mem_scrb.sv \ +$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv +] + +#---- End of section replaced by User ---- +puts "AWS FPGA: Reading AWS Shell design"; + +#Read AWS Design files +read_verilog [ list \ +$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ +$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ +$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ +$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ +$HDK_SHELL_DIR/design/lib/sync.v \ +$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ +$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ +$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ +$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ +$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ +$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ +$HDK_SHELL_DIR/design/interfaces/cl_ports.vh +] + +puts "AWS FPGA: Reading IP blocks"; +#Read DDR IP +read_ip [ list \ +$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci +] + +puts "AWS FPGA: Reading AWS constraints"; + + +#Read all the constraints +# +# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** +# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** +# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** +read_xdc [ list \ + $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ + $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ + $CL_DIR/build/constraints/cl_synth_user.xdc +] + +#Do not propagate local clock constraints for clocks generated in the SH +set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] + +update_compile_order -fileset sources_1 +set_property verilog_define XSDB_SLV_DIS [current_fileset] + +######################## +# CL Synthesis +######################## +puts "AWS FPGA: Start design synthesis"; + +synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -no_lc -shreg_min_size 5 -fsm_extraction one_hot -resource_sharing off + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +set failval [catch {exec grep "FAIL" failfast.csv}] +if { $failval==0 } { + puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" + exit 1 +} + +######################## +# CL Optimize +######################## +puts "AWS FPGA: Optimizing design"; +opt_design -directive Explore + +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp +close_design + +# Implementation +#Read in the Shell checkpoint and do the CL implementation +puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; + +open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp +read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp + +#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) +read_xdc [ list \ +$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ +$CL_DIR/build/constraints/cl_pnr_user.xdc \ +$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc +] + +# Prohibit the top two URAM sites of each URAM quad. +# These two sites cannot be used within PR designs. +set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] +foreach uramSite $uramSites { + # Get the URAM location within a quad + set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] + # The top-two sites have usage restrictions + if {$quadLoc == 2 || $quadLoc == 3} { + # Prohibit the appropriate site + set_property PROHIBIT true $uramSite + puts "Setting Placement Prohibit on $uramSite" + } +} + +puts "AWS FPGA: Optimize design during implementation"; + +opt_design -directive Explore + +######################## +# CL Place +######################## +puts "AWS FPGA: Place design stage"; + +place_design -directive ExtraNetDelay_high +write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp + +########################### +# CL Physical Optimization +########################### +puts "AWS FPGA: Physical optimization stage"; + +phys_opt_design -directive Explore + +######################## +# CL Route +######################## +puts "AWS FPGA: Route design stage"; + +route_design -directive Explore -tns_cleanup + +################################# +# CL Final Physical Optimization +################################# +puts "AWS FPGA: Post-route Physical optimization stage "; + +phys_opt_design -directive Explore + +#This is what will deliver to AWS +write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +# ################################################ +# Emulate what AWS will do (Bitstream Generation) +# ################################################ +puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + +# Make temp dir for bitstream +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +# Verify the Developer DCP is compatible with SH_BB. +pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + +open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp + +report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt +set_param bitstream.enablePR 4123 +write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit + +# Clean-up temp dir for bitstream +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir + +### -------------------------------------------- + +# Create a zipped tar file, that would be used for createFpgaImage EC2 API +puts "Compress files for sending back to AWS" + +# clean up vivado.log file +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl + +cd $CL_DIR/build/checkpoints +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] + +close_design + From 2cbdffc87edbaa1f30309102399dd5cdbedbca64 Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Thu, 19 Jan 2017 13:50:50 -0800 Subject: [PATCH 21/29] Updates to scripts for 5 strategies. Change-Id: I17f802f87d8c1e5a3f77cee7d21f3dd054c34059 --- .../build/scripts/create_dcp_from_cl.tcl | 346 +++++++++++++++--- .../build/scripts/aws_build_dcp_from_cl.sh | 47 ++- .../shell_current/build/scripts/clean_log.pl | 9 +- 3 files changed, 346 insertions(+), 56 deletions(-) rename hdk/{cl/examples/cl_simple => common/shell_current}/build/scripts/aws_build_dcp_from_cl.sh (64%) diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl index 9d212bd7b..9d6ce6c48 100644 --- a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl +++ b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl @@ -15,6 +15,7 @@ package require tar puts "AWS FPGA Scripts"; puts "Creating Design Checkpoint from Custom Logic source code"; puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Vivado Script Name: $argv0"; #checking if CL_DIR env variable exists if { [info exists ::env(CL_DIR)] } { @@ -36,10 +37,11 @@ if { [info exists ::env(HDK_SHELL_DIR)] } { exit 2 } -#Convenience to set the root of the RTL directory -set systemtime [clock seconds] -set timestamp [clock format $systemtime -gmt 1 -format {%y_%m_%d-%H%M}] +# Command-line Arguments +set timestamp [lindex $argv 0] +set strategy [lindex $argv 1] +#Convenience to set the root of the RTL directory puts "All reports and intermediate results will be time stamped with $timestamp"; set_msg_config -severity INFO -suppress @@ -135,7 +137,35 @@ set_property verilog_define XSDB_SLV_DIS [current_fileset] ######################## puts "AWS FPGA: Start design synthesis"; -synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + } + "EXPLORE" { + puts "EXPLORE strategy." + synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + } + "TIMING" { + puts "TIMING strategy." + synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -no_lc -shreg_min_size 5 -fsm_extraction one_hot -resource_sharing off + } + "CONGESTION" { + puts "CONGESTION strategy." + synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -directive AlternateRoutability -no_lc -shreg_min_size 10 -control_set_opt_threshold 16 + } + "OLD" { + puts "OLD strategy." + synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + } + "APP1" { + puts "APP1 strategy." + synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + } + default { + puts "$strategy is NOT a valid strategy." + } +} # Prohibit the top two URAM sites of each URAM quad. # These two sites cannot be used within PR designs. @@ -157,17 +187,45 @@ if { $failval==0 } { exit 1 } +######################## +# CL Optimize +######################## puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore -check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + opt_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + opt_design -directive Explore + } + "OLD" { + puts "OLD strategy." + opt_design -directive Explore + check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt + report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt + } + "APP1" { + puts "APP1 strategy." + opt_design -directive Explore + } + default { + puts "$strategy is NOT a valid strategy." + } +} write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp close_design -####################### # Implementation -####################### #Read in the Shell checkpoint and do the CL implementation puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; @@ -197,71 +255,267 @@ foreach uramSite $uramSites { puts "AWS FPGA: Optimize design during implementation"; -opt_design -directive Explore -check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + opt_design + } + "EXPLORE" { + puts "EXPLORE strategy." + opt_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + opt_design -bufg_opt -control_set_merge -hier_fanout_limit 512 -muxf_remap -propconst -retarget -sweep + } + "OLD" { + puts "OLD strategy." + opt_design -directive Explore + check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt + } + "APP1" { + puts "APP1 strategy." + opt_design + } + default { + puts "$strategy is NOT a valid strategy." + } +} +######################## +# CL Place +######################## puts "AWS FPGA: Place design stage"; -#place_design -verbose -directive Explore -place_design -directive WLDrivenBlockPlacement + +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + place_design + } + "EXPLORE" { + puts "EXPLORE strategy." + place_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + place_design -directive ExtraNetDelay_high + } + "CONGESTION" { + puts "CONGESTION strategy." + place_design -directive AltSpreadLogic_medium + } + "OLD" { + puts "OLD strategy." + place_design -directive WLDrivenBlockPlacement + } + "APP1" { + puts "APP1 strategy." + place_design + } + default { + puts "$strategy is NOT a valid strategy." + } +} write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp +########################### +# CL Physical Optimization +########################### puts "AWS FPGA: Physical optimization stage"; -phys_opt_design -directive Explore -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + phys_opt_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + phys_opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + phys_opt_design -directive Explore + } + "OLD" { + puts "OLD strategy." + phys_opt_design -directive Explore + report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt + write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp + } + "APP1" { + puts "APP1 strategy." + } + default { + puts "$strategy is NOT a valid strategy." + } +} +######################## +# CL Route +######################## puts "AWS FPGA: Route design stage"; -#route_design -verbose -directive Explore -route_design -directive MoreGlobalIterations +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + route_design + } + "EXPLORE" { + puts "EXPLORE strategy." + route_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + route_design -directive Explore -tns_cleanup + } + "CONGESTION" { + puts "CONGESTION strategy." + route_design -directive Explore + } + "OLD" { + puts "OLD strategy." + route_design -directive MoreGlobalIterations + } + "APP1" { + puts "APP1 strategy." + route_design + } + default { + puts "$strategy is NOT a valid strategy." + } +} +################################# +# CL Final Physical Optimization +################################# puts "AWS FPGA: Post-route Physical optimization stage "; -phys_opt_design -directive Explore - -puts "AWS FPGA: Locking design "; - -lock_design -level routing - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + } + "TIMING" { + puts "TIMING strategy." + phys_opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + } + "OLD" { + puts "OLD strategy." + phys_opt_design -directive Explore + puts "AWS FPGA: Locking design "; + lock_design -level routing + report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt + } + "APP1" { + puts "APP1 strategy." + } + default { + puts "$strategy is NOT a valid strategy." + } +} #This is what will deliver to AWS write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" +# ################################################ +# Verify PR Build +# ################################################ +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + } + "TIMING" { + puts "TIMING strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + } + "CONGESTION" { + puts "CONGESTION strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + } + "OLD" { + puts "OLD strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + } + "APP1" { + puts "APP1 strategy." + } + default { + puts "$strategy is NOT a valid strategy." + } +} -#Verify PR build -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log - -### -------------------------------------------- -### Emulate what AWS will do -### -------------------------------------------- +# ################################################ +# Emulate AWS Bitstream Generation +# ################################################ +puts "AWS FPGA: Emulate AWS bitstream generation" # Make temp dir for bitstream -file mkdir $CL_DIR/build/aws_verify_temp_dir +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir # Verify the Developer DCP is compatible with SH_BB. pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt -# Write out the CL DCP to integrate with SH_BB -write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -close_design +switch $strategy { + "DEFAULT" { + puts "DEFAULT strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + } + "TIMING" { + puts "TIMING strategy." + } + "CONGESTION" { + puts "CONGESTION strategy." + } + "OLD" { + puts "OLD strategy." + report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt + # Write out the CL DCP to integrate with SH_BB + write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp + close_design + open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp + report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt + write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp + pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + } + "APP1" { + puts "APP1 strategy." + } + default { + puts "$strategy is NOT a valid strategy." + } +} -# Integreate Developer CL with SH BB -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp -pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file $CL_DIR/build/aws_verify_temp_dir/${timestamp}.SH_CL_final.bit +write_bitstream -force -bin_file $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit # Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/aws_verify_temp_dir +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir ### -------------------------------------------- @@ -269,7 +523,7 @@ file delete -force $CL_DIR/build/aws_verify_temp_dir puts "Compress files for sending back to AWS" # clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl ${timestamp} cd $CL_DIR/build/checkpoints tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] diff --git a/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh b/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh similarity index 64% rename from hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh rename to hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh index a17481074..d42636945 100755 --- a/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh +++ b/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh @@ -15,11 +15,45 @@ ## implied. See the License for the specific language governing permissions and ## limitations under the License. -# If specified use script specified, otherwise use default vivado script -if [ "$1" != "" ]; then - vivado_script="$1" -else - vivado_script="create_dcp_from_cl.tcl" +# Usage help +function usage +{ + echo "usage: aws_build_dcp_from_cl.sh [ [-script ] | [-stratey DEFAULT | EXPLORE | TIMING | CONGESTION] | [-h]]" +} + +# Default arguments for script and strategy +strategy=DEFAULT +vivado_script="create_dcp_from_cl.tcl" + +# Parse command-line arguments +while [ "$1" != "" ]; do + case $1 in + -script ) shift + vivado_script=$1 + ;; + -strategy ) shift + strategy=$1 + ;; + -h | -help ) usage + exit + ;; + * ) usage + exit 1 + esac + shift +done + +# Check that script exists +if ! [ -f "$vivado_script" ]; then + echo "ERROR: $vivado_script doesn't exist." + exit 1 +fi + +# Check that strategy is valid +shopt -s extglob +if [[ $strategy != @(DEFAULT|EXPLORE|TIMING|CONGESTION|OLD|APP1) ]]; then + echo "ERROR: $strategy isn't a valid strategy. Valid strategies are DEFAULT, EXPLORE, TIMING and CONGESTION." + exit 1 fi echo "AWS FPGA: Starting the design checkpoint build process" @@ -60,8 +94,7 @@ echo "AWS FPGA: Environment variables and directories are present. Checking for vivado -version >/dev/null 2>&1 || { echo >&2 "ERROR - Please install/enable Vivado." ; return 1; } # Run vivado -#nohup vivado -mode batch -nojournal -source create_dcp_from_cl.tcl & -nohup vivado -mode batch -nojournal -log $logname -source $vivado_script -tclargs $timestamp > $timestamp.nohup.out 2>&1& +nohup vivado -mode batch -nojournal -log $logname -source $vivado_script -tclargs $timestamp $strategy > $timestamp.nohup.out 2>&1& echo "AWS FPGA: Build through Vivado is running as background process, this may take few hours." echo "AWS FPGA: You can set up an email notification upon Vivado run finish by following the instructions in TBD" diff --git a/hdk/common/shell_current/build/scripts/clean_log.pl b/hdk/common/shell_current/build/scripts/clean_log.pl index 5c9e71325..db741353d 100644 --- a/hdk/common/shell_current/build/scripts/clean_log.pl +++ b/hdk/common/shell_current/build/scripts/clean_log.pl @@ -19,12 +19,15 @@ use warnings; use File::Copy; -copy("vivado.log", "vivado_temp.log") or die "Copy failed: $!"; +# The timestamp is an input to the script +my $timestamp = $ARGV[0]; + +copy("${timestamp}.vivado.log", "vivado_temp.log") or die "Copy failed: $!"; open(FILE1, "vivado_temp.log") or die "Can't open < vivado_temp.log: $!"; -open(my $fh_wr, ">", "vivado.log") - or die "Can't open > vivado.log: $!"; +open(my $fh_wr, ">", "${timestamp}.vivado.log") + or die "Can't open > ${timestamp}.vivado.log: $!"; my $match; my @warning_regexps = ("CRITICAL WARNING.*BRAM instance.*Please verify the instance name in the \.bmm file and the netlist. The BRAM initialization strings will not get populated with data.*", From 08683a9ce71562ba79b3aab68e4e5f73a4d6efce Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Thu, 19 Jan 2017 15:25:11 -0800 Subject: [PATCH 22/29] Build script is soft link to common. Change-Id: If6aa9a04e135518caf78f7152d8e6f8c4041ff95 --- .../build/scripts/aws_build_dcp_from_cl.sh | 69 +------------------ .../build/scripts/aws_build_dcp_from_cl.sh | 1 + 2 files changed, 2 insertions(+), 68 deletions(-) mode change 100755 => 120000 hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh create mode 120000 hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh b/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh deleted file mode 100755 index a17481074..000000000 --- a/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh +++ /dev/null @@ -1,68 +0,0 @@ -#!/bin/bash - -## Amazon FGPA Hardware Development Kit -## -## Copyright 2016 Amazon.com, Inc. or its affiliates. All Rights Reserved. -## -## Licensed under the Amazon Software License (the "License"). You may not use -## this file except in compliance with the License. A copy of the License is -## located at -## -## http://aws.amazon.com/asl/ -## -## or in the "license" file accompanying this file. This file is distributed on -## an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or -## implied. See the License for the specific language governing permissions and -## limitations under the License. - -# If specified use script specified, otherwise use default vivado script -if [ "$1" != "" ]; then - vivado_script="$1" -else - vivado_script="create_dcp_from_cl.tcl" -fi - -echo "AWS FPGA: Starting the design checkpoint build process" -echo "AWS FPGA: Checking for proper environment variables and build directories" - -if ! [ $HDK_SHELL_DIR ] -then - echo "ERROR: HDK_SHELL_DIR environment variable is not set, try running hdk_setup.sh script from the root directory of AWS FPGA repository." - exit 1 -fi - -if ! [ -x $HDK_SHELL_DIR/build/scripts/prepare_build_environment.sh ] -then - echo "prepare_build_env.sh script is not eXecutable, trying to apply chmod +x" - chmod +x $HDK_SHELL_DIR/build/scripts/prepare_build_environment.sh - if ! [ -x $HDK_SHELL_DIR/build/scripts/prepare_build_environment.sh ] - then - echo "ERROR: Failed to change prepare_build_environment.sh to eXecutable, aborting!" - exit 1 - fi -fi - -$HDK_SHELL_DIR/build/scripts/prepare_build_environment.sh - -if ! [[ $? -eq 0 ]] -then - echo "ERROR: Missing environment variable or unable to create the needed build directories, aborting!" - exit 1 -fi - -# Use timestamp for logs and output files -timestamp=$(date +"%y_%m_%d-%H%M%S") -logname=$timestamp.vivado.log - -echo "AWS FPGA: Environment variables and directories are present. Checking for Vivado installation." - -# before going too far make sure Vivado is available -vivado -version >/dev/null 2>&1 || { echo >&2 "ERROR - Please install/enable Vivado." ; return 1; } - -# Run vivado -#nohup vivado -mode batch -nojournal -source create_dcp_from_cl.tcl & -nohup vivado -mode batch -nojournal -log $logname -source $vivado_script -tclargs $timestamp > $timestamp.nohup.out 2>&1& - -echo "AWS FPGA: Build through Vivado is running as background process, this may take few hours." -echo "AWS FPGA: You can set up an email notification upon Vivado run finish by following the instructions in TBD" - diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh b/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh new file mode 120000 index 000000000..825d042d9 --- /dev/null +++ b/hdk/cl/examples/cl_hello_world/build/scripts/aws_build_dcp_from_cl.sh @@ -0,0 +1 @@ +../../../../../common/shell_current/build/scripts/aws_build_dcp_from_cl.sh \ No newline at end of file diff --git a/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh b/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh new file mode 120000 index 000000000..8019754a7 --- /dev/null +++ b/hdk/cl/examples/cl_simple/build/scripts/aws_build_dcp_from_cl.sh @@ -0,0 +1 @@ +../../../../../common/shell_latest/build/scripts/aws_build_dcp_from_cl.sh \ No newline at end of file From 6d0cf44cbf13bacf797c8afc3be09c08343bfbd2 Mon Sep 17 00:00:00 2001 From: AWSNB Date: Fri, 20 Jan 2017 10:12:38 -0800 Subject: [PATCH 23/29] typos and adding -help / -H --- .../shell_current/build/scripts/aws_build_dcp_from_cl.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh b/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh index d42636945..3ec994b51 100755 --- a/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh +++ b/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh @@ -18,7 +18,7 @@ # Usage help function usage { - echo "usage: aws_build_dcp_from_cl.sh [ [-script ] | [-stratey DEFAULT | EXPLORE | TIMING | CONGESTION] | [-h]]" + echo "usage: aws_build_dcp_from_cl.sh [ [-script ] | [-strategy DEFAULT | EXPLORE | TIMING | CONGESTION] | [-h] | [-H] | [-help] | ]" } # Default arguments for script and strategy @@ -34,7 +34,7 @@ while [ "$1" != "" ]; do -strategy ) shift strategy=$1 ;; - -h | -help ) usage + -h | -H | -help ) usage exit ;; * ) usage From cd7e222bea4313c3d8e0c492c6df2656702613a7 Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Tue, 24 Jan 2017 15:40:12 -0800 Subject: [PATCH 24/29] Script strategy changes; cl_hello_world ID changes. Change-Id: I7ee86791e50567eb9c33fa28ab7f61558420547b --- hdk/cl/examples/cl_hello_world/README.md | 10 +- .../scripts/create_dcp_from_cl.basic.tcl | 300 ----------------- .../scripts/create_dcp_from_cl.congestion.tcl | 274 ---------------- .../scripts/create_dcp_from_cl.default.tcl | 274 ---------------- .../scripts/create_dcp_from_cl.explore.tcl | 274 ---------------- .../build/scripts/create_dcp_from_cl.tcl | 308 +++++++++++++++--- .../scripts/create_dcp_from_cl.timing.tcl | 276 ---------------- .../cl_hello_world/design/cl_hello_world.sv | 4 +- .../scripts/create_dcp_from_cl.basic.tcl | 304 ----------------- .../scripts/create_dcp_from_cl.congestion.tcl | 278 ---------------- .../scripts/create_dcp_from_cl.default.tcl | 278 ---------------- .../scripts/create_dcp_from_cl.explore.tcl | 278 ---------------- .../build/scripts/create_dcp_from_cl.tcl | 166 ++++------ .../scripts/create_dcp_from_cl.timing.tcl | 280 ---------------- .../build/scripts/aws_build_dcp_from_cl.sh | 6 +- 15 files changed, 328 insertions(+), 2982 deletions(-) delete mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl delete mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl delete mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl delete mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl delete mode 100644 hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl delete mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl delete mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl delete mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl delete mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl delete mode 100644 hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl diff --git a/hdk/cl/examples/cl_hello_world/README.md b/hdk/cl/examples/cl_hello_world/README.md index 78f28e013..be6debb2f 100644 --- a/hdk/cl/examples/cl_hello_world/README.md +++ b/hdk/cl/examples/cl_hello_world/README.md @@ -12,11 +12,11 @@ Alternatively, you can directly use a pre-generated AFI for this CL which you ca | Key | Value | |-----------|------| | FPGA Image Architecture | xvu9p | -| Shell Version | 0x???????? | -| PCI Device ID | 0x???? | -| PCI Vendor ID | 0x???? | -| PCI Subsystem ID | 0x???? | -| PCI Subsystem Vendor ID | 0x???? | +| Shell Version | 0x11241611 | +| PCI Device ID | 0x1d50 | +| PCI Vendor ID | 0x678A | +| PCI Subsystem ID | 0x1d51 | +| PCI Subsystem Vendor ID | 0xfedd | | Pre-generated AFI ID | afi-????????????????? | | Pre-generated AGFI ID | agfi-????????????????? | diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl deleted file mode 100644 index ab2272223..000000000 --- a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.basic.tcl +++ /dev/null @@ -1,300 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_hello_world.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -directive Explore -check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive WLDrivenBlockPlacement -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive MoreGlobalIterations - -################################# -# CL Final Physical Optimization -################################# -puts "AWS FPGA: Post-route Physical optimization stage "; - -phys_opt_design -directive Explore - -puts "AWS FPGA: Locking design "; - -lock_design -level routing - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -#Verify PR build -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt - -# Write out the CL DCP to integrate with SH_BB -write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -close_design - -# Integreate Developer CL with SH BB -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp -pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl deleted file mode 100644 index e83b4ffe6..000000000 --- a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.congestion.tcl +++ /dev/null @@ -1,274 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_hello_world.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -directive AlternateRoutability -no_lc -shreg_min_size 10 -control_set_opt_threshold 16 - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -bufg_opt -control_set_merge -hier_fanout_limit 512 -muxf_remap -propconst -retarget -sweep - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive AltSpreadLogic_medium -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive Explore - -################################# -# CL Final Physical Optimization -################################# -# N/A - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl deleted file mode 100644 index 0fa9ff65b..000000000 --- a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.default.tcl +++ /dev/null @@ -1,274 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_hello_world.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design - -################################# -# CL Final Physical Optimization -################################# -# N/A - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl deleted file mode 100644 index 76668dfc0..000000000 --- a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.explore.tcl +++ /dev/null @@ -1,274 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_hello_world.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -directive Explore - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive Explore -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive Explore - -################################# -# CL Final Physical Optimization -################################# -# N/A - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.tcl index 9d0918de4..502e7dbe1 100644 --- a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.tcl +++ b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.tcl @@ -9,12 +9,20 @@ package require tar +################################################# +## Versions +################################################# +set shell_version "0x11241611" +set hdk_version "1.0.0" + ################################################# ## Generate CL_routed.dcp (Done by User) ################################################# puts "AWS FPGA Scripts"; puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Shell Version: VenomCL_unc - $shell_version"; +puts "Vivado Script Name: $argv0"; +puts "HDK Version: $hdk_version"; #checking if CL_DIR env variable exists if { [info exists ::env(CL_DIR)] } { @@ -36,10 +44,11 @@ if { [info exists ::env(HDK_SHELL_DIR)] } { exit 2 } -#Convenience to set the root of the RTL directory -set systemtime [clock seconds] -set timestamp [clock format $systemtime -gmt 1 -format {%y_%m_%d-%H%M}] +# Command-line Arguments +set timestamp [lindex $argv 0] +set strategy [lindex $argv 1] +#Convenience to set the root of the RTL directory puts "All reports and intermediate results will be time stamped with $timestamp"; set_msg_config -severity INFO -suppress @@ -76,7 +85,7 @@ puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; #User design files (these are the files that were encrypted by encrypt.tcl) read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_hello_world.sv + $CL_DIR/build/src_post_encryption/cl_hello_world.sv ] #---- End of section replaced by User ---- @@ -131,7 +140,31 @@ set_property verilog_define XSDB_SLV_DIS [current_fileset] ######################## puts "AWS FPGA: Start design synthesis"; -synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt +switch $strategy { + "BASIC" { + puts "BASIC strategy." + synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + } + "EXPLORE" { + puts "EXPLORE strategy." + synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + } + "TIMING" { + puts "TIMING strategy." + synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -no_lc -shreg_min_size 5 -fsm_extraction one_hot -resource_sharing off + } + "CONGESTION" { + puts "CONGESTION strategy." + synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -directive AlternateRoutability -no_lc -shreg_min_size 10 -control_set_opt_threshold 16 + } + "DEFAULT" { + puts "DEFAULT strategy." + synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt + } + default { + puts "$strategy is NOT a valid strategy." + } +} # Prohibit the top two URAM sites of each URAM quad. # These two sites cannot be used within PR designs. @@ -153,17 +186,40 @@ if { $failval==0 } { exit 1 } +######################## +# CL Optimize +######################## puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore -check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt +switch $strategy { + "BASIC" { + puts "BASIC strategy." + opt_design + } + "EXPLORE" { + puts "EXPLORE strategy." + opt_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + opt_design -directive Explore + } + "DEFAULT" { + puts "DEFAULT strategy." + opt_design -directive Explore + } + default { + puts "$strategy is NOT a valid strategy." + } +} write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp close_design -####################### # Implementation -####################### #Read in the Shell checkpoint and do the CL implementation puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; @@ -193,71 +249,209 @@ foreach uramSite $uramSites { puts "AWS FPGA: Optimize design during implementation"; -opt_design -directive Explore -check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt +switch $strategy { + "BASIC" { + puts "BASIC strategy." + opt_design + } + "EXPLORE" { + puts "EXPLORE strategy." + opt_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + opt_design -bufg_opt -control_set_merge -hier_fanout_limit 512 -muxf_remap -propconst -retarget -sweep + } + "DEFAULT" { + puts "DEFAULT strategy." + opt_design -directive Explore + } + default { + puts "$strategy is NOT a valid strategy." + } +} +######################## +# CL Place +######################## puts "AWS FPGA: Place design stage"; -#place_design -verbose -directive Explore -place_design -directive WLDrivenBlockPlacement + +switch $strategy { + "BASIC" { + puts "BASIC strategy." + place_design + } + "EXPLORE" { + puts "EXPLORE strategy." + place_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + place_design -directive ExtraNetDelay_high + } + "CONGESTION" { + puts "CONGESTION strategy." + place_design -directive AltSpreadLogic_medium + } + "DEFAULT" { + puts "DEFAULT strategy." + place_design -directive WLDrivenBlockPlacement + } + default { + puts "$strategy is NOT a valid strategy." + } +} write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp +########################### +# CL Physical Optimization +########################### puts "AWS FPGA: Physical optimization stage"; -phys_opt_design -directive Explore -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp +switch $strategy { + "BASIC" { + puts "BASIC strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + phys_opt_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + phys_opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + phys_opt_design -directive Explore + } + "DEFAULT" { + puts "DEFAULT strategy." + phys_opt_design -directive Explore + } + default { + puts "$strategy is NOT a valid strategy." + } +} +######################## +# CL Route +######################## puts "AWS FPGA: Route design stage"; -#route_design -verbose -directive Explore -route_design -directive MoreGlobalIterations +switch $strategy { + "BASIC" { + puts "BASIC strategy." + route_design + } + "EXPLORE" { + puts "EXPLORE strategy." + route_design -directive Explore + } + "TIMING" { + puts "TIMING strategy." + route_design -directive Explore -tns_cleanup + } + "CONGESTION" { + puts "CONGESTION strategy." + route_design -directive Explore + } + "DEFAULT" { + puts "DEFAULT strategy." + route_design -directive MoreGlobalIterations + } + default { + puts "$strategy is NOT a valid strategy." + } +} +################################# +# CL Final Physical Optimization +################################# puts "AWS FPGA: Post-route Physical optimization stage "; -phys_opt_design -directive Explore - -puts "AWS FPGA: Locking design "; - -lock_design -level routing - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt +switch $strategy { + "BASIC" { + puts "BASIC strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + } + "TIMING" { + puts "TIMING strategy." + phys_opt_design -directive Explore + } + "CONGESTION" { + puts "CONGESTION strategy." + } + "DEFAULT" { + puts "DEFAULT strategy." + phys_opt_design -directive Explore + puts "AWS FPGA: Locking design "; + lock_design -level routing + } + default { + puts "$strategy is NOT a valid strategy." + } +} #This is what will deliver to AWS write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -#Verify PR build -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log +# ################################################ +# Verify PR Build +# ################################################ +switch $strategy { + "BASIC" { + puts "BASIC strategy." + } + "EXPLORE" { + puts "EXPLORE strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + } + "TIMING" { + puts "TIMING strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + } + "CONGESTION" { + puts "CONGESTION strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + } + "DEFAULT" { + puts "DEFAULT strategy." + puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + } + default { + puts "$strategy is NOT a valid strategy." + } +} -### -------------------------------------------- -### Emulate what AWS will do -### -------------------------------------------- +# ################################################ +# Emulate AWS Bitstream Generation +# ################################################ +puts "AWS FPGA: Emulate AWS bitstream generation" # Make temp dir for bitstream -file mkdir $CL_DIR/build/aws_verify_temp_dir +file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir # Verify the Developer DCP is compatible with SH_BB. pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp + open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt - -# Write out the CL DCP to integrate with SH_BB -write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -close_design -# Integreate Developer CL with SH BB -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp -pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file $CL_DIR/build/aws_verify_temp_dir/${timestamp}.SH_CL_final.bit +write_bitstream -force -bin_file $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit # Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/aws_verify_temp_dir +file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir ### -------------------------------------------- @@ -265,10 +459,24 @@ file delete -force $CL_DIR/build/aws_verify_temp_dir puts "Compress files for sending back to AWS" # clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl +exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl ${timestamp} + +# Create manifest file +set manifest_file [open "$CL_DIR/build/checkpoints/to_aws/${timestamp}.manifest.txt" w] +set hash [lindex [split [exec sha256sum $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp] ] 0] + +puts $manifest_file "MANIFEST_FORMAT_VERSION=1\n" +puts $manifest_file "DCP_HASH=$hash\n" +puts $manifest_file "SHELL_VERSION=$shell_version\n" +puts $manifest_file "FILE_NAME=${timestamp}.SH_CL_routed.dcp\n" +puts $manifest_file "HDK_VERSION=$hdk_version\n" +puts $manifest_file "DATE=$timestamp\n" + +close $manifest_file +# Tar checkpoint to aws cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] close_design diff --git a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl b/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl deleted file mode 100644 index f2be4b03c..000000000 --- a/hdk/cl/examples/cl_hello_world/build/scripts/create_dcp_from_cl.timing.tcl +++ /dev/null @@ -1,276 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_hello_world_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_hello_world.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_hello_world -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -no_lc -shreg_min_size 5 -fsm_extraction one_hot -resource_sharing off - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -directive Explore - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive ExtraNetDelay_high -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive Explore -tns_cleanup - -################################# -# CL Final Physical Optimization -################################# -puts "AWS FPGA: Post-route Physical optimization stage "; - -phys_opt_design -directive Explore - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_hello_world/design/cl_hello_world.sv b/hdk/cl/examples/cl_hello_world/design/cl_hello_world.sv index 035231585..255175f25 100644 --- a/hdk/cl/examples/cl_hello_world/design/cl_hello_world.sv +++ b/hdk/cl/examples/cl_hello_world/design/cl_hello_world.sv @@ -557,8 +557,8 @@ sh_ddr #(.DDR_A_PRESENT(0), `endif assign cl_sh_flr_done = 1'b0; - assign cl_sh_id0[31:0] = 32'h0000_0000; - assign cl_sh_id1[31:0] = 32'h0000_0000; + assign cl_sh_id0[31:0] = 32'h1d50_678A; + assign cl_sh_id1[31:0] = 32'h1d51_fedD; assign cl_sh_status0[31:0] = 32'h0000_0000; assign cl_sh_status1[31:0] = `CL_VERSION; diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl deleted file mode 100644 index 23f8be34e..000000000 --- a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.basic.tcl +++ /dev/null @@ -1,304 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_simple_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_simple.sv \ -$CL_DIR/build/src_post_encryption/cl_tst.sv \ -$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ -$CL_DIR/build/src_post_encryption/mem_scrb.sv \ -$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -directive Explore -check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive WLDrivenBlockPlacement -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive MoreGlobalIterations - -################################# -# CL Final Physical Optimization -################################# -puts "AWS FPGA: Post-route Physical optimization stage "; - -phys_opt_design -directive Explore - -puts "AWS FPGA: Locking design "; - -lock_design -level routing - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -#Verify PR build -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt - -# Write out the CL DCP to integrate with SH_BB -write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -close_design - -# Integreate Developer CL with SH BB -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp -report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp -pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl deleted file mode 100644 index 8eea12553..000000000 --- a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.congestion.tcl +++ /dev/null @@ -1,278 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_simple_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_simple.sv \ -$CL_DIR/build/src_post_encryption/cl_tst.sv \ -$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ -$CL_DIR/build/src_post_encryption/mem_scrb.sv \ -$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -directive AlternateRoutability -no_lc -shreg_min_size 10 -control_set_opt_threshold 16 - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -bufg_opt -control_set_merge -hier_fanout_limit 512 -muxf_remap -propconst -retarget -sweep - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive AltSpreadLogic_medium -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive Explore - -################################# -# CL Final Physical Optimization -################################# -# N/A - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl deleted file mode 100644 index badf40c0b..000000000 --- a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.default.tcl +++ /dev/null @@ -1,278 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_simple_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_simple.sv \ -$CL_DIR/build/src_post_encryption/cl_tst.sv \ -$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ -$CL_DIR/build/src_post_encryption/mem_scrb.sv \ -$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design - -################################# -# CL Final Physical Optimization -################################# -# N/A - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl deleted file mode 100644 index 76822788e..000000000 --- a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.explore.tcl +++ /dev/null @@ -1,278 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_simple_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_simple.sv \ -$CL_DIR/build/src_post_encryption/cl_tst.sv \ -$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ -$CL_DIR/build/src_post_encryption/mem_scrb.sv \ -$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -directive Explore - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive Explore -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive Explore - -################################# -# CL Final Physical Optimization -################################# -# N/A - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl index 9d6ce6c48..7db9eb50c 100644 --- a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl +++ b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.tcl @@ -9,13 +9,20 @@ package require tar +################################################# +## Versions +################################################# +set shell_version "0x11241611" +set hdk_version "1.0.0" + ################################################# ## Generate CL_routed.dcp (Done by User) ################################################# puts "AWS FPGA Scripts"; puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; +puts "Shell Version: VenomCL_unc - $shell_version"; puts "Vivado Script Name: $argv0"; +puts "HDK Version: $hdk_version"; #checking if CL_DIR env variable exists if { [info exists ::env(CL_DIR)] } { @@ -138,8 +145,8 @@ set_property verilog_define XSDB_SLV_DIS [current_fileset] puts "AWS FPGA: Start design synthesis"; switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt } "EXPLORE" { @@ -154,12 +161,8 @@ switch $strategy { puts "CONGESTION strategy." synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -directive AlternateRoutability -no_lc -shreg_min_size 10 -control_set_opt_threshold 16 } - "OLD" { - puts "OLD strategy." - synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt - } - "APP1" { - puts "APP1 strategy." + "DEFAULT" { + puts "DEFAULT strategy." synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -keep_equivalent_registers -flatten_hierarchy rebuilt } default { @@ -193,8 +196,9 @@ if { $failval==0 } { puts "AWS FPGA: Optimizing design"; switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." + opt_design } "EXPLORE" { puts "EXPLORE strategy." @@ -208,14 +212,8 @@ switch $strategy { puts "CONGESTION strategy." opt_design -directive Explore } - "OLD" { - puts "OLD strategy." - opt_design -directive Explore - check_timing -file $CL_DIR/build/reports/${timestamp}.cl.synth.check_timing_report.txt - report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.synth.timing_summary.rpt - } - "APP1" { - puts "APP1 strategy." + "DEFAULT" { + puts "DEFAULT strategy." opt_design -directive Explore } default { @@ -256,8 +254,8 @@ foreach uramSite $uramSites { puts "AWS FPGA: Optimize design during implementation"; switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." opt_design } "EXPLORE" { @@ -272,14 +270,9 @@ switch $strategy { puts "CONGESTION strategy." opt_design -bufg_opt -control_set_merge -hier_fanout_limit 512 -muxf_remap -propconst -retarget -sweep } - "OLD" { - puts "OLD strategy." + "DEFAULT" { + puts "DEFAULT strategy." opt_design -directive Explore - check_timing -file $CL_DIR/build/reports/${timestamp}.SH_CL.check_timing_report.txt - } - "APP1" { - puts "APP1 strategy." - opt_design } default { puts "$strategy is NOT a valid strategy." @@ -292,8 +285,8 @@ switch $strategy { puts "AWS FPGA: Place design stage"; switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." place_design } "EXPLORE" { @@ -308,14 +301,10 @@ switch $strategy { puts "CONGESTION strategy." place_design -directive AltSpreadLogic_medium } - "OLD" { - puts "OLD strategy." + "DEFAULT" { + puts "DEFAULT strategy." place_design -directive WLDrivenBlockPlacement } - "APP1" { - puts "APP1 strategy." - place_design - } default { puts "$strategy is NOT a valid strategy." } @@ -328,8 +317,8 @@ write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place. puts "AWS FPGA: Physical optimization stage"; switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." } "EXPLORE" { puts "EXPLORE strategy." @@ -343,14 +332,9 @@ switch $strategy { puts "CONGESTION strategy." phys_opt_design -directive Explore } - "OLD" { - puts "OLD strategy." + "DEFAULT" { + puts "DEFAULT strategy." phys_opt_design -directive Explore - report_timing_summary -file $CL_DIR/build/reports/${timestamp}.cl.post_place_opt.timing_summary.rpt - write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place_opt.dcp - } - "APP1" { - puts "APP1 strategy." } default { puts "$strategy is NOT a valid strategy." @@ -363,8 +347,8 @@ switch $strategy { puts "AWS FPGA: Route design stage"; switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." route_design } "EXPLORE" { @@ -379,14 +363,10 @@ switch $strategy { puts "CONGESTION strategy." route_design -directive Explore } - "OLD" { - puts "OLD strategy." + "DEFAULT" { + puts "DEFAULT strategy." route_design -directive MoreGlobalIterations } - "APP1" { - puts "APP1 strategy." - route_design - } default { puts "$strategy is NOT a valid strategy." } @@ -398,8 +378,8 @@ switch $strategy { puts "AWS FPGA: Post-route Physical optimization stage "; switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." } "EXPLORE" { puts "EXPLORE strategy." @@ -411,15 +391,11 @@ switch $strategy { "CONGESTION" { puts "CONGESTION strategy." } - "OLD" { - puts "OLD strategy." + "DEFAULT" { + puts "DEFAULT strategy." phys_opt_design -directive Explore puts "AWS FPGA: Locking design "; lock_design -level routing - report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL.post_route_opt.timing_summary.rpt - } - "APP1" { - puts "APP1 strategy." } default { puts "$strategy is NOT a valid strategy." @@ -433,31 +409,28 @@ write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_rout # Verify PR Build # ################################################ switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." + "BASIC" { + puts "BASIC strategy." } "EXPLORE" { puts "EXPLORE strategy." puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp } "TIMING" { puts "TIMING strategy." puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp } "CONGESTION" { puts "CONGESTION strategy." puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp } - "OLD" { - puts "OLD strategy." + "DEFAULT" { + puts "DEFAULT strategy." puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -file $CL_DIR/build/checkpoints/to_aws/${timestamp}.pr_verify.log - } - "APP1" { - puts "APP1 strategy." + pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp } default { puts "$strategy is NOT a valid strategy." @@ -478,39 +451,6 @@ pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -switch $strategy { - "DEFAULT" { - puts "DEFAULT strategy." - } - "EXPLORE" { - puts "EXPLORE strategy." - } - "TIMING" { - puts "TIMING strategy." - } - "CONGESTION" { - puts "CONGESTION strategy." - } - "OLD" { - puts "OLD strategy." - report_io -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_report_io.rpt - # Write out the CL DCP to integrate with SH_BB - write_checkpoint -force -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp - close_design - open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL_routed.dcp - report_drc -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_DRC.rpt - write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp - pr_verify -full_check $CL_DIR/build/checkpoints/${timestamp}.SH_CL_final.route.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - } - "APP1" { - puts "APP1 strategy." - } - default { - puts "$strategy is NOT a valid strategy." - } -} - set_param bitstream.enablePR 4123 write_bitstream -force -bin_file $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit @@ -525,8 +465,22 @@ puts "Compress files for sending back to AWS" # clean up vivado.log file exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl ${timestamp} +# Create manifest file +set manifest_file [open "$CL_DIR/build/checkpoints/to_aws/${timestamp}.manifest.txt" w] +set hash [lindex [split [exec sha256sum $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp] ] 0] + +puts $manifest_file "MANIFEST_FORMAT_VERSION=1\n" +puts $manifest_file "DCP_HASH=$hash\n" +puts $manifest_file "SHELL_VERSION=$shell_version\n" +puts $manifest_file "FILE_NAME=${timestamp}.SH_CL_routed.dcp\n" +puts $manifest_file "HDK_VERSION=$hdk_version\n" +puts $manifest_file "DATE=$timestamp\n" + +close $manifest_file + +# Tar checkpoint to aws cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] +tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] close_design diff --git a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl b/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl deleted file mode 100644 index bcc04fc6c..000000000 --- a/hdk/cl/examples/cl_simple/build/scripts/create_dcp_from_cl.timing.tcl +++ /dev/null @@ -1,280 +0,0 @@ -## ============================================================================= -## Copyright 2016 Amazon.com, Inc. or its affiliates. -## All Rights Reserved Worldwide. -## Amazon Confidential information -## Restricted NDA Material -## create_cl.tcl: Build to generate CL design checkpoint based on -## developer code -## ============================================================================= - -package require tar - -################################################# -## Generate CL_routed.dcp (Done by User) -################################################# -puts "AWS FPGA Scripts"; -puts "Creating Design Checkpoint from Custom Logic source code"; -puts "Shell Version: VenomCL_unc - 0x11241611"; -puts "Vivado Script Name: $argv0"; - -#checking if CL_DIR env variable exists -if { [info exists ::env(CL_DIR)] } { - set CL_DIR $::env(CL_DIR) - puts "Using CL directory $CL_DIR"; -} else { - puts "Error: CL_DIR environment variable not defined ! "; - puts "Use export CL_DIR=Your_Design_Root_Directory" - exit 2 -} - -#checking if HDK_SHELL_DIR env variable exists -if { [info exists ::env(HDK_SHELL_DIR)] } { - set HDK_SHELL_DIR $::env(HDK_SHELL_DIR) - puts "Using Shell directory $HDK_SHELL_DIR"; -} else { - puts "Error: HDK_SHELL_DIR environment variable not defined ! "; - puts "Run the hdk_setup.sh script from the root directory of aws-fpga"; - exit 2 -} - -#Convenience to set the root of the RTL directory -#Timestamp passed in from AWS script so that log and output files match -set timestamp [lindex $argv 0] -puts "All reports and intermediate results will be time stamped with $timestamp"; - -##Identify build strategy for script outputs (file or directory names) - TBD -#if {[string equal $argv0 create_dcp_from_cl.basic.tcl]} { -# set strategy "basic" -#} elseif {[string equal $argv0 create_dcp_from_cl.default.tcl]} { -# set strategy "default" -#} elseif {[string equal $argv0 create_dcp_from_cl.explore.tcl]} { -# set strategy "explore" -#} elseif {[string equal $argv0 create_dcp_from_cl.timing.tcl]} { -# set strategy "timing" -#} elseif {[string equal $argv0 create_dcp_from_cl.congestion.tcl]} { -# set strategy "congestion" -#} else { -# set strategy "test" -#} - -set_msg_config -severity INFO -suppress -set_msg_config -severity STATUS -suppress -set_msg_config -severity WARNING -suppress -set_msg_config -id {Chipscope 16-3} -suppress -set_msg_config -string {AXI_QUAD_SPI} -suppress - -puts "AWS FPGA: Calling the encrypt.tcl"; - -source encrypt.tcl - -#This sets the Device Type -source $HDK_SHELL_DIR/build/scripts/device_type.tcl - -create_project -in_memory -part [DEVICE_TYPE] -force - -#set_param chipscope.enablePRFlow true - -############################# -## Read design files -############################# - -#---- User would replace this section ----- - -#Global defines (this is specific to the CL design). This file is encrypted by encrypt.tcl -read_verilog [ list \ - $CL_DIR/build/src_post_encryption/cl_simple_defines.vh -] -set_property file_type {Verilog Header} [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] -set_property is_global_include true [get_files $CL_DIR/build/src_post_encryption/cl_simple_defines.vh ] - -puts "AWS FPGA: Reading developer's Custom Logic files post encryption"; - -#User design files (these are the files that were encrypted by encrypt.tcl) -read_verilog [ list \ -$CL_DIR/build/src_post_encryption/cl_simple.sv \ -$CL_DIR/build/src_post_encryption/cl_tst.sv \ -$CL_DIR/build/src_post_encryption/cl_int_tst.sv \ -$CL_DIR/build/src_post_encryption/mem_scrb.sv \ -$CL_DIR/build/src_post_encryption/cl_tst_scrb.sv -] - -#---- End of section replaced by User ---- -puts "AWS FPGA: Reading AWS Shell design"; - -#Read AWS Design files -read_verilog [ list \ -$HDK_SHELL_DIR/design/lib/flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/flop_fifo_in.sv \ -$HDK_SHELL_DIR/design/lib/bram_2rw.sv \ -$HDK_SHELL_DIR/design/lib/flop_ccf.sv \ -$HDK_SHELL_DIR/design/lib/ccf_ctl.v \ -$HDK_SHELL_DIR/design/lib/sync.v \ -$HDK_SHELL_DIR/design/lib/axi4_ccf.sv \ -$HDK_SHELL_DIR/design/lib/axi4_flop_fifo.sv \ -$HDK_SHELL_DIR/design/lib/lib_pipe.sv \ -$HDK_SHELL_DIR/design/lib/mgt_acc_axl.sv \ -$HDK_SHELL_DIR/design/lib/mgt_gen_axl.sv \ -$HDK_SHELL_DIR/design/interfaces/sh_ddr.sv \ -$HDK_SHELL_DIR/design/interfaces/cl_ports.vh -] - -puts "AWS FPGA: Reading IP blocks"; -#Read DDR IP -read_ip [ list \ -$HDK_SHELL_DIR/design/ip/ddr4_core/ddr4_core.xci -] - -puts "AWS FPGA: Reading AWS constraints"; - - -#Read all the constraints -# -# cl_synth_aws.xdc - AWS provided constraints. ***DO NOT MODIFY*** -# cl_clocks_aws.xdc - AWS provided clock constraint. ***DO NOT MODIFY*** -# cl_ddr.xdc - AWS provided DDR pin constraints. ***DO NOT MODIFY*** -read_xdc [ list \ - $HDK_SHELL_DIR/build/constraints/cl_synth_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_clocks_aws.xdc \ - $HDK_SHELL_DIR/build/constraints/cl_ddr.xdc \ - $CL_DIR/build/constraints/cl_synth_user.xdc -] - -#Do not propagate local clock constraints for clocks generated in the SH -set_property USED_IN {synthesis OUT_OF_CONTEXT} [get_files cl_clocks_aws.xdc] - -update_compile_order -fileset sources_1 -set_property verilog_define XSDB_SLV_DIS [current_fileset] - -######################## -# CL Synthesis -######################## -puts "AWS FPGA: Start design synthesis"; - -synth_design -top cl_simple -verilog_define XSDB_SLV_DIS -part [DEVICE_TYPE] -mode out_of_context -no_lc -shreg_min_size 5 -fsm_extraction one_hot -resource_sharing off - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -set failval [catch {exec grep "FAIL" failfast.csv}] -if { $failval==0 } { - puts "AWS FPGA: FATAL ERROR--Resource utilization error; check failfast.csv for details" - exit 1 -} - -######################## -# CL Optimize -######################## -puts "AWS FPGA: Optimizing design"; -opt_design -directive Explore - -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp -close_design - -# Implementation -#Read in the Shell checkpoint and do the CL implementation -puts "AWS FPGA: Implementation step -Combining Shell and CL design checkpoints"; - -open_checkpoint $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp -read_checkpoint -strict -cell CL $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth_opt.dcp - -#Read the constraints, note *DO NOT* read cl_clocks_aws (clocks originating from AWS shell) -read_xdc [ list \ -$HDK_SHELL_DIR/build/constraints/cl_pnr_aws.xdc \ -$CL_DIR/build/constraints/cl_pnr_user.xdc \ -$HDK_SHELL_DIR/build/constraints/cl_ddr.xdc -] - -# Prohibit the top two URAM sites of each URAM quad. -# These two sites cannot be used within PR designs. -set uramSites [get_sites -filter { SITE_TYPE == "URAM288" } ] -foreach uramSite $uramSites { - # Get the URAM location within a quad - set quadLoc [expr [string range $uramSite [expr [string first Y $uramSite] + 1] end] % 4] - # The top-two sites have usage restrictions - if {$quadLoc == 2 || $quadLoc == 3} { - # Prohibit the appropriate site - set_property PROHIBIT true $uramSite - puts "Setting Placement Prohibit on $uramSite" - } -} - -puts "AWS FPGA: Optimize design during implementation"; - -opt_design -directive Explore - -######################## -# CL Place -######################## -puts "AWS FPGA: Place design stage"; - -place_design -directive ExtraNetDelay_high -write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.SH_CL.post_place.dcp - -########################### -# CL Physical Optimization -########################### -puts "AWS FPGA: Physical optimization stage"; - -phys_opt_design -directive Explore - -######################## -# CL Route -######################## -puts "AWS FPGA: Route design stage"; - -route_design -directive Explore -tns_cleanup - -################################# -# CL Final Physical Optimization -################################# -puts "AWS FPGA: Post-route Physical optimization stage "; - -phys_opt_design -directive Explore - -#This is what will deliver to AWS -write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -# ################################################ -# Emulate what AWS will do (Bitstream Generation) -# ################################################ -puts "AWS FPGA: Verify compatibility of generated checkpoint with SH checkpoint" - -# Make temp dir for bitstream -file mkdir $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -# Verify the Developer DCP is compatible with SH_BB. -pr_verify -full_check $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp $HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp - -open_checkpoint $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp - -report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timing_summary.rpt -set_param bitstream.enablePR 4123 -write_bitstream -force -bin_file -cell CL $CL_DIR/build/${timestamp}_aws_verify_temp_dir/${timestamp}.SH_CL_final.bit - -# Clean-up temp dir for bitstream -file delete -force $CL_DIR/build/${timestamp}_aws_verify_temp_dir - -### -------------------------------------------- - -# Create a zipped tar file, that would be used for createFpgaImage EC2 API -puts "Compress files for sending back to AWS" - -# clean up vivado.log file -exec perl $HDK_SHELL_DIR/build/scripts/clean_log.pl - -cd $CL_DIR/build/checkpoints -tar::create to_aws/${timestamp}.Developer_CL.tar [glob to_aws/${timestamp}*] - -close_design - diff --git a/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh b/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh index 3ec994b51..2c37cb1ca 100755 --- a/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh +++ b/hdk/common/shell_current/build/scripts/aws_build_dcp_from_cl.sh @@ -18,7 +18,7 @@ # Usage help function usage { - echo "usage: aws_build_dcp_from_cl.sh [ [-script ] | [-strategy DEFAULT | EXPLORE | TIMING | CONGESTION] | [-h] | [-H] | [-help] | ]" + echo "usage: aws_build_dcp_from_cl.sh [ [-script ] | [-strategy BASIC | DEFAULT | EXPLORE | TIMING | CONGESTION] | [-h] | [-H] | [-help] | ]" } # Default arguments for script and strategy @@ -51,8 +51,8 @@ fi # Check that strategy is valid shopt -s extglob -if [[ $strategy != @(DEFAULT|EXPLORE|TIMING|CONGESTION|OLD|APP1) ]]; then - echo "ERROR: $strategy isn't a valid strategy. Valid strategies are DEFAULT, EXPLORE, TIMING and CONGESTION." +if [[ $strategy != @(BASIC|DEFAULT|EXPLORE|TIMING|CONGESTION) ]]; then + echo "ERROR: $strategy isn't a valid strategy. Valid strategies are BASIC, DEFAULT, EXPLORE, TIMING and CONGESTION." exit 1 fi From 3ca2ffb6759156e658093f637cb54ba5171ac257 Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Thu, 26 Jan 2017 09:59:42 -0800 Subject: [PATCH 25/29] Updated build README with build strategies. Change-Id: Iae042a8ceb7141f630d6c5fa28c18ee589248f46 --- hdk/cl/examples/cl_simple/build/README.md | 42 ++++++++++++++++++++--- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/hdk/cl/examples/cl_simple/build/README.md b/hdk/cl/examples/cl_simple/build/README.md index b6f2de337..65981ee1d 100644 --- a/hdk/cl/examples/cl_simple/build/README.md +++ b/hdk/cl/examples/cl_simple/build/README.md @@ -42,14 +42,46 @@ Modify the `$CL_DIR/build/scripts/create_dcp_from_cl.tcl` script to include: ### 4) Build -Run the build from the `$CL_DIR/build/scripts` directory as follows: +Run the build script, aws_build_dcp_from_cl.sh, from the `$CL_DIR/build/scripts` directory. - $ ./aws_build_dcp_from_cl.sh - -This performs: +This build script performs: - Synthesis of CL. - Implementation of CL with AWS Shell. - - Generates design checkpoint for AWS ingestion and associated logs. + - Generates design checkpoint (DCP) for AWS ingestion and associated logs. + +In order to help developers close timing goals and successfully build their designs efficiently, the build script provides the means to synthesize with different strategies. The different strategies alter the directives used by the synthesis tool. For example, some directives might specify additional optimizations to close timing, while others may specify less effort to minimize synthesis time for designs that can more easily close timing and area goals. Since every design is different, some strategies may provide better results than anothers. If a developer has trouble successfully building their design with one strategy it is encouraged that they try a different strategy. The strategies are described in more detail below. + +Build script usage: + + $ aws_build_dcp_from_cl.sh [-h | -H | -help] [-script ] [-strategy ] + +Options: + + -script + Use the specified vivado script. The default script is create_dcp_from_cl.tcl. + + -h, -H, -help + Print a usage message. + + -strategy + Use the specified strategy to alter the directives used during synthesis. The DEFAULT strategy will be used if a strategy is not specified. + +Strategy descriptions: + + BASIC + This is the basic flow in Vivado and contains the mandatory steps to be able to build a design. It is designed to provide a good balance betwwen runtime and QOR. + + EXPLORE + This is a high-effort flow which is designed to give improved QOR results at the expense of runtime. + + TIMING + This flow is designed for more aggressive timing optimization at the expense of runtime and congestion. + + CONGESTION + This flow is designed to insert more aggressive whitespace to alleviate routing congestion. + + DEFAULT + This is an additional high-effort flow that results in improved QOR results for the example design at the expense of runtime. To aid developers in build verification, there is a final step in the build script that emulates the process that AWS uses to generate bitstreams from a developer DCP. From 802b08c283da4615c22f161c24069a8a87520b42 Mon Sep 17 00:00:00 2001 From: AWScccabra Date: Thu, 26 Jan 2017 12:11:19 -0600 Subject: [PATCH 26/29] Update README.md --- hdk/cl/examples/cl_simple/build/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hdk/cl/examples/cl_simple/build/README.md b/hdk/cl/examples/cl_simple/build/README.md index 65981ee1d..ce65da804 100644 --- a/hdk/cl/examples/cl_simple/build/README.md +++ b/hdk/cl/examples/cl_simple/build/README.md @@ -57,13 +57,13 @@ Build script usage: Options: - -script + -script / Use the specified vivado script. The default script is create_dcp_from_cl.tcl. -h, -H, -help Print a usage message. - -strategy + -strategy / Use the specified strategy to alter the directives used during synthesis. The DEFAULT strategy will be used if a strategy is not specified. Strategy descriptions: From e70c2b38d46c55fb43bbbbb53e8786896addc446 Mon Sep 17 00:00:00 2001 From: AWScccabra Date: Thu, 26 Jan 2017 12:12:39 -0600 Subject: [PATCH 27/29] Update README.md --- hdk/cl/examples/cl_simple/build/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hdk/cl/examples/cl_simple/build/README.md b/hdk/cl/examples/cl_simple/build/README.md index ce65da804..5cacfa8c0 100644 --- a/hdk/cl/examples/cl_simple/build/README.md +++ b/hdk/cl/examples/cl_simple/build/README.md @@ -57,13 +57,13 @@ Build script usage: Options: - -script / - Use the specified vivado script. The default script is create_dcp_from_cl.tcl. + -script (vivado_script) + Use the specified vivado script. The default script create_dcp_from_cl.tcl will be used if a script is not specified. -h, -H, -help Print a usage message. - -strategy / + -strategy (BASIC | EXPLORE | TIMING | CONGESTION | DEFAULT) Use the specified strategy to alter the directives used during synthesis. The DEFAULT strategy will be used if a strategy is not specified. Strategy descriptions: From dab1e131813e0b1b0a1a089e8a705d1005a14cce Mon Sep 17 00:00:00 2001 From: Carlos Cabral Date: Fri, 27 Jan 2017 09:35:49 -0800 Subject: [PATCH 28/29] Merged build strategies README edits and created links for the examples to the new_cl_template build README. Change-Id: I0b1912cdd96f0402cb4d695d35159486b6adf3dc --- .../examples/cl_hello_world/build/README.md | 179 +-------------- hdk/cl/examples/cl_simple/build/README.md | 211 +----------------- .../new_cl_template/build/README.md | 45 +++- 3 files changed, 40 insertions(+), 395 deletions(-) mode change 100644 => 120000 hdk/cl/examples/cl_hello_world/build/README.md mode change 100644 => 120000 hdk/cl/examples/cl_simple/build/README.md diff --git a/hdk/cl/examples/cl_hello_world/build/README.md b/hdk/cl/examples/cl_hello_world/build/README.md deleted file mode 100644 index 74a5ec024..000000000 --- a/hdk/cl/examples/cl_hello_world/build/README.md +++ /dev/null @@ -1,178 +0,0 @@ -# How to build and submit your Custom Logic (CL) to AWS - - -## Overview - -Once the developer has a functional design, the next steps are to: synthesize the design into basic FPGA cells, perform place-and-route, and check that the design meets the timing/frequency constraints. This could be an iterative process. Upon success, the developer will need to pass the output of the flow to AWS for final AFI creation. - -The developer needs to transfer to AWS the encrypted placed-and-routed design checkpoints (referred to as DCP throughout this document). The DCP includes the complete developer design that meets timing/frequency constraints, placement boundraries within the allocated CL area on the FPGA, and the functional requirements laid out in the Shell Interface Specification file. - -To assist in this process, AWS provides a reference DCP that includes the shell (SH) logic with a black-boxed CL under: `$HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp` - -AWS also provides out-of-the-box scripts that compile a few examples like `CL_simple` design as if they were developer code. These reference examples can serve as starting points for new designs. The AWS-provided scripts create an encrypted placed-and-routed DCP that AWS will use to generate final bitstreams. - -Advanced developers can use different scripts, tools, and techniques (e.g., regioning), with the condition that they submit "encrypted placed-and-routed design checkpoints", that pass final checks that are included in the build scripts. (TBD - final_check_dcp). - -The following section covers the step-by-step procedure. Some of these steps can be modified or adjusted based on developer experience and design needs. - -## Build Procedure - -Overview: A developer can execute `$HDK_SHELL_DIR/build/scripts/aws_build_dcp_from_cl.sh` to check the environment, setup the build directory and invoke Xilinx Vivado to create the encrypted placed-and-routed DCP (which include AWS Shell + Developer CL) that AWS will ingest through the CreateFpgaImage EC2 API. - -Executing this script also entails encryption of developer-specified RTL files. Further details on invoking the script from Vivado are provided below. - -Steps: - -### 1) Pre-requisite: Environment Variables and Tools - - 1. The environment variable `HDK_SHELL_DIR` should have been set. This is usually done by executing `source hdk_setup.sh` from the HDK root directory - 2. The environment variable `CL_DIR` should have been set pointing to the root directory where the CL exists. The CL root directory should have the `/build` and `/design` subdirectories. One way to make sure to have the right directory is to execute `source $(HDK_DIR)/cl/developer_designs/prepare_new_cl.sh` - 3. Developer have Xilinx Vivado tools installed, with the supported version by the HDK, and with proper license. If the developer is using AWS supplied [FPGA Development AMI](https//aws.amazon.com/marketplace/AmazonFPGAAmi) from AWS marketplace, it includes the README.md how to setup up the tools and license. - -### 2) Encrypt Source Files - -As a pre-cursor to the encryption and build process, modify the `$CL_DIR/build/scripts/encrypt.tcl` script to include all the CL source files, so the script can encrypt and copy them to the `$CL_DIR/build/src_post_encryption` directory. - -### 3) Prepare for the CL Build - -Modify the `$CL_DIR/build/scripts/create_dcp_from_cl.tcl` script to include: - 1. The list of CL encrypted files in `$CL_DIR/build/src_post_encryption`. - 2. The list of CL specific timing and placement constraints in `$CL_DIR/build/constraints`. - 3. The specific constraints and design file for IP included in your CL (e.g., DDR4). - -### 4) Build - -Run the build from the `$CL_DIR/build/scripts` directory as follows: - - $ ./aws_build_dcp_from_cl.sh - -This performs: - - Synthesis of CL. - - Implementation of CL with AWS Shell. - - Generates design checkpoint for AWS ingestion and associated logs. - -To aid developers in build verification, there is a final step in the build script that emulates -the process that AWS uses to generate bitstreams from a developer DCP. - -The outputs are: - - `$CL_DIR/build/checkpoints/*`: Various checkpoints generated during the build process. - - `$CL_DIR/build/to_aws/SH_CL_routed.dcp`: Encrypted placed-and-routed design checkpoint for AWS ingestion. - - `$CL_DIR/build/reports/*`: Various build reports (generally, check_timing/report_timing). - - `$CL_DIR/build/src_post_encryption/*`: Encrypted developer source. - - `$CL_DIR/build/constraints/*`: Implementation constraints. - -A developer may need to iterate multiple times through this process until arriving upon an error-free run. - -### 5) Submit your DCP to AWS to register the AFI - -To submit the DCP, create an S3 bucket for submitting the design and upload the tarball file into that bucket. -You need to prepare the following information: - -1. Name of the logic design *(Optional)*. -2. Generic description of the logic design *(Optional)*. -3. PCI IDs: Device, Vendor, Subsystem, SubsystemVendor. -4. Location of the tarball file object in S3. -5. Location of an S3 directory where AWS would write back logs of the AFI creation. -6. Version of the AWS Shell. - -**NOTE**: *The PCI IDs for the example CLs should be found in the README files in the respective CL example directory. -If you are building a custom CL, then you need to incorporate these values in your design as shown in the [AWS Shell Interface Specifications](https://github.com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md#pcie-ids).* - -To upload your tarball file to S3, you can use any of [the tools supported by S3](http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html)). -For example, you can use the AWS CLI as follows: - - $ aws s3 mb s3:// # Create an S3 bucket (choose a unique bucket name) - $ aws s3 cp *.Developer_CL.tar \ # Upload the file to S3 - s3:/// - -Now you need to provide AWS (Account ID: 365015490807) the appropriate [read/write permissions](http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html) to your S3 buckets. -Below is a sample policy. - -**NOTE**: *The AWS Account ID has changed, please ensure you are using the correct Account ID listed here.* - - { - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "Bucket level permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::365015490807:root" - }, - "Action": [ - "s3:ListBucket" - ], - "Resource": "arn:aws:s3:::" - }, - { - "Sid": "Object read permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::365015490807:root" - }, - "Action": [ - "s3:GetObject" - ], - "Resource": "arn:aws:s3:::/" - }, - { - "Sid": "Folder write permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::365015490807:root" - }, - "Action": [ - "s3:PutObject" - ], - "Resource": "arn:aws:s3:::/*" - } - ] - } - -To create an AFI execute the `create-fpga-image` command as follows: - - $ aws ec2 create-fpga-image \ - --shell-version \ - --fpga-pci-id DeviceId=,VendorId=,SubsystemId=,SubsystemVendorId= \ - --input-storage-location Bucket=,Key= \ - --name \ - --description \ - --logs-storage-location Bucket=,Key=logs/ - -The output of this command includes two identifiers that refer to your AFI: -- **FPGA Image Identifier** or **AFI ID**: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. - This ID is regional, i.e., if an AFI is copied across multiple regions, it will have a different unique AFI ID in each region. - An example AFI ID is **`afi-01234567890abcdef`**. -- **Glogal FPGA Image Identifier** or **AGFI ID**: this is a global ID that is used to refer to an AFI from within an F1 instance. - For example, to load or clear an AFI from an FPGA slot, you use the AGFI ID. - Since the AGFI IDs is global (by design), it allows you to copy a combination of AFI/AMI to multiple regions, and they will work without requiring any extra setup. - An example AGFI ID is **`agfi-01234567890abcdef`**. - -After the AFI generation is complete, AWS will put the logs into the bucket location provided by the developer and notify them -by email. - -**NOTE**: *Attempting to associate the AFI to an AMI before the AFI is ready will result in an `InvalidFpgaImageID.Unavailable` error. -Please wait until you receive a confirmation email from AWS indicating the creation process is complete.* - -## About Encryption - Developer RTL is encrypted using IEEE 1735 V2 encryption. This level of encryption protects both the raw source files and the implemented design. - - -## Advanced Notes - - The included implementation flow is a baseline flow. It is possible to add advanced commands/constraints (e.g, regioning) to the flow. - - Developers are free to modify the flow, but the final output must be a combined (AWS Shell + CL), encrypted, placed-and-routed design checkpoint. - -# Frequently Asked Questions - - -1. What are the different files that a developer needs to provide to AWS? - -2. How do I ensure that the DCP I create will generate a good bistream at AWS? - -3. What should I do my design is not meeting timing? - -4. My design was meeting timing, but even without changes, subsequent builds are not meeting timing? - -5. "pr_verify" is complaining that the design checkpoints are incompatible. What should I do? - -6. What version of Vivado do I need to use? diff --git a/hdk/cl/examples/cl_hello_world/build/README.md b/hdk/cl/examples/cl_hello_world/build/README.md new file mode 120000 index 000000000..44212905b --- /dev/null +++ b/hdk/cl/examples/cl_hello_world/build/README.md @@ -0,0 +1 @@ +../../../../common/shell_current/new_cl_template/build/README.md \ No newline at end of file diff --git a/hdk/cl/examples/cl_simple/build/README.md b/hdk/cl/examples/cl_simple/build/README.md deleted file mode 100644 index c9f38812c..000000000 --- a/hdk/cl/examples/cl_simple/build/README.md +++ /dev/null @@ -1,210 +0,0 @@ -# How to build and submit your Custom Logic (CL) to AWS - - -## Overview - -Once the developer has a functional design, the next steps are to: synthesize the design into basic FPGA cells, perform place-and-route, and check that the design meets the timing/frequency constraints. This could be an iterative process. Upon success, the developer will need to pass the output of the flow to AWS for final AFI creation. - -The developer needs to transfer to AWS the encrypted placed-and-routed design checkpoints (referred to as DCP throughout this document). The DCP includes the complete developer design that meets timing/frequency constraints, placement boundraries within the allocated CL area on the FPGA, and the functional requirements laid out in the Shell Interface Specification file. - -To assist in this process, AWS provides a reference DCP that includes the shell (SH) logic with a black-boxed CL under: `$HDK_SHELL_DIR/build/checkpoints/from_aws/SH_CL_BB_routed.dcp` - -AWS also provides out-of-the-box scripts that compile a few examples like `CL_simple` design as if they were developer code. These reference examples can serve as starting points for new designs. The AWS-provided scripts create an encrypted placed-and-routed DCP that AWS will use to generate final bitstreams. - -Advanced developers can use different scripts, tools, and techniques (e.g., regioning), with the condition that they submit "encrypted placed-and-routed design checkpoints", that pass final checks that are included in the build scripts. (TBD - final_check_dcp). - -The following section covers the step-by-step procedure. Some of these steps can be modified or adjusted based on developer experience and design needs. - -## Build Procedure - -Overview: A developer can execute `$HDK_SHELL_DIR/build/scripts/aws_build_dcp_from_cl.sh` to check the environment, setup the build directory and invoke Xilinx Vivado to create the encrypted placed-and-routed DCP (which include AWS Shell + Developer CL) that AWS will ingest through the CreateFpgaImage EC2 API. - -Executing this script also entails encryption of developer-specified RTL files. Further details on invoking the script from Vivado are provided below. - -Steps: - -### 1) Pre-requisite: Environment Variables and Tools - - 1. The environment variable `HDK_SHELL_DIR` should have been set. This is usually done by executing `source hdk_setup.sh` from the HDK root directory - 2. The environment variable `CL_DIR` should have been set pointing to the root directory where the CL exists. The CL root directory should have the `/build` and `/design` subdirectories. One way to make sure to have the right directory is to execute `source $(HDK_DIR)/cl/developer_designs/prepare_new_cl.sh` - 3. Developer have Xilinx Vivado tools installed, with the supported version by the HDK, and with proper license. If the developer is using AWS supplied [FPGA Development AMI](https//aws.amazon.com/marketplace/AmazonFPGAAmi) from AWS marketplace, it includes the README.md how to setup up the tools and license. - -### 2) Encrypt Source Files - -As a pre-cursor to the encryption and build process, modify the `$CL_DIR/build/scripts/encrypt.tcl` script to include all the CL source files, so the script can encrypt and copy them to the `$CL_DIR/build/src_post_encryption` directory. - -### 3) Prepare for the CL Build - -Modify the `$CL_DIR/build/scripts/create_dcp_from_cl.tcl` script to include: - 1. The list of CL encrypted files in `$CL_DIR/build/src_post_encryption`. - 2. The list of CL specific timing and placement constraints in `$CL_DIR/build/constraints`. - 3. The specific constraints and design file for IP included in your CL (e.g., DDR4). - -### 4) Build - -Run the build script, aws_build_dcp_from_cl.sh, from the `$CL_DIR/build/scripts` directory. - -This build script performs: - - Synthesis of CL. - - Implementation of CL with AWS Shell. - - Generates design checkpoint (DCP) for AWS ingestion and associated logs. - -In order to help developers close timing goals and successfully build their designs efficiently, the build script provides the means to synthesize with different strategies. The different strategies alter the directives used by the synthesis tool. For example, some directives might specify additional optimizations to close timing, while others may specify less effort to minimize synthesis time for designs that can more easily close timing and area goals. Since every design is different, some strategies may provide better results than anothers. If a developer has trouble successfully building their design with one strategy it is encouraged that they try a different strategy. The strategies are described in more detail below. - -Build script usage: - - $ aws_build_dcp_from_cl.sh [-h | -H | -help] [-script ] [-strategy ] - -Options: - - -script (vivado_script) - Use the specified vivado script. The default script create_dcp_from_cl.tcl will be used if a script is not specified. - - -h, -H, -help - Print a usage message. - - -strategy (BASIC | EXPLORE | TIMING | CONGESTION | DEFAULT) - Use the specified strategy to alter the directives used during synthesis. The DEFAULT strategy will be used if a strategy is not specified. - -Strategy descriptions: - - BASIC - This is the basic flow in Vivado and contains the mandatory steps to be able to build a design. It is designed to provide a good balance betwwen runtime and QOR. - - EXPLORE - This is a high-effort flow which is designed to give improved QOR results at the expense of runtime. - - TIMING - This flow is designed for more aggressive timing optimization at the expense of runtime and congestion. - - CONGESTION - This flow is designed to insert more aggressive whitespace to alleviate routing congestion. - - DEFAULT - This is an additional high-effort flow that results in improved QOR results for the example design at the expense of runtime. - -To aid developers in build verification, there is a final step in the build script that emulates -the process that AWS uses to generate bitstreams from a developer DCP. - -The outputs are: - - `$CL_DIR/build/checkpoints/*`: Various checkpoints generated during the build process. - - `$CL_DIR/build/to_aws/SH_CL_routed.dcp`: Encrypted placed-and-routed design checkpoint for AWS ingestion. - - `$CL_DIR/build/reports/*`: Various build reports (generally, check_timing/report_timing). - - `$CL_DIR/build/src_post_encryption/*`: Encrypted developer source. - - `$CL_DIR/build/constraints/*`: Implementation constraints. - -A developer may need to iterate multiple times through this process until arriving upon an error-free run. - -### 5) Submit your DCP to AWS to register the AFI - -To submit the DCP, create an S3 bucket for submitting the design and upload the tarball file into that bucket. -You need to prepare the following information: - -1. Name of the logic design *(Optional)*. -2. Generic description of the logic design *(Optional)*. -3. PCI IDs: Device, Vendor, Subsystem, SubsystemVendor. -4. Location of the tarball file object in S3. -5. Location of an S3 directory where AWS would write back logs of the AFI creation. -6. Version of the AWS Shell. - -**NOTE**: *The PCI IDs for the example CLs should be found in the README files in the respective CL example directory. -If you are building a custom CL, then you need to incorporate these values in your design as shown in the [AWS Shell Interface Specifications](https://github.com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md#pcie-ids).* - -To upload your tarball file to S3, you can use any of [the tools supported by S3](http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html)). -For example, you can use the AWS CLI as follows: - - $ aws s3 mb s3:// # Create an S3 bucket (choose a unique bucket name) - $ aws s3 cp *.Developer_CL.tar \ # Upload the file to S3 - s3:/// - -Now you need to provide AWS (Account ID: 365015490807) the appropriate [read/write permissions](http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html) to your S3 buckets. -Below is a sample policy. - -**NOTE**: *The AWS Account ID has changed, please ensure you are using the correct Account ID listed here.* - - { - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "Bucket level permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::365015490807:root" - }, - "Action": [ - "s3:ListBucket" - ], - "Resource": "arn:aws:s3:::" - }, - { - "Sid": "Object read permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::365015490807:root" - }, - "Action": [ - "s3:GetObject" - ], - "Resource": "arn:aws:s3:::/" - }, - { - "Sid": "Folder write permissions", - "Effect": "Allow", - "Principal": { - "AWS": "arn:aws:iam::365015490807:root" - }, - "Action": [ - "s3:PutObject" - ], - "Resource": "arn:aws:s3:::/*" - } - ] - } - -To create an AFI execute the `create-fpga-image` command as follows: - - $ aws ec2 create-fpga-image \ - --shell-version \ - --fpga-pci-id DeviceId=,VendorId=,SubsystemId=,SubsystemVendorId= \ - --input-storage-location Bucket=,Key= \ - --name \ - --description \ - --logs-storage-location Bucket=,Key=logs/ - -The output of this command includes two identifiers that refer to your AFI: -- **FPGA Image Identifier** or **AFI ID**: this is the main ID used to manage your AFI through the AWS EC2 CLI commands and AWS SDK APIs. - This ID is regional, i.e., if an AFI is copied across multiple regions, it will have a different unique AFI ID in each region. - An example AFI ID is **`afi-01234567890abcdef`**. -- **Glogal FPGA Image Identifier** or **AGFI ID**: this is a global ID that is used to refer to an AFI from within an F1 instance. - For example, to load or clear an AFI from an FPGA slot, you use the AGFI ID. - Since the AGFI IDs is global (by design), it allows you to copy a combination of AFI/AMI to multiple regions, and they will work without requiring any extra setup. - An example AGFI ID is **`agfi-01234567890abcdef`**. - -After the AFI generation is complete, AWS will put the logs into the bucket location provided by the developer and notify them -by email. - -**NOTE**: *Attempting to associate the AFI to an AMI before the AFI is ready will result in an `InvalidFpgaImageID.Unavailable` error. -Please wait until you receive a confirmation email from AWS indicating the creation process is complete.* - -## About Encryption - Developer RTL is encrypted using IEEE 1735 V2 encryption. This level of encryption protects both the raw source files and the implemented design. - - -## Advanced Notes - - The included implementation flow is a baseline flow. It is possible to add advanced commands/constraints (e.g, regioning) to the flow. - - Developers are free to modify the flow, but the final output must be a combined (AWS Shell + CL), encrypted, placed-and-routed design checkpoint. - -# Frequently Asked Questions - - -1. What are the different files that a developer needs to provide to AWS? - -2. How do I ensure that the DCP I create will generate a good bistream at AWS? - -3. What should I do my design is not meeting timing? - -4. My design was meeting timing, but even without changes, subsequent builds are not meeting timing? - -5. "pr_verify" is complaining that the design checkpoints are incompatible. What should I do? - -6. What version of Vivado do I need to use? diff --git a/hdk/cl/examples/cl_simple/build/README.md b/hdk/cl/examples/cl_simple/build/README.md new file mode 120000 index 000000000..44212905b --- /dev/null +++ b/hdk/cl/examples/cl_simple/build/README.md @@ -0,0 +1 @@ +../../../../common/shell_current/new_cl_template/build/README.md \ No newline at end of file diff --git a/hdk/common/shell_current/new_cl_template/build/README.md b/hdk/common/shell_current/new_cl_template/build/README.md index ca5314b82..7e021ad2d 100644 --- a/hdk/common/shell_current/new_cl_template/build/README.md +++ b/hdk/common/shell_current/new_cl_template/build/README.md @@ -50,20 +50,51 @@ Modify the `$CL_DIR/build/scripts/create_dcp_from_cl.tcl` script to include: ### 4) Build -Run the build from the `$CL_DIR/build/scripts` directory as follows: +Run the build script, aws_build_dcp_from_cl.sh, from the `$CL_DIR/build/scripts` directory. - $ ./aws_build_dcp_from_cl.sh - -This performs: +The build script performs: - Synthesis of CL. - Implementation of CL with AWS Shell. - Generation of Design Checkpoint (DCP) for AWS ingestion with the associated logs. - Generation of the corresponding manifest.txt. -To aid developers in build verification, there is a final step in the build script that emulates -the process that AWS uses to generate bitstreams from a developer DCP. +In order to help developers close timing goals and successfully build their designs efficiently, the build script provides the means to synthesize with different strategies. The different strategies alter the directives used by the synthesis tool. For example, some directives might specify additional optimizations to close timing, while others may specify less effort to minimize synthesis time for designs that can more easily close timing and area goals. Since every design is different, some strategies may provide better results than anothers. If a developer has trouble successfully building their design with one strategy it is encouraged that they try a different strategy. The strategies are described in more detail below. -The outputs are: +Build script usage: + + $ ./aws_build_dcp_from_cl.sh [-h | -H | -help] [-script ] [-strategy ] + +Options: + + -script (vivado_script) + Use the specified vivado script. The default script create_dcp_from_cl.tcl will be used if a script is not specified. + + -h, -H, -help + Print a usage message. + + -strategy (BASIC | EXPLORE | TIMING | CONGESTION | DEFAULT) + Use the specified strategy to alter the directives used during synthesis. The DEFAULT strategy will be used if a strategy is not specified. + +Strategy descriptions: + + BASIC + This is the basic flow in Vivado and contains the mandatory steps to be able to build a design. It is designed to provide a good balance betwwen runtime and Quality of Results (QOR). + + EXPLORE + This is a high-effort flow which is designed to give improved QOR results at the expense of runtime. + + TIMING + This flow is designed for more aggressive timing optimization at the expense of runtime and congestion. + + CONGESTION + This flow is designed to insert more aggressive whitespace to alleviate routing congestion. + + DEFAULT + This is an additional high-effort flow that results in improved QOR results for the example design at the expense of runtime. + +In addition, in order to aid developers with build verification, there is a final step in the build script that emulates the process that AWS uses to generate bitstreams from a developer DCP. + +The outputs of the build script are: - `$CL_DIR/build/checkpoints/*`: Various checkpoints generated during the build process. - `$CL_DIR/build/to_aws/SH_CL_routed.dcp`: Encrypted placed-and-routed design checkpoint for AWS ingestion. - `$CL_DIR/build/reports/*`: Various build reports (generally, check_timing/report_timing). From b850f1f4b41371bce6b6fcb5a4da5585483cee80 Mon Sep 17 00:00:00 2001 From: AWScccabra Date: Fri, 27 Jan 2017 11:53:41 -0600 Subject: [PATCH 29/29] Update README.md Fixed formatting for the script and strategies descriptions. --- .../new_cl_template/build/README.md | 32 +++++++++---------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/hdk/common/shell_current/new_cl_template/build/README.md b/hdk/common/shell_current/new_cl_template/build/README.md index 7e021ad2d..32cd9049e 100644 --- a/hdk/common/shell_current/new_cl_template/build/README.md +++ b/hdk/common/shell_current/new_cl_template/build/README.md @@ -66,31 +66,31 @@ Build script usage: Options: - -script (vivado_script) - Use the specified vivado script. The default script create_dcp_from_cl.tcl will be used if a script is not specified. +* -script \ + * Use the specified vivado script. The default script create_dcp_from_cl.tcl will be used if a script is not specified. - -h, -H, -help - Print a usage message. +* -h, -H, -help + * Print a usage message. - -strategy (BASIC | EXPLORE | TIMING | CONGESTION | DEFAULT) - Use the specified strategy to alter the directives used during synthesis. The DEFAULT strategy will be used if a strategy is not specified. +* -strategy \ + * Use the specified strategy to alter the directives used during synthesis. The DEFAULT strategy will be used if a strategy is not specified. Strategy descriptions: - BASIC - This is the basic flow in Vivado and contains the mandatory steps to be able to build a design. It is designed to provide a good balance betwwen runtime and Quality of Results (QOR). +* BASIC + * This is the basic flow in Vivado and contains the mandatory steps to be able to build a design. It is designed to provide a good balance betwwen runtime and Quality of Results (QOR). - EXPLORE - This is a high-effort flow which is designed to give improved QOR results at the expense of runtime. +* EXPLORE + * This is a high-effort flow which is designed to give improved QOR results at the expense of runtime. - TIMING - This flow is designed for more aggressive timing optimization at the expense of runtime and congestion. +* TIMING + * This flow is designed for more aggressive timing optimization at the expense of runtime and congestion. - CONGESTION - This flow is designed to insert more aggressive whitespace to alleviate routing congestion. +* CONGESTION + * This flow is designed to insert more aggressive whitespace to alleviate routing congestion. - DEFAULT - This is an additional high-effort flow that results in improved QOR results for the example design at the expense of runtime. +* DEFAULT + * This is an additional high-effort flow that results in improved QOR results for the example design at the expense of runtime. In addition, in order to aid developers with build verification, there is a final step in the build script that emulates the process that AWS uses to generate bitstreams from a developer DCP.