From 57e92be2be3d79f2066d62d78fee0e6e4f169e45 Mon Sep 17 00:00:00 2001 From: Andrew Shao Date: Tue, 15 Oct 2024 16:59:25 -0700 Subject: [PATCH 1/6] Update instructions for organization and for offline install --- .../{platform => install_notes}/generic.rst | 6 ++-- .../nonroot-linux.rst | 2 +- .../install_notes/offline.rst | 35 +++++++++++++++++++ doc/installation_instructions/platform.rst | 23 ++++++++---- .../platform/cray.rst | 2 +- .../platform/frontier.rst | 12 +++---- .../platform/ncar-cheyenne.rst | 33 ----------------- .../platform/olcf-summit.rst | 2 +- .../platform/perlmutter.rst | 6 ++-- .../platform/pml-scylla.rst | 6 ++-- smartsim/_core/_cli/build.py | 8 ++++- smartsim/_core/_cli/info.py | 7 ++++ 12 files changed, 83 insertions(+), 59 deletions(-) rename doc/installation_instructions/{platform => install_notes}/generic.rst (96%) rename doc/installation_instructions/{platform => install_notes}/nonroot-linux.rst (96%) create mode 100644 doc/installation_instructions/install_notes/offline.rst delete mode 100644 doc/installation_instructions/platform/ncar-cheyenne.rst diff --git a/doc/installation_instructions/platform/generic.rst b/doc/installation_instructions/install_notes/generic.rst similarity index 96% rename from doc/installation_instructions/platform/generic.rst rename to doc/installation_instructions/install_notes/generic.rst index 6ead091028..790f84d13d 100644 --- a/doc/installation_instructions/platform/generic.rst +++ b/doc/installation_instructions/install_notes/generic.rst @@ -1,5 +1,5 @@ Customizing environment variables -================================= +--------------------------------- Various environment variables can be used to control the compilers and dependencies for SmartSim. These are particularly important to set before the @@ -21,7 +21,7 @@ Toolkit libraries findable by the link loader (e.g. available in the ``LD_LIBRARY_PATH`` environment variable). Compiler environment --------------------- +^^^^^^^^^^^^^^^^^^^^ Unlike SmartRedis, we *strongly* encourage users to only use the GNU compiler chain to build the SmartSim dependencies. Notably, RedisAI has some coding @@ -33,7 +33,7 @@ the following environment variables will control the C and C++ compilers: - ``CXX``: Path the C++ compiler CUDA-related ------------- +^^^^^^^^^^^^ The following environment variables help the ``smart build`` step find and link in the CUDA Toolkit and cuDNN libraries needed to build the ML backends. diff --git a/doc/installation_instructions/platform/nonroot-linux.rst b/doc/installation_instructions/install_notes/nonroot-linux.rst similarity index 96% rename from doc/installation_instructions/platform/nonroot-linux.rst rename to doc/installation_instructions/install_notes/nonroot-linux.rst index 3070a871ae..b49c6f2b9c 100644 --- a/doc/installation_instructions/platform/nonroot-linux.rst +++ b/doc/installation_instructions/install_notes/nonroot-linux.rst @@ -1,5 +1,5 @@ GPU dependencies (non-root) -=========================== +--------------------------- The Nvidia installation instructions for CUDA Toolkit and cuDNN tend to be tailored for users with root access. For those on HPC platforms where root diff --git a/doc/installation_instructions/install_notes/offline.rst b/doc/installation_instructions/install_notes/offline.rst new file mode 100644 index 0000000000..6711be64e6 --- /dev/null +++ b/doc/installation_instructions/install_notes/offline.rst @@ -0,0 +1,35 @@ +Non-internet machines +--------------------- + +SmartSim implictly assumes that dependencies can be retrieved via the Internet. +The ``smart build`` step can be bypassed by transferring the build artifacts +from a different machine. + +.. warning:: + + The Redis Source Available License (which licenses RedisAI) prohibits + distributing binaries to third-parties. Thus, compiled binaries should not + be shared outside of your organization (see `RSAL v2 + `_). + + +The easiest way to accomplish this assumes that you have a machine that can be +connected to the internet and has built SmartSim (referred to as Machine A). +This machine should have a similar compilation and build environment as the +target machine (referred to as Machine B) to ensure compatibility. + +**Step 1:** Note the path to SmartSim's ``core`` directory on Machine A + +.. code:: + + smart info + +**Step 2:** tar the ``bin`` and ``lib`` directories + +.. code:: + + tar -cf smartsim_build_artifacts.tar -C bin/ lib/ + +**Step 3:** Copy the tarball to Machine B (method will vary by machine) + +**Step 4:** pip install SmartSim on Machine B diff --git a/doc/installation_instructions/platform.rst b/doc/installation_instructions/platform.rst index c1eb51df1a..46bf5b0c9d 100644 --- a/doc/installation_instructions/platform.rst +++ b/doc/installation_instructions/platform.rst @@ -1,27 +1,36 @@ .. _install-notes: -Installation on specific platforms -================================== +General Installation Notes +========================== -The following describes installation details for various systems and platforms -that SmartSim may be used on. +SmartSim has been installed on a variety of systems and our users often have +different build environments and toolchains. The following two sections detail +some common situations and how to setup and modify your build environment: .. include:: platform/generic.rst .. include:: platform/nonroot-linux.rst +Installation guides for specific platforms +========================================== + +HPC platforms have specific modules that users can often use so they do not +need to retrieve all of the build dependencies themselves. Some machines +have specific environment variables and/or configuration settings that need +to be set for optimal performance. The below machines have vetted +instructions, please feel free to contribute instructions for your own +machine. + .. include:: platform/frontier.rst .. include:: platform/perlmutter.rst .. include:: platform/cray.rst -.. include:: platform/ncar-cheyenne.rst +.. include:: platform/pml-scylla.rst .. include:: platform/olcf-summit.rst -.. include:: platform/pml-scylla.rst - .. _site_installation: .. include:: site-install.rst diff --git a/doc/installation_instructions/platform/cray.rst b/doc/installation_instructions/platform/cray.rst index 1a352abd99..6b763c0236 100644 --- a/doc/installation_instructions/platform/cray.rst +++ b/doc/installation_instructions/platform/cray.rst @@ -1,5 +1,5 @@ HPE Cray supercomputers -======================= +----------------------- On certain HPE Cray machines, the SmartSim dependencies have been installed system-wide though specific paths and names might vary (please contact the team diff --git a/doc/installation_instructions/platform/frontier.rst b/doc/installation_instructions/platform/frontier.rst index 9b05061fe1..d1f5a8bb3b 100644 --- a/doc/installation_instructions/platform/frontier.rst +++ b/doc/installation_instructions/platform/frontier.rst @@ -2,7 +2,7 @@ OLCF Frontier ============= Known limitations ------------------ +^^^^^^^^^^^^^^^^^ We are continually working on getting all the features of SmartSim working on Frontier, however we do have some known limitations: @@ -23,7 +23,7 @@ Please raise an issue in the SmartSim Github or contact the developers if the ab issues are affecting your workflow or if you find any other problems. One-time Setup --------------- +^^^^^^^^^^^^^^ To install the SmartRedis and SmartSim python packages on Frontier, please follow these instructions, being sure to set the following variables @@ -63,7 +63,7 @@ these instructions, being sure to set the following variables .. code:: bash - smart build --device=rocm-6 + smart build ^^device=rocm-6 **Step 5:** Check that SmartSim has been installed and built correctly: @@ -76,7 +76,7 @@ these instructions, being sure to set the following variables mkdir -p $MIOPEN_USER_DB_PATH # Run the install validation utility - smart validate --device gpu + smart validate ^^device gpu The following output indicates a successful install: @@ -87,7 +87,7 @@ The following output indicates a successful install: 16:26:35 login SmartSim[557020:MainThread] INFO Success! Post-installation ------------------ +^^^^^^^^^^^^^^^^^ Before running SmartSim, the environment should match the one used to build, and some variables should be set to optimize performance: @@ -109,7 +109,7 @@ build, and some variables should be set to optimize performance: mkdir -p ${MIOPEN_USER_DB_PATH} Binding DBs to Slingshot ------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^ Each Frontier node has *four* NICs, which also means users need to bind DBs to *four* network interfaces, ``hsn0``, ``hsn1``, ``hsn2``, diff --git a/doc/installation_instructions/platform/ncar-cheyenne.rst b/doc/installation_instructions/platform/ncar-cheyenne.rst deleted file mode 100644 index aeb994e917..0000000000 --- a/doc/installation_instructions/platform/ncar-cheyenne.rst +++ /dev/null @@ -1,33 +0,0 @@ - -Cheyenne at NCAR -================ - -Since SmartSim does not currently support the Message Passing Toolkit (MPT), -Cheyenne users of SmartSim will need to utilize OpenMPI. - -The following module commands were utilized to run the examples: - -.. code-block:: bash - - $ module purge - $ module load ncarenv/1.3 gnu/8.3.0 ncarcompilers/0.5.0 netcdf/4.7.4 openmpi/4.0.5 - -With this environment loaded, users will need to build and install both SmartSim -and SmartRedis through pip. Usually we recommend users installing or loading -miniconda and using the pip that comes with that installation. - -.. code-block:: bash - - $ pip install smartsim - $ smart build --device cpu #(Since Cheyenne does not have GPUs) - -To make the SmartRedis library (C, C++, Fortran clients), follow these steps -with the same environment loaded. - -.. code-block:: bash - - # clone SmartRedis and build - $ git clone https://github.com/SmartRedis.git smartredis - $ cd smartredis - $ make lib - diff --git a/doc/installation_instructions/platform/olcf-summit.rst b/doc/installation_instructions/platform/olcf-summit.rst index 07be24eec7..b61112370b 100644 --- a/doc/installation_instructions/platform/olcf-summit.rst +++ b/doc/installation_instructions/platform/olcf-summit.rst @@ -1,6 +1,6 @@ Summit at OLCF -============== +-------------- Since SmartSim does not have a built PowerPC build, the build steps for an IBM system are slightly different than other systems. diff --git a/doc/installation_instructions/platform/perlmutter.rst b/doc/installation_instructions/platform/perlmutter.rst index 71f97a4dc9..1380381eb4 100644 --- a/doc/installation_instructions/platform/perlmutter.rst +++ b/doc/installation_instructions/platform/perlmutter.rst @@ -1,8 +1,8 @@ NERSC Perlmutter -================ +---------------- One-time Setup --------------- +^^^^^^^^^^^^^^ To install SmartSim on Perlmutter, follow these steps: @@ -53,7 +53,7 @@ The following output indicates a successful install: 16:26:35 login SmartSim[557020:MainThread] INFO Success! Post-installation ------------------ +^^^^^^^^^^^^^^^^^ After completing the above steps to install SmartSim in a conda environment, you can reload the conda environment by running the following commands: diff --git a/doc/installation_instructions/platform/pml-scylla.rst b/doc/installation_instructions/platform/pml-scylla.rst index c13f178213..5d9a7bfa0d 100644 --- a/doc/installation_instructions/platform/pml-scylla.rst +++ b/doc/installation_instructions/platform/pml-scylla.rst @@ -1,12 +1,12 @@ PML Scylla -========== +---------- .. warning:: As of September 2024, the software stack on Scylla is still being finalized. Therefore, please consider these instructions as preliminary for now. One-time Setup --------------- +^^^^^^^^^^^^^^ To install SmartSim on Scylla, follow these steps: @@ -72,7 +72,7 @@ The following output indicates a successful install: 16:26:35 login SmartSim[557020:MainThread] INFO Success! Post-installation ------------------ +^^^^^^^^^^^^^^^^^ After completing the above steps to install SmartSim in a conda environment, you can reload the conda environment by running the following commands: diff --git a/smartsim/_core/_cli/build.py b/smartsim/_core/_cli/build.py index 5d094b72f4..dc2b543ac9 100644 --- a/smartsim/_core/_cli/build.py +++ b/smartsim/_core/_cli/build.py @@ -306,7 +306,8 @@ def execute( logger.warning("Dragon installation failed") # REDIS/KeyDB - build_database(build_env, versions, keydb, verbose) + if not args.skip_database: + build_database(build_env, versions, keydb, verbose) if (CONFIG.lib_path / "redisai.so").exists(): logger.warning("RedisAI was previously built, run 'smart clean' to rebuild") @@ -368,6 +369,11 @@ def configure_parser(parser: argparse.ArgumentParser) -> None: action="store_true", help="Do not compile RedisAI and the backends", ) + parser.add_argument( + "--skip-database", + action="store_true", + help="Do not build the database" + ) parser.add_argument( "--skip-torch", action="store_true", diff --git a/smartsim/_core/_cli/info.py b/smartsim/_core/_cli/info.py index c08fcb1a35..21a426bafc 100644 --- a/smartsim/_core/_cli/info.py +++ b/smartsim/_core/_cli/info.py @@ -9,6 +9,7 @@ import smartsim._core._cli.utils as _utils import smartsim._core.utils.helpers as _helpers from smartsim._core._install.buildenv import BuildEnv as _BuildEnv +from smartsim._core.config import CONFIG _MISSING_DEP = _helpers.colorize("Not Installed", "red") @@ -29,6 +30,12 @@ def execute( end="\n\n", ) + print("SmartSim Paths") + path_table = [["core", str(CONFIG.dependency_path)]] + path_table.append(["bin", str(CONFIG.bin_path)]) + path_table.append(["lib", str(CONFIG.lib_path)]) + print(tabulate(path_table, tablefmt="fancy_outline"), end="\n\n") + print("Orchestrator Configuration:") db_path = _utils.get_db_path() db_table = [["Installed", _fmt_installed_db(db_path)]] From 6fef8ecd31e59901490d6eab109a387234fbd457 Mon Sep 17 00:00:00 2001 From: Andrew Shao Date: Tue, 15 Oct 2024 17:22:10 -0700 Subject: [PATCH 2/6] Add documentation describing how to make your configuration --- .../install_notes.rst | 16 +++++++++ .../install_notes/custom_backends.rst | 34 +++++++++++++++++++ 2 files changed, 50 insertions(+) create mode 100644 doc/installation_instructions/install_notes.rst create mode 100644 doc/installation_instructions/install_notes/custom_backends.rst diff --git a/doc/installation_instructions/install_notes.rst b/doc/installation_instructions/install_notes.rst new file mode 100644 index 0000000000..0411aaa50e --- /dev/null +++ b/doc/installation_instructions/install_notes.rst @@ -0,0 +1,16 @@ +.. _install-notes: + +General Installation Notes +========================== + +SmartSim has been installed on a variety of systems and our users often have +different build environments and toolchains. The following two sections detail +some common situations and how to setup and modify your build environment: + +.. include:: install_notes/generic.rst + +.. include:: install_notes/nonroot-linux.rst + +.. include:: install_notes/offline.rst + +.. include:: install_notes/custom_backends.rst \ No newline at end of file diff --git a/doc/installation_instructions/install_notes/custom_backends.rst b/doc/installation_instructions/install_notes/custom_backends.rst new file mode 100644 index 0000000000..96be2e1402 --- /dev/null +++ b/doc/installation_instructions/install_notes/custom_backends.rst @@ -0,0 +1,34 @@ +Custom ML backends +------------------ + +The ML backends (Torch, ONNX Runtime, and Tensorflow) and their associated +python packages have different versions and indices that can be supported based +on the intended device (CPU, ROCM, CUDA-11, or CUDA-12). SmartSim stores this +information in JSON files within the ``smartsim/_core/_install/configs/mlpackages`` +directory. If a different version or variant is needed, these can be specified +using ``smart build --config-dir ``. The following is the +JSON file used for Linux with CUDA-12. + +.. literalinclude:: ../../../smartsim/_core/_install/configs/mlpackages/Linux64CUDA12.json + +The following table explains what each of the main fields are: + +.. list-table:: MLPackages fields + :widths: 25 50 + :header-rows: 1 + + * - Field Name + - Description + * - name + - The name of the C++ frontend to the ML package itself (e.g. libtorch) + * - version + - A string used to identify the version of the library. Note that this does not have + an effect on the build process itself, but is used to display information + * - pip_index + - The pip index from which to install the python packages associated with this ML package + * - lib_source + - The location of the archive which contains the ML backend. If this is a URL, the file + will be downloaded, otherwise if this is a local path, the archive will be copied to + the build library and extracted + * - rai_patches + - Patch RedisAI source code with modifications needed by this ML package \ No newline at end of file From c68c0bf3016ae19d8e7f4d23b4b2c687ca8dd3f1 Mon Sep 17 00:00:00 2001 From: Andrew Shao Date: Thu, 31 Oct 2024 09:22:15 -0700 Subject: [PATCH 3/6] Update changelog --- doc/changelog.md | 8 +++++++- doc/index.rst | 1 + doc/installation_instructions/platform.rst | 17 ++--------------- 3 files changed, 10 insertions(+), 16 deletions(-) diff --git a/doc/changelog.md b/doc/changelog.md index b2bf0152d5..bcca548854 100644 --- a/doc/changelog.md +++ b/doc/changelog.md @@ -14,7 +14,8 @@ To be released at some point in the future Description - Implement workaround for Tensorflow that allows RedisAI to build with GCC-14 -- Add instructions for installing SmartSim on PML's Scylla +- Add installation instructions for airgapped machines +- Add installation instructions for PML's Scylla - Fix typos in documentation Detailed Notes @@ -26,6 +27,11 @@ Detailed Notes Future versions of Tensorflow may fix this problem, but for now this seems to be the best workaround. ([SmartSim-PR738](https://github.com/CrayLabs/SmartSim/pull/738)) +- Update install notes and documentation for custom backends +- Update/reorganize the install instructions to include a split between advanced + install notes and instructions for specific platforms. Additionally, add + instructions for machines which do not have access to the internet. + ([SmartSim-PR749](https://github.com/CrayLabs/SmartSim/pull/749)) - PML's Scylla is still under development. The usual SmartSim build instructions do not apply because the GPU dependencies have yet to be installed at a system-wide level. Scylla has diff --git a/doc/index.rst b/doc/index.rst index 4c64712b23..59ef95d885 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -12,6 +12,7 @@ overview installation_instructions/basic + installation_instructions/install_notes installation_instructions/platform contributing smartsim_zoo diff --git a/doc/installation_instructions/platform.rst b/doc/installation_instructions/platform.rst index 46bf5b0c9d..79052d65d7 100644 --- a/doc/installation_instructions/platform.rst +++ b/doc/installation_instructions/platform.rst @@ -1,18 +1,5 @@ -.. _install-notes: - -General Installation Notes -========================== - -SmartSim has been installed on a variety of systems and our users often have -different build environments and toolchains. The following two sections detail -some common situations and how to setup and modify your build environment: - -.. include:: platform/generic.rst - -.. include:: platform/nonroot-linux.rst - -Installation guides for specific platforms -========================================== +Platform Install Guide +====================== HPC platforms have specific modules that users can often use so they do not need to retrieve all of the build dependencies themselves. Some machines From 2194083c3e8df35432b0abb4a5e50b17c2196b75 Mon Sep 17 00:00:00 2001 From: Andrew Shao Date: Wed, 16 Oct 2024 17:31:13 -0700 Subject: [PATCH 4/6] Update installation guides --- doc/index.rst | 2 +- doc/installation_instructions/basic.rst | 37 +++--- .../install_notes.rst | 16 --- .../install_notes/nonroot-linux.rst | 18 --- doc/installation_instructions/platform.rst | 14 +-- .../platform/frontier.rst | 8 +- .../platform/olcf-summit.rst | 4 +- .../platform/perlmutter.rst | 4 +- .../platform/pml-scylla.rst | 4 +- .../site-install.rst | 2 +- .../troubleshooting/cuda-dependencies.rst | 111 ++++++++++++++++++ .../troubleshooting/cudatoolkit | 24 ++++ .../troubleshooting/cudnn | 13 ++ .../custom_backends.rst | 34 +++--- .../generic.rst | 26 +--- .../offline.rst | 31 ++++- .../troubleshooting/troubleshooting.rst | 16 +++ 17 files changed, 244 insertions(+), 120 deletions(-) delete mode 100644 doc/installation_instructions/install_notes.rst delete mode 100644 doc/installation_instructions/install_notes/nonroot-linux.rst create mode 100644 doc/installation_instructions/troubleshooting/cuda-dependencies.rst create mode 100644 doc/installation_instructions/troubleshooting/cudatoolkit create mode 100644 doc/installation_instructions/troubleshooting/cudnn rename doc/installation_instructions/{install_notes => troubleshooting}/custom_backends.rst (53%) rename doc/installation_instructions/{install_notes => troubleshooting}/generic.rst (55%) rename doc/installation_instructions/{install_notes => troubleshooting}/offline.rst (56%) create mode 100644 doc/installation_instructions/troubleshooting/troubleshooting.rst diff --git a/doc/index.rst b/doc/index.rst index 59ef95d885..f9e05a51c7 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -12,8 +12,8 @@ overview installation_instructions/basic - installation_instructions/install_notes installation_instructions/platform + installation_instructions/troubleshooting/troubleshooting contributing smartsim_zoo diff --git a/doc/installation_instructions/basic.rst b/doc/installation_instructions/basic.rst index 226ccb0854..382cad9495 100644 --- a/doc/installation_instructions/basic.rst +++ b/doc/installation_instructions/basic.rst @@ -4,7 +4,9 @@ Basic Installation ****************** -The following will show how to install both SmartSim and SmartRedis. +The following instructions serve as a guide for installing both SmartSim and +SmartRedis. SmartSim, despite being a Python-library, has a second build +step for Redis and RedisAI. Please follow these instructions carefully. .. note:: @@ -30,30 +32,26 @@ The base prerequisites to install SmartSim and SmartRedis wtih CPU-only support .. note:: - GCC 9, 11-13 is recommended (here are known issues compiling with GCC 10). For - CUDA 11.8, GCC 9 or 11 must be used. - -.. warning:: - - Apple Clang 15 seems to have issues on MacOS with Apple Silicon. Please modify - your path to ensure that a version of GCC installed by brew has priority. Note - this seems to be hardcoded to `gcc` and `g++` in the Redis build so ensure that - `which gcc g++` do not point to Apple Clang. - + GCC is recommended to build the backends for SmartSim. CUDA 11.8 requires GCC + 9 or 11, CUDA 12 requires GCC 11 or higher. SmartRedis can be compiled with + GCC, Intel, Cray, and Nvidia compilers. ML Library Support ================== -We currently support both Nvidia and AMD GPUs when using RedisAI for GPU inference. The support -for these GPUs often depends on the version of the CUDA or ROCm stack that is availble on your -machine. In _most_ cases, the versions backwards compatible. If you encounter problems, please -contact us and we can build the backend libraries for your desired version of CUDA and ROCm. +We currently support both Nvidia and AMD GPUs when using RedisAI for GPU +inference. The support for these GPUs often depends on the version of the CUDA +or ROCm stack that is availble on your machine. In _most_ cases, the versions of +the ML frameworks are backwards compatible. If you encounter problems, please +contact us and we can build the backend libraries for your desired version of +CUDA and ROCm. CPU backends are provided for Apple (both Intel and Apple Silicon) and Linux (x86_64). -Be sure to reference the table below to find which versions of the ML libraries are supported for -your particular platform. Additional, see :ref:`installation notes ` for helpful -information regarding various system types before installation. +Be sure to reference the table below to find which versions of the ML libraries +are supported for your particular platform. Additionally, see :ref:`Platform +Installation Guide ` for helpful information regarding +for specific systems. Linux ----- @@ -287,8 +285,7 @@ combination. GPU builds can be troublesome due to the way that RedisAI and the ML-package backends look for the CUDA Toolkit and cuDNN libraries. Please see the - :ref:`Platform Installation Section ` section for guidance. - + :ref:`Install Troubleshooting ` section for guidance. .. _dragon_install: diff --git a/doc/installation_instructions/install_notes.rst b/doc/installation_instructions/install_notes.rst deleted file mode 100644 index 0411aaa50e..0000000000 --- a/doc/installation_instructions/install_notes.rst +++ /dev/null @@ -1,16 +0,0 @@ -.. _install-notes: - -General Installation Notes -========================== - -SmartSim has been installed on a variety of systems and our users often have -different build environments and toolchains. The following two sections detail -some common situations and how to setup and modify your build environment: - -.. include:: install_notes/generic.rst - -.. include:: install_notes/nonroot-linux.rst - -.. include:: install_notes/offline.rst - -.. include:: install_notes/custom_backends.rst \ No newline at end of file diff --git a/doc/installation_instructions/install_notes/nonroot-linux.rst b/doc/installation_instructions/install_notes/nonroot-linux.rst deleted file mode 100644 index b49c6f2b9c..0000000000 --- a/doc/installation_instructions/install_notes/nonroot-linux.rst +++ /dev/null @@ -1,18 +0,0 @@ -GPU dependencies (non-root) ---------------------------- - -The Nvidia installation instructions for CUDA Toolkit and cuDNN tend to be -tailored for users with root access. For those on HPC platforms where root -access is rare, manually downloading and installing these dependencies as -a user is possible. - -.. code-block:: bash - - wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda_11.4.4_470.82.01_linux.run - chmod +x cuda_11.4.4_470.82.01_linux.run - ./cuda_11.4.4_470.82.01_linux.run --toolkit --silent --toolkitpath=/path/to/install/location/ - -For cuDNN, follow `Nvidia's instructions -`_, -and copy the cuDNN libraries to the `lib64` directory at the CUDA Toolkit -location specified above. \ No newline at end of file diff --git a/doc/installation_instructions/platform.rst b/doc/installation_instructions/platform.rst index 79052d65d7..a8f5cea350 100644 --- a/doc/installation_instructions/platform.rst +++ b/doc/installation_instructions/platform.rst @@ -1,3 +1,5 @@ +.. _platform-installation: + Platform Install Guide ====================== @@ -9,18 +11,10 @@ instructions, please feel free to contribute instructions for your own machine. .. include:: platform/frontier.rst - .. include:: platform/perlmutter.rst - -.. include:: platform/cray.rst - .. include:: platform/pml-scylla.rst - +.. include:: platform/cray.rst .. include:: platform/olcf-summit.rst .. _site_installation: - -.. include:: site-install.rst - - - +.. include:: site-install.rst \ No newline at end of file diff --git a/doc/installation_instructions/platform/frontier.rst b/doc/installation_instructions/platform/frontier.rst index d1f5a8bb3b..06828bac9e 100644 --- a/doc/installation_instructions/platform/frontier.rst +++ b/doc/installation_instructions/platform/frontier.rst @@ -1,5 +1,5 @@ -OLCF Frontier -============= +Frontier (OLCF) +--------------- Known limitations ^^^^^^^^^^^^^^^^^ @@ -63,7 +63,7 @@ these instructions, being sure to set the following variables .. code:: bash - smart build ^^device=rocm-6 + smart build --device=rocm-6 **Step 5:** Check that SmartSim has been installed and built correctly: @@ -76,7 +76,7 @@ these instructions, being sure to set the following variables mkdir -p $MIOPEN_USER_DB_PATH # Run the install validation utility - smart validate ^^device gpu + smart validate --device gpu The following output indicates a successful install: diff --git a/doc/installation_instructions/platform/olcf-summit.rst b/doc/installation_instructions/platform/olcf-summit.rst index b61112370b..fbb4f9b6d2 100644 --- a/doc/installation_instructions/platform/olcf-summit.rst +++ b/doc/installation_instructions/platform/olcf-summit.rst @@ -1,6 +1,6 @@ -Summit at OLCF --------------- +Summit (OLCF) +------------- Since SmartSim does not have a built PowerPC build, the build steps for an IBM system are slightly different than other systems. diff --git a/doc/installation_instructions/platform/perlmutter.rst b/doc/installation_instructions/platform/perlmutter.rst index 1380381eb4..7f1a0088c8 100644 --- a/doc/installation_instructions/platform/perlmutter.rst +++ b/doc/installation_instructions/platform/perlmutter.rst @@ -1,5 +1,5 @@ -NERSC Perlmutter ----------------- +Perlmutter (NERSC) +------------------ One-time Setup ^^^^^^^^^^^^^^ diff --git a/doc/installation_instructions/platform/pml-scylla.rst b/doc/installation_instructions/platform/pml-scylla.rst index 5d9a7bfa0d..8aa80c0e7f 100644 --- a/doc/installation_instructions/platform/pml-scylla.rst +++ b/doc/installation_instructions/platform/pml-scylla.rst @@ -1,5 +1,5 @@ -PML Scylla ----------- +Scylla (PML) +------------ .. warning:: As of September 2024, the software stack on Scylla is still being finalized. diff --git a/doc/installation_instructions/site-install.rst b/doc/installation_instructions/site-install.rst index 53e0ff8bf0..ca4b3f0c60 100644 --- a/doc/installation_instructions/site-install.rst +++ b/doc/installation_instructions/site-install.rst @@ -12,4 +12,4 @@ from source with the following steps replacing ``COMPILER_VERSION`` and module use -a /lus/scratch/smartsim/local/modulefiles module load cudatoolkit/11.8 cudnn smartsim-deps/COMPILER_VERSION/SMARTSIM_VERSION pip install smartsim - smart build --skip-backends --device gpu [--onnx] + smart build --skip-backends --device gpu diff --git a/doc/installation_instructions/troubleshooting/cuda-dependencies.rst b/doc/installation_instructions/troubleshooting/cuda-dependencies.rst new file mode 100644 index 0000000000..3132056dea --- /dev/null +++ b/doc/installation_instructions/troubleshooting/cuda-dependencies.rst @@ -0,0 +1,111 @@ +Nvidia GPU Dependencies +----------------------- + +The Nvidia installation instructions for CUDA Toolkit and cuDNN tend to be +tailored for users with root access. For those on HPC platforms where root +access is rare, users can install Nvidia dependencies in user-space. Even on +machines where these dependencies are available, if environment variables are +not set, the ``smart build`` step may fail. This section details how to download +and install these dependencies and configure your build environment. + +.. note:: + + At runtime, the environment in which the Orchestrator is launched must have + the cuDNN and CUDA Toolkit libraries findable by the link loader (e.g. + available in the ``LD_LIBRARY_PATH`` environment variable). + +Download and install +^^^^^^^^^^^^^^^^^^^^ + +**Step 1:** Find a location which is globally accessible and has sufficient +storage space (about 12GB) and set an environment variable + +.. code-block:: bash + + export CUDA_TOOLKIT_INSTALL_PATH=/path/to/install/location/cudatoolkit + export CUDNN_INSTALL_PATH=/path/to/install/location/cudnn + +**Step 2:** Download cudatoolkit and install it + +.. tabs:: + + .. group-tab:: CUDA 11 + + .. code-block:: bash + + wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run + sh ./cuda_11.8.0_520.61.05_linux.run --toolkit --silent --toolkitpath=$CUDA_TOOLKIT_INSTALL_PATH + + .. group-tab:: CUDA 12 + + .. code-block:: bash + + wget https://developer.download.nvidia.com/compute/cuda/12.5.0/local_installers/cuda_12.5.0_555.42.02_linux.run + sh ./cuda_12.5.0_555.42.02_linux.run --toolkit --silent --toolkitpath=$CUDA_TOOLKIT_INSTALL_PATH + +**Step 3:** Download cuDNN +For cuDNN, follow `Nvidia's instructions +`_ for +downloading cuDNN version 8.9 for either CUDA-11 or CUDA-12. + +**Step 4:** Untar the cuDNN archive + +.. tabs:: + + .. group-tab:: CUDA 11 + + .. code-block:: bash + + mkdir -p $CUDNN_INSTALL_PATH + tar -xf cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar -C $CUDNN_INSTALL_PATH --strip-components 1 + + .. group-tab:: CUDA 12 + + .. code-block:: bash + + mkdir -p $CUDNN_INSTALL_PATH + tar -xf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar -C $CUDNN_INSTALL_PATH --strip-components 1 + +Option 1: Environment Variables +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following environment variables help the ``smart build`` step find and link in the +CUDA Toolkit and cuDNN libraries needed to build the ML backends. + +.. code-block:: bash + + # CUDA Toolkit variables + export CUDA_TOOLKIT_ROOT_DIR=$CUDA_TOOLKIT_INSTALL_PATH + export CUDA_NVCC_EXECUTABLE=$CUDA_TOOLKIT_ROOT_DIR/bin/nvcc + export CUDA_INCLUDE_DIRS=$CUDA_TOOLKIT_ROOT_DIR/include + + # cuDNN Variables + export CUDNN_LIBRARY=$CUDNN_INSTALL_PATH/lib/libcudnn.so + export CUDNN_INCLUDE_DIR=$CUDNN_INSTALL_PATH/include + +Option 2: Setup Modulefiles +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Alternatively, these environment variables can be setup by using environment +modules instead. This can be especially useful when the CUDA dependencies are +intended to be shared across users. + +**Step 1:** Download these two modulefiles to a directory of your choosing + +- :download:`CUDA Toolkit <./cudatoolkit>` +- :download:`cuDNN <./cudnn>` + +**Step 2:** Modify the files to set the ``cuda_home`` and ``CUDNN_ROOT`` +variables to match the installed locations for CUDA Toolkit and cuDNN. + +**Step 3:** In your ``.bashrc`` add the following line + +.. code-block:: + + module use /path/to/modulefile root + +**Step 4:** Activate the modulefiles + +.. code-block:: + + module load cudatoolkit cudnn \ No newline at end of file diff --git a/doc/installation_instructions/troubleshooting/cudatoolkit b/doc/installation_instructions/troubleshooting/cudatoolkit new file mode 100644 index 0000000000..7b8dd73068 --- /dev/null +++ b/doc/installation_instructions/troubleshooting/cudatoolkit @@ -0,0 +1,24 @@ +#%Module -*- tcl -*- ## +## modulefile + +proc ModulesHelp { } { + + puts stderr "\tAdds CUDA Toolkit to your environment," + +} + +module-whatis "CUDA Toolkit development libraries" + +conflict cudatoolkit + +set cuda_home path/to/cudatoolkit + +setenv CUDA_HOME $cuda_home +setenv CUDA_TOOLKIT_ROOT_DIR $cuda_home +set cuda_lib $cuda_home/lib64/ +setenv CUDA_LIBRARY $cuda_lib +prepend-path LD_LIBRARY_PATH $cuda_lib + +prepend-path PATH $cuda_home/bin +setenv CUDA_NVCC_EXECUTABLE $cuda_home/bin/nvcc +setenv CUDA_INCLUDE_DIRS $cuda_home/include \ No newline at end of file diff --git a/doc/installation_instructions/troubleshooting/cudnn b/doc/installation_instructions/troubleshooting/cudnn new file mode 100644 index 0000000000..ccf6a6f211 --- /dev/null +++ b/doc/installation_instructions/troubleshooting/cudnn @@ -0,0 +1,13 @@ +#%Module -*- tcl -*- ## +## modulefile +proc ModulesHelp { } { + +puts stderr "\tAdds CUDNN to your environment," + +} + +module-whatis "CUDNN development libraries" + +set CUDNN_ROOT /path/to/cudnn +set cudnn_lib $CUDNN_ROOT/lib +setenv CUDNN_INSTALL_PATH $CUDNN_ROOT \ No newline at end of file diff --git a/doc/installation_instructions/install_notes/custom_backends.rst b/doc/installation_instructions/troubleshooting/custom_backends.rst similarity index 53% rename from doc/installation_instructions/install_notes/custom_backends.rst rename to doc/installation_instructions/troubleshooting/custom_backends.rst index 96be2e1402..c745dbd5ec 100644 --- a/doc/installation_instructions/install_notes/custom_backends.rst +++ b/doc/installation_instructions/troubleshooting/custom_backends.rst @@ -3,32 +3,38 @@ Custom ML backends The ML backends (Torch, ONNX Runtime, and Tensorflow) and their associated python packages have different versions and indices that can be supported based -on the intended device (CPU, ROCM, CUDA-11, or CUDA-12). SmartSim stores this -information in JSON files within the ``smartsim/_core/_install/configs/mlpackages`` -directory. If a different version or variant is needed, these can be specified -using ``smart build --config-dir ``. The following is the -JSON file used for Linux with CUDA-12. +on the intended device (CPU, ROCM, CUDA-11, or CUDA-12). The officially +supported backends are stored in JSON files within the +``smartsim/_core/_install/configs/mlpackages`` directory. -.. literalinclude:: ../../../smartsim/_core/_install/configs/mlpackages/Linux64CUDA12.json +If you need to define a different version of the backend and/or the packages, we +recommend that you copy one of the JSON files (for example the one at the end of +this section) that SmartSim ships with, modify as needed, and then use ``smart +build --config-dir`` to specify the path to your custom configuration(s). -The following table explains what each of the main fields are: +The following table describes the main fields needed to define a machine learning +backend used by RedisAI. .. list-table:: MLPackages fields - :widths: 25 50 + :widths: 15 60 :header-rows: 1 * - Field Name - Description - * - name + * - ``name`` - The name of the C++ frontend to the ML package itself (e.g. libtorch) - * - version + * - ``version`` - A string used to identify the version of the library. Note that this does not have an effect on the build process itself, but is used to display information - * - pip_index + * - ``pip_index`` - The pip index from which to install the python packages associated with this ML package - * - lib_source + * - ``lib_source`` - The location of the archive which contains the ML backend. If this is a URL, the file will be downloaded, otherwise if this is a local path, the archive will be copied to the build library and extracted - * - rai_patches - - Patch RedisAI source code with modifications needed by this ML package \ No newline at end of file + * - ``rai_patches`` + - Patch RedisAI source code with modifications needed by this ML package + +As an example, the following file describes the ML frameworks for Linux on CUDA-12 devices: + +.. literalinclude:: ../../../smartsim/_core/_install/configs/mlpackages/LinuxX64CUDA12.json diff --git a/doc/installation_instructions/install_notes/generic.rst b/doc/installation_instructions/troubleshooting/generic.rst similarity index 55% rename from doc/installation_instructions/install_notes/generic.rst rename to doc/installation_instructions/troubleshooting/generic.rst index 790f84d13d..f035c3bc47 100644 --- a/doc/installation_instructions/install_notes/generic.rst +++ b/doc/installation_instructions/troubleshooting/generic.rst @@ -14,12 +14,6 @@ backends are compiled with the desired compilation environment. that this works as intended however, please be sure to set the correct environment for the simulation using the ``RunSettings``. -All of the following environment variables must be *exported* to ensure that -they are used throughout the entire build process. Additionally at runtime, the -environment in which the Orchestrator is launched must have the cuDNN and CUDA -Toolkit libraries findable by the link loader (e.g. available in the -``LD_LIBRARY_PATH`` environment variable). - Compiler environment ^^^^^^^^^^^^^^^^^^^^ @@ -30,22 +24,4 @@ compiler should be used (e.g. the Cray Programming Environment wrappers), the following environment variables will control the C and C++ compilers: - ``CC``: Path to the C compiler -- ``CXX``: Path the C++ compiler - -CUDA-related -^^^^^^^^^^^^ - -The following environment variables help the ``smart build`` step find and link in the -CUDA Toolkit and cuDNN libraries needed to build the ML backends. - -cuDNN: - -- ``CUDNN_LIBRARY``: Path to the cuDNN shared libraries (e.g. ``libcudnn.so``) are found -- ``CUDNN_INCLUDE_DIR``: Path to cuDNN header files (e.g. ``cudnn.h``) - -CUDA Toolkit: - -- ``CUDA_TOOLKIT_ROOT_DIR``: Path to the root directory of CUDA Toolkit -- ``CUDA_NVCC_EXECUTABLE``: Path to the ``nvcc`` compiler -- ``CUDA_INCLUDE_DIRS``: Path to the CUDA Toolkit headers - +- ``CXX``: Path the C++ compiler \ No newline at end of file diff --git a/doc/installation_instructions/install_notes/offline.rst b/doc/installation_instructions/troubleshooting/offline.rst similarity index 56% rename from doc/installation_instructions/install_notes/offline.rst rename to doc/installation_instructions/troubleshooting/offline.rst index 6711be64e6..6fef0fc73f 100644 --- a/doc/installation_instructions/install_notes/offline.rst +++ b/doc/installation_instructions/troubleshooting/offline.rst @@ -1,5 +1,5 @@ -Non-internet machines ---------------------- +Airgapped Systems +----------------- SmartSim implictly assumes that dependencies can be retrieved via the Internet. The ``smart build`` step can be bypassed by transferring the build artifacts @@ -28,8 +28,29 @@ target machine (referred to as Machine B) to ensure compatibility. .. code:: - tar -cf smartsim_build_artifacts.tar -C bin/ lib/ + tar -cf smartsim_build_artifacts.tar -C bin/ lib/ -**Step 3:** Copy the tarball to Machine B (method will vary by machine) +**Step 3:** Copy the tarball, SmartSim wheel, SmartRedis wheel, +SmartRedis libraries to Machine B (method will vary by machine) -**Step 4:** pip install SmartSim on Machine B +**Step 4:** Install SmartSim and SmartRedis on Machine B + +.. code:: + + pip install + +**Step 5:** Find the path to the core directory again with + +.. code:: + + smart info + +**Step 6:** Unpack the tarball to the core directory + +.. code:: + + tar -xf smartsim_build_artifacts.tar -C + +**Step 7:** Install the python packages associated with the ML frameworks +(for the default versions reference +``smartsim/_core/_install/configs/mlpackages``) \ No newline at end of file diff --git a/doc/installation_instructions/troubleshooting/troubleshooting.rst b/doc/installation_instructions/troubleshooting/troubleshooting.rst new file mode 100644 index 0000000000..63878e8a35 --- /dev/null +++ b/doc/installation_instructions/troubleshooting/troubleshooting.rst @@ -0,0 +1,16 @@ +.. _installation-troubleshooting: + +Installation Troubleshooting +============================ + +SmartSim has been installed on a variety of systems and our users often have +different build environments and toolchains. The following two sections detail +some common situations and how to setup and modify your build environment: + +.. include:: generic.rst + +.. include:: cuda-dependencies.rst + +.. include:: offline.rst + +.. include:: custom_backends.rst \ No newline at end of file From ee754c80be10c17f8e9934843b8379d36a9d93b5 Mon Sep 17 00:00:00 2001 From: Andrew Shao Date: Thu, 17 Oct 2024 08:37:30 -0700 Subject: [PATCH 5/6] Fix style error --- smartsim/_core/_cli/build.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/smartsim/_core/_cli/build.py b/smartsim/_core/_cli/build.py index dc2b543ac9..7b6347ea40 100644 --- a/smartsim/_core/_cli/build.py +++ b/smartsim/_core/_cli/build.py @@ -370,9 +370,7 @@ def configure_parser(parser: argparse.ArgumentParser) -> None: help="Do not compile RedisAI and the backends", ) parser.add_argument( - "--skip-database", - action="store_true", - help="Do not build the database" + "--skip-database", action="store_true", help="Do not build the database" ) parser.add_argument( "--skip-torch", From 4952a13f8b01f066cbc92e55d92e224057270d2b Mon Sep 17 00:00:00 2001 From: Andrew Shao Date: Thu, 31 Oct 2024 09:18:04 -0700 Subject: [PATCH 6/6] Fix grammatical/clarity issues from review --- doc/contributing.rst | 1 + doc/installation_instructions/basic.rst | 32 +++++++++++-------- doc/installation_instructions/platform.rst | 13 ++++---- .../troubleshooting/cuda-dependencies.rst | 18 +++++------ .../troubleshooting/custom_backends.rst | 11 ++++--- .../troubleshooting/offline.rst | 16 ++++++---- .../troubleshooting/troubleshooting.rst | 6 ++-- 7 files changed, 53 insertions(+), 44 deletions(-) diff --git a/doc/contributing.rst b/doc/contributing.rst index a8a860045c..7e200078f1 100644 --- a/doc/contributing.rst +++ b/doc/contributing.rst @@ -1,3 +1,4 @@ +.. _contributing: ****************** Contributing Guide diff --git a/doc/installation_instructions/basic.rst b/doc/installation_instructions/basic.rst index 382cad9495..d4dda3b688 100644 --- a/doc/installation_instructions/basic.rst +++ b/doc/installation_instructions/basic.rst @@ -4,9 +4,9 @@ Basic Installation ****************** -The following instructions serve as a guide for installing both SmartSim and -SmartRedis. SmartSim, despite being a Python-library, has a second build -step for Redis and RedisAI. Please follow these instructions carefully. +The following instructions guide you through installing SmartSim and SmartRedis. +SmartSim, despite being a Python-library, has a second build step for Redis and +RedisAI. Please follow these instructions carefully. .. note:: @@ -32,26 +32,29 @@ The base prerequisites to install SmartSim and SmartRedis wtih CPU-only support .. note:: - GCC is recommended to build the backends for SmartSim. CUDA 11.8 requires GCC - 9 or 11, CUDA 12 requires GCC 11 or higher. SmartRedis can be compiled with - GCC, Intel, Cray, and Nvidia compilers. + We suggest using GCC to build Redis, RedisAI, and the ML backends. For specific + version requirements see the :ref:`Requirements ` section. + + SmartRedis can be compiled with GCC, Intel, Cray, and Nvidia compilers. ML Library Support ================== -We currently support both Nvidia and AMD GPUs when using RedisAI for GPU -inference. The support for these GPUs often depends on the version of the CUDA -or ROCm stack that is availble on your machine. In _most_ cases, the versions of -the ML frameworks are backwards compatible. If you encounter problems, please -contact us and we can build the backend libraries for your desired version of -CUDA and ROCm. +SmartSim supports using Nvidia and AMD GPUs when using RedisAI for GPU +inference. GPU support often depends on the version of the CUDA or ROCm stack +that is available on your machine. In _most_ cases, the versions of the ML +frameworks are backwards compatible. If you encounter problems, please contact +us at (smartsim at hpe dot com) and we can build the backend libraries for your +desired version of CUDA and/or ROCm. CPU backends are provided for Apple (both Intel and Apple Silicon) and Linux (x86_64). Be sure to reference the table below to find which versions of the ML libraries are supported for your particular platform. Additionally, see :ref:`Platform Installation Guide ` for helpful information regarding -for specific systems. +specific systems. + +.. _requirements: Linux ----- @@ -62,7 +65,7 @@ Linux Additional requirements: - * GCC <= 11 + * GCC <= 11 (except 10) * CUDA Toolkit 11.7 or 11.8 * cuDNN 8.9 @@ -84,6 +87,7 @@ Linux Additional requirements: + * GCC >= 11 * CUDA Toolkit 12 * cuDNN 8.9 diff --git a/doc/installation_instructions/platform.rst b/doc/installation_instructions/platform.rst index a8f5cea350..c2aca958fc 100644 --- a/doc/installation_instructions/platform.rst +++ b/doc/installation_instructions/platform.rst @@ -3,12 +3,13 @@ Platform Install Guide ====================== -HPC platforms have specific modules that users can often use so they do not -need to retrieve all of the build dependencies themselves. Some machines -have specific environment variables and/or configuration settings that need -to be set for optimal performance. The below machines have vetted -instructions, please feel free to contribute instructions for your own -machine. + +HPC platforms often provide modules that enable user to avoid retrieving all +build dependencies themselves. Additionally, some machines require environment +variables and/or configuration settings that need to be set for optimal +performance. The below machines have vetted instructions. Please feel free to +contribute instructions for your own platform (see :ref:`Contributing Guide +`). .. include:: platform/frontier.rst .. include:: platform/perlmutter.rst diff --git a/doc/installation_instructions/troubleshooting/cuda-dependencies.rst b/doc/installation_instructions/troubleshooting/cuda-dependencies.rst index 3132056dea..cf6dcacc0f 100644 --- a/doc/installation_instructions/troubleshooting/cuda-dependencies.rst +++ b/doc/installation_instructions/troubleshooting/cuda-dependencies.rst @@ -10,22 +10,22 @@ and install these dependencies and configure your build environment. .. note:: - At runtime, the environment in which the Orchestrator is launched must have - the cuDNN and CUDA Toolkit libraries findable by the link loader (e.g. - available in the ``LD_LIBRARY_PATH`` environment variable). + The Orchestrator must launched in an environment with the cuDNN and CUDA + Toolkit libraries findable by the link loader (e.g. available in the + ``LD_LIBRARY_PATH`` environment variable). Download and install ^^^^^^^^^^^^^^^^^^^^ **Step 1:** Find a location which is globally accessible and has sufficient -storage space (about 12GB) and set an environment variable +storage space (about 12GB) and set an environment variable: .. code-block:: bash export CUDA_TOOLKIT_INSTALL_PATH=/path/to/install/location/cudatoolkit export CUDNN_INSTALL_PATH=/path/to/install/location/cudnn -**Step 2:** Download cudatoolkit and install it +**Step 2:** Download cudatoolkit and install it: .. tabs:: @@ -43,12 +43,12 @@ storage space (about 12GB) and set an environment variable wget https://developer.download.nvidia.com/compute/cuda/12.5.0/local_installers/cuda_12.5.0_555.42.02_linux.run sh ./cuda_12.5.0_555.42.02_linux.run --toolkit --silent --toolkitpath=$CUDA_TOOLKIT_INSTALL_PATH -**Step 3:** Download cuDNN +**Step 3:** Download cuDNN: For cuDNN, follow `Nvidia's instructions `_ for downloading cuDNN version 8.9 for either CUDA-11 or CUDA-12. -**Step 4:** Untar the cuDNN archive +**Step 4:** Untar the cuDNN archive: .. tabs:: @@ -87,8 +87,8 @@ Option 2: Setup Modulefiles ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Alternatively, these environment variables can be setup by using environment -modules instead. This can be especially useful when the CUDA dependencies are -intended to be shared across users. +modules. This is useful when the CUDA dependencies are intended to be shared +across users. **Step 1:** Download these two modulefiles to a directory of your choosing diff --git a/doc/installation_instructions/troubleshooting/custom_backends.rst b/doc/installation_instructions/troubleshooting/custom_backends.rst index c745dbd5ec..98031cf5e8 100644 --- a/doc/installation_instructions/troubleshooting/custom_backends.rst +++ b/doc/installation_instructions/troubleshooting/custom_backends.rst @@ -3,14 +3,15 @@ Custom ML backends The ML backends (Torch, ONNX Runtime, and Tensorflow) and their associated python packages have different versions and indices that can be supported based -on the intended device (CPU, ROCM, CUDA-11, or CUDA-12). The officially +on the intended device (CPU, ROCm, CUDA-11, or CUDA-12). The officially supported backends are stored in JSON files within the ``smartsim/_core/_install/configs/mlpackages`` directory. -If you need to define a different version of the backend and/or the packages, we -recommend that you copy one of the JSON files (for example the one at the end of -this section) that SmartSim ships with, modify as needed, and then use ``smart -build --config-dir`` to specify the path to your custom configuration(s). +To customize the version of a backend and/or package, we recommend that you use +a configuration shipped with SmartSim as a template (for example the one at the +end of this section). Copy the file and update as needed. Afterwards, use +``smart build --config-dir`` to tell the build process to use custom +configuration(s). The following table describes the main fields needed to define a machine learning backend used by RedisAI. diff --git a/doc/installation_instructions/troubleshooting/offline.rst b/doc/installation_instructions/troubleshooting/offline.rst index 6fef0fc73f..87b19f1c3d 100644 --- a/doc/installation_instructions/troubleshooting/offline.rst +++ b/doc/installation_instructions/troubleshooting/offline.rst @@ -1,9 +1,9 @@ Airgapped Systems ----------------- -SmartSim implictly assumes that dependencies can be retrieved via the Internet. -The ``smart build`` step can be bypassed by transferring the build artifacts -from a different machine. +SmartSim assumes that dependencies can be retrieved via the Internet. The +``smart build`` step can be bypassed by transferring the build artifacts from a +different machine. .. warning:: @@ -13,10 +13,12 @@ from a different machine. `_). -The easiest way to accomplish this assumes that you have a machine that can be -connected to the internet and has built SmartSim (referred to as Machine A). -This machine should have a similar compilation and build environment as the -target machine (referred to as Machine B) to ensure compatibility. +The easiest way to accomplish this assumes that you have the following +- A source machine connected to the internet with SmartSim built (referred to as Machine A). +- A target machine not connected to the Internet + +.. warning:: + The build and compilation environments of Machine A and B must be compatibile. **Step 1:** Note the path to SmartSim's ``core`` directory on Machine A diff --git a/doc/installation_instructions/troubleshooting/troubleshooting.rst b/doc/installation_instructions/troubleshooting/troubleshooting.rst index 63878e8a35..c9490637fb 100644 --- a/doc/installation_instructions/troubleshooting/troubleshooting.rst +++ b/doc/installation_instructions/troubleshooting/troubleshooting.rst @@ -3,9 +3,9 @@ Installation Troubleshooting ============================ -SmartSim has been installed on a variety of systems and our users often have -different build environments and toolchains. The following two sections detail -some common situations and how to setup and modify your build environment: +SmartSim has been installed on a variety of systems with different build +environments and toolchains. The following two sections detail some common +situations and how to configure your build environment: .. include:: generic.rst