|
| 1 | +# oneCCL Getting Started samples |
| 2 | +The CCL sample codes are implemented using C++, C and DPC++ language for CPU and GPU. |
| 3 | +By using all reduce collective operation samples, users can understand how to compile oneCCL codes with various oneCCL configurations in Intel oneAPI environment. |
| 4 | + |
| 5 | +| Optimized for | Description |
| 6 | +|:--- |:--- |
| 7 | +| OS | Linux Ubuntu 18.04; |
| 8 | +| Hardware | Kaby Lake with GEN9 or newer |
| 9 | +| Software | Intel oneAPI Collective Communications Library (oneCCL), Intel oneAPI DPC++/C++ Compiler, Intel oneAPI DPC++ Library (oneDPL), GNU Compiler |
| 10 | +| What you will learn | basic oneCCL programming model for both Intel CPU and GPU |
| 11 | +| Time to complete | 15 minutes |
| 12 | + |
| 13 | +## List of Samples |
| 14 | +| C++ API | C API | Collective Operation | |
| 15 | +| ------ | ------ | ------ | |
| 16 | +| sycl_allreduce_cpp_test.cpp | sycl_allreduce_test.cpp |[Allreduce](https://intel.github.io/oneccl/spec/communication_primitives.html#allreduce) | |
| 17 | +| cpu_allreduce_cpp_test.cpp | cpu_allreduce_test.cpp/cpu_allreduce_bfp16.c |[Allreduce](https://intel.github.io/oneccl/spec/communication_primitives.html#allreduce) | |
| 18 | +> Notice : Please use Intel oneAPI DevCloud as the environment for jupyter notebook samples. \ |
| 19 | +Users can refer to [DevCloud Getting Started](https://devcloud.intel.com/oneapi/get-started/) for using DevCloud \ |
| 20 | +Users can use JupyterLab from DevCloud via "One-click Login in", and download samples via "git clone" or the "oneapi-cli" tool \ |
| 21 | +Once users are in the JupyterLab with download jupytered notebook samples, they can start following the steps without further installion needed. |
| 22 | + |
| 23 | +## Purpose |
| 24 | +The samples implement the allreduce collective operation with oneCCL APIs. |
| 25 | +With the samples users will learn how to compile the code with various oneCCL configurations in Intel oneAPI environment. |
| 26 | + |
| 27 | +## License |
| 28 | +Those code samples are licensed under MIT license |
| 29 | + |
| 30 | +## Prerequisites |
| 31 | + |
| 32 | +### CPU |
| 33 | + |
| 34 | +----- |
| 35 | + |
| 36 | +The samples below require the following components, which are part of the [Intel oneAPI DL Framework Developer Toolkit (DLFD Kit) |
| 37 | +](https://software.intel.com/en-us/oneapi/dldev-kit) |
| 38 | +* Intel oneAPI Collective Communications Library (oneCCL) |
| 39 | + |
| 40 | +You can refer to this page [oneAPI](https://software.intel.com/en-us/oneapi) for toolkit installation. |
| 41 | + |
| 42 | + |
| 43 | +### GPU and CPU |
| 44 | + |
| 45 | +----- |
| 46 | + |
| 47 | +The samples below require the following components, which are part of the [Intel oneAPI Base Tookit](https://software.intel.com/en-us/oneapi/oneapi-kit) |
| 48 | +* Intel oneAPI Collective Communications Library (oneCCL) |
| 49 | +* Intel oneAPI DPC++/C++ Compiler |
| 50 | +* Intel oneAPI DPC++ Library (oneDPL) |
| 51 | + |
| 52 | +The samples also require OpenCL driver. Please refer [System Requirements](https://software.intel.com/en-us/articles/intel-oneapi-base-toolkit-system-requirements) for OpenCL driver installation. |
| 53 | + |
| 54 | + |
| 55 | +You can refer to this page [oneAPI](https://software.intel.com/en-us/oneapi) for toolkit installation. |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | + |
| 60 | +## Building the samples for CPU and GPU |
| 61 | + |
| 62 | +### on a Linux* System |
| 63 | + |
| 64 | +#### CPU only: |
| 65 | + |
| 66 | +- Build the samples with GCC for CPU only \ |
| 67 | + please replace ${ONEAPI_ROOT} for your installation path. \ |
| 68 | + ex : /opt/intel/oneapi \ |
| 69 | + Don't need to replace {DPCPP_CMPLR_ROOT} |
| 70 | + ``` |
| 71 | + source ${ONEAPI_ROOT}/setvars.sh --ccl-configuration=cpu_icc |
| 72 | +
|
| 73 | + cd oneapi-toolkit/oneCCL/oneCCL_Getting_Started |
| 74 | + mkdir build |
| 75 | + cd build |
| 76 | + cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ |
| 77 | + make cpu_allreduce_cpp_test |
| 78 | + ``` |
| 79 | +> NOTE: The source file "cpu_allreduce_cpp_test.cpp" will be copied from ${INTEL_ONEAPI_INSTALL_FOLDER}/ccl/latest/examples/cpu to build/src/cpu folder. |
| 80 | +Users can rebuild the cpu_allreduce_cpp_test.cpp by typing "make cpu_allreduce_cpp_test" under build folder. |
| 81 | + |
| 82 | +#### GPU and CPU: |
| 83 | + |
| 84 | +- Build the samples with SYCL for GPU and CPU \ |
| 85 | + please replace ${ONEAPI_ROOT} for your installation path. \ |
| 86 | + ex : /opt/intel/oneapi \ |
| 87 | + Don't need to replace {DPCPP_CMPLR_ROOT} |
| 88 | + ``` |
| 89 | + source ${ONEAPI_ROOT}/setvars.sh --ccl-configuration=cpu_gpu_dpcpp |
| 90 | +
|
| 91 | + cd oneapi-toolkit/oneCCL/oneCCL_Getting_Started |
| 92 | + mkdir build |
| 93 | + cd build |
| 94 | + cmake .. -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=dpcpp |
| 95 | + make sycl_allreduce_cpp_test |
| 96 | + ``` |
| 97 | +> NOTE: The source file "sycl_allreduce_cpp_test.cpp" will be copied from ${INTEL_ONEAPI_INSTALL_FOLDER}/ccl/latest/examples/sycl to build/src/sycl folder. |
| 98 | +Users can rebuild the sycl_allreduce_cpp_test.cpp by typing "make sycl_allreduce_cpp_test" under build folder. |
| 99 | + |
| 100 | +### Include Files |
| 101 | +The include folder is located at ${CCL_ROOT}}\include on your development system". |
| 102 | + |
| 103 | +## Running the Sample |
| 104 | + |
| 105 | +### on a Linux* System |
| 106 | + |
| 107 | +#### CPU only: |
| 108 | +- Run the program \ |
| 109 | + take cpu_allreduce_cpp_test for example. \ |
| 110 | + you can apply those steps for all other sample binaries. \ |
| 111 | + please replace the {NUMBER_OF_PROCESSES} with integer number accordingly |
| 112 | + |
| 113 | + ``` |
| 114 | + mpirun -n ${NUMBER_OF_PROCESSES} ./out/cpu/cpu_allreduce_cpp_test |
| 115 | + ``` |
| 116 | + |
| 117 | + ex: |
| 118 | + ``` |
| 119 | + mpirun -n 2 ./out/cpu/cpu_allreduce_cpp_test |
| 120 | + ``` |
| 121 | + |
| 122 | + |
| 123 | +#### GPU and CPU: |
| 124 | +- Run the program \ |
| 125 | + take sycl_allreduce_cpp_test for example. \ |
| 126 | + you can apply those steps for all other sample binaries. \ |
| 127 | + please replace the {NUMBER_OF_PROCESSES} with integer number accordingly |
| 128 | + |
| 129 | + ``` |
| 130 | + mpirun -n ${NUMBER_OF_PROCESSES} ./out/sycl/sycl_allreduce_cpp_test gpu|cpu|host|default |
| 131 | + ``` |
| 132 | + |
| 133 | + ex: run on GPU |
| 134 | + ``` |
| 135 | + mpirun -n 2 ./out/sycl/sycl_allreduce_cpp_test gpu |
| 136 | + ``` |
| 137 | + |
| 138 | + |
| 139 | +### Example of Output |
| 140 | + |
| 141 | +#### on Linux |
| 142 | +- Run the program on CPU or GPU following [How to Run Section](#running-the-sample) |
| 143 | +- CPU Results |
| 144 | + |
| 145 | + ``` |
| 146 | + Provided device type: cpu |
| 147 | + Running on Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz |
| 148 | + Example passes |
| 149 | + ``` |
| 150 | + please note that name of running device may vary according to your environment |
| 151 | + |
| 152 | + |
| 153 | +- GPU Results |
| 154 | + ``` |
| 155 | + Provided device type: gpu |
| 156 | + Running on Intel(R) Gen9 HD Graphics NEO |
| 157 | + Example passes |
| 158 | + ``` |
| 159 | + please note that name of running device may vary according to your environment |
| 160 | + |
| 161 | +- Enable oneCCL Verbose log |
| 162 | + |
| 163 | + There are different log levels in oneCCL. Users can refer to below table for different log levels. |
| 164 | + |
| 165 | + | CCL_LOG_LEVEL | value |
| 166 | + | :------ | :------ |
| 167 | + | ERROR | 0 |
| 168 | + | INFO | 1 |
| 169 | + | DEBUG | 2 |
| 170 | + | TRACE | 3 |
| 171 | + |
| 172 | + |
| 173 | + Users can enable oneCCL verbose log by following below command to see more |
| 174 | + runtime information from oneCCL. |
| 175 | + ``` |
| 176 | + export CCL_LOG_LEVEL=1 |
| 177 | + ``` |
| 178 | + |
0 commit comments