Skip to content

Conversation

@crcrpar
Copy link
Collaborator

@crcrpar crcrpar commented Dec 15, 2021

As per title to speed up apex build.

The main change is

def append_nvcc_threads(nvcc_extra_args):
    return nvcc_extra_args + ["--threads", "4"]

.
With this, in my local environment, apex build got improved from 15m45s to10m49s.

Build command: CC=/usr/bin/gcc-9 CXX=/usr/bin/g++-9 TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX" CFLAGS="-g0" pip install -q --no-cache-dir --disable-pip-version-check --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--bnp" --global-option="--xentropy" --global-option="--deprecated_fused_adam" --global-option="--deprecated_fused_lamb" --global-option="--fast_multihead_attn" --global-option="--distributed_lamb" --global-option="--fast_layer_norm" --global-option="--transducer" --global-option="--distributed_adam" --global-option="--fmha" --global-option="--fast_bottleneck" .

cc @xwang233 @ptrblck

@crcrpar crcrpar marked this pull request as ready for review December 15, 2021 06:35
@crcrpar crcrpar merged commit f63dac8 into NVIDIA:master Dec 15, 2021
@crcrpar crcrpar deleted the nvcc-threads branch December 15, 2021 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant