Skip to content

Conversation

@msaroufim
Copy link
Member

@msaroufim msaroufim commented Oct 17, 2024

Mostly copied from the torchvision workflows, upload working fine, need to test for real and will validate on my PC

Seems like it's working fine although I'm having issues building custom cuda extensions on Windows for cuda 11.8 only

  • FAILED: C:/actions-runner/_work/ao/ao/pytorch/ao/build/temp.win-amd64-cpython-39/Release/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.obj
  • FAILED: C:/actions-runner/_work/ao/ao/pytorch/ao/build/temp.win-amd64-cpython-39/Release/torchao/csrc/cuda/tensor_core_tiled_layout/tensor_core_tiled_layout.obj
  • FAILED: C:/actions-runner/_work/ao/ao/pytorch/ao/build/temp.win-amd64-cpython-39/Release/torchao/csrc/cuda/fp6_llm/fp6_linear.obj

So I tried to instead skip cuda 11.8 builds by adding the below, unfortunately the nova builds don't allow me to configure a specific cuda version but i suspect the below will just make build success but upload fail

REM Check CUDA version and skip build if it's 11.8
if "%CUDA_VERSION%" == "11.8" (
    echo Skipping build for CUDA 11.8
    exit /b 0
)

REM Set TORCH_CUDA_ARCH_LIST
set TORCH_CUDA_ARCH_LIST=8.0;8.6
if "%CU_VERSION%" == "cu124" set TORCH_CUDA_ARCH_LIST=%TORCH_CUDA_ARCH_LIST%;9.0

EDIT: Well considering cuda extensions arent build anywhere on windows and triton windows support is still experimental it seems fine as a stop gap to only publish windows cpu binaries for now

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1101

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9d6c5d3 with merge base 6b52996 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2024
@msaroufim msaroufim changed the title Create build_wheels_windows.yml [WIP] Create build_wheels_windows.yml Oct 17, 2024
@msaroufim msaroufim requested a review from atalman October 17, 2024 04:23
@msaroufim msaroufim mentioned this pull request Oct 17, 2024
17 tasks
@msaroufim msaroufim requested a review from cpuhrsch October 17, 2024 04:46
@gau-nernst
Copy link
Collaborator

The undefined symbols are probably caused by building for earlier compute capabilities. Something like this would fix it.

# Set ARCH list so that we can build fp16 with SM75+, the logic is copied from
# pytorch/builder
TORCH_CUDA_ARCH_LIST="8.0;8.6"
if [[ ${CU_VERSION:-} == "cu124" ]]; then
TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST};9.0"
fi

The CUDA sources are actually not being compiled in 12.1 and 12.4 job, so that is kinda strange.

@msaroufim msaroufim merged commit 7aaf0ff into main Oct 17, 2024
22 checks passed
@msaroufim msaroufim deleted the msaroufim-patch-24 branch October 17, 2024 18:23
@msaroufim msaroufim changed the title [WIP] Create build_wheels_windows.yml Create build_wheels_windows.yml Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants