-
Notifications
You must be signed in to change notification settings - Fork 69
Description
🐛 Describe the bug
I could not replicate the scenario through simple test script. It is a bug found in vLLM ROCm when running meta-llama/Llama-4-Scout-17B-16E-Instruct
. This behaviour only occurs in HIPGraph mode + torch.compile
. In EAGER mode + torch.compile
, the contiguous() API and stride() API are consistent.
In the HIPGraph, it could occur that the tensor A
has the following properties:
.shape
: ([1024, 1])
.is_contiguous()
: True
.stride()
: [1,1024]
.is_contiguous(memory_format=torch.channels_last)
is False
.is_contiguous(memory_format=torch.contiguous_format)
is True
Expected behaviour is .stride(1) == 1
is True
as is_contiguous()
is True
.
This A = A.contiguous()
do not fix the issue. It is still .stride()
: [1,1024]
.
To fix this bug, the workaround right now is A = A.view(-1).reshape(A.shape)
.
On CUDA, the stride returns (1,1)
but on ROCm, it returns (1,1024)
Versions
Collecting environment information...
PyTorch version: 2.7.0a0+git295f2ed
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.3.42133-1b9c17779
OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.3.1 24491 1e0fda770a2079fbd71e4b70974d74f62fd3af10)
CMake version: version 3.31.6
Libc version: glibc-2.35
Python version: 3.12.9 (main, Feb 5 2025, 08:49:00) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-116-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Instinct MI300X (gfx942:sramecc+:xnack-)
Nvidia driver version: Could not collect cuDNN version: Could not collect
HIP runtime version: 6.3.42133
MIOpen runtime version: 3.3.0
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.7.0a0+git295f2ed
[pip3] torchvision==0.21.0+7af6987
[pip3] triton==3.2.0+gite5be006a