Skip to content

Conversation

@will-cromar
Copy link
Contributor

@will-cromar will-cromar commented Feb 26, 2024

Also bump ecosystem packages (torchtext, torchvision, torchaudio) to latest versions

@will-cromar
Copy link
Contributor Author

cc @djherbis

@djherbis djherbis merged commit 3434de7 into Kaggle:main Feb 26, 2024
@djherbis
Copy link
Contributor

Thanks!

@will-cromar
Copy link
Contributor Author

I saw that the Jenkins workflow failed but can't see the reason. Feel free to revert this PR if you need to.

I experimented in my own notebook and ran into this error:

024-02-26 22:23:13.206001: I external/xla/xla/pjrt/pjrt_api.cc:146] The PJRT plugin has PJRT API version 0.32. The framework PJRT API version is 0.40.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1708986193.206053     185 pjrt_computation_client.cc:158] Non-OK-status: tpu_status status: INVALID_ARGUMENT: Mismatched PJRT plugin PJRT API version (0.32) and framework PJRT API version 0.40).

It looks like it's getting the wrong libtpu somewhere, but the bundled libtpu does exist: /usr/local/lib/python3.10/site-packages/torch_xla/lib/libtpu.so

@will-cromar
Copy link
Contributor Author

It looks like TPU_LIBRARY_PATH is getting overridden to the libtpu package:

image

In the 2.2 release, we intentionally don't write that value, but we do treat it as an override: https://github.com/pytorch/xla/blob/v2.2.0/torch_xla/__init__.py#L98-L125

@djherbis
Copy link
Contributor

Looks like this is the actual cause of the failure:

#13 90.98 ERROR: Cannot install torch==2.2.0 and torchvision==0.17.1 because these package versions have conflicting dependencies.

#13 90.98 

#13 90.98 The conflict is caused by:

#13 90.98     The user requested torch==2.2.0

#13 90.98     torchvision 0.17.1 depends on torch==2.2.1

@will-cromar Can we use 2.2.1?

@djherbis
Copy link
Contributor

Trying 2.2.1: da1e2ec

@will-cromar
Copy link
Contributor Author

will-cromar commented Feb 27, 2024

Trying 2.2.1: da1e2ec

2.2.1 won't work because torch_xla hasn't published a corresponding patch release. What you can do is keep 2.2.0 in the config, then switch the torch==${TORCH_VERSION} for torch~=${TORCH_VERSION}. This will install the latest patch release for torch under 2.2.

Edit: You can also create a separate TORCH_XLA_VERSION. In general, we don't publish patch releases on the same schedule as upstream torch.

@djherbis
Copy link
Contributor

Thanks, trying that: #1365

@djherbis
Copy link
Contributor

Nice, it builds at least :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants