Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
194 commits
Select commit Hold shift + click to select a range
a2f1736
add sparse_marlin kernel to the build
Oct 17, 2024
f817edf
drop .h from conversion
Oct 17, 2024
c9bc1bc
cp_asyc4_pred_zfill() AMD implementation
Oct 17, 2024
16feff4
implement matching mem utility with amd GCN isa
Oct 18, 2024
0b21555
implement mma util with amd gcn isa
Oct 18, 2024
f23b194
enable rocm path
Oct 18, 2024
ecc3927
update copy from global to lds
lcskrishna Oct 22, 2024
a80730b
implement cvta_to_shared()
Oct 23, 2024
d2c7ce4
consolidate code with cvta_to_shared()
Oct 23, 2024
15974c7
Merge branch 'main' into rocm_sparse_marlin
petrex Jan 8, 2025
a4e8c30
lint
Jan 8, 2025
c678cb0
add GPU arch check for MI300x
Jan 9, 2025
08d1cfb
revert change in tensor_core_tile_layout.cu
Jan 9, 2025
b5b739b
Skip tests on fbcode
jainapurva Jan 9, 2025
982141b
Make it easer to isolate test cases (#1537)
drisspg Jan 10, 2025
cedadc7
Fix failing docs build in CI (#1542)
jainapurva Jan 10, 2025
9c2635b
torchao setup.py with cmake
metascroy Jan 10, 2025
79979ec
SAM2: Rerun batch size 1 experiments on latest nightly (#1543)
cpuhrsch Jan 10, 2025
24a78fe
Add run_tutorials github action and fix existing errors (#1546)
jerryzh168 Jan 10, 2025
6d6aa01
Add support for eager mode performance (#1539)
jerryzh168 Jan 11, 2025
1651ffa
Update run_tutorials.yml (#1550)
jerryzh168 Jan 11, 2025
ad61822
Remove temp build files from torchao (#1551)
metascroy Jan 11, 2025
f15ec15
Add convert path for quantize_ QAT API (#1540)
andrewor14 Jan 13, 2025
d57704c
Update QAT READMEs using new APIs (#1541)
andrewor14 Jan 13, 2025
12a58cf
Fix run_tutorials code (#1552)
jerryzh168 Jan 13, 2025
7b3caa6
Verify that submodules are checked out (#1536)
alexsamardzic Jan 13, 2025
9ea7d30
[cleanup][1/x] make hp_tensor_to_float8_dynamic only work with hp inp…
vkuzo Jan 13, 2025
2ec9bc1
[cleanup][2/x] split float8 mm by delayed vs dynamic (#1461)
vkuzo Jan 13, 2025
12396c6
[cleanup][3/x] unify dynamic input and grad_output casting (#1480)
vkuzo Jan 13, 2025
de5c6e1
Make sure tests are ran with pytest (#1538)
drisspg Jan 13, 2025
b3deb16
Fix torch.intx support in FakeQuantizeConfig (#1544)
andrewor14 Jan 14, 2025
0bc5b00
Clean up linear_int8_dynamic_activation_intx_weight_subclass
metascroy Jan 14, 2025
71c6231
SAM2 Modal script extensions (#1500)
cpuhrsch Jan 14, 2025
1c0ea5b
Fix float related autoquant options (#1562)
jerryzh168 Jan 15, 2025
11333ba
Update __init__.py to load experimental ops even if other C++ ops are…
metascroy Jan 15, 2025
e1cb44a
Bug Fix (#1559): sparsity instead of sparstiy (#1560)
jaewoosong Jan 15, 2025
b96196b
Merge branch 'main' into rocm_sparse_marlin
petrex Jan 15, 2025
aea9d81
lint
Jan 15, 2025
f90b29e
[float8nocompile] support option to not precompute fp8 tensor for bac…
danielvegamyhre Jan 16, 2025
5e59b51
[float8nocompile] add e2e fsdp test (#1523)
danielvegamyhre Jan 16, 2025
522f5b8
[float8nocompile] add triton kernel which does fp8 conversion to col …
danielvegamyhre Jan 16, 2025
74a15f1
Add a register_replacement to fix float8 delayed scaling kernel fusio…
y-sq Jan 16, 2025
eea4d25
Update version to 0.9.0 (#1568)
jainapurva Jan 16, 2025
f520c91
Update supported dtypes for fp8 (#1573)
jainapurva Jan 17, 2025
cf45336
Relax dtype requirements for int4 and float8 quants in autoquant (#1571)
jerryzh168 Jan 17, 2025
d96c6a7
Enable ROCM in CI (#999)
msaroufim Jan 17, 2025
a1c67b9
Skip Unit Tests for ROCm CI (#1563)
petrex Jan 17, 2025
69f3795
Delete unused QAT utils code (#1579)
andrewor14 Jan 17, 2025
9afaabb
Revert "Skip Unit Tests for ROCm CI" (#1580)
andrewor14 Jan 17, 2025
1240b19
Revert "Enable ROCM in CI" (#1583)
andrewor14 Jan 17, 2025
32d9b0b
Fix CI linux_job permissions (#1576)
jainapurva Jan 18, 2025
ea7910e
Refactor s8s4_linear_cutlass() (#1545)
alexsamardzic Jan 21, 2025
5d1444b
Sparsity docs update (#1590)
jainapurva Jan 21, 2025
166a357
Sparsity getting started docs (#1592)
jainapurva Jan 22, 2025
602ba86
gate sparsity tests by presence of cusparselt (#1602)
vkuzo Jan 23, 2025
d0e434c
Fix broken link on doc page (#1582)
andrewor14 Jan 23, 2025
e53edaa
pin nightlies to 20250122 (#1608)
vkuzo Jan 23, 2025
52280bb
[BE] Only run docs build in CI if docs have changed (#1589)
danielvegamyhre Jan 23, 2025
2d4c848
[float8nocompile] Add float8nocompile CI tests which only trigger on …
danielvegamyhre Jan 24, 2025
4ed93b9
[CPU] Fix registration of int4wo linear implementation on CPU (#1578)
Xia-Weiwen Jan 24, 2025
0fae693
Add H100 to Float8 CI for testing (#1575)
jainapurva Jan 24, 2025
4e4f4df
Add quick start guide for first time users (#1611)
andrewor14 Jan 24, 2025
70be245
Move fpx to tensor subclass (#1603)
jainapurva Jan 24, 2025
fb335e0
Revert "Move fpx to tensor subclass" (#1616)
jainapurva Jan 24, 2025
6c3bc53
Update api_ref_dtypes docs (#1610)
jainapurva Jan 24, 2025
860da26
Add module swap -> tensor subclass migration tutorial (#1596)
andrewor14 Jan 24, 2025
11440c2
mx cleanup [1/x]: unbreak mx_formats tests (#1569)
vkuzo Jan 24, 2025
6b472e5
mx cleanup [2/x]: refactor mx gemm (#1593)
vkuzo Jan 24, 2025
47f96f1
add separate quantization primitives for float8 (#1597)
danielvegamyhre Jan 25, 2025
09dd636
Prepare for -DPy_LIMITED_API flag in pytorch #145764 (#1627)
janeyx99 Jan 27, 2025
13bd59e
Update docs to refer to version.html (#1631)
jainapurva Jan 27, 2025
e151d6a
notify when CI job fails (#1547)
HDCharles Jan 28, 2025
abd41e5
Add torchao/experimental CI test (#1586)
metascroy Jan 28, 2025
7b0d2ce
Consolidate `ZeroPointDomain.NONE` & `None` zero point domains (#1556)
sanchitintel Jan 29, 2025
2aed684
Pass all args to pytest.main to propagate user options like -k (#1640)
janeyx99 Jan 29, 2025
2d8c8eb
only run docs CI jobs on PRs when docs have changed (#1612)
danielvegamyhre Jan 29, 2025
0c42823
Fix `.item()` issue in running parallel evaluation for BO mixed preci…
haodongucsb Jan 29, 2025
aa0b7ca
Split contributor guide into quantization overview (#1618)
andrewor14 Jan 29, 2025
c1f5872
Update api_ref_quantization docs (#1619)
andrewor14 Jan 29, 2025
b559c6d
[Experimental][Kleidi] Add GEMM operator tests (#1638)
digantdesai Jan 30, 2025
463a872
skip failing MX tests on cuda capability 10.0 (#1624)
vkuzo Jan 30, 2025
7815262
[Feat]: Add support for kleidiai quantization schemes (#1447)
nikhil-arm Jan 30, 2025
48fdd31
Ruff lint (#1646)
metascroy Jan 30, 2025
3eb18e7
float8 rowwise training: add FSDP workaround (#1629)
vkuzo Jan 31, 2025
122eb73
more stringent test for CPUOffloadOptimizer (#1650)
ngc92 Feb 1, 2025
6ffe236
Fix LR scheduler issue with CPU offload optimizer (#1649)
gau-nernst Feb 2, 2025
7e54629
Fix ruff and make sure pre-commit is at same version (#1658)
drisspg Feb 4, 2025
b2fb664
Add int8 dynamic activation + int8 weight only test to TensorParallel…
jainapurva Feb 4, 2025
1a4c8f9
Add CUTLASS-based W4A4 (#1515)
gau-nernst Feb 5, 2025
8afd10e
Fix compile issue for Marin qqq on sm<8.0 (#1651)
gau-nernst Feb 5, 2025
8d14f0e
SAM2: more export, small perf improvements (#1673)
cpuhrsch Feb 5, 2025
4df4d03
Moved CUTLASS pin to v3.7.0 (#1672)
alexsamardzic Feb 5, 2025
bc1530b
Q dq layout (#1642)
metascroy Feb 5, 2025
c6611be
Remove duplicate definitions of fill_defaults (#1674)
jainapurva Feb 6, 2025
867a91f
update notify in build_wheels_linux.yml (#1676)
HDCharles Feb 6, 2025
1d75c8f
Support mixed MX element dtype in `mx_mm` function and `MXLinear`. (#…
balancap Feb 6, 2025
753ba98
Test fix (#1678)
jainapurva Feb 6, 2025
d1e6c03
CI fix for linux wheels (#1679)
jainapurva Feb 6, 2025
cc6244c
Add boiler plate code to Tensor subclass (#1663)
jainapurva Feb 7, 2025
e7aa4ca
add a deprecation warning for float8 delayed and static scaling (#1681)
vkuzo Feb 7, 2025
c8eb8d3
Lint fixes for fbcode (#1682)
jainapurva Feb 7, 2025
4d1c774
SAM2: Modal experiments QoL improvements (#1683)
cpuhrsch Feb 9, 2025
bae41d1
mx: add ceil and RNE rounding modes to the cast from fp32 to e8m0 (#1…
vkuzo Feb 10, 2025
32a51ec
Support power of 2 scaling factors in float8 training and use e4m3 ev…
danielvegamyhre Feb 10, 2025
999b16d
Add third_party to exclude (#1692)
drisspg Feb 11, 2025
d99785c
Update float8nocompile readme (#1693)
danielvegamyhre Feb 11, 2025
39dd340
Change TORCH_LIBRARY to TORCH_LIBRARY_FRAGMENT (#1645)
metascroy Feb 12, 2025
682ffd5
Update to cutlass 3.8 (#1634)
drisspg Feb 12, 2025
aa51486
SAM2: Collect p90 latency statistics (#1703)
cpuhrsch Feb 12, 2025
d3306b2
Add mx_fp8_bf16 kernel (#1637)
drisspg Feb 12, 2025
dff29c0
Fix use_hqq for int4_weight_only quantize (#1707)
jainapurva Feb 13, 2025
52f4737
[bc-breaking] enable direct configuration in quantize_ (#1595)
vkuzo Feb 14, 2025
2e51872
config migration: float8* (#1694)
vkuzo Feb 14, 2025
6fe41c2
config migration: int* (#1696)
vkuzo Feb 14, 2025
413689d
config migration: fpx, gemlite, uintx (#1697)
vkuzo Feb 14, 2025
17b9ce3
unbreak float8 static quant tutorial (#1709)
vkuzo Feb 14, 2025
3fa8e44
migrate static quant tutorials to direct configuration (#1710)
vkuzo Feb 14, 2025
12e830b
update torchao READMEs with new configuration APIs (#1711)
vkuzo Feb 14, 2025
3227472
make quantize_.set_inductor_config None by default (#1716)
vkuzo Feb 14, 2025
c3bb80e
mx formats: create MXLinearConfig (#1688)
vkuzo Feb 14, 2025
40d01cd
MX: move block_size and elem_dtype into MXLinearConfig (#1689)
vkuzo Feb 14, 2025
8fc49fe
MX: hook up mxfp8 and mxfp4 CUTLASS kernels to MXLinear (#1713)
vkuzo Feb 14, 2025
22d7d51
Reformat (#1723)
metascroy Feb 18, 2025
aa9b9c9
Fix `DDP` with `nf4` (#1684)
jeromeku Feb 18, 2025
f2e8f56
notify on wheel failure for aarch, m1, windows (#1725)
HDCharles Feb 18, 2025
7b37eb0
Make TorchAO cpp/Python extension
drisspg Feb 18, 2025
988c5c9
fix tensor parallelism for float8 training with rowwise scaling (#1718)
vkuzo Feb 18, 2025
79ac44e
Promote Supermask out of prototype (#1729)
jcaip Feb 18, 2025
c59561a
SAM2: Update README.md (#1735)
cpuhrsch Feb 19, 2025
7fc8ad4
float8 training: clean up recipe names (#1730)
vkuzo Feb 19, 2025
c6c388b
float8 training: make the "config from recipe" API polished (#1731)
vkuzo Feb 19, 2025
ed16fe7
float8 training: add README.md entry for rowwise scaling (#1733)
vkuzo Feb 19, 2025
ceceea5
promote blocksparse from prototype, make it faster (#1734)
jcaip Feb 19, 2025
217d968
Make FakeQuantizer expose useful config details (#1717)
andrewor14 Feb 19, 2025
4780e10
Update version.txt to 0.10.0 (#1714)
HDCharles Feb 20, 2025
f6f3322
Add ukernel selection logic + clean up KleidiAI integration (#1652)
metascroy Feb 20, 2025
0293bcd
Remove duplicate, confusing conditional in setup.py (#1748)
janeyx99 Feb 20, 2025
6bab4db
SAM2: Use torch.export for VOS (#1708)
cpuhrsch Feb 20, 2025
1c76736
Fix ruff for torchao/float8/config.py (#1750)
cpuhrsch Feb 21, 2025
dc0134e
Add ciflow/rocm to bot-created tags (#1749)
jithunnair-amd Feb 21, 2025
e0f7148
Update to cutlass 3.8 tag (#1754)
drisspg Feb 21, 2025
878ec7a
Add linear bias support for QAT (#1755)
andrewor14 Feb 21, 2025
ed361ff
[Reland] ROCm CI (Infra + Skips) (#1581)
petrex Feb 21, 2025
c72ebc6
move decorators to testing/utils.py (#1761)
jcaip Feb 22, 2025
25ddb77
Allow for scales to be in new e8m0 dtype (#1742)
drisspg Feb 22, 2025
d370196
delete delayed scaling from torchao.float8 (#1753)
vkuzo Feb 22, 2025
2a3fbff
MX Updated to_blocked to not call nn.pad (#1762)
drisspg Feb 22, 2025
8d38814
add MX support to lowp training profiling script (#1765)
vkuzo Feb 24, 2025
bac039f
Update README.md (#1758)
jerryzh168 Feb 24, 2025
09ebb12
mx bench: add cast with to_blocked (#1771)
vkuzo Feb 24, 2025
089cd7e
update mixed mm weight only quant test to work w mixed mm deletion (#…
eellison Feb 24, 2025
38e36de
Auto-fix lint violations from Fixit] fbcode//pytorch/ao (#1752)
facebook-github-bot Feb 24, 2025
98c4e2e
Fix potential out-of-bound access in int8_mm.py (#1751)
mark14wu Feb 25, 2025
f18043d
Merge branch 'main' into rocm_sparse_marlin
petrex Feb 25, 2025
8706d3f
Fix internal test_linear_8bit_act_xbit_weightAppleMac
metascroy Feb 26, 2025
7d87946
[1/x] float8 cleanup: remove float8_python_api (#1779)
vkuzo Feb 26, 2025
d00ee41
[2/x] float8 cleanup: move roofline utils to testing (#1780)
vkuzo Feb 26, 2025
8d110bf
modify cast from hp to mx to help inductor fuse (#1786)
vkuzo Feb 26, 2025
1ab1b77
add a benchmark for casting a tensor to MX across dim0 and dim1 (#1787)
vkuzo Feb 26, 2025
c788ee7
[1/x] mx roofline: make the script work on NVIDIA B200 (#1778)
vkuzo Feb 27, 2025
e6706ca
roofline estimation: delete scaling type (#1781)
vkuzo Feb 27, 2025
cd69415
roofline estimation: delete axiswise scaling, for now (#1782)
vkuzo Feb 27, 2025
f478692
roofline estimator: simplify (#1783)
vkuzo Feb 27, 2025
79e3366
Add support for copy_ for plain layout and tensor core tiled layout (…
jerryzh168 Feb 28, 2025
b9c51b7
Updating Cuda 12.1/12.4 to 12.4/12.6 to reflect current state (#1794)
HDCharles Feb 28, 2025
ac832b0
Fixing DORA imports (#1795)
HDCharles Feb 28, 2025
890e0ac
Use exp2 for mx scaling (#1530)
drisspg Feb 28, 2025
3219318
bugfix clean_release_notes.py (#1801)
HDCharles Feb 28, 2025
4a4925f
Revert "Add support for copy_ for plain layout and tensor core tiled …
jainapurva Feb 28, 2025
8f93751
metal lowbit kernels: pip install (#1785)
manuelcandales Mar 1, 2025
7963f9c
[float8] add float8 training benchmarking scripts (#1802)
danielvegamyhre Mar 1, 2025
3bc1dd4
Silence loud error on torchao cpu builds (#1808)
msaroufim Mar 3, 2025
55600a1
Delete DORA (#1815)
msaroufim Mar 3, 2025
914de78
Revert "Use exp2 for mx scaling" (#1813)
jainapurva Mar 3, 2025
bc54ae5
Fix experimental CI (#1820)
metascroy Mar 3, 2025
7b496c9
Remove split_k kernel (#1816)
msaroufim Mar 3, 2025
e2f4ab4
CPUOffload: only offload parameters above a certain size (#1720)
ngc92 Mar 4, 2025
2c2a590
update typehint (#1740)
crcrpar Mar 4, 2025
81a2813
Move torchao/_models to benchmarks/_models (#1784)
jainapurva Mar 4, 2025
d8af7d7
roofline estimator: add float8 rowwise and mxfp8 recipe support (#1789)
vkuzo Mar 4, 2025
173d9bf
metal lowbit ops: ci (#1825)
manuelcandales Mar 4, 2025
e767713
Fix experimental CI (#1827)
metascroy Mar 4, 2025
9bcd73b
Optionally enable KleidiAI + clean up setup.py flags (#1826)
metascroy Mar 4, 2025
8b34390
Merge branch 'main' into rocm_sparse_marlin
petrex Mar 4, 2025
1ff8592
Fix float8nocompile CI workflow (#1695)
danielvegamyhre Mar 4, 2025
d4be9e4
ROCm Support : Tile_Layout kernel (#1201)
petrex Mar 4, 2025
883dc65
ruff fix for setup.py (#1833)
jcaip Mar 4, 2025
75b6816
Merge branch 'main' into rocm_sparse_marlin
petrex Mar 4, 2025
8124a58
lint
Mar 4, 2025
29d1be6
fix gpu_arch
Mar 6, 2025
617e792
Improve ROCm GPU architecture detection in setup.py
Mar 6, 2025
3db4c4d
Refactor CUDA/ROCm source file handling in setup.py
Mar 6, 2025
92fedc8
Improve CUDA/ROCm extension build configuration
Mar 10, 2025
67a538a
Add detailed logging for CUDA/ROCm source file discovery
Mar 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .github/pytorch-probot.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
mergebot: True
ciflow_push_tags:
- ciflow/benchmark
- ciflow/tutorials
- ciflow/rocm
31 changes: 31 additions & 0 deletions .github/workflows/build-wheels_m1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,34 @@ jobs:
runner-type: macos-m1-stable
smoke-test-script: test/smoke_test.py
trigger-event: ${{ github.event_name }}
notify:
runs-on: ubuntu-latest
name: Email notification
needs: [generate-matrix, build]
if: failure() && github.event_name == 'schedule'
steps:
- uses: dawidd6/action-send-mail@v4
with:
server_address: smtp.gmail.com
server_port: 465
username: torchao.notify
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
from: [email protected]
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
subject: Scheduled Build Failure for TorchAO
body: |
Build Failure Notification for TorchAO
A failure occurred in the Build Linux Wheels workflow.
Run Details:
- Workflow: ${{ github.workflow }}
- Run Type: ${{ github.event_name }}
- Repository: ${{ github.repository }}
- Branch/PR: ${{ github.ref }}
- Commit: ${{ github.sha }}
You can view the full run details here:
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
Error Information:
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}

This is an automated notification. Please check the GitHub Actions page for more details about the failure.
34 changes: 33 additions & 1 deletion .github/workflows/build_wheels_aarch64_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ jobs:
test-infra-repository: pytorch/test-infra
test-infra-ref: main
with-cuda: disable

# please note: excluding 3.13t for aarch64 builds for now
python-versions: '["3.9", "3.10", "3.11", "3.12", "3.13"]'
build:
needs: generate-matrix
permissions:
Expand All @@ -53,3 +54,34 @@ jobs:
setup-miniconda: false
secrets:
PYPI_API_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
notify:
runs-on: ubuntu-latest
name: Email notification
needs: [generate-matrix, build]
if: failure() && github.event_name == 'schedule'
steps:
- uses: dawidd6/action-send-mail@v4
with:
server_address: smtp.gmail.com
server_port: 465
username: torchao.notify
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
from: [email protected]
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
subject: Scheduled Build Failure for TorchAO
body: |
Build Failure Notification for TorchAO
A failure occurred in the Build AARCH64 Wheels workflow.
Run Details:
- Workflow: ${{ github.workflow }}
- Run Type: ${{ github.event_name }}
- Repository: ${{ github.repository }}
- Branch/PR: ${{ github.ref }}
- Commit: ${{ github.sha }}
You can view the full run details here:
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
Error Information:
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}

This is an automated notification. Please check the GitHub Actions page for more details about the failure.
37 changes: 37 additions & 0 deletions .github/workflows/build_wheels_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ jobs:
with-cuda: enable
with-rocm: enable
with-xpu: enable
# please note: excluding 3.13t for aarch64 builds for now
python-versions: '["3.9", "3.10", "3.11", "3.12", "3.13"]'

build:
needs: generate-matrix
Expand All @@ -56,3 +58,38 @@ jobs:
upload-to-pypi: cu121
secrets:
PYPI_API_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
notify:
runs-on: ubuntu-latest
name: Email notification
needs: [generate-matrix, build]
if: failure() && github.event_name == 'schedule'
steps:
- uses: dawidd6/action-send-mail@v4
with:
server_address: smtp.gmail.com
server_port: 465
username: torchao.notify
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
from: [email protected]
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
subject: Scheduled Build Failure for TorchAO
body: |
Build Failure Notification for TorchAO

A failure occurred in the Build Linux Wheels workflow.

Run Details:
- Workflow: ${{ github.workflow }}
- Run Type: ${{ github.event_name }}
- Repository: ${{ github.repository }}
- Branch/PR: ${{ github.ref }}
- Commit: ${{ github.sha }}

You can view the full run details here:
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}

Error Information:
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}

This is an automated notification. Please check the GitHub Actions page for more details about the failure.
35 changes: 35 additions & 0 deletions .github/workflows/build_wheels_windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,38 @@ jobs:
package-name: ${{ matrix.package-name }}
smoke-test-script: ${{ matrix.smoke-test-script }}
trigger-event: ${{ github.event_name }}
notify:
runs-on: ubuntu-latest
name: Email notification
needs: [generate-matrix, build]
if: failure() && github.event_name == 'schedule'
steps:
- uses: dawidd6/action-send-mail@v4
with:
server_address: smtp.gmail.com
server_port: 465
username: torchao.notify
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
from: [email protected]
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
subject: Scheduled Build Failure for TorchAO
body: |
Build Failure Notification for TorchAO

A failure occurred in the Build Windows Wheels workflow.

Run Details:
- Workflow: ${{ github.workflow }}
- Run Type: ${{ github.event_name }}
- Repository: ${{ github.repository }}
- Branch/PR: ${{ github.ref }}
- Commit: ${{ github.sha }}

You can view the full run details here:
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}

Error Information:
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}

This is an automated notification. Please check the GitHub Actions page for more details about the failure.
10 changes: 5 additions & 5 deletions .github/workflows/dashboard_perf_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,19 +42,19 @@ jobs:

mkdir -p ${{ runner.temp }}/benchmark-results
# llama3 - compile baseline
${CONDA_RUN} python torchao/_models/llama/generate.py --checkpoint_path "${CHECKPOINT_PATH}/${MODEL_REPO}/model.pth" --compile --compile_prefill --output_json_path ${{ runner.temp }}/benchmark-results/llama3-benchmark-results.json
${CONDA_RUN} python benchmarks/_models/llama/generate.py --checkpoint_path "${CHECKPOINT_PATH}/${MODEL_REPO}/model.pth" --compile --compile_prefill --output_json_path ${{ runner.temp }}/benchmark-results/llama3-benchmark-results.json

# llama3 - autoquant
${CONDA_RUN} python torchao/_models/llama/generate.py --checkpoint_path "${CHECKPOINT_PATH}/${MODEL_REPO}/model.pth" --compile --compile_prefill --quantization autoquant --output_json_path ${{ runner.temp }}/benchmark-results/llama3-benchmark-results.json
${CONDA_RUN} python benchmarks/_models/llama/generate.py --checkpoint_path "${CHECKPOINT_PATH}/${MODEL_REPO}/model.pth" --compile --compile_prefill --quantization autoquant --output_json_path ${{ runner.temp }}/benchmark-results/llama3-benchmark-results.json

# skipping SAM because of https://hud.pytorch.org/pr/pytorch/ao/1407
# # SAM
# ${CONDA_RUN} pip install git+https://github.com/pytorch-labs/segment-anything-fast.git@main
# # SAM compile baselilne
# ${CONDA_RUN} sh torchao/_models/sam/setup.sh
# ${CONDA_RUN} python torchao/_models/sam/eval_combo.py --coco_root_dir datasets/coco2017 --coco_slice_name val2017 --sam_checkpoint_base_path checkpoints --sam_model_type vit_h --point_sampling_cache_dir tmp/sam_coco_mask_center_cache --mask_debug_out_dir tmp/sam_eval_masks_out --batch_size 32 --num_workers 8 --use_compile max-autotune --use_half bfloat16 --device cuda --output_json_path ${{ runner.temp }}/benchmark-results/sam-benchmark-results.json
# ${CONDA_RUN} sh benchmarks/_models/sam/setup.sh
# ${CONDA_RUN} python benchmarks/_models/sam/eval_combo.py --coco_root_dir datasets/coco2017 --coco_slice_name val2017 --sam_checkpoint_base_path checkpoints --sam_model_type vit_h --point_sampling_cache_dir tmp/sam_coco_mask_center_cache --mask_debug_out_dir tmp/sam_eval_masks_out --batch_size 32 --num_workers 8 --use_compile max-autotune --use_half bfloat16 --device cuda --output_json_path ${{ runner.temp }}/benchmark-results/sam-benchmark-results.json

# ${CONDA_RUN} python torchao/_models/sam/eval_combo.py --coco_root_dir datasets/coco2017 --coco_slice_name val2017 --sam_checkpoint_base_path checkpoints --sam_model_type vit_h --point_sampling_cache_dir tmp/sam_coco_mask_center_cache --mask_debug_out_dir tmp/sam_eval_masks_out --batch_size 32 --num_workers 8 --use_compile max-autotune --use_half bfloat16 --device cuda --compression autoquant --output_json_path ${{ runner.temp }}/benchmark-results/sam-benchmark-results.json
# ${CONDA_RUN} python benchmarks/_models/sam/eval_combo.py --coco_root_dir datasets/coco2017 --coco_slice_name val2017 --sam_checkpoint_base_path checkpoints --sam_model_type vit_h --point_sampling_cache_dir tmp/sam_coco_mask_center_cache --mask_debug_out_dir tmp/sam_eval_masks_out --batch_size 32 --num_workers 8 --use_compile max-autotune --use_half bfloat16 --device cuda --compression autoquant --output_json_path ${{ runner.temp }}/benchmark-results/sam-benchmark-results.json

# SAM 2.1
# ${CONDA_RUN} sh scripts/download_sam2_ckpts.sh ${CHECKPOINT_PATH}/sam2
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/doc_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ on:
- v[0-9]+.[0-9]+.[0-9]
- v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+
pull_request:
paths:
- 'docs/**'
- '!docs/**'
workflow_dispatch:

concurrency:
Expand Down Expand Up @@ -91,7 +94,7 @@ jobs:
ref: gh-pages
persist-credentials: true
- name: Download artifact
uses: actions/download-artifact@v3
uses: actions/download-artifact@v4
with:
name: Doc-Build
path: docs
Expand Down
12 changes: 10 additions & 2 deletions .github/workflows/float8_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,18 @@ jobs:
include:
- name: SM-89
runs-on: linux.g6.4xlarge.experimental.nvidia.gpu
torch-spec: '--pre torch --index-url https://download.pytorch.org/whl/nightly/cu121'
torch-spec: '--pre torch==2.7.0.dev20250122 --index-url https://download.pytorch.org/whl/nightly/cu124'
gpu-arch-type: "cuda"
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
- name: H100
runs-on: linux.aws.h100
torch-spec: '--pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124'
gpu-arch-type: "cuda"
gpu-arch-version: "12.4"

permissions:
id-token: write
contents: read
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
timeout: 60
Expand Down
53 changes: 53 additions & 0 deletions .github/workflows/float8nocompile_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Run Float8nocompile Tests

on:
push:
branches:
- main
- 'gh/**'
paths:
- 'torchao/prototype/float8nocompile/**'
pull_request:
branches:
- main
- 'gh/**'
paths:
- 'torchao/prototype/float8nocompile/**'

concurrency:
group: floatnocompile_test-${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_number || github.ref }}
cancel-in-progress: true

env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}

jobs:
test:
strategy:
fail-fast: false
matrix:
include:
- name: SM-89
runs-on: linux.g6.4xlarge.experimental.nvidia.gpu
torch-spec: '--pre torch --index-url https://download.pytorch.org/whl/nightly/cu121'
gpu-arch-type: "cuda"
gpu-arch-version: "12.1"

uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
timeout: 300
runner: ${{ matrix.runs-on }}
gpu-arch-type: ${{ matrix.gpu-arch-type }}
gpu-arch-version: ${{ matrix.gpu-arch-version }}
submodules: recursive
script: |
conda create -n venv python=3.9 -y
conda activate venv
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
python -m pip install --upgrade pip
pip install ${{ matrix.torch-spec }}
pip install -r dev-requirements.txt
pip install .
cd torchao/prototype/float8nocompile
pytest kernels/ --verbose -s
pytest test/train_test.py --verbose -s
10 changes: 6 additions & 4 deletions .github/workflows/nightly_smoke_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ concurrency:
cancel-in-progress: true

env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}

jobs:
test:
Expand All @@ -21,11 +21,13 @@ jobs:
include:
- name: CUDA Nightly
runs-on: linux.g5.12xlarge.nvidia.gpu
torch-spec: '--pre torch --index-url https://download.pytorch.org/whl/nightly/cu121'
torch-spec: '--pre torch==2.7.0.dev20250122 --index-url https://download.pytorch.org/whl/nightly/cu124'
gpu-arch-type: "cuda"
gpu-arch-version: "12.1"

gpu-arch-version: "12.4"

permissions:
id-token: write
contents: read
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
runner: ${{ matrix.runs-on }}
Expand Down
7 changes: 5 additions & 2 deletions .github/workflows/regression_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,18 @@ jobs:
include:
- name: CUDA Nightly
runs-on: linux.g5.12xlarge.nvidia.gpu
torch-spec: '--pre torch --index-url https://download.pytorch.org/whl/nightly/cu124'
torch-spec: '--pre torch==2.7.0.dev20250122 --index-url https://download.pytorch.org/whl/nightly/cu124'
gpu-arch-type: "cuda"
gpu-arch-version: "12.4"
- name: CPU Nightly
runs-on: linux.4xlarge
torch-spec: '--pre torch --index-url https://download.pytorch.org/whl/nightly/cpu'
torch-spec: '--pre torch==2.7.0.dev20250122 --index-url https://download.pytorch.org/whl/nightly/cpu'
gpu-arch-type: "cpu"
gpu-arch-version: ""

permissions:
id-token: write
contents: read
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
timeout: 120
Expand Down
49 changes: 49 additions & 0 deletions .github/workflows/regression_test_rocm.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Run Regression Tests on ROCm

on:
push:
branches:
- main
tags:
- ciflow/rocm/*

concurrency:
group: regression_test-${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_number || github.ref }}
cancel-in-progress: true

env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}

jobs:
test-nightly:
strategy:
fail-fast: false
matrix:
include:
- name: ROCM Nightly
runs-on: linux.rocm.gpu.torchao
torch-spec: '--pre torch==2.7.0.dev20250122 --index-url https://download.pytorch.org/whl/nightly/rocm6.3'
gpu-arch-type: "rocm"
gpu-arch-version: "6.3"

permissions:
id-token: write
contents: read
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
timeout: 120
no-sudo: ${{ matrix.gpu-arch-type == 'rocm' }}
runner: ${{ matrix.runs-on }}
gpu-arch-type: ${{ matrix.gpu-arch-type }}
gpu-arch-version: ${{ matrix.gpu-arch-version }}
submodules: recursive
script: |
conda create -n venv python=3.9 -y
conda activate venv
python -m pip install --upgrade pip
pip install ${{ matrix.torch-spec }}
pip install -r dev-requirements.txt
pip install .
export CONDA=$(dirname $(dirname $(which conda)))
export LD_LIBRARY_PATH=$CONDA/lib/:$LD_LIBRARY_PATH
pytest test --verbose -s
Loading