Skip to content

Conversation

vmoens
Copy link
Collaborator

@vmoens vmoens commented Jun 13, 2023

Note: prints of nested lazy stacks is broken

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 13, 2023
@github-actions
Copy link

github-actions bot commented Jun 13, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 47. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.0499ms 1.0220ms 978.4535 Ops/s 999.1627 Ops/s $\color{#d91a1a}-2.07\%$
test_creation 4.7999μs 4.3883μs 227.8811 KOps/s 232.6772 KOps/s $\color{#d91a1a}-2.06\%$
test_creation_empty 11.7058μs 11.3125μs 88.3977 KOps/s 90.4767 KOps/s $\color{#d91a1a}-2.30\%$
test_creation_nested_1 21.1327μs 20.5429μs 48.6787 KOps/s 49.3337 KOps/s $\color{#d91a1a}-1.33\%$
test_creation_nested_2 22.0027μs 21.6266μs 46.2394 KOps/s 47.2263 KOps/s $\color{#d91a1a}-2.09\%$
test_clone 28.4546μs 27.4990μs 36.3649 KOps/s 37.5471 KOps/s $\color{#d91a1a}-3.15\%$
test_getitem[int] 33.0192μs 32.4615μs 30.8057 KOps/s 30.8907 KOps/s $\color{#d91a1a}-0.28\%$
test_getitem[slice_int] 73.1895μs 68.3819μs 14.6238 KOps/s 14.5562 KOps/s $\color{#35bf28}+0.46\%$
test_getitem[range] 66.4809μs 65.8291μs 15.1909 KOps/s 15.2812 KOps/s $\color{#d91a1a}-0.59\%$
test_getitem[tuple] 64.0397μs 63.1665μs 15.8312 KOps/s 15.9300 KOps/s $\color{#d91a1a}-0.62\%$
test_getitem[list] 72.5791μs 56.6256μs 17.6599 KOps/s 17.7459 KOps/s $\color{#d91a1a}-0.48\%$
test_setitem_dim[int] 73.1990μs 47.1737μs 21.1983 KOps/s 21.8904 KOps/s $\color{#d91a1a}-3.16\%$
test_setitem_dim[slice_int] 0.1142ms 85.6788μs 11.6715 KOps/s 11.7827 KOps/s $\color{#d91a1a}-0.94\%$
test_setitem_dim[range] 0.1356ms 77.7719μs 12.8581 KOps/s 13.1120 KOps/s $\color{#d91a1a}-1.94\%$
test_setitem_dim[tuple] 0.1192ms 79.5123μs 12.5767 KOps/s 12.8940 KOps/s $\color{#d91a1a}-2.46\%$
test_setitem 33.5835μs 32.6713μs 30.6079 KOps/s 31.7410 KOps/s $\color{#d91a1a}-3.57\%$
test_set 77.7109μs 38.2407μs 26.1502 KOps/s 32.4972 KOps/s $\textbf{\color{#d91a1a}-19.53\%}$
test_set_shared 0.1424ms 0.1399ms 7.1475 KOps/s 7.0815 KOps/s $\color{#35bf28}+0.93\%$
test_update 35.7365μs 34.1496μs 29.2829 KOps/s 29.7834 KOps/s $\color{#d91a1a}-1.68\%$
test_update_nested 51.8783μs 50.5533μs 19.7811 KOps/s 19.5251 KOps/s $\color{#35bf28}+1.31\%$
test_set_nested 42.5744μs 41.6940μs 23.9843 KOps/s 24.7513 KOps/s $\color{#d91a1a}-3.10\%$
test_set_nested_new 65.5511μs 59.6680μs 16.7594 KOps/s 17.2075 KOps/s $\color{#d91a1a}-2.60\%$
test_select 0.1008ms 0.1002ms 9.9768 KOps/s 10.2213 KOps/s $\color{#d91a1a}-2.39\%$
test_creation[device0] 1.1224ms 0.4249ms 2.3533 KOps/s 2.3774 KOps/s $\color{#d91a1a}-1.01\%$
test_creation_from_tensor 0.5498ms 0.4027ms 2.4830 KOps/s 2.1453 KOps/s $\textbf{\color{#35bf28}+15.74\%}$
test_add_one[memmap_tensor0] 43.9334μs 30.4702μs 32.8190 KOps/s 34.0314 KOps/s $\color{#d91a1a}-3.56\%$
test_contiguous[memmap_tensor0] 8.7689μs 8.1129μs 123.2610 KOps/s 127.5079 KOps/s $\color{#d91a1a}-3.33\%$
test_stack[memmap_tensor0] 0.1544ms 39.4094μs 25.3747 KOps/s 25.4780 KOps/s $\color{#d91a1a}-0.41\%$
test_reshape_pytree 38.0665μs 35.9130μs 27.8451 KOps/s 28.6798 KOps/s $\color{#d91a1a}-2.91\%$
test_reshape_td 51.8483μs 49.5414μs 20.1851 KOps/s 20.4285 KOps/s $\color{#d91a1a}-1.19\%$
test_view_pytree 34.0395μs 33.0535μs 30.2540 KOps/s 30.6666 KOps/s $\color{#d91a1a}-1.35\%$
test_view_td 9.5569μs 8.8542μs 112.9404 KOps/s 113.2393 KOps/s $\color{#d91a1a}-0.26\%$
test_unbind_pytree 38.4465μs 37.2545μs 26.8424 KOps/s 27.4741 KOps/s $\color{#d91a1a}-2.30\%$
test_unbind_td 0.1540ms 0.1500ms 6.6673 KOps/s 6.8372 KOps/s $\color{#d91a1a}-2.49\%$
test_split_pytree 43.7794μs 41.8514μs 23.8940 KOps/s 24.2901 KOps/s $\color{#d91a1a}-1.63\%$
test_split_td 0.1196ms 0.1177ms 8.4942 KOps/s 8.5346 KOps/s $\color{#d91a1a}-0.47\%$
test_add_pytree 47.0733μs 45.0446μs 22.2002 KOps/s 22.7341 KOps/s $\color{#d91a1a}-2.35\%$
test_add_td 64.3831μs 62.4276μs 16.0186 KOps/s 16.4166 KOps/s $\color{#d91a1a}-2.42\%$
test_distributed 69.7990μs 69.7990μs 14.3269 KOps/s 14.7931 KOps/s $\color{#d91a1a}-3.15\%$
test_tdmodule 51.0990μs 24.6616μs 40.5488 KOps/s 40.9115 KOps/s $\color{#d91a1a}-0.89\%$
test_tdmodule_dispatch 41.2505ms 55.7931μs 17.9234 KOps/s 19.4652 KOps/s $\textbf{\color{#d91a1a}-7.92\%}$
test_tdseq 0.1786ms 30.6602μs 32.6155 KOps/s 32.1887 KOps/s $\color{#35bf28}+1.33\%$
test_tdseq_dispatch 95.3990μs 54.0545μs 18.4999 KOps/s 18.6912 KOps/s $\color{#d91a1a}-1.02\%$
test_instantiation_functorch 1.6036ms 1.5204ms 657.7083 Ops/s 662.6847 Ops/s $\color{#d91a1a}-0.75\%$
test_instantiation_td 6.0842ms 1.1985ms 834.3755 Ops/s 880.0086 Ops/s $\textbf{\color{#d91a1a}-5.19\%}$
test_exec_functorch 0.1803ms 0.1743ms 5.7360 KOps/s 5.8828 KOps/s $\color{#d91a1a}-2.49\%$
test_exec_td 0.3038ms 0.3011ms 3.3215 KOps/s 3.3892 KOps/s $\color{#d91a1a}-2.00\%$

@github-actions
Copy link

github-actions bot commented Jun 13, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 47. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.7412ms 1.4958ms 668.5264 Ops/s 646.9268 Ops/s $\color{#35bf28}+3.34\%$
test_creation 6.1950μs 3.9354μs 254.1028 KOps/s 217.0127 KOps/s $\textbf{\color{#35bf28}+17.09\%}$
test_creation_empty 20.5381μs 11.6174μs 86.0781 KOps/s 75.4562 KOps/s $\textbf{\color{#35bf28}+14.08\%}$
test_creation_nested_1 36.6261μs 23.2324μs 43.0433 KOps/s 38.2269 KOps/s $\textbf{\color{#35bf28}+12.60\%}$
test_creation_nested_2 41.4152μs 23.6129μs 42.3497 KOps/s 34.9997 KOps/s $\textbf{\color{#35bf28}+21.00\%}$
test_clone 45.4242μs 33.0175μs 30.2870 KOps/s 27.9657 KOps/s $\textbf{\color{#35bf28}+8.30\%}$
test_getitem[int] 61.4114μs 43.9763μs 22.7395 KOps/s 23.6805 KOps/s $\color{#d91a1a}-3.97\%$
test_getitem[slice_int] 0.1119ms 90.5720μs 11.0409 KOps/s 11.3994 KOps/s $\color{#d91a1a}-3.14\%$
test_getitem[range] 0.1438ms 0.1168ms 8.5624 KOps/s 8.6275 KOps/s $\color{#d91a1a}-0.75\%$
test_getitem[tuple] 0.1011ms 82.2942μs 12.1515 KOps/s 13.1245 KOps/s $\textbf{\color{#d91a1a}-7.41\%}$
test_getitem[list] 0.1299ms 0.1052ms 9.5014 KOps/s 9.5453 KOps/s $\color{#d91a1a}-0.46\%$
test_setitem_dim[int] 2.1241ms 71.9805μs 13.8926 KOps/s 14.8040 KOps/s $\textbf{\color{#d91a1a}-6.16\%}$
test_setitem_dim[slice_int] 2.8672ms 0.1285ms 7.7814 KOps/s 8.1470 KOps/s $\color{#d91a1a}-4.49\%$
test_setitem_dim[range] 9.8842ms 0.1392ms 7.1861 KOps/s 7.4844 KOps/s $\color{#d91a1a}-3.99\%$
test_setitem_dim[tuple] 5.0997ms 0.1100ms 9.0939 KOps/s 9.4267 KOps/s $\color{#d91a1a}-3.53\%$
test_setitem 56.7643μs 45.8520μs 21.8093 KOps/s 21.5682 KOps/s $\color{#35bf28}+1.12\%$
test_set 0.1151ms 45.7297μs 21.8676 KOps/s 22.8544 KOps/s $\color{#d91a1a}-4.32\%$
test_set_shared 0.3891ms 0.2851ms 3.5079 KOps/s 3.1937 KOps/s $\textbf{\color{#35bf28}+9.84\%}$
test_update 0.1242ms 46.7934μs 21.3705 KOps/s 20.8710 KOps/s $\color{#35bf28}+2.39\%$
test_update_nested 0.1193ms 67.8972μs 14.7282 KOps/s 14.0989 KOps/s $\color{#35bf28}+4.46\%$
test_set_nested 90.0645μs 58.7392μs 17.0244 KOps/s 17.2272 KOps/s $\color{#d91a1a}-1.18\%$
test_set_nested_new 0.1089ms 78.8169μs 12.6876 KOps/s 12.6587 KOps/s $\color{#35bf28}+0.23\%$
test_select 0.1709ms 0.1252ms 7.9867 KOps/s 8.1071 KOps/s $\color{#d91a1a}-1.48\%$
test_creation[device0] 1.6380ms 0.6179ms 1.6185 KOps/s 1.6349 KOps/s $\color{#d91a1a}-1.00\%$
test_creation_from_tensor 0.8791ms 0.5796ms 1.7253 KOps/s 1.5934 KOps/s $\textbf{\color{#35bf28}+8.28\%}$
test_add_one[memmap_tensor0] 80.0394μs 64.4643μs 15.5125 KOps/s 15.3387 KOps/s $\color{#35bf28}+1.13\%$
test_contiguous[memmap_tensor0] 16.1741μs 11.6590μs 85.7708 KOps/s 73.6024 KOps/s $\textbf{\color{#35bf28}+16.53\%}$
test_stack[memmap_tensor0] 0.2761ms 74.9104μs 13.3493 KOps/s 16.0926 KOps/s $\textbf{\color{#d91a1a}-17.05\%}$
test_reshape_pytree 49.6243μs 38.4782μs 25.9887 KOps/s 25.7622 KOps/s $\color{#35bf28}+0.88\%$
test_reshape_td 92.5585μs 66.4194μs 15.0558 KOps/s 16.1451 KOps/s $\textbf{\color{#d91a1a}-6.75\%}$
test_view_pytree 56.8453μs 35.3719μs 28.2710 KOps/s 28.1142 KOps/s $\color{#35bf28}+0.56\%$
test_view_td 12.2931μs 8.9310μs 111.9692 KOps/s 101.9729 KOps/s $\textbf{\color{#35bf28}+9.80\%}$
test_unbind_pytree 66.5284μs 43.9337μs 22.7616 KOps/s 23.7802 KOps/s $\color{#d91a1a}-4.28\%$
test_unbind_td 0.3461ms 0.2153ms 4.6439 KOps/s 4.9311 KOps/s $\textbf{\color{#d91a1a}-5.82\%}$
test_split_pytree 98.4437μs 56.2513μs 17.7774 KOps/s 20.7696 KOps/s $\textbf{\color{#d91a1a}-14.41\%}$
test_split_td 0.2361ms 0.1648ms 6.0661 KOps/s 6.2999 KOps/s $\color{#d91a1a}-3.71\%$
test_add_pytree 70.4255μs 63.2403μs 15.8127 KOps/s 15.7246 KOps/s $\color{#35bf28}+0.56\%$
test_add_td 0.1833ms 0.1113ms 8.9843 KOps/s 9.5092 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_distributed 0.1625ms 0.1625ms 6.1538 KOps/s 5.6721 KOps/s $\textbf{\color{#35bf28}+8.49\%}$
test_tdmodule 2.1849ms 39.0934μs 25.5798 KOps/s 27.3572 KOps/s $\textbf{\color{#d91a1a}-6.50\%}$
test_tdmodule_dispatch 4.5261ms 82.1722μs 12.1696 KOps/s 12.4297 KOps/s $\color{#d91a1a}-2.09\%$
test_tdseq 2.3490ms 49.0917μs 20.3700 KOps/s 20.7986 KOps/s $\color{#d91a1a}-2.06\%$
test_tdseq_dispatch 6.1115ms 93.5635μs 10.6879 KOps/s 10.4861 KOps/s $\color{#35bf28}+1.92\%$
test_instantiation_functorch 2.2201ms 1.8937ms 528.0589 Ops/s 551.3513 Ops/s $\color{#d91a1a}-4.22\%$
test_instantiation_td 2.0751ms 1.4785ms 676.3503 Ops/s 695.7380 Ops/s $\color{#d91a1a}-2.79\%$
test_exec_functorch 0.3449ms 0.2834ms 3.5292 KOps/s 3.7548 KOps/s $\textbf{\color{#d91a1a}-6.01\%}$
test_exec_td 0.5581ms 0.4600ms 2.1740 KOps/s 2.2161 KOps/s $\color{#d91a1a}-1.90\%$

@vmoens vmoens added the enhancement New feature or request label Jun 13, 2023
@vmoens vmoens changed the title [WIP] Fix het lazy stack ops [BugFix] Fix het lazy stack ops Jun 14, 2023
@vmoens vmoens merged commit 4b676dc into main Jun 14, 2023
@vmoens vmoens deleted the fix_lazy_stack_ops branch June 14, 2023 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants