-
Notifications
You must be signed in to change notification settings - Fork 649
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Save some binary size
cla signed
fb-exported
meta-exported
#4900
opened Sep 19, 2025 by
cthi
Loading…
Small changes to improve blackwell_fmha_test.py
cla signed
fb-exported
meta-exported
#4896
opened Sep 18, 2025 by
henrylhtsang
Loading…
Migrate GenAI gqa attn splitk kernels to
FBGEMM_LAUNCH_KERNEL
, pt 1
cla signed
fb-exported
meta-exported
#4894
opened Sep 18, 2025 by
q10
Loading…
upgrade cutlass to 4.2.0 for fbcode
cla signed
fb-exported
meta-exported
#4893
opened Sep 18, 2025 by
henrylhtsang
Loading…
Optimize wgrad CUTLASS grouped gemm
cla signed
fb-exported
meta-exported
#4891
opened Sep 18, 2025 by
jiawenliu64
Loading…
Support multiple total-k and total-m in quantize bench
cla signed
fb-exported
meta-exported
#4890
opened Sep 18, 2025 by
jiawenliu64
Loading…
MI350X FP8 triton patch
cla signed
fb-exported
meta-exported
#4889
opened Sep 17, 2025 by
JChunX
Loading…
fix the scaled input issue
cla signed
fb-exported
meta-exported
#4884
opened Sep 16, 2025 by
pyjhzwh
Loading…
- Clean torch.check
cla signed
fb-exported
meta-exported
#4871
opened Sep 12, 2025 by
flaviotruzzi
Loading…
dequantize_fp8_cache_kernel: Move D=128 device-side-assertion check to host
cla signed
fb-exported
meta-exported
#4869
opened Sep 12, 2025 by
ColinPeppler
Loading…
symmetric quantization to FBGEMM prefill token-wise FP8 (fixed)
cla signed
fb-exported
meta-exported
#4868
opened Sep 12, 2025 by
ColinPeppler
Loading…
- Reland D75563906
ci-no-td
cla signed
fb-exported
meta-exported
#4865
opened Sep 11, 2025 by
flaviotruzzi
Loading…
Migrate GenAI quantize kernels to
FBGEMM_LAUNCH_KERNEL
, pt 4
cla signed
fb-exported
#4863
opened Sep 11, 2025 by
q10
Loading…
Add cutlass decode kernel to TritonBench
cla signed
fb-exported
#4853
opened Sep 10, 2025 by
Aya-ZIbra
Loading…
remove std out in EEG estimator
cla signed
fb-exported
#4832
opened Sep 6, 2025 by
YanXiong-Meta
Loading…
convert batch size to float before torch.std in params reporter
cla signed
fb-exported
#4828
opened Sep 5, 2025 by
YanXiong-Meta
Loading…
Migrate backward warp kernel arguments to use PTA_B
cla signed
fb-exported
#4825
opened Sep 5, 2025 by
q10
Loading…
Migrate TBE UVM cache kernels to
FBGEMM_LAUNCH_KERNEL
cla signed
fb-exported
#4817
opened Sep 4, 2025 by
q10
Loading…
remove cpu check and hardcoded row alignment to 8
cla signed
fb-exported
#4781
opened Aug 27, 2025 by
chenyuzhcy
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.