-
Notifications
You must be signed in to change notification settings - Fork 684
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Update to batch processing delta update for kvzch and fix table idx issue.
cla signed
fb-exported
meta-exported
#5148
opened Nov 18, 2025 by
EddyLXJ
Loading…
shortcut for merge_pooled_embedding
cla signed
fb-exported
meta-exported
#5147
opened Nov 18, 2025 by
garroud
Loading…
Add support rowwise_adagrad_wtith_counter on CPU
cla signed
fb-exported
meta-exported
#5146
opened Nov 18, 2025 by
gchalump
Loading…
Adding support for top_k=8 to index_shuffling moe kernel
cla signed
fb-exported
meta-exported
#5142
opened Nov 17, 2025 by
metastableB
Loading…
cutlass-fa3 new mask interface
cla signed
fb-exported
meta-exported
#5141
opened Nov 17, 2025 by
arsatis
Loading…
Bump setuptools from 75.1.0 to 78.1.1 in /fbgemm_gpu
cla signed
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#5139
opened Nov 16, 2025 by
dependabot
bot
Loading…
Updated asmjit & adapted to its latest changes
cla signed
#5137
opened Nov 15, 2025 by
kobalicek
Loading…
Slightly improve requantize_ AVX2 code performance
cla signed
fb-exported
meta-exported
#5135
opened Nov 14, 2025 by
mcfi
Loading…
Vectorize requantize_ for Arm64 with NEON intrinsics
cla signed
fb-exported
meta-exported
#5130
opened Nov 14, 2025 by
mcfi
Loading…
Allow specifiying the use of persistent kernel
cla signed
fb-exported
meta-exported
#5129
opened Nov 14, 2025 by
peiying779
Loading…
Extend raw_id_tracker to track ShardedManagedCollisionEmbeddingCollection
cla signed
fb-exported
meta-exported
#5128
opened Nov 14, 2025 by
FriedCosey
Loading…
Enable arm64 convolution for fbgemm through the reference convolution APIs
cla signed
fb-exported
meta-exported
#5126
opened Nov 13, 2025 by
mcfi
Loading…
minimize gpuAtomicAdd overhead in bounds_check_indices_kernel_v2
cla signed
#5124
opened Nov 13, 2025 by
liligwu
Loading…
Backward optimization for group_index_select_or_add_2d_kernel
cla signed
#5123
opened Nov 13, 2025 by
shbiswas834
Loading…
backward performance optimization for MI350 (#4925)
cla signed
fb-exported
meta-exported
module: rocm
#5121
opened Nov 12, 2025 by
spcyppt
Loading…
embedding forward optimization for rocm
cla signed
module: rocm
#5120
opened Nov 12, 2025 by
JaxChen29
Loading…
add group_index_select_or_add_2d_kernel optimizations
cla signed
module: rocm
#5119
opened Nov 12, 2025 by
shbiswas834
•
Draft
Add NEON implementation of FloatOrHalfToFusedNBitRowwiseQuantizedSBHalf
cla signed
fb-exported
meta-exported
#5115
opened Nov 11, 2025 by
Nicoshev
Loading…
Add support of 64 headDim
cla signed
fb-exported
meta-exported
#5114
opened Nov 11, 2025 by
Aya-ZIbra
Loading…
accelerate permute_1D_data_kernel
cla signed
fb-exported
meta-exported
#5110
opened Nov 10, 2025 by
royren622
Loading…
Fix NAN for the prediction (#2096)
cla signed
fb-exported
meta-exported
#5088
opened Nov 4, 2025 by
quhang
Loading…
Several kernel optimization from aiter team
cla signed
module: rocm
#5074
opened Oct 31, 2025 by
Bernard-Liu
•
Draft
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.