Commit 2a4a5f5
Add NEON implementation of FloatOrHalfToFusedNBitRowwiseQuantizedSBHalf (#5115)
Summary:
X-link: facebookresearch/FBGEMM#2121
Adding NEON translation of FloatOrHalfToFusedNBitRowwiseQuantizedSBHalf, used by Ads
Performance improves by an order of magnitude:
Before:
bit_rate rows, cols, elems_per_usec, GB/Sec
2, 100, 16, 211.26, 0.85
2, 100, 64, 210.96, 0.84
2, 100, 128, 204.26, 0.82
2, 100, 256, 200.47, 0.80
2, 100, 512, 194.19, 0.78
2, 100, 1024, 190.98, 0.76
2, 100, 2048, 186.85, 0.75
2, 120, 16, 206.88, 0.83
2, 120, 64, 211.64, 0.85
2, 120, 128, 203.97, 0.82
2, 120, 256, 200.22, 0.80
2, 120, 512, 194.97, 0.78
2, 120, 1024, 191.76, 0.77
2, 120, 2048, 187.45, 0.75
2, 1000, 16, 205.10, 0.82
2, 1000, 64, 214.15, 0.86
2, 1000, 128, 205.43, 0.82
2, 1000, 256, 200.34, 0.80
2, 1000, 512, 196.62, 0.79
2, 1000, 1024, 194.64, 0.78
2, 1000, 2048, 187.54, 0.75
4, 100, 16, 197.97, 0.79
4, 100, 64, 200.02, 0.80
4, 100, 128, 191.06, 0.76
4, 100, 256, 186.58, 0.75
4, 100, 512, 180.76, 0.72
4, 100, 1024, 176.65, 0.71
4, 100, 2048, 175.00, 0.70
4, 120, 16, 198.93, 0.80
4, 120, 64, 201.74, 0.81
4, 120, 128, 190.95, 0.76
4, 120, 256, 186.79, 0.75
4, 120, 512, 181.32, 0.73
4, 120, 1024, 177.54, 0.71
4, 120, 2048, 174.69, 0.70
4, 1000, 16, 194.63, 0.78
4, 1000, 64, 201.64, 0.81
4, 1000, 128, 191.78, 0.77
4, 1000, 256, 186.87, 0.75
4, 1000, 512, 182.91, 0.73
4, 1000, 1024, 180.66, 0.72
4, 1000, 2048, 175.04, 0.70
8, 100, 16, 171.01, 0.68
8, 100, 64, 177.53, 0.71
8, 100, 128, 168.92, 0.68
8, 100, 256, 165.23, 0.66
8, 100, 512, 162.25, 0.65
8, 100, 1024, 158.87, 0.64
8, 100, 2048, 155.39, 0.62
8, 120, 16, 173.77, 0.70
8, 120, 64, 178.34, 0.71
8, 120, 128, 168.66, 0.67
8, 120, 256, 165.60, 0.66
8, 120, 512, 162.30, 0.65
8, 120, 1024, 159.38, 0.64
8, 120, 2048, 156.17, 0.62
8, 1000, 16, 171.34, 0.69
8, 1000, 64, 178.96, 0.72
8, 1000, 128, 169.71, 0.68
8, 1000, 256, 165.62, 0.66
8, 1000, 512, 162.98, 0.65
8, 1000, 1024, 161.59, 0.65
8, 1000, 2048, 157.16, 0.63
After:
bit_rate rows, cols, elems_per_usec, GB/Sec
2, 100, 16, 1006.83, 4.03
2, 100, 64, 1542.11, 6.17
2, 100, 128, 1882.99, 7.53
2, 100, 256, 2063.71, 8.25
2, 100, 512, 2232.29, 8.93
2, 100, 1024, 2298.69, 9.19
2, 100, 2048, 2333.73, 9.33
2, 120, 16, 1016.40, 4.07
2, 120, 64, 1524.36, 6.10
2, 120, 128, 1853.40, 7.41
2, 120, 256, 2158.92, 8.64
2, 120, 512, 2321.61, 9.29
2, 120, 1024, 2353.80, 9.42
2, 120, 2048, 2332.84, 9.33
2, 1000, 16, 1129.08, 4.52
2, 1000, 64, 1606.46, 6.43
2, 1000, 128, 2095.33, 8.38
2, 1000, 256, 2470.88, 9.88
2, 1000, 512, 2746.67, 10.99
2, 1000, 1024, 2882.32, 11.53
2, 1000, 2048, 2447.96, 9.79
4, 100, 16, 999.05, 4.00
4, 100, 64, 1666.00, 6.66
4, 100, 128, 2062.08, 8.25
4, 100, 256, 2226.33, 8.91
4, 100, 512, 2481.11, 9.92
4, 100, 1024, 2717.50, 10.87
4, 100, 2048, 2656.00, 10.62
4, 120, 16, 1056.31, 4.23
4, 120, 64, 1651.95, 6.61
4, 120, 128, 2058.65, 8.23
4, 120, 256, 2339.64, 9.36
4, 120, 512, 2570.03, 10.28
4, 120, 1024, 2788.24, 11.15
4, 120, 2048, 2701.20, 10.80
4, 1000, 16, 1184.28, 4.74
4, 1000, 64, 1765.47, 7.06
4, 1000, 128, 2348.17, 9.39
4, 1000, 256, 2852.72, 11.41
4, 1000, 512, 3249.46, 13.00
4, 1000, 1024, 3418.46, 13.67
4, 1000, 2048, 2841.77, 11.37
8, 100, 16, 1176.35, 4.71
8, 100, 64, 1902.76, 7.61
8, 100, 128, 2196.23, 8.78
8, 100, 256, 2596.55, 10.39
8, 100, 512, 2814.30, 11.26
8, 100, 1024, 3175.49, 12.70
8, 100, 2048, 3334.41, 13.34
8, 120, 16, 1213.55, 4.85
8, 120, 64, 1806.19, 7.22
8, 120, 128, 2390.64, 9.56
8, 120, 256, 2736.11, 10.94
8, 120, 512, 3015.86, 12.06
8, 120, 1024, 3332.53, 13.33
8, 120, 2048, 3319.50, 13.28
8, 1000, 16, 1362.12, 5.45
8, 1000, 64, 2029.25, 8.12
8, 1000, 128, 2759.50, 11.04
8, 1000, 256, 3532.71, 14.13
8, 1000, 512, 4014.48, 16.06
8, 1000, 1024, 4240.49, 16.96
8, 1000, 2048, 3440.59, 13.76
Differential Revision: D867741721 parent bc6d968 commit 2a4a5f5
File tree
3 files changed
+285
-4
lines changed- include/fbgemm
- src
3 files changed
+285
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
39 | 46 | | |
40 | 47 | | |
41 | 48 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
636 | 636 | | |
637 | 637 | | |
638 | 638 | | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
639 | 659 | | |
640 | 660 | | |
641 | 661 | | |
| |||
660 | 680 | | |
661 | 681 | | |
662 | 682 | | |
| 683 | + | |
| 684 | + | |
663 | 685 | | |
664 | 686 | | |
665 | 687 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
99 | | - | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
100 | 104 | | |
101 | 105 | | |
102 | 106 | | |
| |||
141 | 145 | | |
142 | 146 | | |
143 | 147 | | |
144 | | - | |
145 | | - | |
| 148 | + | |
| 149 | + | |
146 | 150 | | |
147 | 151 | | |
148 | 152 | | |
| |||
257 | 261 | | |
258 | 262 | | |
259 | 263 | | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
260 | 494 | | |
261 | 495 | | |
262 | 496 | | |
| |||
372 | 606 | | |
373 | 607 | | |
374 | 608 | | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
375 | 627 | | |
376 | 628 | | |
377 | 629 | | |
| |||
0 commit comments