Skip to content

Conversation

@gggekov
Copy link
Collaborator

@gggekov gggekov commented Aug 19, 2025

We were annotating the ADD/SUB with a shared observer resulting in the same quantisation parameters on the two inputs even if we were adding numbers in different ranges(positive tensor to a tensor with positive and negative values). As a result,
the quantisation parameters were suboptimal. This change annotates the operator with independent observers and changes how we rescale the two inputs to bring them to the same range. Added a unit test of a resnet model. Lowered the number of channels on a few unit tests in order to keep the Total SRAM Used below 2MB for the Ethos-U55 to fit within the memory limit of the Corstone-300.

Fixes #12959

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

@gggekov gggekov requested a review from digantdesai as a code owner August 19, 2025 14:53
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13516

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Unrelated Failure

As of commit 87ac448 with merge base 71a7806 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 19, 2025
@gggekov gggekov added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk topic: not user facing labels Aug 19, 2025
@zingo zingo added release notes: arm Changes to the ARM backend delegate and removed topic: not user facing labels Aug 19, 2025
@facebook-github-bot
Copy link
Contributor

@digantdesai has imported this pull request. If you are a Meta employee, you can view this in D80562110.

# pyre-ignore
tqutils.insert_rescale_op_to_int8(
tosa_graph, add_output, scale_back, node, self.tosa_spec
tosa_graph, add_output, scale_back, node, False, self.tosa_spec
Copy link
Contributor

@digantdesai digantdesai Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability nit

Suggested change
tosa_graph, add_output, scale_back, node, False, self.tosa_spec
tosa_graph, add_output, scale_back, node, compute_rescale=False, self.tosa_spec

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks. We have a long weekend in the UK, will be back on Tuesday.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now fixed.

}
fvp_sub2_xfails = {"rand_4D_2x2x4x4": "MLETORCH-517 : Multiple batches not supported"}

# Sub and tan - the tan has a really steep curve just before Pi/2 and a point of discontinuity at Pi/2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

from tosa.RoundingMode import RoundingMode # type: ignore


def insert_rescale_ops_to_int32_for_add_sub(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit why the fn name should have _for_add_sub suffix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insert_rescale_ops_to_int32_for_add_sub function is only called for the ADD & SUB ops because only for these operators we use a common scale of 2max(scale_A,scale_B) and then multiply the original scale by 1<<20 without overflowing in a 32-bit accumualator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the function name shouldn't list its call sites :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed :)

@gggekov gggekov force-pushed the Meta_fusing_conv1d_relu_residual_add branch 6 times, most recently from 80e0d2b to 03d88e1 Compare August 29, 2025 15:12
We were annotating the ADD/SUB with a shared observer resulting in
the same quantisation parameters on the two inputs even if
we were adding  numbers in different ranges(positive tensor to a tensor
with positive and negative values). As a result,
the quantisation parameters were suboptimal. This change annotates
the operator with independent observers and changes how we rescale
the two inputs to bring them to the same range. Added a unit test
of a resnet model. Lowered the number of channels on a few unit tests
in order to keep the Total SRAM Used below 2MB for the Ethos-U55 to fit
within the memory limit of the Corstone-300.

Change-Id: I7adde636f901c9df6b779d946a157e66fd12e24e
@gggekov gggekov force-pushed the Meta_fusing_conv1d_relu_residual_add branch from 03d88e1 to c87751e Compare August 29, 2025 16:40
@zingo
Copy link
Collaborator

zingo commented Sep 1, 2025

Rebased after a fix for some broken arm tests was merged

@zingo
Copy link
Collaborator

zingo commented Sep 1, 2025

Test fails are unrelated

@zingo
Copy link
Collaborator

zingo commented Sep 1, 2025

@digantdesai I cant merge this, is there a older version of this PR internally blocking this?

@facebook-github-bot
Copy link
Contributor

@digantdesai has imported this pull request. If you are a Meta employee, you can view this in D80562110.

@zingo zingo merged commit 8ba92a9 into pytorch:main Sep 2, 2025
245 of 248 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect Observer Sharing/Derivation at Conv-ReLU+ Residual with Arm Ethos Quantizer

4 participants