[compiler] source location validator #35109

nathanmarks · 2025-11-11T14:13:27Z

@josephsavona this was briefly discussed in an old thread, lmk your thoughts on the approach. I have some fixes ready as well but wanted to get this test case in first... there's some things I don't love about this approach, but end of the day it's just a tool for the test suite rather than something for end user folks so even if it does a 70% good enough job that's fine.

refresher on the problem

when we generate coverage reports with jest (istanbul), our coverage ends up completely out of whack due to the AST missing a ton of (let's call them "important") source locations after the compiler pipeline has run.

At the moment to get around this, we've been doing something a bit unorthodox and also running our test suite with istanbul running before the compiler -- which results in its own set of issues (for eg, things being memoized differently, or the compiler completely bailing out on the instrumented code, etc).

before getting in fixes, I wanted to set up a test case to start chipping away on as you had recommended.

how it works

The validator basically:

Traverses the original AST and collects the source locations for some "important" node types
- (excludes useMemo/useCallback calls, as those are stripped out by the compiler)
Traverses the generated AST and looks for nodes with matching source locations.
Generates errors for source locations missing nodes in the generated AST

caveats/drawbacks

There are some things that don't work super well with this approach. A more natural test fit I think would be just having some explicit assertions made against an AST in a test file, as you can just bake all of the assumptions/nuance in there that are difficult to handle in a generic manner. However, this is maybe "good enough" for now.

Have to be careful what you put into the test fixture. If you put in some code that the compiler just removes (for eg, a variable assignment that is unused), you're creating a failure case that's impossible to fix. I added a skip for useMemo/useCallback.
"Important" locations must exactly match for validation to pass.
- Might get tricky making sure things are mapped correctly when a node type is completely changed, for eg, when a block statement arrow function body gets turned into an implicit return via the body just being an expression/identifier.
- This can/could result in scenarios where more changes are needed to shuttle the locations through due to HIR not having a 1:1 mapping all the babel nuances, even if some combination of other data might be good enough even if not 10000% accurate. This might be the right thing anyways so we don't end up with edge cases having incorrect source locations.

josephsavona · 2025-11-12T20:58:49Z

compiler/packages/babel-plugin-react-compiler/src/Validation/ValidateSourceLocations.ts

+    if (t.isIdentifier(callee)) {
+      return callee.name === 'useMemo' || callee.name === 'useCallback';
+    }
+    if (t.isMemberExpression(callee) && t.isIdentifier(callee.property)) {


lets check that the callee is 'React'

josephsavona · 2025-11-12T21:00:08Z

compiler/packages/babel-plugin-react-compiler/src/Validation/ValidateSourceLocations.ts

+
+    // Use Babel's VISITOR_KEYS to traverse only actual node properties
+    const keys = t.VISITOR_KEYS[node.type as keyof typeof t.VISITOR_KEYS];
+    if (!keys) return;


if (keys == null) { return }

josephsavona

This makes sense to me as something we can use internally to improve and test source map coverage. See a couple minor comments, but otherwise looks good. Thanks for the contribution!

nathanmarks · 2025-11-12T22:27:00Z

cool, i'll fix those things then

i have some follow PRs mostly ready to address a few of the lower hanigng fruit gaps, a couple of them are a little hairy but we can dive into that later

@josephsavona

@josephsavona this was briefly discussed in an old thread, lmk your thoughts on the approach. I have some fixes ready as well but wanted to get this test case in first... there's some things I don't _love_ about this approach, but end of the day it's just a tool for the test suite rather than something for end user folks so even if it does a 70% good enough job that's fine. ### refresher on the problem when we generate coverage reports with jest (istanbul), our coverage ends up completely out of whack due to the AST missing a ton of (let's call them "important") source locations after the compiler pipeline has run. At the moment to get around this, we've been doing something a bit unorthodox and also running our test suite with istanbul running before the compiler -- which results in its own set of issues (for eg, things being memoized differently, or the compiler completely bailing out on the instrumented code, etc). before getting in fixes, I wanted to set up a test case to start chipping away on as you had recommended. ### how it works The validator basically: 1. Traverses the original AST and collects the source locations for some "important" node types - (excludes useMemo/useCallback calls, as those are stripped out by the compiler) 3. Traverses the generated AST and looks for nodes with matching source locations. 4. Generates errors for source locations missing nodes in the generated AST ### caveats/drawbacks There are some things that don't work super well with this approach. A more natural test fit I think would be just having some explicit assertions made against an AST in a test file, as you can just bake all of the assumptions/nuance in there that are difficult to handle in a generic manner. However, this is maybe "good enough" for now. 1. Have to be careful what you put into the test fixture. If you put in some code that the compiler just removes (for eg, a variable assignment that is unused), you're creating a failure case that's impossible to fix. I added a skip for useMemo/useCallback. 2. "Important" locations must exactly match for validation to pass. - Might get tricky making sure things are mapped correctly when a node type is completely changed, for eg, when a block statement arrow function body gets turned into an implicit return via the body just being an expression/identifier. - This can/could result in scenarios where more changes are needed to shuttle the locations through due to HIR not having a 1:1 mapping all the babel nuances, even if some combination of other data might be good enough even if not 10000% accurate. This might be the _right_ thing anyways so we don't end up with edge cases having incorrect source locations. DiffTrain build for [3a495ae](3a495ae)

@josephsavona

@josephsavona this was briefly discussed in an old thread, lmk your thoughts on the approach. I have some fixes ready as well but wanted to get this test case in first... there's some things I don't _love_ about this approach, but end of the day it's just a tool for the test suite rather than something for end user folks so even if it does a 70% good enough job that's fine. ### refresher on the problem when we generate coverage reports with jest (istanbul), our coverage ends up completely out of whack due to the AST missing a ton of (let's call them "important") source locations after the compiler pipeline has run. At the moment to get around this, we've been doing something a bit unorthodox and also running our test suite with istanbul running before the compiler -- which results in its own set of issues (for eg, things being memoized differently, or the compiler completely bailing out on the instrumented code, etc). before getting in fixes, I wanted to set up a test case to start chipping away on as you had recommended. ### how it works The validator basically: 1. Traverses the original AST and collects the source locations for some "important" node types - (excludes useMemo/useCallback calls, as those are stripped out by the compiler) 3. Traverses the generated AST and looks for nodes with matching source locations. 4. Generates errors for source locations missing nodes in the generated AST ### caveats/drawbacks There are some things that don't work super well with this approach. A more natural test fit I think would be just having some explicit assertions made against an AST in a test file, as you can just bake all of the assumptions/nuance in there that are difficult to handle in a generic manner. However, this is maybe "good enough" for now. 1. Have to be careful what you put into the test fixture. If you put in some code that the compiler just removes (for eg, a variable assignment that is unused), you're creating a failure case that's impossible to fix. I added a skip for useMemo/useCallback. 2. "Important" locations must exactly match for validation to pass. - Might get tricky making sure things are mapped correctly when a node type is completely changed, for eg, when a block statement arrow function body gets turned into an implicit return via the body just being an expression/identifier. - This can/could result in scenarios where more changes are needed to shuttle the locations through due to HIR not having a 1:1 mapping all the babel nuances, even if some combination of other data might be good enough even if not 10000% accurate. This might be the _right_ thing anyways so we don't end up with edge cases having incorrect source locations. DiffTrain build for [3a495ae](3a495ae)

@josephsavona

@josephsavona this was briefly discussed in an old thread, lmk your thoughts on the approach. I have some fixes ready as well but wanted to get this test case in first... there's some things I don't _love_ about this approach, but end of the day it's just a tool for the test suite rather than something for end user folks so even if it does a 70% good enough job that's fine. ### refresher on the problem when we generate coverage reports with jest (istanbul), our coverage ends up completely out of whack due to the AST missing a ton of (let's call them "important") source locations after the compiler pipeline has run. At the moment to get around this, we've been doing something a bit unorthodox and also running our test suite with istanbul running before the compiler -- which results in its own set of issues (for eg, things being memoized differently, or the compiler completely bailing out on the instrumented code, etc). before getting in fixes, I wanted to set up a test case to start chipping away on as you had recommended. ### how it works The validator basically: 1. Traverses the original AST and collects the source locations for some "important" node types - (excludes useMemo/useCallback calls, as those are stripped out by the compiler) 3. Traverses the generated AST and looks for nodes with matching source locations. 4. Generates errors for source locations missing nodes in the generated AST ### caveats/drawbacks There are some things that don't work super well with this approach. A more natural test fit I think would be just having some explicit assertions made against an AST in a test file, as you can just bake all of the assumptions/nuance in there that are difficult to handle in a generic manner. However, this is maybe "good enough" for now. 1. Have to be careful what you put into the test fixture. If you put in some code that the compiler just removes (for eg, a variable assignment that is unused), you're creating a failure case that's impossible to fix. I added a skip for useMemo/useCallback. 2. "Important" locations must exactly match for validation to pass. - Might get tricky making sure things are mapped correctly when a node type is completely changed, for eg, when a block statement arrow function body gets turned into an implicit return via the body just being an expression/identifier. - This can/could result in scenarios where more changes are needed to shuttle the locations through due to HIR not having a 1:1 mapping all the babel nuances, even if some combination of other data might be good enough even if not 10000% accurate. This might be the _right_ thing anyways so we don't end up with edge cases having incorrect source locations.

@josephsavona

@josephsavona this was briefly discussed in an old thread, lmk your thoughts on the approach. I have some fixes ready as well but wanted to get this test case in first... there's some things I don't _love_ about this approach, but end of the day it's just a tool for the test suite rather than something for end user folks so even if it does a 70% good enough job that's fine. ### refresher on the problem when we generate coverage reports with jest (istanbul), our coverage ends up completely out of whack due to the AST missing a ton of (let's call them "important") source locations after the compiler pipeline has run. At the moment to get around this, we've been doing something a bit unorthodox and also running our test suite with istanbul running before the compiler -- which results in its own set of issues (for eg, things being memoized differently, or the compiler completely bailing out on the instrumented code, etc). before getting in fixes, I wanted to set up a test case to start chipping away on as you had recommended. ### how it works The validator basically: 1. Traverses the original AST and collects the source locations for some "important" node types - (excludes useMemo/useCallback calls, as those are stripped out by the compiler) 3. Traverses the generated AST and looks for nodes with matching source locations. 4. Generates errors for source locations missing nodes in the generated AST ### caveats/drawbacks There are some things that don't work super well with this approach. A more natural test fit I think would be just having some explicit assertions made against an AST in a test file, as you can just bake all of the assumptions/nuance in there that are difficult to handle in a generic manner. However, this is maybe "good enough" for now. 1. Have to be careful what you put into the test fixture. If you put in some code that the compiler just removes (for eg, a variable assignment that is unused), you're creating a failure case that's impossible to fix. I added a skip for useMemo/useCallback. 2. "Important" locations must exactly match for validation to pass. - Might get tricky making sure things are mapped correctly when a node type is completely changed, for eg, when a block statement arrow function body gets turned into an implicit return via the body just being an expression/identifier. - This can/could result in scenarios where more changes are needed to shuttle the locations through due to HIR not having a 1:1 mapping all the babel nuances, even if some combination of other data might be good enough even if not 10000% accurate. This might be the _right_ thing anyways so we don't end up with edge cases having incorrect source locations. DiffTrain build for [3a495ae](facebook@3a495ae)

@josephsavona

@josephsavona this was briefly discussed in an old thread, lmk your thoughts on the approach. I have some fixes ready as well but wanted to get this test case in first... there's some things I don't _love_ about this approach, but end of the day it's just a tool for the test suite rather than something for end user folks so even if it does a 70% good enough job that's fine. ### refresher on the problem when we generate coverage reports with jest (istanbul), our coverage ends up completely out of whack due to the AST missing a ton of (let's call them "important") source locations after the compiler pipeline has run. At the moment to get around this, we've been doing something a bit unorthodox and also running our test suite with istanbul running before the compiler -- which results in its own set of issues (for eg, things being memoized differently, or the compiler completely bailing out on the instrumented code, etc). before getting in fixes, I wanted to set up a test case to start chipping away on as you had recommended. ### how it works The validator basically: 1. Traverses the original AST and collects the source locations for some "important" node types - (excludes useMemo/useCallback calls, as those are stripped out by the compiler) 3. Traverses the generated AST and looks for nodes with matching source locations. 4. Generates errors for source locations missing nodes in the generated AST ### caveats/drawbacks There are some things that don't work super well with this approach. A more natural test fit I think would be just having some explicit assertions made against an AST in a test file, as you can just bake all of the assumptions/nuance in there that are difficult to handle in a generic manner. However, this is maybe "good enough" for now. 1. Have to be careful what you put into the test fixture. If you put in some code that the compiler just removes (for eg, a variable assignment that is unused), you're creating a failure case that's impossible to fix. I added a skip for useMemo/useCallback. 2. "Important" locations must exactly match for validation to pass. - Might get tricky making sure things are mapped correctly when a node type is completely changed, for eg, when a block statement arrow function body gets turned into an implicit return via the body just being an expression/identifier. - This can/could result in scenarios where more changes are needed to shuttle the locations through due to HIR not having a 1:1 mapping all the babel nuances, even if some combination of other data might be good enough even if not 10000% accurate. This might be the _right_ thing anyways so we don't end up with edge cases having incorrect source locations. DiffTrain build for [3a495ae](facebook@3a495ae)

add compiler source location validator

45c064a

meta-cla bot added the CLA Signed label Nov 11, 2025

josephsavona reviewed Nov 12, 2025

View reviewed changes

remove 1 line return and check callee object for member expr

1078e85

josephsavona approved these changes Nov 13, 2025

View reviewed changes

josephsavona marked this pull request as ready for review November 13, 2025 03:02

josephsavona merged commit 3a495ae into facebook:main Nov 13, 2025
21 checks passed

nathanmarks mentioned this pull request Nov 13, 2025

[compiler] Fix VariableDeclarator source location #35129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compiler] source location validator #35109

[compiler] source location validator #35109

Uh oh!

nathanmarks commented Nov 11, 2025

Uh oh!

josephsavona Nov 12, 2025

Uh oh!

josephsavona Nov 12, 2025

Uh oh!

josephsavona left a comment

Uh oh!

nathanmarks commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[compiler] source location validator #35109

[compiler] source location validator #35109

Uh oh!

Conversation

nathanmarks commented Nov 11, 2025

refresher on the problem

how it works

caveats/drawbacks

Uh oh!

josephsavona Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

josephsavona Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

josephsavona left a comment

Choose a reason for hiding this comment

Uh oh!

nathanmarks commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants