Some performance optimizations and code cleanup #16302

Thorium · 2023-11-19T02:15:45Z

I did some code cleanup without significant changes, and managed to cut around 10% of unit-test-execution times in my laptop (which might be coincidence).

I cloned fsharp-repo and my unit-tests on main branch fails for
FSharp.Compiler.Scripting.UnitTests.InteractiveTests.Script with nuget package that yields out of order dependencies works correctly So I have to do this PR to see CI results.

I tried to not touch functionality of anything.
I tried to keep the test-files as they are, so all the possible performance gain is from real code.
This came to touch quite a big amount of files, so I don't care if you merge or not, maybe just copy&paste some meaningful bits, whatever: I use F# daily, so any speedup would help.

Here are the main changes:

Instead of doing 2 dictionary lookups if whateverDict.ContainsKey x then Some whateverDict.[x] else None this uses .TryContainsKey to do the same business with only one lookup.
When your nested for-loops are having list.Contains, rather convert the nested list first to a Set for faster searching. So instead of for x in xs do if List.contains x ys this does let ySet = ys |> Set.ofList; for x in xs do if Set.contains x ySet (meanwhile still applicable, reverted to separate PR due to memory concerns)
~~I also changed some small DU to a Struct, and Some int to ValueSome int~~ (reverted to avoid changing public api)
FSharpLint: Recommends to use isNull x over x = null because it's faster
FSharpLint: Recommends to use List.collect(fun x -> ...) over List.map(fun x -> ...) |> List.concat etc to avoid iterating lists multiple times.
FSharpLint: Recommends to use List.map myFunc over List.map(fun x -> myFunc x), and I'm not sure if it will cut some extra lambda from AST or not. For this reason it also prefers >> but I generally try to avoid too much point-free style.

I didn't include FSharpLint as tool because it gave also many false-positives like member-access setting to false: MyClass(x, MyMember = false), second parameter should not be replaced to false, because it's a named setter, not Boolean comparison.

Some ContainsKey double lookups to TryContainsKey Some nested list Contains to set Contains. Some FSharpLint rules also "dotnet fantomas . -r" but I bet it'll make merging harder.

Thorium · 2023-11-19T09:19:18Z

This is "how fast F# code executes" in current main:

This is "how fast F# code executes" in this PR:

However there are 3 failing tests because this PR made trimmed FSharp.Core.dll smaller:

CheckTrim : Test failed with unexpected net8.0  -  trimmed FSharp.Core.dll length:
Expected:
	287744 Bytes
Actual:
	287232 Bytes

I think I need to update the test rather than trying to make the file larger.

…h SelfContained_Trimming_Test and StaticLinkedFSharpCore_Trimming_Test the results once.

kerams · 2023-11-19T10:33:22Z

Nice changes. However, I would not take the CI duration difference at face value. They sometimes wildly fluctuate between runs.

src/Compiler/AbstractIL/ilreflect.fs

src/Compiler/Checking/NameResolution.fs

src/Compiler/Driver/CompilerDiagnostics.fs

src/Compiler/Optimize/Optimizer.fs

src/Compiler/Service/FSharpCheckerResults.fs

src/Compiler/SyntaxTree/XmlDoc.fs

tests/AheadOfTime/Trimming/check.ps1

Thorium · 2023-11-19T11:00:09Z

I find the more verbose case much easier to understand,

I know there is a tiny difference on readability, but the other one makes more complex and slower AST.

I would not take the CI duration difference at face value.

I know, but that was the only thing I could measure easily the build time, it was consistently faster also in my local machine.
Besides of people saving on build times on daily basis, the performance improvement is achieved by FSharp.Core working faster, not a build-time trick. And AOT saves half a megabyte.

vzarytovskii · 2023-11-19T11:02:58Z

Nice changes. However, I would not take the CI duration difference at face value. They sometimes wildly fluctuate between runs.

Yeah, CI runs aren't trustworthy, they use different machines and containers.

vzarytovskii · 2023-11-19T11:07:33Z

I find the more verbose case much easier to understand,

I know there is a tiny difference on readability, but the other one makes more complex and slower AST.

I am sure JIT will optimize branching. It sure can't be that significant.

Thorium · 2023-11-19T11:26:20Z

Modifying 80 files, I don't want to make separate PR per each one. And I don't want to cause more work for you. These are all little independent changes without public API, so you could do merge so that you just unstage/ignore the parts you don't like.

Thorium · 2023-11-19T11:27:42Z

I am sure JIT will optimize branching.

I like your optimism. :-)

vzarytovskii · 2023-11-19T11:30:42Z

Modifying 80 files, I don't want to make separate PR per each one. And I don't want to cause more work for you. These are all little independent changes without public API, so you could do merge so that you just unstage/ignore the parts you don't like.

It looks alright, the only singular part I'm slightly worried about is set allocation and GC. Want to hear others' opinions.

vzarytovskii · 2023-11-19T11:38:19Z

I am sure JIT will optimize branching.

I like your optimism. :-)

Hm, actually, JIT generates exactly the same code for both:

https://sharplab.io/#v2:EYLgtghglgdgNAGxAMwM5wC4gqsAfBAUwwAJkB7ckiE4EgYxIBMSBeAWACgSSplqSGABaEYggE4BXQlx6EEfWiQBkyhirUthoidNkl5qQmQgIj+rkVLAI4gXUYsO3AXjxLV6z0wsuuQA

Thorium · 2023-11-19T12:31:54Z

Ok I think I found even better way to do the check. Now reverted some changes. Interesting to see if AOT size changes again.

But in longer run I think the unit-test testing exact binary byte-size of dll is not sustainable solution. It'll just make the 30min build fail twice.

vzarytovskii · 2023-11-19T13:14:08Z

But in longer run I think the unit-test testing exact binary byte-size of dll is not sustainable solution. It'll just make the 30min build fail twice.

Its purpose is exactly that. To verify consciously that size change is desired.

Thorium · 2023-11-19T13:24:31Z

But in longer run I think the unit-test testing exact binary byte-size of dll is not sustainable solution. It'll just make the 30min build fail twice.

Its purpose is exactly that. To verify consciously that size change is desired.

Can it be run locally quickly? E.g. build Release without testing then look the size?

Thorium · 2023-11-19T13:42:29Z

It looks alright, the only singular part I'm slightly worried about is set allocation and GC. Want to hear others' opinions.

I'll split this PR to 2, to send the set allocations as a separate PR, so that can be explored more in detail.

…e PR comments Will open this as separate PR

En3Tho · 2023-11-19T13:57:18Z

@vzarytovskii Your sharplab points to x86 version of code. You have to choose x64 explicitly there. X86 code is generally worse

vzarytovskii · 2023-11-19T14:00:08Z

@vzarytovskii Your sharplab points to x86 version of code. You have to choose x64 explicitly there. X86 code is generally worse

Either it doesn't work or x64 yields the same thing

Thorium · 2023-11-19T14:54:41Z

@vzarytovskii As far as I can see all the issues were resolved and this would ready for merge.

tests/AheadOfTime/Trimming/check.ps1

vzarytovskii · 2023-11-20T11:42:52Z

I did some benchmarks (very rough ones), and the difference is within statistical error for the most part.

Both compilers are not ngen'd or crossgen'd, so difference in elapsed time will be even smaller in SDK.

Compiling compiler itself:
Before, elapsed, seconds: 26,0397-26,3572
After, elapsed, seconds: 25,3997-25,7576

Before, working set, mb: 3312
After, working set, mb: 3483

Before, GC, (Gen0, Gen1, Gen2), number of: 22, 13, 7
After, GC, (Gen0, Gen1, Gen2), number of: 23, 13, 7

So it's marginally faster, migth be GC-ing and allocating a bit more.

I am alright with merging it if folks are fine with trimming changes @0101 @T-Gro

src/FSharp.Core/Query.fs

Thorium · 2023-11-20T12:41:09Z

I did some benchmarks (very rough ones)

Was this with the List.contains -> Set.contains or without? As I already removed that from this PR as non-beneficial.

vzarytovskii · 2023-11-20T13:18:27Z

I did some benchmarks (very rough ones)

Was this with the List.contains -> Set.contains or without? As I already removed that from this PR as non-beneficial.

With just latest commit from the branch. So probably without. Not sure where do allocations come from, they might as well come from the differences in the code we compile (less lambda lifting).

Thorium · 2023-11-20T13:40:00Z

Hm, actually, JIT generates exactly the same code for both:

Off-topic but look at this:
https://sharplab.io/#v2:EYLgtghglgdgNAGxAMwM5wC4gqsAfBAUwwAJkB7ckiE4EgXgFgAoEkqZakjAC0JjIQEqQizaEEHWtz4DkQkWJISR3AE4BXUaxIsipYBDVc6THTHIYAFDTx5aASiXPmQA

src/Compiler/AbstractIL/ilwritepdb.fs

src/Compiler/Optimize/InnerLambdasToTopLevelFuncs.fs

src/Compiler/Checking/MethodOverrides.fs

src/Compiler/Checking/QuotationTranslator.fs

smoothdeveloper · 2023-11-20T22:00:17Z

Thinking of infrastructure regarding the "compiler doesn't generate performant IL":

putting the sample code under a test with IL baselines, under a "optimization" folder
the test would have comments in a certain format (some json?) highlighting version of F# which generates imperfect/faulty IL, and the version it got fixed (whenever this occur)

Then we can track tests added like this to tickets in a csv fail which is maintained in the repository, and helps with the process of having a "Tracking" meta issue about all codegen / optimization related issue, and generating some release notes about code generation.

I can give this a look to sketch something, taking the comments and issues opened about those here, and maybe in the issue tracker, and add to the IL baseline tests and list the tests in a csv file to make a PR.

psfinaki

I think in its current shape the changes are beneficial and benign. Up for merging.

T-Gro · 2023-11-21T09:54:13Z

I am also fine with merging, approved + marked my comments as resolved.

vzarytovskii · 2023-11-22T17:17:34Z

Sigh...fantomas changes from latest Fantomas update...it's going now to affect majority of PRs :(

vzarytovskii · 2023-11-22T17:19:23Z

@nojaf FYI - rather small example of what's usually happening when a lot of small formatting changes introduced.

# Conflicts: # src/Compiler/CodeGen/IlxGen.fs # src/Compiler/Driver/CompilerDiagnostics.fs # src/Compiler/Driver/GraphChecking/TrieMapping.fs # src/Compiler/Interactive/fsi.fs # src/Compiler/Service/ServiceParamInfoLocations.fs # src/Compiler/SyntaxTree/SyntaxTreeOps.fs # src/Compiler/Utilities/sformat.fs # src/FSharp.Core/QueryExtensions.fs # src/FSharp.Core/seq.fs

vzarytovskii · 2023-11-22T17:29:15Z

@Thorium thanks for taking care of it. It was rather fast. Did you do manual resolution or with something more "smart"?

Thorium · 2023-11-22T17:31:14Z

Manual, I know what are the changes and I see what Fantomas did, and there were actually not too many conflicts. Just had to force because I didn't want to deal with GitHub partial successful automation. I recommend Kdiff3 as mergetool.

nojaf · 2023-11-23T09:32:34Z

@nojaf FYI - rather small example of what's usually happening when a lot of small formatting changes introduced.

Well, this was bound to happen for all open PRs. But again, I do not perceive this as a blocker to not update Fantomas. There will never be a good time when there are no open PRs.

Thorium requested a review from a team as a code owner November 19, 2023 02:15

code optimisations and cleanup:

839886c

Some ContainsKey double lookups to TryContainsKey Some nested list Contains to set Contains. Some FSharpLint rules also "dotnet fantomas . -r" but I bet it'll make merging harder.

Thorium force-pushed the cleanup branch from 131fcab to 839886c Compare November 19, 2023 08:03

Thorium added 2 commits November 19, 2023 09:38

Made FSharp.Core.dll expectation tighter. It would be nice to get bot…

dd3c211

…h SelfContained_Trimming_Test and StaticLinkedFSharpCore_Trimming_Test the results once.

As expected, StaticLinkedFSharpCore_Trimming_Test smaller too...

d363de6

vzarytovskii requested changes Nov 19, 2023

View reviewed changes

vzarytovskii feedback implemented

b279629

Reverted Sets to Lists due to speculation of memory consumption in th…

6ecd140

…e PR comments Will open this as separate PR

Thorium requested a review from vzarytovskii November 19, 2023 15:00

T-Gro reviewed Nov 20, 2023

View reviewed changes

tests/AheadOfTime/Trimming/check.ps1 Show resolved Hide resolved

T-Gro reviewed Nov 20, 2023

View reviewed changes

tests/AheadOfTime/Trimming/check.ps1 Show resolved Hide resolved

vzarytovskii reviewed Nov 20, 2023

View reviewed changes

src/FSharp.Core/Query.fs Show resolved Hide resolved

brianrourkeboll reviewed Nov 20, 2023

View reviewed changes

src/Compiler/AbstractIL/ilwritepdb.fs Show resolved Hide resolved

src/Compiler/Optimize/InnerLambdasToTopLevelFuncs.fs Show resolved Hide resolved

psfinaki reviewed Nov 20, 2023

View reviewed changes

src/Compiler/Checking/MethodOverrides.fs Show resolved Hide resolved

psfinaki reviewed Nov 20, 2023

View reviewed changes

src/Compiler/Checking/QuotationTranslator.fs Show resolved Hide resolved

psfinaki approved these changes Nov 20, 2023

View reviewed changes

T-Gro approved these changes Nov 21, 2023

View reviewed changes

smoothdeveloper mentioned this pull request Nov 22, 2023

putting a bunch of opened issues about IL optimisation under IL baseline tests #16325

Draft

vzarytovskii approved these changes Nov 22, 2023

View reviewed changes

Thorium force-pushed the cleanup branch from 2c5f594 to c829eef Compare November 22, 2023 17:27

psfinaki merged commit 2a25184 into dotnet:main Nov 22, 2023

allisonchou mentioned this pull request Dec 15, 2023

[Automated] PRs inserted in VS build main-34415.28 #16445

Closed

Some performance optimizations and code cleanup #16302

Some performance optimizations and code cleanup #16302

Uh oh!

Conversation

Thorium commented Nov 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

kerams commented Nov 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

vzarytovskii commented Nov 19, 2023

Uh oh!

vzarytovskii commented Nov 19, 2023

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

vzarytovskii commented Nov 19, 2023

Uh oh!

vzarytovskii commented Nov 19, 2023

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

vzarytovskii commented Nov 19, 2023

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

En3Tho commented Nov 19, 2023

Uh oh!

vzarytovskii commented Nov 19, 2023

Uh oh!

Thorium commented Nov 19, 2023

Uh oh!

Uh oh!

Uh oh!

vzarytovskii commented Nov 20, 2023

Uh oh!

Uh oh!

Thorium commented Nov 20, 2023

Uh oh!

vzarytovskii commented Nov 20, 2023

Uh oh!

Thorium commented Nov 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smoothdeveloper commented Nov 20, 2023

Uh oh!

psfinaki left a comment

Choose a reason for hiding this comment

Uh oh!

T-Gro commented Nov 21, 2023

Uh oh!

vzarytovskii commented Nov 22, 2023

Uh oh!

vzarytovskii commented Nov 22, 2023

Uh oh!

vzarytovskii commented Nov 22, 2023

Uh oh!

Thorium commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojaf commented Nov 23, 2023

Thorium commented Nov 19, 2023 •

edited

Loading

kerams commented Nov 19, 2023 •

edited

Loading

Thorium commented Nov 22, 2023 •

edited

Loading