Skip to content

Conversation

@trylek
Copy link
Member

@trylek trylek commented Jun 29, 2022

This change modifies Crossgen2 to use the generic cycle detector
originally implemented for NativeAOT to trim infinite generic
expansion as observed in the LanguageExt public nuget package
and tracked by the issue

#66079

At this point I have only implemented a back-compat option to
opt out of the generic cycle detection in case it turns out to
cause problems for certain payloads. I haven't added the option
to set the cutoff level - in practice, LanguageExt compilation
still fails when the cutoff is set to anything higher than 0;
moreover for Crossgen2 skipping precompilation of certain methods
merely incurs slightly reduced performance, not runtime failures
as is the case with NativeAOT.

Thanks

Tomas

Fixes: #66079

/cc @dotnet/crossgen-contrib

@MichalStrehovsky
Copy link
Member

LanguageExt compilation still fails when the cutoff is set to anything higher than 0

You might need to add a callout to the detection in more than one place. If anything higher than 0 doesn't work, I suspect the problem is that we don't have a callout in all the places that need it. I would at least try to look where the cycle is forming if we have a higher cutoff because that part of crossgen2 might be reachable if user code has a different kind of cycle.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need cleaner error handling here.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 1, 2022
@trylek trylek force-pushed the Crossgen2GenericCycleDetector branch from d456e05 to 47788a4 Compare July 11, 2022 12:18
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 11, 2022
@mangod9
Copy link
Member

mangod9 commented Nov 7, 2022

@trylek is this still being considered or should be closed?

@trylek
Copy link
Member Author

trylek commented Nov 7, 2022

It should be ultimately fixed but we can close it for now, I'll reopen this once we're done with the experiments and start working on .NET 8 features.

@trylek trylek closed this Nov 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2022
@trylek trylek mentioned this pull request May 4, 2023
46 tasks
@trylek trylek reopened this May 19, 2023
@trylek trylek force-pushed the Crossgen2GenericCycleDetector branch 2 times, most recently from b4f5253 to 8c77748 Compare May 19, 2023 22:56
@trylek
Copy link
Member Author

trylek commented May 19, 2023

I believe I have finally managed to make this work for the repro case described in #66079. The main trick is that the depth cutoff heuristic doesn't work well in the case of LanguageExt.Core - the problem with this assembly is that many generic types have lots of type parameters and what basically happens is a form of exponential explosion not necessarily recursively repeating a given generic type within a single type parameter. To cater for this I have introduced a second heuristic - breadth cutoff - that corresponds to the total number of occurrences of types marked as poisoned w.r.t. potentially forming cycles within a generic type that cause the dependency to be identified as potentially cycle-forming. With the default value of 4 for the depth cutoff I have measured sizes of the compiled LanguageExt.Core assembly (Windows x64 release) variating the breadth cutoff:

Breadth cutoff LanguageExt.Core Size
1 32 MB
2 49 MB
3 120 MB
4 592 MB

In light of these findings I set the default breadth cutoff to 2. After discussing the matter with Manish I have come to the conclusion that for now it's probably better to make this just an opt-in especially in light of the fact that the compilation takes longer due to the need to build the module tables of poisoned types. If at some point we decide to make this the default (possibly with an opt-out switch) there are several places where we might be able to optimize the algorithm - e.g. by parallelizing the initial poisoned types generation, by caching the results of IsDeepPossiblyCyclicInstantiation, by improving the quadratic algorithm for checking the depth cutoff and the like. I'm still working on validating the change in the lab but for now I believe that it should be feature complete, so to say.

@trylek
Copy link
Member Author

trylek commented May 19, 2023

One other aspect worth mentioning is that at this point the change is slightly hacky by manually bringing in bits of NativeAOT logic pertinent to the generic cycle detection along with bits of logging logic. I think that it would be useful to unify this between Crossgen2 and NativeAOT by either putting the relevant code in one of the existing common projects or by creating a new one shared between the two compilers.

@MichalStrehovsky
Copy link
Member

Is it possible to write a test where crossgen would previously fail and now works? We have some NativeAOT specific testing in smoketests for generic recursions that were killing the NAOT compiler, mostly extracted from the real world patterns.

@dotnet dotnet unlocked this conversation May 22, 2023
@trylek trylek force-pushed the Crossgen2GenericCycleDetector branch 2 times, most recently from 0040ad7 to 42fe681 Compare May 23, 2023 22:03
@trylek trylek force-pushed the Crossgen2GenericCycleDetector branch from 0660b5f to dfd968b Compare June 9, 2023 17:50
Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the handling of the fixups for compiled methods to be tweaked a bit. Also, do we know what impact this has on the performance of crossgen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dropping the dependencies of the fixup. I'm not very comfortable with that, and agree with @MichalStrehovsky here. Instead, could we detect the cycle when adding the fixup to _fixups, and if we do so, mark the method with whatever details we do, when then entire method can't be compiled.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 9, 2023
@trylek
Copy link
Member Author

trylek commented Jun 9, 2023

Thanks David for your feedback, I'll modify the fixup management based on your suggestion. The impact on Crossgen2 performance is quite detrimental, that's why it's just an opt-in for now, if we decided to turn it on by default, we would at the very least need to improve the perf quite a bit. For the LanguageExt assembly mentioned in the original issue, just the module analysis takes something like a minute on my devbox.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 9, 2023
@MichalStrehovsky
Copy link
Member

just the module analysis takes something like a minute on my devbox

Is that a debug build? From what I saw when enabling this, it usually takes single digits milliseconds to analyze an assembly.

@trylek
Copy link
Member Author

trylek commented Jun 9, 2023

Thanks David and Michal for your feedback. I have updated the PR along David's suggestion. For Michal's comment, I have instrumented CycleInfoHashtable to show time spent in the analysis and this is the result for the LanguageExt.Core assembly from the original issue repro:

Cycle analysis for LanguageExt.Core took 360382 msecs
Cycle analysis for System.Private.CoreLib took 45 msecs
Cycle analysis for LanguageExt.Core took 363084 msecs
Cycle analysis for LanguageExt.Core took 363232 msecs
Cycle analysis for LanguageExt.Core took 363388 msecs
Cycle analysis for LanguageExt.Core took 364608 msecs
Cycle analysis for LanguageExt.Core took 365848 msecs
Cycle analysis for LanguageExt.Core took 367601 msecs
Cycle analysis for LanguageExt.Core took 368732 msecs

Apparently one problem is that the internal Crossgen2 parallelism triggers initial analysis of the assembly on all the threads by virtue of the lock-free hashtable; I'm not sure if some form of locking would improve this. This is kind of the slowest of my devboxes but I doubt its Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz is a thousand times slower than Michal's box. I think one problem could be that the LanguageExt.Core assembly is a real large one, about 10 MB or on par with CoreLib, just much more complex in terms of heavy use of generic structs everywhere; please feel free to try it out in NativeAOT mode, I don't see how the Tarjan cycle detector could be so much faster there considering it basically just uses the ECMA / typesystem code that is shared between the two compilers.

@trylek
Copy link
Member Author

trylek commented Jun 9, 2023

I have experimentally added a somewhat absurd lock around the lock-free hashtable and it indeed made things go faster in general:

Cycle analysis for LanguageExt.Core took 96969 msecs
Cycle analysis for System.Private.CoreLib took 32 msecs
Emitting R2R PE file: D:\git\AndreSteenveld.CrossgenLanguageExt\obj\Release\net6.0\win-x64\R2R\LanguageExt.Core.dll

In light of this fact I think it might make sense to consider dropping the parallelized hashtable population in this particular case but technically such change would be somewhat orthogonal to my primary effort and I would feel more comfortable making it in a separate change if we agree it's the right way to go. Also my somewhat ham-fisted approach globally locks the hashtable whereas a more subtle fix could just lock its population for a particular module, I think that two different modules can be happily scanned in parallel.

@trylek
Copy link
Member Author

trylek commented Jun 9, 2023

(Just to clarify regarding Michal's above question, these are measurements on Windows.x64.Release build.)

@MichalStrehovsky
Copy link
Member

Oh, if it's a 10 MB assembly, I'm not too concerned. This is absolutely huge by any standards. CoreLib is a good example of a "normal" big assembly and that's 45 ms, which is the expected ballpark.

The locking may or may not benefit things. Normal sized assemblies take single digits ms and that's comparable with jitting a couple methods. It would only help if we somehow figure out things will take longer (maybe from size of typespec table?)

@trylek
Copy link
Member Author

trylek commented Jun 10, 2023

Thanks Michal for your additional feedback. Looking at the compilation in the debugger, I see that LanguageExt.Core has | NumberOfRows | 0x000090ed | (about 37K) in the TypeSpec table; in contrast, SPC has just | NumberOfRows | 0x00000636 | (1.5K or so) in the same table. The interesting question is how to actually utilize this number in the compilation - we could possibly treat it as an indicator that we need to run the generic cycle analysis, to avoid parallelization in its initial scanning and / or for additional tweaks in the compilation pipeline. We'd also need to establish and measure additional heuristics like - is this supposed to be compared with a hardcoded constant, or somehow weighted as a percentage of the number of TypeDesc / MethodDesc table entries or something yet completely different.

@trylek trylek merged commit bc4cbb6 into dotnet:main Jun 13, 2023
@trylek trylek deleted the Crossgen2GenericCycleDetector branch June 13, 2023 00:49
trylek added a commit to trylek/runtime that referenced this pull request Jun 13, 2023
As part of investigation of the bug dotnet#66079 and implementation of
the fix dotnet#71426 I noticed that one aggravating factor is that
Crossgen2 starts analyzing the same assembly multiple times on
its various parallel threads; this busywork additionally makes the
app compete for access to the same metadata, making the initial
analysis even slower.

In accordance with Michal's suggestion from the PR thread dotnet#71426
I propose to modify the generic cycle detector to run the initial
analysis single-threaded if the module in question is expected
to be "generics-heavy" using the number of TypeSpec rows in its
ECMA metadata as an indicator of module generic complexity.

I have written a simple managed app scanning all runtime framework
assemblies, ASP.NET assemblies and assemblies used in internal
CoreCLR testing. I found out that the largest number of TypeSpec
rows is in FSharp.Core (3855), followed by Microsoft.CodeAnalysis
(3148). Based on these findings I have set the initial value of
the cutoff for single-threaded analysis to 5000.

Thanks

Tomas
@ghost ghost locked as resolved and limited conversation to collaborators Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compiling application with LanguageExt as a dependency and ReadyToRun enabled using .NET 6

5 participants