- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
Adapt Tarjan generic cycle detector for use in Crossgen2 #71426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
 You might need to add a callout to the detection in more than one place. If anything higher than 0 doesn't work, I suspect the problem is that we don't have a callout in all the places that need it. I would at least try to look where the cycle is forming if we have a higher cutoff because that part of crossgen2 might be reachable if user code has a different kind of cycle. | 
        
          
                src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need cleaner error handling here.
        
          
                src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
      d456e05    to
    47788a4      
    Compare
  
    | @trylek is this still being considered or should be closed? | 
| It should be ultimately fixed but we can close it for now, I'll reopen this once we're done with the experiments and start working on .NET 8 features. | 
b4f5253    to
    8c77748      
    Compare
  
    | I believe I have finally managed to make this work for the repro case described in #66079. The main trick is that the depth cutoff heuristic doesn't work well in the case of LanguageExt.Core - the problem with this assembly is that many generic types have lots of type parameters and what basically happens is a form of exponential explosion not necessarily recursively repeating a given generic type within a single type parameter. To cater for this I have introduced a second heuristic - breadth cutoff - that corresponds to the total number of occurrences of types marked as poisoned w.r.t. potentially forming cycles within a generic type that cause the dependency to be identified as potentially cycle-forming. With the default value of 4 for the depth cutoff I have measured sizes of the compiled LanguageExt.Core assembly (Windows x64 release) variating the breadth cutoff: 
 In light of these findings I set the default breadth cutoff to 2. After discussing the matter with Manish I have come to the conclusion that for now it's probably better to make this just an opt-in especially in light of the fact that the compilation takes longer due to the need to build the module tables of poisoned types. If at some point we decide to make this the default (possibly with an opt-out switch) there are several places where we might be able to optimize the algorithm - e.g. by parallelizing the initial poisoned types generation, by caching the results of IsDeepPossiblyCyclicInstantiation, by improving the quadratic algorithm for checking the depth cutoff and the like. I'm still working on validating the change in the lab but for now I believe that it should be feature complete, so to say. | 
| One other aspect worth mentioning is that at this point the change is slightly hacky by manually bringing in bits of NativeAOT logic pertinent to the generic cycle detection along with bits of logging logic. I think that it would be useful to unify this between Crossgen2 and NativeAOT by either putting the relevant code in one of the existing common projects or by creating a new one shared between the two compilers. | 
| Is it possible to write a test where crossgen would previously fail and now works? We have some NativeAOT specific testing in smoketests for generic recursions that were killing the NAOT compiler, mostly extracted from the real world patterns. | 
        
          
                src/coreclr/tools/Common/TypeSystem/Common/TypeSystemContext.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/CompilerTypeSystemContext.Aot.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCompilerContext.cs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                src/coreclr/tools/aot/ILCompiler.ReadyToRun/ILCompiler.ReadyToRun.csproj
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                src/coreclr/tools/aot/ILCompiler.ReadyToRun/ILCompiler.ReadyToRun.csproj
              
                Outdated
          
            Show resolved
            Hide resolved
        
      0040ad7    to
    42fe681      
    Compare
  
            
          
                ...ols/aot/ILCompiler.DependencyAnalysisFramework/ILCompiler.DependencyAnalysisFramework.csproj
              
                Outdated
          
            Show resolved
            Hide resolved
        
      I'll address the code reorg (GenericCycleDetection) and regression test in subsequent commits. Thanks Tomas
So I have a trivial depth cutoff test working; without enabling generic cycle detection Crossgen2 crashes after about an hour on my box with the Arithmetic Overflow as described in the issue dotnet#66079. As next step I'll work on the equivalent breadth test, that is somewhat more tricky. Thanks Tomas
0660b5f    to
    dfd968b      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like the handling of the fixups for compiled methods to be tweaked a bit. Also, do we know what impact this has on the performance of crossgen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dropping the dependencies of the fixup. I'm not very comfortable with that, and agree with @MichalStrehovsky here. Instead, could we detect the cycle when adding the fixup to _fixups, and if we do so, mark the method with whatever details we do, when then entire method can't be compiled.
| Thanks David for your feedback, I'll modify the fixup management based on your suggestion. The impact on Crossgen2 performance is quite detrimental, that's why it's just an opt-in for now, if we decided to turn it on by default, we would at the very least need to improve the perf quite a bit. For the LanguageExt assembly mentioned in the original issue, just the module analysis takes something like a minute on my devbox. | 
| 
 Is that a debug build? From what I saw when enabling this, it usually takes single digits milliseconds to analyze an assembly. | 
| Thanks David and Michal for your feedback. I have updated the PR along David's suggestion. For Michal's comment, I have instrumented  Cycle analysis for LanguageExt.Core took 360382 msecs Cycle analysis for System.Private.CoreLib took 45 msecs Cycle analysis for LanguageExt.Core took 363084 msecs Cycle analysis for LanguageExt.Core took 363232 msecs Cycle analysis for LanguageExt.Core took 363388 msecs Cycle analysis for LanguageExt.Core took 364608 msecs Cycle analysis for LanguageExt.Core took 365848 msecs Cycle analysis for LanguageExt.Core took 367601 msecs Cycle analysis for LanguageExt.Core took 368732 msecs Apparently one problem is that the internal Crossgen2 parallelism triggers initial analysis of the assembly on all the threads by virtue of the lock-free hashtable; I'm not sure if some form of locking would improve this. This is kind of the slowest of my devboxes but I doubt its  | 
| I have experimentally added a somewhat absurd lock around the lock-free hashtable and it indeed made things go faster in general: Cycle analysis for LanguageExt.Core took 96969 msecs Cycle analysis for System.Private.CoreLib took 32 msecs Emitting R2R PE file: D:\git\AndreSteenveld.CrossgenLanguageExt\obj\Release\net6.0\win-x64\R2R\LanguageExt.Core.dll In light of this fact I think it might make sense to consider dropping the parallelized hashtable population in this particular case but technically such change would be somewhat orthogonal to my primary effort and I would feel more comfortable making it in a separate change if we agree it's the right way to go. Also my somewhat ham-fisted approach globally locks the hashtable whereas a more subtle fix could just lock its population for a particular module, I think that two different modules can be happily scanned in parallel. | 
| (Just to clarify regarding Michal's above question, these are measurements on Windows.x64.Release build.) | 
| Oh, if it's a 10 MB assembly, I'm not too concerned. This is absolutely huge by any standards. CoreLib is a good example of a "normal" big assembly and that's 45 ms, which is the expected ballpark. The locking may or may not benefit things. Normal sized assemblies take single digits ms and that's comparable with jitting a couple methods. It would only help if we somehow figure out things will take longer (maybe from size of typespec table?) | 
| Thanks Michal for your additional feedback. Looking at the compilation in the debugger, I see that  | 
As part of investigation of the bug dotnet#66079 and implementation of the fix dotnet#71426 I noticed that one aggravating factor is that Crossgen2 starts analyzing the same assembly multiple times on its various parallel threads; this busywork additionally makes the app compete for access to the same metadata, making the initial analysis even slower. In accordance with Michal's suggestion from the PR thread dotnet#71426 I propose to modify the generic cycle detector to run the initial analysis single-threaded if the module in question is expected to be "generics-heavy" using the number of TypeSpec rows in its ECMA metadata as an indicator of module generic complexity. I have written a simple managed app scanning all runtime framework assemblies, ASP.NET assemblies and assemblies used in internal CoreCLR testing. I found out that the largest number of TypeSpec rows is in FSharp.Core (3855), followed by Microsoft.CodeAnalysis (3148). Based on these findings I have set the initial value of the cutoff for single-threaded analysis to 5000. Thanks Tomas
This change modifies Crossgen2 to use the generic cycle detector
originally implemented for NativeAOT to trim infinite generic
expansion as observed in the LanguageExt public nuget package
and tracked by the issue
#66079
At this point I have only implemented a back-compat option to
opt out of the generic cycle detection in case it turns out to
cause problems for certain payloads. I haven't added the option
to set the cutoff level - in practice, LanguageExt compilation
still fails when the cutoff is set to anything higher than 0;
moreover for Crossgen2 skipping precompilation of certain methods
merely incurs slightly reduced performance, not runtime failures
as is the case with NativeAOT.
Thanks
Tomas
Fixes: #66079
/cc @dotnet/crossgen-contrib