Sort the finisher #15489

nojaf · 2023-06-26T11:22:20Z

Sorting the finishers,
Before folding the first state,
Destinies entwined,
In cosmic symphony, they await.

Some of you may have seen the following error message when compiling FSharp.Compiler.Service.fsproj using the new --test:GraphBasedChecking flag.

MSBuild version 17.7.0-preview-23320-09+d30eef85d for .NET
C:\Users\nojaf\Projects\fsharp\src\Compiler\Driver\GraphChecking\Continuation.fsi(2,17): error FS0248: Two modules named 'Continuation' occur in two parts of this assembly [C:\Users\nojaf\Projects\fsharp\src\Compiler\FSharp.Compiler.Service.fsproj::TargetFramework=netstandard2.0]

Build FAILED.

C:\Users\nojaf\Projects\fsharp\src\Compiler\Driver\GraphChecking\Continuation.fsi(2,17): error FS0248: Two modules named 'Continuation' occur in two parts of this assembly [C:\Users\nojaf\Projects\fsharp\src\Compiler\FSharp.Compiler.Service.fsproj::TargetFramework=netstandard2.0]
    0 Warning(s)
    1 Error(s)

You can easily reproduce this problem when you add

if inp.FileName.EndsWith("Continuation.fsi") then
                Thread.Sleep 1000

in CheckOneInputWithCallback in ParseAndCheckInputs.fs.

It was initially discovered reliably by @safesparrow via

#r "nuget: CliWrap, 3.6.0"

open System
open System.IO
open CliWrap

task {
    for idx in [ 0..50 ] do
        printfn "Start %i" idx

        let dll =
            FileInfo(
                @"C:\Users\nojaf\Projects\fsharp\artifacts\obj\FSharp.Compiler.Service\Debug\netstandard2.0\FSharp.Compiler.Service.dll"
            )

        if dll.Exists then
            dll.Delete()

        let! _ =
            Cli
                .Wrap("dotnet") // Assumption is that you use a preview 7.0.4xx SDK (by removing the global.json)
                .WithArguments(
                    "build FSharp.Compiler.Service.fsproj"
                    + " -bl /p:OtherFlags=\"--test:ParallelIlxGen --test:ParallelOptimization --test:GraphBasedChecking --test:DumpCheckingGraph --parallelreferenceresolution\""
                    + " --no-dependencies --no-restore -noWarn:MSB4011 /v:quiet /p:BUILDING_USING_DOTNET=true"
                )
                .WithWorkingDirectory(@"C:\Users\nojaf\Projects\fsharp\src\Compiler")
                .WithStandardOutputPipe(PipeTarget.ToDelegate(printfn "%s"))
                .WithStandardErrorPipe(PipeTarget.ToDelegate(printfn "%s"))
                .ExecuteAsync()
                .Task

        ()
}

In this example, it would fail after a couple of attempts. And show the error FS0248: Two modules named 'Continuation' occur in two parts of this assembly message from:

fsharp/src/Compiler/TypedTree/TypedTreeOps.fs

Line 10164 in 5225dc8

    
           errorR(Error(FSComp.SR.tastTwoModulesWithSameNameInAssembly(textOfPath path2), entity2.Range))

After some investigation, we noticed that when preparing a certain TcState to be used in CheckOneInputWithCallback, the order of the folded finishers is random.
Initially, this didn't seem to matter. And to me, it is still somewhat unclear whether it does.

To recapitulate what happens before CheckOneInputWithCallback to construct the TcState:
combineResults gets called for the next node we wish to process

fsharp/src/Compiler/Driver/GraphChecking/TypeCheckingGraphProcessing.fs

Lines 21 to 59 in 1d8aa13

    
           let combineResults 
        
               (emptyState: 'State) 
        
               (deps: ProcessedNode<'Item, 'State * Finisher<'ChosenItem, 'State, 'FinalFileResult>>[]) 
        
               (transitiveDeps: ProcessedNode<'Item, 'State * Finisher<'ChosenItem, 'State, 'FinalFileResult>>[]) 
        
               (sortResultsToAdd: 'Item -> 'Item -> int) 
        
               (folder: 'State -> Finisher<'ChosenItem, 'State, 'FinalFileResult> -> 'State) 
        
               : 'State = 
        
               match deps with 
        
               | [||] -> emptyState 
        
               | _ -> 
        
                   // Instead of starting with empty state, 
        
                   // reuse state produced by the dependency with the biggest number of transitive dependencies. 
        
                   // This is to reduce the number of folds required to achieve the final state. 
        
                   let biggestDependency = 
        
                       let sizeMetric (node: ProcessedNode<_, _>) = node.Info.TransitiveDeps.Length 
        
                       deps |> Array.maxBy sizeMetric 
        
                   let firstState = biggestDependency.Result |> fst 
        
                   // Find items not already included in the state. 
        
                   // Note: Ordering is not preserved due to reusing results of the biggest child 
        
                   // rather than starting with empty state 
        
                   let itemsPresent = 
        
                       set 
        
                           [| 
        
                               yield! biggestDependency.Info.TransitiveDeps 
        
                               yield biggestDependency.Info.Item 
        
                           |] 
        
                   let resultsToAdd = 
        
                       transitiveDeps 
        
                       |> Array.filter (fun dep -> itemsPresent.Contains dep.Info.Item = false) 
        
                       |> Array.distinctBy (fun dep -> dep.Info.Item) 
        
                       |> Array.sortWith (fun a b -> sortResultsToAdd a.Info.Item b.Info.Item) 
        
                       |> Array.map (fun dep -> dep.Result |> snd) 
        
                   // Fold results not already included and produce the final state 
        
                   let state = Array.fold folder firstState resultsToAdd 
        
                   state

Here, the firstState is picked by looking at the biggest dependency, in order to reduce the number of folds that are still required to have all dependencies information of the current node in the TcState.

In the concrete case of Driver\GraphChecking\FileContentMapping.fs there are four nodes left to fold over the firstState in order to get the correct starting state to begin processing FileContentMapping.fs:

NodeToTypeCheck.ArtificialImplFile "SyntaxTree\SyntaxTreeOps.fsi"
NodeToTypeCheck.ArtificialImplFile "Driver\GraphChecking\Continuation.fsi"
NodeToTypeCheck.PhysicalFile "SyntaxTree\SyntaxTreeOps.fsi"
NodeToTypeCheck.PhysicalFile "Driver\GraphChecking\Continuation.fsi"

(What is an ArtificialImplFile again? It is adding the results of a typed-check signature file to the TcEnv of implementation files. Mimicking as if we type-checked the actual implementation file).

Currently, this order is random. And so far that didn't seem to matter. Eventually, once everything is folded, all information is available in both TcEnv in TcState and all is fine.

However, given the Two modules named 'Continuation' occur in two parts of this assembly error, the order appears to be relevant after all.

The old setting --test:ParallelCheckingWithSignatureFilesOn, always did respect this order. It used the same gimmick of adding signature information to the TcEnv of implementation files. But it always respected the file order.

What currently feels wrong to me here, is that we fold the information of an implementation file to the TcState.TcEnvFromImpls of implementation files (be it a copy of the signature file) first before the actual physical signature is merged on top of the TcState.TcEnvFromSignatures of signature files. This means that NodeToTypeCheck.ArtificialImplFile of X is folded before the NodeToTypeCheck.PhysicalFile of X. If X has both the ArtificialImplFile (copy of signature data for implementation TcEnv) and PhysicalFile (real signature), that should always be folded first.
This is not the expected way this should happen. For the same reason, your signature file always needs to be above your implementation in your project.

My educated guess is that this might impact the CcuThunk of the TcState and thus leads us to the error we see. I have yet to find real evidence for this. Nor can I really put my finger on why this happens for Continuation.fsi and no other files. Speculation is that both Continuation.fs and Continuation.fsi are very small and this might be related.

Once I sorted the order of finishers, I was no longer able to reproduce the problem.

On a side note, the graph-based node code can be a bit confusing to debug at times. That is why I added the NodeToTypeCheck to the Finisher. I think we might benefit from a few more additions to improve the debugging experience.

@safesparrow I would be in favour of removing the generics in TypeCheckingGraphProcessing.processTypeCheckingGraph, I don't think they add that much, to be honest. Thoughts?

Lastly, when you think about it. Continuation.fsi would also just be a private module for FileContentMapping.fs. As it really is only used there and contains two helper functions.

safesparrow · 2023-06-26T11:37:30Z

After some investigation, we noticed that when preparing a certain TcState to be used in CheckOneInputWithCallback, the order of the folded finishers is random.
Initially, this didn't seem to matter. And to me, it is still somewhat unclear whether it does.

I think we do have a good grasp on why processing artificial files before signature files leads to an error - signature files check their contents for duplicates against CcuSig, but add results to tcsRootSigs.
ArtificialImplFiles also check against CcuSig, but add to CcuSig as well.

So processing of a signature file does not affect further duplicate checks, because it doesn't modify CcuSig.

Processing of ArtificialImplFile (or real impl file) does add its symbols to CcuSig and causes signature file to detect duplicates.

This is equivalent to processing an implementation file before its signature (ignoring the fact this would fail at earlier stages).

So it makes sense that order should be maintained.

The one thing I wasn't sure of was why this wasn't spotted before, given it's a generic issue. This is still not clear to me.
Ordering can be different on each run due to using unordered dictionaries in graph/node handling - but why this doesn't error for all other files, I'm not sure.

The change is definitely the right thing to do regardless.

0101

Nice! I just started investigating this error in Transparent Compiler. Will try to apply the changes there and see if it helps. Even though I'm not using processTypeCheckingGraph or combineResults, it looks like the underlying issue might be the same.

src/Compiler/Driver/GraphChecking/TypeCheckingGraphProcessing.fs

vzarytovskii · 2023-06-26T13:48:23Z

Nice! I just started investigating this error in Transparent Compiler. Will try to apply the changes there and see if it helps. Even though I'm not using processTypeCheckingGraph or combineResults, it looks like the underlying issue might be the same.

Oh, is it the one when we have wrong data after unpicking?

safesparrow · 2023-06-26T21:01:49Z

@safesparrow I would be in favour of removing the generics in TypeCheckingGraphProcessing.processTypeCheckingGraph, I don't think they add that much, to be honest. Thoughts?

I don't mind either way.
The generics were used in an attempt to separate the scheduling from the type-checking internals, AST types etc., to make them easier to use in tests - but this part of the code doesn't have tests atm. And the abstraction does make it more difficult to reason about the code for little benefit.

0101 · 2023-06-27T07:41:36Z

Oh, is it the one when we have wrong data after unpicking?

No that's a separate issue. Most likely unrelated to this.

@safesparrow I would be in favour of removing the generics in TypeCheckingGraphProcessing.processTypeCheckingGraph, I don't think they add that much, to be honest. Thoughts?

I don't mind either way. The generics were used in an attempt to separate the scheduling from the type-checking internals, AST types etc., to make them easier to use in tests - but this part of the code doesn't have tests atm. And the abstraction does make it more difficult to reason about the code for little benefit.

Can we keep it for now in case we'll be able to re-use it in Transparent Compiler? Where we'd potentially need to drag more data through it. Not sure yet if it's possible but would be nice if it was unified.

nojaf · 2023-06-27T08:28:51Z

Can we keep it for now in case we'll be able to re-use it in Transparent Compiler?

I genuinely think it would be great if we could address that situation when it arises. We can easily identify and streamline the code when we notice any duplication.

nojaf added 3 commits June 23, 2023 09:09

WIP

e7f9930

Sort resultsToAdd when folding the firstState.

1d8aa13

Undo temporary changes.

0203625

nojaf requested a review from a team as a code owner June 26, 2023 11:22

nojaf added 3 commits June 26, 2023 13:24

Use NodeToTypeCheck instead of idx.

12fdbdd

Undo spaces

8616d7f

Merge branch 'main' into continuation

dbc0988

nojaf marked this pull request as draft June 26, 2023 11:26

0101 approved these changes Jun 26, 2023

View reviewed changes

src/Compiler/Driver/GraphChecking/TypeCheckingGraphProcessing.fs Outdated Show resolved Hide resolved

Merge branch 'main' into continuation

d3fa340

Move processTypeCheckingGraph into ParseAndCheckInputs.fs

a1a6194

nojaf mentioned this pull request Jun 27, 2023

Remove FSharp.Compiler.AbstractIL.IL opens. #15493

Merged

nojaf marked this pull request as ready for review June 27, 2023 09:41

vzarytovskii merged commit 778df95 into dotnet:main Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sort the finisher #15489

Sort the finisher #15489

Uh oh!

nojaf commented Jun 26, 2023

Uh oh!

safesparrow commented Jun 26, 2023

Uh oh!

0101 left a comment

Uh oh!

Uh oh!

vzarytovskii commented Jun 26, 2023

Uh oh!

safesparrow commented Jun 26, 2023

Uh oh!

0101 commented Jun 27, 2023

Uh oh!

nojaf commented Jun 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	let combineResults
	(emptyState: 'State)
	(deps: ProcessedNode<'Item, 'State * Finisher<'ChosenItem, 'State, 'FinalFileResult>>[])
	(transitiveDeps: ProcessedNode<'Item, 'State * Finisher<'ChosenItem, 'State, 'FinalFileResult>>[])
	(sortResultsToAdd: 'Item -> 'Item -> int)
	(folder: 'State -> Finisher<'ChosenItem, 'State, 'FinalFileResult> -> 'State)
	: 'State =
	match deps with
	\| [\|\|] -> emptyState
	\| _ ->
	// Instead of starting with empty state,
	// reuse state produced by the dependency with the biggest number of transitive dependencies.
	// This is to reduce the number of folds required to achieve the final state.
	let biggestDependency =
	let sizeMetric (node: ProcessedNode<_, _>) = node.Info.TransitiveDeps.Length
	deps \|> Array.maxBy sizeMetric

	let firstState = biggestDependency.Result \|> fst

	// Find items not already included in the state.
	// Note: Ordering is not preserved due to reusing results of the biggest child
	// rather than starting with empty state
	let itemsPresent =
	set
	[\|
	yield! biggestDependency.Info.TransitiveDeps
	yield biggestDependency.Info.Item
	\|]

	let resultsToAdd =
	transitiveDeps
	\|> Array.filter (fun dep -> itemsPresent.Contains dep.Info.Item = false)
	\|> Array.distinctBy (fun dep -> dep.Info.Item)
	\|> Array.sortWith (fun a b -> sortResultsToAdd a.Info.Item b.Info.Item)
	\|> Array.map (fun dep -> dep.Result \|> snd)

	// Fold results not already included and produce the final state
	let state = Array.fold folder firstState resultsToAdd
	state

Sort the finisher #15489

Sort the finisher #15489

Uh oh!

Conversation

nojaf commented Jun 26, 2023

Uh oh!

safesparrow commented Jun 26, 2023

Uh oh!

0101 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vzarytovskii commented Jun 26, 2023

Uh oh!

safesparrow commented Jun 26, 2023

Uh oh!

0101 commented Jun 27, 2023

Uh oh!

nojaf commented Jun 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants