Skip to content

Conversation

@nhirschey
Copy link
Contributor

This change increases GC throughput when doing parallel processing with fsi.
A simple test:

let nums = [|1..1000000|]

let isPrime n = 
    let upperBound = int (sqrt (float n))
    let hasDivisor =     
        [2..upperBound]
        |> List.exists (fun i -> n % i = 0)
    not hasDivisor

System.GC.Collect()

let finalDigitOfPrimes = 
        nums 
        |> PSeq.filter isPrime
        |> PSeq.groupBy (fun i -> i % 10)
        |> PSeq.map (fun (k, vs) -> (k, Seq.length vs))
        |> PSeq.toArray

// Timing for finalDigit of Primes only.
// Default GC: Real: 00:00:04.444, CPU: 00:00:14.812, GC gen0: 5117, gen1: 3, gen2: 0
// Changing to gcServer: Real: 00:00:02.536, CPU: 00:00:15.187, GC gen0: 205, gen1: 0, gen2: 0

This difference is small in this example, but on an 18 core cpu processing lots of data I saw this change result in computation time falling from ~ 40 minutes to 10 minutes. This seems like an acceptable tradeoff when fsharp is being used interactively.

This change increases GC throughput when doing parallel processing using fsharp interactively.
@KevinRansom
Copy link
Contributor

@nhirschey,

This is an interesting moment for this PR, up until now our view has been, it's easy enough for the developer to pick a value and edit their own config file. However, now with side by side installs and frequent updates of the compiler it seems like editing it, would cause the settings to get overwritten on a fairly frequent basis.

This means we either:

  • need to pick a good default, and I would at least assert that the current default is 'mostly good' except for multi-threads and lots of allocs with short lived objects.
  • or make these kind of settings configurable, with the values stored alongside the vs roaming settings, and have vs configure them at launch.
  • or both.

What do you think?

Kevin

@cartermp
Copy link
Contributor

Another consideration (cc @dsyme) is having a way to configure this for .NET Core projects, particularly for the churn-through-lots-of-data scenario outlined by @nhirschey. Since App.Config is not a thing, it's not quite as straightforward. The ASP.NET Configuration API can certainly be used to understand JSON files, and since we live in a webscale world where JSON is used a lot, this doesn't seem unreasonable. The question is how to make that a decent experience and not something that people need to hand-wire.

@KevinRansom
Copy link
Contributor

@cartermp

I doubt if we will invent a configurable way to set this for coreclr. It may even not be a thing it supports at this time, however, I will follow up with the folk upstairs tomorrow. However, whatever we do, I think configuration via VS, and giving VS a way to start dotnet fsi when it exists seems like a good thing to me.

@nhirschey
Copy link
Contributor Author

nhirschey commented Jul 27, 2018

Thank you Kevin and Phillip for considering this and the thoughtful responses.

You both point out the real problem: poor discoverability and ease of configuration, especially for people who are not very familiar with .NET (hi!). My main goal is to improve out of the box performance for people who do not know about this setting and its effects.

A setting in VS Tools > Options > F# Tools > F# Interactive would be a great. If this existed, I would have figured out why my "parallel" code was running slowly on a single thread much faster than I did.

On defaults: I understand the motivation for making client gc the default. But gcServer is far superior for anyone (like me) using FSI as a faster and easier to parallelize alternative to R or Python. While self-serving, I also think it makes sense to optimize the FSI out of the box experience for these types of users. They are less likely to know about the setting, it's impact, and how to change it. The cost to other types of users seems small.

@KevinRansom
Copy link
Contributor

@nhirschey

I think that it is pretty easy to make a convincing argument that for fsiAnyCpu the default should be server gc. and that for 32 bit fsi the default should be client gc.

However, we hope that developers will be using the coreclr fsi in the longer term, a consequence of that is we need to understand what the longer term needs of the feature are going to look like.

Just as an FYI, I am inclined to pull this PR, the benefits to developers with large datasets and performing sophisticated analysis, seems to be compelling.

@nhirschey
Copy link
Contributor Author

Sounds great @KevinRansom!

The coreclr implications for GC config sound complex. I appreciate you adding this to your list of things to consider for that. I know you guys have a lot on your plate with the rest of that transition.

@KevinRansom
Copy link
Contributor

@nhirschey,

I have had a bunch of conversations with folks around the wider dotnet team and the consensus here is that a benefit of this change is highly situational. There are scenarios where it improves things, and others where it will likely makes things worse.

We are prepared to take a change that allows this to be configured in the IDE, although the default will stay as off, but we are not keen on changing the default to always use serverGC, and really not keen on requiring developers who want the old behavior having to edit a config file buried deep in the program files (x86) folder.

Thanks for the PR, it gave us an opportunity to think through the ramifications of this change.

Kevin

@KevinRansom KevinRansom closed this Aug 8, 2018
@nhirschey
Copy link
Contributor Author

Understood. Thank you for considering.

@cartermp
Copy link
Contributor

Tracking with #5502

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants