-
Notifications
You must be signed in to change notification settings - Fork 830
Description
I was looking into why the Seq.map function is so much slower than System.Linq's Select extension method and found that one major reason is that Seq.map has an unnecessary unboxing step (see the unbox.any below).
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: newobj instance void class Microsoft.FSharp.Collections.IEnumerator/map@111<!!b, !!a>::.ctor(class Microsoft.FSharp.Core.FSharpFunc`2<!1, !0>, class [mscorlib] System.Collections.Generic.IEnumerator`1<!1>)
IL_0008: unbox.any class [mscorlib]System.Collections.Generic.IEnumerator`1<!!b>
IL_000d: ret
Method 1:
To see how to remove this step, I poked around and found this line in IlxGen.fs, which seems to generate this extra line of IL. Commenting out that line appears to have a this sort of effect (but cost more allocations):
| Method | Platform | Jit | count | Median | StdDev | Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
|---|---|---|---|---|---|---|---|---|---|
| LinqSelect | X64 | RyuJit | 10 | 848.7264 ns | 34.2269 ns | 3,038.81 | - | - | 590.74 |
| oldMap | X64 | RyuJit | 10 | 1,544.1877 ns | 87.8153 ns | 2,318.00 | - | - | 451.25 |
| newMap | X64 | RyuJit | 10 | 1,259.8084 ns | 48.7878 ns | 2,492.00 | - | - | 484.89 |
| LinqSelect | X86 | LegacyJit | 10 | 860.3415 ns | 31.4182 ns | 1,547.38 | - | - | 302.12 |
| oldMap | X86 | LegacyJit | 10 | 1,975.4831 ns | 88.3854 ns | 1,178.00 | - | - | 229.05 |
| newMap | X86 | LegacyJit | 10 | 1,750.1149 ns | 47.8851 ns | 1,207.03 | - | - | 240.31 |
I also found an earlier discussion (#871) and a comment by @manofstick about the performance benefits and how that change at least used to pass all the tests. However, there appears to be a related comment that the reason this exists is because of stack merge points. Are these actually a problem because Roslyn also has a similar comment but it doesn't generate the extra casts? And if so, is there a way we could ignore the casts just for those cases?
Some additional benefits of this approach would be that there would be a performance benefit every time an upcast is done (for example every time you use an interface inside F#)
Method 2:
Another way to deal with the extra cast might be to just remove the MapEnumerator type and thus avoid upcasting at all with something like this (for comparison here):
let mapImplementation f (e : IEnumerator<_>) : IEnumerator<_> =
let mutable curr = Unchecked.defaultof<_>
let mutable state = NotStarted
let GetCurrent () =
match state with
| InProcess -> curr
| NotStarted -> notStarted()
| Finished -> alreadyFinished()
{ new IEnumerator<'T> with
member this.Current = GetCurrent()
interface IEnumerator with
member this.Current = box(GetCurrent())
member this.MoveNext () =
state <- InProcess
if e.MoveNext() then
curr <- f e.Current
true
else
state <- Finished
false
member this.Reset() = noReset()
interface System.IDisposable with
member this.Dispose() = e.Dispose() }
Although this helps with the performance, it does appear to come at the cost of more allocations (this
test does 4 chained select/map operations with the id function)
| Method | Platform | Jit | count | Median | StdDev | Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
|---|---|---|---|---|---|---|---|---|---|
| LinqSelect | X64 | RyuJit | 10 | 842.2784 ns | 36.2418 ns | 2,921.75 | - | - | 563.48 |
| oldMap | X64 | RyuJit | 10 | 1,545.4245 ns | 61.1134 ns | 2,358.00 | - | - | 455.39 |
| newMap | X64 | RyuJit | 10 | 1,190.1807 ns | 57.4496 ns | 3,058.00 | - | - | 589.40 |
| LinqSelect | X86 | LegacyJit | 10 | 859.0624 ns | 26.6690 ns | 1,547.28 | - | - | 299.70 |
| oldMap | X86 | LegacyJit | 10 | 1,957.1439 ns | 70.0879 ns | 1,178.00 | - | - | 227.23 |
| newMap | X86 | LegacyJit | 10 | 1,660.6148 ns | 66.1325 ns | 1,472.00 | - | - | 289.18 |
I'm looking for some guidance on which kind of approach would be better before I try submitting a PR.
Thanks