-
Notifications
You must be signed in to change notification settings - Fork 833
Optimize integer for loop code gen #13573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Nice. Can I please ask you to add some benchmarks, so we understand what are implications in runtime here? Comparisons - before/after, comparison with normal loop, etc. Also:
What is current behaviour? Also, it should produce warning/error if step is constant and known at compile time. |
|
@vzarytovskii I added benchmark results to the original pr message. Throwing an error at runtime is a behavior replicated from the RangeInt32 enumerator type that backed the .. .. syntax. sharplab example I can look into displaying a warning/error if the step is zero however it might be better to emit such a message for any usage of an integer range step not just in a for loop. |
Thanks!
Yep, agree, should be done separately for all such cases. |
* adds new active pattern to match int32 range step for loops * existing Expr structure changed, no longer backed by an enumerator * updates test, existing check changed to int64 instead of int32
* even smaller and faster generated il code * ran fantomas
|
I've made further improvements to the codegen that results in smaller and faster il. Disassembled il, c#, and benchmarks updated. |
|
this is ready |
Wow! I noticed you showed timings for a loop of 90x, did you see similar behavior with larger loops? Did you compare when there is slightly more code in the body, or some reference type so that JIT cannot optimize everything away? Just curious! These timings are amazing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, need one more approval @KevinRansom @dsyme @0101 @psfinaki
|
@albert-du This is fantastic work We need some more testing
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above
58dd212 to
65e6553
Compare
|
@dsyme I added over/underflow checking to the constant step optimized loops, variable step loops don't seem to suffer from this as the logic was completely lifted from RangeInt32. Tests for under/overflow checking and tasks added. Debug stepping for loops looks good. Original comment updated with newer code gen for constant step loops. |
|
benchmarks updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further changes are going to be needed here to move the optimization out of checking and into Optimizer.fs or similar, to avoid additions to the quotation API and changes to quotation formats.
This is a little painful as the loop lowering code is placed most naturally in CheckExpressions.
Do you feel able to try the changes with guidance? I'd be happy to help you with them
| | FSharpForLoopUp -> QP.mkIntegerForLoop(ConvExpr cenv env lim0, ConvExpr cenv env lim1, ConvExpr cenv env body) | ||
| | _ -> wfail(Error(FSComp.SR.crefQuotationsCantContainDescendingForLoops(), m)) | ||
|
|
||
| | TOp.IntegerForLoop (_, _, _), [], [Expr.Lambda (_, _, _, [_], lim0, _, _);Expr.Lambda (_, _, _, [_], lim1, _, _);body; Expr.Lambda (_, _, _, [_], step, _, _)] -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned by the addition of these new quotation constructs, see above
src/FSharp.Core/quotations.fsi
Outdated
| /// | ||
| /// <example-tbd></example-tbd> | ||
| [<CompiledName("ForIntegerRangeLoopWithStepPattern")>] | ||
| val (|ForIntegerRangeLoopWithStep|_|): input: Expr -> (Var * Expr * Expr * Expr * Expr) option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimizations shouldn't really cause additions to the quotation API. If we were to add something, we would need to add a corresponding quotation construction method, e.g. similar to
fsharp/src/FSharp.Core/quotations.fs
Line 2632 in 9978a14
| static member ForIntegerRangeLoop(loopVariable, start: Expr, endExpr: Expr, body: Expr) = |
|
|
||
| // optimize 'for i in n .. step .. m do' | ||
| | Expr.App(Expr.Val(vref, _, _), _, [ tyarg; stepTyarg ], [ startExpr; stepExpr; finishExpr ], _) | ||
| when valRefEq g vref g.range_step_op_vref && typeEquiv g tyarg g.int_ty && typeEquiv g stepTyarg g.int_ty -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need to guard this via a language version switch or move this to an optimization phase (though that is painful). It changes the form of quotations for these constructs which is a breaking change.
Quoting these loops is rare but it's also more philosophical: we shouldn't be adding any optimizations which change the quoted form or require new additions to the quotations API. So optimziations should be in later phases.
|
@dsyme I think I'd like to try making those changes, it'll be a good opportunity for me to understand more of the compiler |
|
That's 400% to 1740% faster! |
|
@albert-du please don’t let this go stale, it’s amazing work! 💯 🥇 If it gets behind ‘main’ for too long, the conflicts may become hard to solve. If you need some help with seeing this through, feel free to ping me, or anybody, on F# Slack. |
|
@abelbraaksma I've been busy with school the past few months but I fully intend to work more on this soon. Thank you so much for your support! |
|
This was subsumed by @brianrourkeboll in his PR, closing this one. |
#938 #9548
Optimizes
for i in n .. step .. m doil code gen to avoid allocation.Throws ArgumentException at runtime if step is zero.
Replicates logic of OperatorIntrinsics.RangeInt32 with additional optimizations if the step is a non-zero compiled constant.
Example:
Before:
.method public static void f ( int32 n, int32 step, int32 m ) cil managed { .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 03 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 00 00 ) // Method begins at RVA 0x2050 // Code size 59 (0x3b) .maxstack 5 .locals init ( [0] class [System.Runtime]System.Collections.Generic.IEnumerable`1<int32>, [1] class [System.Runtime]System.Collections.Generic.IEnumerator`1<int32>, [2] int32, [3] class [System.Runtime]System.IDisposable ) IL_0000: ldarg.0 IL_0001: ldarg.1 IL_0002: ldarg.2 IL_0003: call class [System.Runtime]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32) IL_0008: stloc.0 IL_0009: ldloc.0 IL_000a: callvirt instance class [System.Runtime]System.Collections.Generic.IEnumerator`1<!0> class [System.Runtime]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_000f: stloc.1 .try { // loop start (head: IL_0010) IL_0010: ldloc.1 IL_0011: callvirt instance bool [System.Runtime]System.Collections.IEnumerator::MoveNext() IL_0016: brfalse.s IL_0026 IL_0018: ldloc.1 IL_0019: callvirt instance !0 class [System.Runtime]System.Collections.Generic.IEnumerator`1<int32>::get_Current() IL_001e: stloc.2 IL_001f: call void [System.Console]System.Console::WriteLine() IL_0024: br.s IL_0010 // end loop IL_0026: leave.s IL_003a } // end .try finally { IL_0028: ldloc.1 IL_0029: isinst [System.Runtime]System.IDisposable IL_002e: stloc.3 IL_002f: ldloc.3 IL_0030: brfalse.s IL_0039 IL_0032: ldloc.3 IL_0033: callvirt instance void [System.Runtime]System.IDisposable::Dispose() IL_0038: endfinally IL_0039: endfinally } // end handler IL_003a: ret } // end of method _::fAfter:
.method public static void f ( int32 n, int32 step, int32 m ) cil managed { .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 03 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 00 00 ) // Method begins at RVA 0x2050 // Code size 53 (0x35) .maxstack 4 .locals init ( [0] int32, [1] int32, [2] int32 ) IL_0000: ldarg.0 IL_0001: stloc.2 IL_0002: ldarg.1 IL_0003: stloc.1 IL_0004: ldloc.1 IL_0005: brtrue.s IL_0017 IL_0007: ldstr "The step of a range cannot be zero." IL_000c: ldstr "step" IL_0011: newobj instance void [mscorlib]System.ArgumentException::.ctor(string, string) IL_0016: throw IL_0017: ldarg.2 IL_0018: stloc.0 IL_0019: br.s IL_0024 // loop start (head: IL_0024) IL_001b: call void [mscorlib]System.Console::WriteLine() IL_0020: ldloc.2 IL_0021: ldloc.1 IL_0022: add IL_0023: stloc.2 IL_0024: ldloc.2 IL_0025: ldloc.0 IL_0026: ble.s IL_002c IL_0028: ldloc.1 IL_0029: ldc.i4.0 IL_002a: bgt.s IL_0034 IL_002c: ldloc.2 IL_002d: ldloc.0 IL_002e: bge.s IL_001b IL_0030: ldloc.1 IL_0031: ldc.i4.0 IL_0032: bge.s IL_001b // end loop IL_0034: ret }Constant step optimization:
.method public static void f ( int32 n, int32 m ) cil managed { .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00 ) // Method begins at RVA 0x2050 // Code size 24 (0x18) .maxstack 4 .locals init ( [0] int32, [1] int32 ) IL_0000: ldarg.0 IL_0001: stloc.1 IL_0002: ldarg.1 IL_0003: stloc.0 IL_0004: br.s IL_000f // loop start (head: IL_000f) IL_0006: call void [mscorlib]System.Console::WriteLine() IL_000b: ldloc.1 IL_000c: ldc.i4.2 IL_000d: add IL_000e: stloc.1 IL_000f: ldloc.1 IL_0010: ldarg.0 IL_0011: blt.s IL_0017 IL_0013: ldloc.1 IL_0014: ldloc.0 IL_0015: ble.s IL_0006 // end loop IL_0017: ret }Benchmarks
Before:
After:
79.9% to 94.6% faster, while smaller and without allocation.