Improve async stack traces #4867

dsyme · 2018-05-09T17:31:36Z

The stack traces for F# async are not good see #2741. This PR starts to improve them (though doesn't solve the problem anywhere near completely)

Summary

Debugging computationally embedded languages gives bad stack traces. This PR outlines

the problem
a technique for improving things for most synchronous parts of computations
applies the technique to Async
also improves exception stack traces for the first throw of an exception

Problem One - losing stack traces for computationally embedded languages

There is a fundamental problem with debugging for pretty much all "computationally embedded DSLs" which delay computation. Many basic DSLs suffer this problem, both in functional and object programming.

For our purposes, "Computationally embedded DSLs" are ones where:

The user code creates a composition of objects using helper calls from a library (e.g. async { ... })
The user code then executes the computation implied by that collection of objects (e.g. calls Async.RunSynchronously).

The process of execution may dynamically create more computation objects along the way during phase2, though that's kind of irrelevant.

A very, very simple example of such a "computationally embedded language" is below. This DSL has lousy debugging in F#:

module DSL = 
    type Computation<'T> = C of (unit -> 'T)

    let bind (C f) k = C(fun () -> let (C f2) = k (f ()) in f2())

    let ret x = C (fun () -> x)

    let add c1 c2 = bind c1 (fun v1 -> bind c2 (fun v2 -> ret (v1 + v2)))

    let run (C f) = f ()

which lets you write things like:

    open DSL
    let f1() = ret 6
    let f2() = ret 7
    let f3() = add (f1()) (f2())
    f3() |> run

Now the "stack" when the ret 6 computation of f1() is actually "invoked" (i.e. when it gets invoked as part of f3() |> run) is crappy. You can see the crappy stack by augmenting the example as follows:

type Computation<'T> = C of (unit -> 'T)

let delay f = bind (ret ()) f
let stack msg c1 = 
    delay (fun () -> 
        System.Console.WriteLine("----{0}-----\n{1}\n----------", msg, System.Diagnostics.StackTrace(true).ToString())
        c1)

Here is the stack:

----f1-----
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 12
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 5
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 5
   at A.run[a](Computation`1 _arg1) in C:\GitHub\dsyme\visualfsharp\a.fs:line 13
   at A.main(String[] argv) in C:\GitHub\dsyme\visualfsharp\a.fs:line 22

----------

Barely any user code is on the stack at all! The line numbers are all the closures in the implementation of the DSL!!! Ugh!! 😞 👿

Note that C# async and other implementations of Promises/Tasks/"Hot Tasks" don't suffer this problem since they are not "delayed". But most F# DSLs (and many C# ones) are indeed "delayed". For async, it is F#'s use of "cold" asyncs that require explicit starting that causes the issue. Using "cold" asyncs brings many benefits - most particularly the implicit plumbing of cancellation tokens - but also bring problems.

Problem Two - Losing stack traces for exceptions

There is a second fundamental problem with .NET stack traces for any computationally embedded DSL that may throw .NET exceptions. This is based on this problem, where only a limited number of stack frames are included in the .StackTrace property of a .NET exception when it is thrown, up to the first catch handler.

This is a serious and surprising limitation in the CLR, and I think it would be great to fix it. For example it’s a very common pattern in functional programming to quickly turn exceptions into data, e.g. consider basic patterns like the Railway Oriented programming pattern. This might have some code like this:

/// A type very similar to this is in the FSharp.Core library
type Result<'T> = 
   | Ok of 'T 
   | Error of System.Exception

let embed f = 
    try Ok(f()) 
    with exn -> Error exn

let run f = f()

run (embed (fun () -> failwith "fail"))

Here the StackTrace data for the exception carried by the Error data will only carry a partial stack trace containing one stack frame. See the link above for why - but basically the CLR only adds stack frames up to the first try ... with handler. Ugh. From the user’s point of view that’s not what they normally need or expect. But it's what the CLR does.

There are C# equivalents to such patterns, especially with C# becoming more and more expression-oriented. They suffer the same problem.

Problem One - What can we do?

Now, what cards can we play to fix this? In general, it seems "hard". Here are the options I know of:

Keep causal stacks in objects and reinstate them when the behaviour of the objects is activated.
- This could be done if there was a way to do some kind of [<CaptureCallerStackToken>] and pass ti down. We can simulate that by capturing a System.Diagnostics.StackTrace for each and every computational object created and passing them down.
- This could give very nice stack traces, but you would need to be able to hijack the normal debugging mechanisms and say "hey, this is the real causal stack". This can be done for exception stack traces by hacking the internals of a .NET exception object (see this blog post).
- This is expensive. We can't normally afford to keep stacks from the creation of objects in the actual object closures.
- In some sense, this is what C# async debugging is doing. However I don't yet understand the mechanism C# async is using or if it is reusable for other computationally embedded languages. It is possible that it is, by emitting just the right calls and attributes. That's future work, any advice or links appreciated.
- It is also roughly what is done for rethrow exceptions using ExceptionDispatchInfo. However AFAIK this can't be done for stack traces at breakpoints.
We could pass source/line number information into the process of constructing the computational objects. This was done by Eirik Tsarpalis here in order to get better exception stack traces. However this feels expensive - or at least not cost free - and also requires changing API surface area. Also, it is only helpful for stack traces produced for exceptions.
The third option, and the one used here, is to "inline" the key parts of the computational language (and its composition operations) so that the key closures which implement the composition become user code. This relies on the fact that when the F# compiler inlines code, it re-tags closures contained in that code with the source information of callsite.

Here is an example of how we can play this game with the very simple computational language above:

type Computation<'T> = C of (unit -> 'T)

[<System.Diagnostics.DebuggerHidden>]
let apply (C f) = f()
let inline bind f k = C(fun () -> let f2 = k (apply f) in apply f2)
let inline ret x = C (fun () -> x)
let inline add c1 c2 = bind c1 (fun v1 -> bind c2 (fun v2 -> ret (v1 + v2)))
let inline stack msg c1 = bind (ret ()) (fun () -> System.Console.WriteLine("----{0}-----\n{1}\n----------", msg, System.Diagnostics.StackTrace(true).ToString()); c1)
let run c = apply c

The rules we've applied are:

Inline the mechanisms of composition
Expose any necessary "runtime" functions such as apply and mark them DebuggerHidden

The resulting stack trace (excluding DebuggerHidden methods) is

----f1-----
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 16
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 18
   at A.main(String[] argv) in C:\GitHub\dsyme\visualfsharp\a.fs:line 23

This is not perfect - we see the generated closure names - but at least the line numbers are right and names A.f1 and A.f3 appear. This is a significant improvement.

Problem Two - What can we do?

One way to improve the exception stacks for problem 2 is to use a "trampoline" to run the synchronous parts of computations. When an exception happens, the exception continuation (or other information required to continue execution) is written into the trampoline.

This technique works for async since we have a trampoline available.

Note that the stack trace is lost when rethrowing an exception. Sadly this means that it is lost when an implicit rethrow happens with this sort of thing:

try ... 
with :? ArgumentException as e -> ...

You are encouraged to do something like this pattern to keep good InnerException information:

try ... 
with 
    | :? ArgumentException as e -> ... 
    | otherExn -> raise (AggregateException(..., otherExn))

What this PR actually does

This PR applies technique #3 to async. The steps:

The Bind composition operation in AsyncBuilder is marked inline along with a couple of others.
An extra level of underlying async execution machinery is exposed in the API of FSharp.Core. The new API surface area is in control.fsi in the PR - all of the added APIs are for generated code only. All of these will be marked compiler use only but become part of the long-term binary compatible API for FSharp.Core. This is "just enough" to allow key closures to be inlined into user code, but not too much that we can't re-implement how Async is implemented at a later stage.

Limitations

Only improves debug user code that is compiled with tailcalls off
Improves the debugging experience for first throw of exceptions and the synchronous parts of asynchronous code.
The stack is still lost if
1. the "trampoline" has been used, which happens every 300 executions of bind on a stack or when any exception is raised (even if it is caught)
2. an exception is rethrown.
The improvements need a modified FSharp.Core.dll (i.e. F# 4.5)

Remaining Steps

Design question: return! asyncExpr results in the loss of a debug stack frame, it is always treated as an async tailcall even if tailcalls are off. See this comment
Design question: currently we don't inline TryWith, consider doing this.: Now done
Get it green (e.g. updates to API surface area, updates to generated code samples)
Double check the stepping and breakpoint behaviour of async debugging after this change.

Later:

See if we can apply the same technique to cancellable and eventually in the F# compiler codebase

…o async-st

…o async-cleanup

cartermp · 2018-05-09T17:43:06Z

See if we can apply the same technique to cancellable and eventually in the F# compiler codebase

Oh yes yes yes

…o async-st

dsyme · 2018-05-09T21:27:36Z

I've realised as a side effect of this that our F# async exception implementation is hitting this problem badly, where only one stack trace is included in the .StackTrace property of the exception. This is partly because our F# async machinery catches the exception “early”.

If someone knows a way, somewhere deep in .NET Framework or .NET Core, by which to attach a deeper stack trace string to an exception that has already been thrown, I'd be very grateful.

We’re willing to do anything to make this happen: create a task, whatever. I’ve even considered using the reflection hack documented here to set the _stackTraceString field, which Eirik Tsarpalis has also used.

The one thing we can’t do is wrap the exception in an AggregateException – we have to throw the user’s exception.

dsyme · 2018-05-09T22:39:33Z

I've realised as a side effect of this that our F# async exception implementation is hitting this problem ...

I think I've found an approach that fixes this. Essentially we let exceptions propagate all the way up to the trampoline and deal with them there. When combined with the rest of this work this should give much, much better exception stacks

For example for the code

let rec async_syncLoopViaTailCallFail(n) = async {
    if n > 10 then 
        let! res = failwith "fail"
        return res
    else
        return! async_syncLoopViaTailCallFail(n+1)
}

let async1() = async {
    System.Console.WriteLine("next")
    let! res = async_syncLoopViaTailCallFail(0)
    return "no"
}


try 
     async1() |> Async.RunSynchronously
 with e -> 
     e.StackTrace

we get exception stack trace

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24
   at Microsoft.FSharp.Control.AsyncPrimitives.Call[T,TResult](AsyncActivation`1 ctxt, TResult result1, FSharpFunc`2 part2)
   at Microsoft.FSharp.Control.AsyncPrimitives.Bind[T,TResult](Boolean keepStack, AsyncActivation`1 ctxt, FSharpAsync`1 part1, FSharpFunc`2 part2)
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 84
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 83
   at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction)
--- End of stack trace from previous location where exception was thrown ---
...

instead of

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24
   at Microsoft.FSharp.Control.AsyncPrimitives.Call[T,TResult](AsyncActivation`1 ctxt, TResult result1, FSharpFunc`2 part2f)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

If you somehow ignore DebuggerHidden stuff in those traces then you get

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 84
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 83
...

instead of just

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24

src/fsharp/FSharp.Core/control.fs

dsyme · 2018-05-12T01:12:13Z

@dotnet-bot test this please

dsyme · 2018-05-12T10:26:39Z

Yay, it's green!

dsyme · 2018-05-13T01:36:44Z

Note: we need to double check the stepping and breakpoint behaviour of async debugging after this change.

dsyme · 2018-05-13T03:31:53Z

Note: we need to double check the stepping and breakpoint behaviour of async debugging after this change.

I've done that now - stepping (F11) is as good as it has been before. step-over (F10) is also as good as before (though is not great or intuitive)

dsyme · 2018-05-13T03:32:08Z

This is ready for further road testing

forki · 2018-05-13T08:20:28Z

Awesome! Congrats.

dsyme · 2018-05-13T23:17:23Z

This PR is no nirvana, but it does make some improvements. Some of the glitches remaining (even in Debug code) are

breakpoints are generally useful
locals are generally useful but maybe hidden in this or eliminated if no longer in scope.
stepping F11 often takes multiple stops on the same source line, e.g. the try in an async try/finally is is visited twice on the way into the async and once on the way out. There is also an extra step on the return from the try/finally
stepping F11 into an async function let f() = async { ... } takes one step onto the async (when the async computation is constructed) then back out then back in to execute the function.
stepping F11 on a async for loop hits the for multiple times. This is because the for is really a Using+TryFinally
step-over F10 is often counter-intuitive. For example, it doesn't step into a try/finally. F10 over a let! is counter-intuitive and not useful. Much of this is due to the interaction between the generated code, tailcalls and the remaining code in FSharp.Core.dll
exception stacks are improved but not when exceptions are rethrown

When I look at the code we generate for async, I can see room for many improvements, both from performance and debugging perspectives. But we should get something in the bank, so I'm going to stop here for now.

* Fixing #4967 (#4968) * Fix internal error when compiling interface implementation lacking an overloaded method implementation - fixes #4967 * Adding a test * Update E_OverloadMismatch.fs * Remove a setify + isSingleton combo (#4980) * remove a setify * Remove intermediate collection * Improve async stack traces (#4867) * very early prototype * async stack traces * async cleanup * minor fix * async cleanup * more async cleanup * integrate async-cleanup * async cleanup * fix build * more cleanup * minor fixes * minor fixes * full exception stacktraces * fix test * fix test * code review * cleanup naming * fix build * undo rethrow and integrate cleanup * apply renamings * Further cleanup in control.fs * add tests and add filtering TryWith, plus other cleanup * integrate cleanup * fix tests * test only runs on .net framework * slightly tweak primitives to be more suitable for later optimization * slightly tweak primitives to be more suitable for later optimization * update baselines * add check that no line 0 appear in stack * update baseline * use struct wrapper for async activation * simplify code * simplify code * update baselines * update baselines * fix baseline * remove dead code * simplify code * apply DebuggerHidden in a couple more places * [ RFC FS-1039] implementation of value options (#4837) * posible implementation of value options * fix surface area test * fix test * VNone --> ValueNone * fix surface area * fix build * update baselines * fix baselines * fix baselines * fix baselines * fix baselines * fix build

matthid

Just some smaller things that came to mind when looking at the changes.

matthid · 2018-05-31T08:54:19Z

src/fsharp/FSharp.Core/async.fs


+        /// Save the exception continuation during propagation of an exception, or prior to raising an exception
+        member __.OnExceptionRaised (action: econt) = 
+            storedExnCont <- Some action


Shouldn't we check if the storedExnCont is already Some value and throw (which I guess indicates a programming error)?

Actually on second look we already do this for storedCont so another indication this might be a good idea.

matthid · 2018-05-31T08:56:53Z

src/fsharp/FSharp.Core/async.fs

+
+        member ctxt.WithExceptionContinuation econt = AsyncActivation<_> { contents with aux = { ctxt.aux with econt = econt } } 
+
+        member ctxt.WithContinuation(cont) = AsyncActivation<_> { cont = cont; aux = contents.aux } 


why not { contents with here?

@matthid The type of the object changes, and {a with ... } assumes the type of the expression is exactly the same as a

matthid · 2018-05-31T09:01:49Z

src/fsharp/FSharp.Core/async.fs

                if thisIsTopTrampoline then
                    Trampoline.thisThreadHasTrampoline <- false
-            FakeUnit
+            Unchecked.defaultof<AsyncReturn>


This is essentially null, correct? Should we add AllowNull attribute and use null? And add a comment that only null is used as value for this type?

This is just a question, I guess the current solution is fine - just a bit unexpected when looking at the type

matthid · 2018-05-31T09:17:49Z

src/fsharp/FSharp.Core/async.fsi

+
+    /// <summary>The F# compiler emits references to this type to implement F# async expressions.</summary>
+    [<Struct; NoEquality; NoComparison>]
+    type AsyncActivation<'T> =


Should we add some attribute here like CompilerGenerated or Obsolete or something else to indicate users should not use the type directly?

cartermp · 2018-06-04T21:02:02Z

@dsyme would you consider an RFC for this? No need to be super detailed. Just want to have the RFC repo accurately reflect major changes going into F# 4.5/FSharp.Core 4.5.0.0 here: fsharp/fslang-design#311

dsyme · 2018-06-07T00:17:04Z

@cartermp I'll write up something

dsyme · 2018-06-07T10:50:13Z

@cartermp wrote an RFC here: https://github.com/fsharp/fslang-design/blob/master/FSharp.Core-4.5.0.0/FS-1059-improve-async-stack-traces.md

dsyme and others added 21 commits January 30, 2018 17:12

very early prototype

9be5969

Merge branch 'master' of http://github.com/Microsoft/visualfsharp int…

b8412b3

…o async-st

async stack traces

41082e2

integrate master

ddf8a43

async cleanup

efd785c

minor fix

d7268ca

Merge branch 'master' of http://github.com/Microsoft/visualfsharp int…

f684806

…o async-cleanup

integrate async-cleanup

3620df5

async cleanup

4e7de49

integrate async-cleanup

73750f7

more async cleanup

a31caa7

integrate async-cleanup

bef1c38

integrate async-cleanup

60030b7

async cleanup

54a335c

integrate async-cleanup

88e68cf

fix build

2dd05d7

more cleanup

7462682

Merge branch 'async-cleanup' into async-st

6c63a26

minor fixes

683a777

integrate cleanup

841454b

minor fixes

cd2d7c7

dsyme added the Candidate for F# 4.5 label May 9, 2018

Merge branch 'master' of http://github.com/Microsoft/visualfsharp int…

adcb276

…o async-st

full exception stacktraces

660a881

eiriktsarpalis reviewed May 9, 2018

View reviewed changes

src/fsharp/FSharp.Core/control.fs Outdated Show resolved Hide resolved

dsyme added 2 commits May 10, 2018 00:42

fix test

fa9d4dc

fix test

1742cb9

dsyme added 5 commits May 11, 2018 18:11

simplify code

1b08d5a

simplify code

dc6707c

update baselines

4549b0f

update baselines

368d009

fix baseline

4c8e9f2

dsyme mentioned this pull request May 12, 2018

cleanup: separate control.fs into multiple files #4893

Merged

dsyme added 3 commits May 12, 2018 23:32

integrate master

a388e7d

remove dead code

5045d64

simplify code

b0fea33

dsyme mentioned this pull request May 13, 2018

Stepping in computation expressions steps out on let!/do! #4835

Closed

apply DebuggerHidden in a couple more places

1cd72fe

cartermp added this to the 15.8 milestone May 20, 2018

KevinRansom approved these changes May 24, 2018

View reviewed changes

KevinRansom merged commit a8e059f into dotnet:master May 24, 2018

dsyme mentioned this pull request May 30, 2018

Don't hide exceptions on the async cancellation-path #3257

Closed

matthid reviewed May 31, 2018

View reviewed changes

dsyme mentioned this pull request Jun 1, 2018

Async comments and cleanup from code review #5065

Merged

baronfel mentioned this pull request Apr 13, 2020

Enhance stack traces/debugging for CE's exposed in this project demystifyfp/FsToolkit.ErrorHandling#76

Closed

abelbraaksma mentioned this pull request May 27, 2020

Improve async stack traces #2741

Open


		member ctxt.WithExceptionContinuation econt = AsyncActivation<_> { contents with aux = { ctxt.aux with econt = econt } }

		member ctxt.WithContinuation(cont) = AsyncActivation<_> { cont = cont; aux = contents.aux }

Improve async stack traces #4867

Improve async stack traces #4867

Uh oh!

Conversation

dsyme commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem One - losing stack traces for computationally embedded languages

Problem Two - Losing stack traces for exceptions

Problem One - What can we do?

Problem Two - What can we do?

What this PR actually does

Limitations

Remaining Steps

Uh oh!

cartermp commented May 9, 2018

Uh oh!

dsyme commented May 9, 2018

Uh oh!

dsyme commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dsyme commented May 12, 2018

Uh oh!

dsyme commented May 12, 2018

Uh oh!

dsyme commented May 13, 2018

Uh oh!

dsyme commented May 13, 2018

Uh oh!

dsyme commented May 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forki commented May 13, 2018 via email • edited by dsyme Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsyme commented May 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthid left a comment

Choose a reason for hiding this comment

Uh oh!

matthid May 31, 2018

Choose a reason for hiding this comment

Uh oh!

matthid May 31, 2018

Choose a reason for hiding this comment

Uh oh!

matthid May 31, 2018

Choose a reason for hiding this comment

Uh oh!

dsyme Jun 1, 2018

Choose a reason for hiding this comment

Uh oh!

matthid May 31, 2018

Choose a reason for hiding this comment

Uh oh!

matthid May 31, 2018

Choose a reason for hiding this comment

Uh oh!

matthid May 31, 2018

Choose a reason for hiding this comment

Uh oh!

cartermp commented Jun 4, 2018

Uh oh!

dsyme commented Jun 7, 2018

Uh oh!

dsyme commented Jun 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dsyme commented May 9, 2018 •

edited

Loading

dsyme commented May 9, 2018 •

edited

Loading

dsyme commented May 13, 2018 •

edited

Loading

forki commented May 13, 2018 via email •

edited by dsyme

Loading

dsyme commented May 13, 2018 •

edited

Loading