Skip to content

Conversation

@dsyme
Copy link
Contributor

@dsyme dsyme commented May 9, 2018

The stack traces for F# async are not good see #2741. This PR starts to improve them (though doesn't solve the problem anywhere near completely)

Summary

Debugging computationally embedded languages gives bad stack traces. This PR outlines

  1. the problem
  2. a technique for improving things for most synchronous parts of computations
  3. applies the technique to Async
  4. also improves exception stack traces for the first throw of an exception

Problem One - losing stack traces for computationally embedded languages

There is a fundamental problem with debugging for pretty much all "computationally embedded DSLs" which delay computation. Many basic DSLs suffer this problem, both in functional and object programming.

For our purposes, "Computationally embedded DSLs" are ones where:

  1. The user code creates a composition of objects using helper calls from a library (e.g. async { ... })
  2. The user code then executes the computation implied by that collection of objects (e.g. calls Async.RunSynchronously).

The process of execution may dynamically create more computation objects along the way during phase2, though that's kind of irrelevant.

A very, very simple example of such a "computationally embedded language" is below. This DSL has lousy debugging in F#:

module DSL = 
    type Computation<'T> = C of (unit -> 'T)

    let bind (C f) k = C(fun () -> let (C f2) = k (f ()) in f2())

    let ret x = C (fun () -> x)

    let add c1 c2 = bind c1 (fun v1 -> bind c2 (fun v2 -> ret (v1 + v2)))

    let run (C f) = f ()

which lets you write things like:

    open DSL
    let f1() = ret 6
    let f2() = ret 7
    let f3() = add (f1()) (f2())
    f3() |> run

Now the "stack" when the ret 6 computation of f1() is actually "invoked" (i.e. when it gets invoked as part of f3() |> run) is crappy. You can see the crappy stack by augmenting the example as follows:

type Computation<'T> = C of (unit -> 'T)

let delay f = bind (ret ()) f
let stack msg c1 = 
    delay (fun () -> 
        System.Console.WriteLine("----{0}-----\n{1}\n----------", msg, System.Diagnostics.StackTrace(true).ToString())
        c1)

Here is the stack:

----f1-----
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 12
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 5
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 5
   at A.run[a](Computation`1 _arg1) in C:\GitHub\dsyme\visualfsharp\a.fs:line 13
   at A.main(String[] argv) in C:\GitHub\dsyme\visualfsharp\a.fs:line 22

----------

Barely any user code is on the stack at all! The line numbers are all the closures in the implementation of the DSL!!! Ugh!! 😞 👿

Note that C# async and other implementations of Promises/Tasks/"Hot Tasks" don't suffer this problem since they are not "delayed". But most F# DSLs (and many C# ones) are indeed "delayed". For async, it is F#'s use of "cold" asyncs that require explicit starting that causes the issue. Using "cold" asyncs brings many benefits - most particularly the implicit plumbing of cancellation tokens - but also bring problems.

Problem Two - Losing stack traces for exceptions

There is a second fundamental problem with .NET stack traces for any computationally embedded DSL that may throw .NET exceptions. This is based on this problem, where only a limited number of stack frames are included in the .StackTrace property of a .NET exception when it is thrown, up to the first catch handler.

This is a serious and surprising limitation in the CLR, and I think it would be great to fix it. For example it’s a very common pattern in functional programming to quickly turn exceptions into data, e.g. consider basic patterns like the Railway Oriented programming pattern. This might have some code like this:

/// A type very similar to this is in the FSharp.Core library
type Result<'T> = 
   | Ok of 'T 
   | Error of System.Exception

let embed f = 
    try Ok(f()) 
    with exn -> Error exn

let run f = f()

run (embed (fun () -> failwith "fail"))

Here the StackTrace data for the exception carried by the Error data will only carry a partial stack trace containing one stack frame. See the link above for why - but basically the CLR only adds stack frames up to the first try ... with handler. Ugh. From the user’s point of view that’s not what they normally need or expect. But it's what the CLR does.

There are C# equivalents to such patterns, especially with C# becoming more and more expression-oriented. They suffer the same problem.

Problem One - What can we do?

Now, what cards can we play to fix this? In general, it seems "hard". Here are the options I know of:

  1. Keep causal stacks in objects and reinstate them when the behaviour of the objects is activated.

    • This could be done if there was a way to do some kind of [<CaptureCallerStackToken>] and pass ti down. We can simulate that by capturing a System.Diagnostics.StackTrace for each and every computational object created and passing them down.

    • This could give very nice stack traces, but you would need to be able to hijack the normal debugging mechanisms and say "hey, this is the real causal stack". This can be done for exception stack traces by hacking the internals of a .NET exception object (see this blog post).

    • This is expensive. We can't normally afford to keep stacks from the creation of objects in the actual object closures.

    • In some sense, this is what C# async debugging is doing. However I don't yet understand the mechanism C# async is using or if it is reusable for other computationally embedded languages. It is possible that it is, by emitting just the right calls and attributes. That's future work, any advice or links appreciated.

    • It is also roughly what is done for rethrow exceptions using ExceptionDispatchInfo. However AFAIK this can't be done for stack traces at breakpoints.

  2. We could pass source/line number information into the process of constructing the computational objects. This was done by Eirik Tsarpalis here in order to get better exception stack traces. However this feels expensive - or at least not cost free - and also requires changing API surface area. Also, it is only helpful for stack traces produced for exceptions.

  3. The third option, and the one used here, is to "inline" the key parts of the computational language (and its composition operations) so that the key closures which implement the composition become user code. This relies on the fact that when the F# compiler inlines code, it re-tags closures contained in that code with the source information of callsite.

Here is an example of how we can play this game with the very simple computational language above:

type Computation<'T> = C of (unit -> 'T)

[<System.Diagnostics.DebuggerHidden>]
let apply (C f) = f()
let inline bind f k = C(fun () -> let f2 = k (apply f) in apply f2)
let inline ret x = C (fun () -> x)
let inline add c1 c2 = bind c1 (fun v1 -> bind c2 (fun v2 -> ret (v1 + v2)))
let inline stack msg c1 = bind (ret ()) (fun () -> System.Console.WriteLine("----{0}-----\n{1}\n----------", msg, System.Diagnostics.StackTrace(true).ToString()); c1)
let run c = apply c

The rules we've applied are:

  1. Inline the mechanisms of composition
  2. Expose any necessary "runtime" functions such as apply and mark them DebuggerHidden

The resulting stack trace (excluding DebuggerHidden methods) is

----f1-----
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 16
   at [email protected](Unit unitVar0) in C:\GitHub\dsyme\visualfsharp\a.fs:line 18
   at A.main(String[] argv) in C:\GitHub\dsyme\visualfsharp\a.fs:line 23

This is not perfect - we see the generated closure names - but at least the line numbers are right and names A.f1 and A.f3 appear. This is a significant improvement.

Problem Two - What can we do?

One way to improve the exception stacks for problem 2 is to use a "trampoline" to run the synchronous parts of computations. When an exception happens, the exception continuation (or other information required to continue execution) is written into the trampoline.

This technique works for async since we have a trampoline available.

Note that the stack trace is lost when rethrowing an exception. Sadly this means that it is lost when an implicit rethrow happens with this sort of thing:

try ... 
with :? ArgumentException as e -> ...

You are encouraged to do something like this pattern to keep good InnerException information:

try ... 
with 
    | :? ArgumentException as e -> ... 
    | otherExn -> raise (AggregateException(..., otherExn))

What this PR actually does

This PR applies technique #3 to async. The steps:

  1. The Bind composition operation in AsyncBuilder is marked inline along with a couple of others.

  2. An extra level of underlying async execution machinery is exposed in the API of FSharp.Core. The new API surface area is in control.fsi in the PR - all of the added APIs are for generated code only. All of these will be marked compiler use only but become part of the long-term binary compatible API for FSharp.Core. This is "just enough" to allow key closures to be inlined into user code, but not too much that we can't re-implement how Async is implemented at a later stage.

Limitations

  • Only improves debug user code that is compiled with tailcalls off

  • Improves the debugging experience for first throw of exceptions and the synchronous parts of asynchronous code.

  • The stack is still lost if

    1. the "trampoline" has been used, which happens every 300 executions of bind on a stack or when any exception is raised (even if it is caught)
    2. an exception is rethrown.
  • The improvements need a modified FSharp.Core.dll (i.e. F# 4.5)

Remaining Steps

  • Design question: return! asyncExpr results in the loss of a debug stack frame, it is always treated as an async tailcall even if tailcalls are off. See this comment
  • Design question: currently we don't inline TryWith, consider doing this.: Now done
  • Get it green (e.g. updates to API surface area, updates to generated code samples)
  • Double check the stepping and breakpoint behaviour of async debugging after this change.

Later:

  • See if we can apply the same technique to cancellable and eventually in the F# compiler codebase

@cartermp
Copy link
Contributor

cartermp commented May 9, 2018

See if we can apply the same technique to cancellable and eventually in the F# compiler codebase

Oh yes yes yes

@dsyme
Copy link
Contributor Author

dsyme commented May 9, 2018

I've realised as a side effect of this that our F# async exception implementation is hitting this problem badly, where only one stack trace is included in the .StackTrace property of the exception. This is partly because our F# async machinery catches the exception “early”.

If someone knows a way, somewhere deep in .NET Framework or .NET Core, by which to attach a deeper stack trace string to an exception that has already been thrown, I'd be very grateful.

We’re willing to do anything to make this happen: create a task, whatever. I’ve even considered using the reflection hack documented here to set the _stackTraceString field, which Eirik Tsarpalis has also used.

The one thing we can’t do is wrap the exception in an AggregateException – we have to throw the user’s exception.

@dsyme
Copy link
Contributor Author

dsyme commented May 9, 2018

I've realised as a side effect of this that our F# async exception implementation is hitting this problem ...

I think I've found an approach that fixes this. Essentially we let exceptions propagate all the way up to the trampoline and deal with them there. When combined with the rest of this work this should give much, much better exception stacks

For example for the code

let rec async_syncLoopViaTailCallFail(n) = async {
    if n > 10 then 
        let! res = failwith "fail"
        return res
    else
        return! async_syncLoopViaTailCallFail(n+1)
}

let async1() = async {
    System.Console.WriteLine("next")
    let! res = async_syncLoopViaTailCallFail(0)
    return "no"
}


try 
     async1() |> Async.RunSynchronously
 with e -> 
     e.StackTrace

we get exception stack trace

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24
   at Microsoft.FSharp.Control.AsyncPrimitives.Call[T,TResult](AsyncActivation`1 ctxt, TResult result1, FSharpFunc`2 part2)
   at Microsoft.FSharp.Control.AsyncPrimitives.Bind[T,TResult](Boolean keepStack, AsyncActivation`1 ctxt, FSharpAsync`1 part1, FSharpFunc`2 part2)
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 84
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 83
   at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction)
--- End of stack trace from previous location where exception was thrown ---
...

instead of

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24
   at Microsoft.FSharp.Control.AsyncPrimitives.Call[T,TResult](AsyncActivation`1 ctxt, TResult result1, FSharpFunc`2 part2f)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

If you somehow ignore DebuggerHidden stuff in those traces then you get

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 84
   at [email protected](AsyncActivation`1 ctxt) in C:\GitHub\dsyme\visualfsharp\a.fs:line 83
...

instead of just

   at [email protected](Unit unitVar) in C:\GitHub\dsyme\visualfsharp\a.fs:line 24

@dsyme
Copy link
Contributor Author

dsyme commented May 12, 2018

@dotnet-bot test this please

@dsyme
Copy link
Contributor Author

dsyme commented May 12, 2018

Yay, it's green!

@dsyme
Copy link
Contributor Author

dsyme commented May 13, 2018

Note: we need to double check the stepping and breakpoint behaviour of async debugging after this change.

@dsyme
Copy link
Contributor Author

dsyme commented May 13, 2018

Note: we need to double check the stepping and breakpoint behaviour of async debugging after this change.

I've done that now - stepping (F11) is as good as it has been before. step-over (F10) is also as good as before (though is not great or intuitive)

@dsyme
Copy link
Contributor Author

dsyme commented May 13, 2018

This is ready for further road testing

@forki
Copy link
Contributor

forki commented May 13, 2018 via email

@dsyme
Copy link
Contributor Author

dsyme commented May 13, 2018

This PR is no nirvana, but it does make some improvements. Some of the glitches remaining (even in Debug code) are

  • breakpoints are generally useful

  • locals are generally useful but maybe hidden in this or eliminated if no longer in scope.

  • stepping F11 often takes multiple stops on the same source line, e.g. the try in an async try/finally is is visited twice on the way into the async and once on the way out. There is also an extra step on the return from the try/finally

  • stepping F11 into an async function let f() = async { ... } takes one step onto the async (when the async computation is constructed) then back out then back in to execute the function.

  • stepping F11 on a async for loop hits the for multiple times. This is because the for is really a Using+TryFinally

  • step-over F10 is often counter-intuitive. For example, it doesn't step into a try/finally. F10 over a let! is counter-intuitive and not useful. Much of this is due to the interaction between the generated code, tailcalls and the remaining code in FSharp.Core.dll

  • exception stacks are improved but not when exceptions are rethrown

When I look at the code we generate for async, I can see room for many improvements, both from performance and debugging perspectives. But we should get something in the bank, so I'm going to stop here for now.

@cartermp cartermp added this to the 15.8 milestone May 20, 2018
@KevinRansom KevinRansom merged commit a8e059f into dotnet:master May 24, 2018
KevinRansom pushed a commit that referenced this pull request May 24, 2018
* Fixing #4967 (#4968)

* Fix internal error when compiling interface implementation lacking an overloaded method implementation - fixes #4967

* Adding a test

* Update E_OverloadMismatch.fs

* Remove a setify + isSingleton combo (#4980)

* remove a setify

* Remove intermediate collection

* Improve async stack traces (#4867)

* very early prototype

* async stack traces

* async cleanup

* minor fix

* async cleanup

* more async cleanup

* integrate async-cleanup

* async cleanup

* fix build

* more cleanup

* minor fixes

* minor fixes

* full exception stacktraces

* fix test

* fix test

* code review

* cleanup naming

* fix build

* undo rethrow and integrate cleanup

* apply renamings

* Further cleanup in control.fs

* add tests and add filtering TryWith, plus other cleanup

* integrate cleanup

* fix tests

* test only runs on .net framework

* slightly tweak primitives to be more suitable for later optimization

* slightly tweak primitives to be more suitable for later optimization

* update baselines

* add check that no line 0 appear in stack

* update baseline

* use struct wrapper for async activation

* simplify code

* simplify code

* update baselines

* update baselines

* fix baseline

* remove dead code

* simplify code

* apply DebuggerHidden in a couple more places

* [ RFC FS-1039] implementation of value options (#4837)

* posible implementation of value options

* fix surface area test

* fix test

* VNone --> ValueNone

* fix surface area

* fix build

* update baselines

* fix baselines

* fix baselines

* fix baselines

* fix baselines

* fix build
Copy link
Contributor

@matthid matthid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some smaller things that came to mind when looking at the changes.


/// Save the exception continuation during propagation of an exception, or prior to raising an exception
member __.OnExceptionRaised (action: econt) =
storedExnCont <- Some action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check if the storedExnCont is already Some value and throw (which I guess indicates a programming error)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on second look we already do this for storedCont so another indication this might be a good idea.


member ctxt.WithExceptionContinuation econt = AsyncActivation<_> { contents with aux = { ctxt.aux with econt = econt } }

member ctxt.WithContinuation(cont) = AsyncActivation<_> { cont = cont; aux = contents.aux }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not { contents with here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthid The type of the object changes, and {a with ... } assumes the type of the expression is exactly the same as a

if thisIsTopTrampoline then
Trampoline.thisThreadHasTrampoline <- false
FakeUnit
Unchecked.defaultof<AsyncReturn>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially null, correct? Should we add AllowNull attribute and use null? And add a comment that only null is used as value for this type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a question, I guess the current solution is fine - just a bit unexpected when looking at the type


/// <summary>The F# compiler emits references to this type to implement F# async expressions.</summary>
[<Struct; NoEquality; NoComparison>]
type AsyncActivation<'T> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some attribute here like CompilerGenerated or Obsolete or something else to indicate users should not use the type directly?

@cartermp
Copy link
Contributor

cartermp commented Jun 4, 2018

@dsyme would you consider an RFC for this? No need to be super detailed. Just want to have the RFC repo accurately reflect major changes going into F# 4.5/FSharp.Core 4.5.0.0 here: fsharp/fslang-design#311

@dsyme
Copy link
Contributor Author

dsyme commented Jun 7, 2018

@cartermp I'll write up something

@dsyme
Copy link
Contributor Author

dsyme commented Jun 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants