Improve error reporting: Better heuristics for identifying the code line with a type error

This may be too large in scope for #1103, but I believe it's worth considering.

Thomas Wies gave a presentation at Compose Conference 2016 titled "Improving Type Error Localization for Languages with Type Inference". The [presentation video](https://www.youtube.com/watch?v=BMT6MZ5zuvw) is about 50 minutes long, and the [corresponding paper](http://cs.nyu.edu/wies/publ/finding_minimum_type_error_sources.pdf) is 18 pages long counting the bibliography, so I'll try my best to summarize them. I certainly won't be able to do it justice, though, so watching the 50-minute video is recommended at some point. (Beware: there's an audio problem at 24:54 in that video, in which a beep that lasts for about 40 seconds is accidentally superimposed over the audo. Turn your headphone volume down before that point to spare your ears; at 25:35 the beep goes away and Thomas resumes his presentation).
## What
#### What - an error message that could be improved, and a code example to trigger it

The code example that Thomas Wies starts discussing around 10:45 of the video (the slide comes on a few seconds later), adapted to something that will compile in F#, is basically the following:

``` fsharp
type Point = float * float
let example2 (lst:Point list): Point list =
    // ... a real example would have some more code here ...
    let rec loop lst x y dir acc =
        if lst = [] then
            acc
        else
            printfn "Debugging message"
    loop lst 0.0 0.0 0.0 [(0.0,0.0)] |> List.rev
```

The compiler produces two error messages from this code. The first error is on the `[(0.0,0.0)]` argument when `loop` is called, and the error message is:

```
This expression was expected to have type
    unit    
but here has type
    'a list
```
## Why
#### Why the error message is confusing

An F# newbie who wrote that code would be confused by that error message. This value is getting passed in as the `acc` parameter, which he intended to be treated as a list of Points. Why would `acc` be considered to have the `unit` type? He doesn't have any `()` values anywhere in his function, so where is that coming from? (An experienced programmer would, of course, remember that `printfn` returns `unit`, but an F# newbie probably still thinks of `printfn` as a "statement" and doesn't realize that his `printfn` invocation is causing the return type of `loop` to be considered `unit`.)

Further, there is other information in the code that might allow the compiler to infer that the `acc` parameter, and the return type of `loop`, are both intended to be a list. The result of `loop` is piped into `List.rev`, suggesting that it should have been a list. And passing `[(0.0,0.0)]` to `loop` as the value of `acc` also suggests that the programmer's intent is to use a list.
## How
#### How the error message could be improved

Thomas Wies's presentation suggests a weight-based heuristic, whose details I won't go into here in order to save space. But essentially, every place where the types are mismatched should be looked at, and a weight assigned to each. Then at each place, the compiler would consider "Here the expression has type A, but elsewhere it has type B. What if this expression had had type B instead -- would that cause the type mismatch to go away?" And of all the possible "switch type A to type B" or "switch type B to type A" conversions, the one with the smallest weight (which would usually mean the one that involved the fewest type changes to the code) would be chosen as the error message.

In this code, that heuristic would identify several places in the code where there's a type mismatch between `'a list` (or in some cases `Point list`) and `unit`:
1. The `acc` parameter of `loop`
2. The `then` branch where `acc` is returned from `loop`, thereby constraining `loop`'s return type to be the same as `acc`
3. The `else` branch where the return value of `printfn` is returned from `loop`, thereby constraining `loop`'s return type to be the same as what `printfn` returns
4. The place where `[(0.0,0.0)]` is passed to `loop` as the value of the `acc` parameter in this function call
5. The place where `List.rev` is called with the return value of `loop`

The compiler can infer that all five of these locations must have the same type `'T`. Locations 1 and 2 do not constrain `'T`. Location 3 constrains `'T` to be `unit`, location 4 constrains it to be `Point list` (or, more accurately, `(float * float) list`, but I'll call it `Point list` from now on), and location 5 constrains it to be `'a list`. Following a basic weight-based heuristic, we could assign a weight of 0 to locations 1 and 2 since they impose no constraints, and a weight of 1 to the three other locations. Then the compiler would consider each option in turn:
- Changing the type at location 3 to be `Point list` would satisfy all constraints, since `'a list` is compatible with `Point list`. So this change gets a score of 1.
- Changing the type at location 4 to be `unit` and changing no other locations would not be enough to satisfy all constraints.
- Changing the type at location 5 to be `unit` and changing no other locations would not be enough to satisfy all constraints.
- Changing the type at locations 4 and 5 to be `unit` would satisfy all constraints. That involves making two changes at locations with weight 1, so this change gets a score of 2.

The lowest-score change that would satisfy all constraints is to change the type at location 3, so that is the one the compiler would base the error message on. And the error message would then identify the expression `printfn "Debugging message"` as the source of the error, and give the error message:

```
This expression was expected to have type
    Point list    
but here has type
    unit
```
## Drawbacks
#### Why we might not want to do this

This seems like it would be a non-trivial change to implement in the compiler -- certainly it seems it would require more work than most of the other suggestions from #1103. Although 39:12 in Thomas Weis's presentation, he starts talking about how this idea could be implemented, and he asserts that "it doesn't require all that much work to actually build this into a compiler." So maybe this would be feasible to implement -- if not in F# 4.1, then perhaps in 4.2 or 5.0. And if this could make it into F# 4.1, it would certainly make for a much friendlier experience for newbies.

So I believe this idea is at least worth considering: watch Wies's presentation, read his paper, and decide whether the benefit would be worth the work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve error reporting: Better heuristics for identifying the code line with a type error #1384

What

What - an error message that could be improved, and a code example to trigger it

Why

Why the error message is confusing

How

How the error message could be improved

Drawbacks

Why we might not want to do this

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve error reporting: Better heuristics for identifying the code line with a type error #1384

Description

What

What - an error message that could be improved, and a code example to trigger it

Why

Why the error message is confusing

How

How the error message could be improved

Drawbacks

Why we might not want to do this

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions