Skip to content

Improve error reporting: Better heuristics for identifying the code line with a type error #1384

@rmunn

Description

@rmunn

This may be too large in scope for #1103, but I believe it's worth considering.

Thomas Wies gave a presentation at Compose Conference 2016 titled "Improving Type Error Localization for Languages with Type Inference". The presentation video is about 50 minutes long, and the corresponding paper is 18 pages long counting the bibliography, so I'll try my best to summarize them. I certainly won't be able to do it justice, though, so watching the 50-minute video is recommended at some point. (Beware: there's an audio problem at 24:54 in that video, in which a beep that lasts for about 40 seconds is accidentally superimposed over the audo. Turn your headphone volume down before that point to spare your ears; at 25:35 the beep goes away and Thomas resumes his presentation).

What

What - an error message that could be improved, and a code example to trigger it

The code example that Thomas Wies starts discussing around 10:45 of the video (the slide comes on a few seconds later), adapted to something that will compile in F#, is basically the following:

type Point = float * float
let example2 (lst:Point list): Point list =
    // ... a real example would have some more code here ...
    let rec loop lst x y dir acc =
        if lst = [] then
            acc
        else
            printfn "Debugging message"
    loop lst 0.0 0.0 0.0 [(0.0,0.0)] |> List.rev

The compiler produces two error messages from this code. The first error is on the [(0.0,0.0)] argument when loop is called, and the error message is:

This expression was expected to have type
    unit    
but here has type
    'a list

Why

Why the error message is confusing

An F# newbie who wrote that code would be confused by that error message. This value is getting passed in as the acc parameter, which he intended to be treated as a list of Points. Why would acc be considered to have the unit type? He doesn't have any () values anywhere in his function, so where is that coming from? (An experienced programmer would, of course, remember that printfn returns unit, but an F# newbie probably still thinks of printfn as a "statement" and doesn't realize that his printfn invocation is causing the return type of loop to be considered unit.)

Further, there is other information in the code that might allow the compiler to infer that the acc parameter, and the return type of loop, are both intended to be a list. The result of loop is piped into List.rev, suggesting that it should have been a list. And passing [(0.0,0.0)] to loop as the value of acc also suggests that the programmer's intent is to use a list.

How

How the error message could be improved

Thomas Wies's presentation suggests a weight-based heuristic, whose details I won't go into here in order to save space. But essentially, every place where the types are mismatched should be looked at, and a weight assigned to each. Then at each place, the compiler would consider "Here the expression has type A, but elsewhere it has type B. What if this expression had had type B instead -- would that cause the type mismatch to go away?" And of all the possible "switch type A to type B" or "switch type B to type A" conversions, the one with the smallest weight (which would usually mean the one that involved the fewest type changes to the code) would be chosen as the error message.

In this code, that heuristic would identify several places in the code where there's a type mismatch between 'a list (or in some cases Point list) and unit:

  1. The acc parameter of loop
  2. The then branch where acc is returned from loop, thereby constraining loop's return type to be the same as acc
  3. The else branch where the return value of printfn is returned from loop, thereby constraining loop's return type to be the same as what printfn returns
  4. The place where [(0.0,0.0)] is passed to loop as the value of the acc parameter in this function call
  5. The place where List.rev is called with the return value of loop

The compiler can infer that all five of these locations must have the same type 'T. Locations 1 and 2 do not constrain 'T. Location 3 constrains 'T to be unit, location 4 constrains it to be Point list (or, more accurately, (float * float) list, but I'll call it Point list from now on), and location 5 constrains it to be 'a list. Following a basic weight-based heuristic, we could assign a weight of 0 to locations 1 and 2 since they impose no constraints, and a weight of 1 to the three other locations. Then the compiler would consider each option in turn:

  • Changing the type at location 3 to be Point list would satisfy all constraints, since 'a list is compatible with Point list. So this change gets a score of 1.
  • Changing the type at location 4 to be unit and changing no other locations would not be enough to satisfy all constraints.
  • Changing the type at location 5 to be unit and changing no other locations would not be enough to satisfy all constraints.
  • Changing the type at locations 4 and 5 to be unit would satisfy all constraints. That involves making two changes at locations with weight 1, so this change gets a score of 2.

The lowest-score change that would satisfy all constraints is to change the type at location 3, so that is the one the compiler would base the error message on. And the error message would then identify the expression printfn "Debugging message" as the source of the error, and give the error message:

This expression was expected to have type
    Point list    
but here has type
    unit

Drawbacks

Why we might not want to do this

This seems like it would be a non-trivial change to implement in the compiler -- certainly it seems it would require more work than most of the other suggestions from #1103. Although 39:12 in Thomas Weis's presentation, he starts talking about how this idea could be implemented, and he asserts that "it doesn't require all that much work to actually build this into a compiler." So maybe this would be feasible to implement -- if not in F# 4.1, then perhaps in 4.2 or 5.0. And if this could make it into F# 4.1, it would certainly make for a much friendlier experience for newbies.

So I believe this idea is at least worth considering: watch Wies's presentation, read his paper, and decide whether the benefit would be worth the work.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    In Progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions