Skip to content

JaroWinkler string distance algorithm allocates 1.11 GB and spends 4.8% of CPU time generating suggestions that are rarely used #6044

@cartermp

Description

@cartermp

This was found by:

  • Using VF# solution in VS 2019 Preview
  • Working in service.fs
  • Noticing memory usage was high
  • Profiling with good 'ole PerfView
  • Typing a bit, invoking some features, etc.

GC Rollup
image

Using a 50 second sample of the full trace:

Allocations
image

System.Char[] allocations
image

CPU time
image

There are two problems at play here:

  1. We send all identifiers the compiler knows about through this algorithm when generating a set of suggested names when there is a compile error
  2. The routine itself allocates a lot of char[]s and generally does a lot of work: https://github.com/Microsoft/visualfsharp/blob/master/src/utils/EditDistance.fs#L29

Taking a brief look, it doesn't seem trivial to make this faster or allocate less. It's also not clear how we could generate useful suggestions for people if we cannot scan all known identifiers.

That said, we should find something to do here. It comes as a result of typing in the IDE. Because this is done a lot (duh!), especially for slower typers, improving this would be useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions