Skip to content

Conversation

@droberts195
Copy link

When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes #45047

When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes elastic#45047
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent benwtrent self-requested a review August 1, 2019 15:53
@droberts195 droberts195 merged commit 7c43894 into elastic:master Aug 1, 2019
@droberts195 droberts195 deleted the improve_csv_header_detection branch August 1, 2019 19:08
droberts195 pushed a commit that referenced this pull request Aug 2, 2019
When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes #45047
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] find_file_structure not detecting CSV header with many long and highly variable field values

4 participants