[ML] Improve CSV header row detection in find_file_structure #45099

droberts195 · 2019-08-01T15:11:08Z

When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes #45047

When doing a fieldwise Levenshtein distance comparison between CSV rows, this change ignores all fields that have long values, not just the longest field. This approach works better for CSV formats that have multiple freeform text fields rather than just a single "message" field. Fixes elastic#45047

elasticmachine · 2019-08-01T15:11:10Z

Pinging @elastic/ml-core

When doing a fieldwise Levenshtein distance comparison between CSV rows, this change ignores all fields that have long values, not just the longest field. This approach works better for CSV formats that have multiple freeform text fields rather than just a single "message" field. Fixes #45047

droberts195 added >enhancement :ml Machine learning v8.0.0 v7.4.0 labels Aug 1, 2019

benwtrent self-requested a review August 1, 2019 15:53

benwtrent approved these changes Aug 1, 2019

View reviewed changes

droberts195 merged commit 7c43894 into elastic:master Aug 1, 2019

droberts195 deleted the improve_csv_header_detection branch August 1, 2019 19:08

codebrain mentioned this pull request Oct 14, 2019

7.4 meta ticket elastic/elasticsearch-net#4133

Closed

56 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Improve CSV header row detection in find_file_structure #45099

[ML] Improve CSV header row detection in find_file_structure #45099

Uh oh!

droberts195 commented Aug 1, 2019

Uh oh!

elasticmachine commented Aug 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ML] Improve CSV header row detection in find_file_structure #45099

[ML] Improve CSV header row detection in find_file_structure #45099

Uh oh!

Conversation

droberts195 commented Aug 1, 2019

Uh oh!

elasticmachine commented Aug 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants