-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Date detection should not rely on a hardcoded set of characters. #22171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently we only apply date detection on strings that contain either `:`, `-` or `/`. This commit inverses the heuristic in order to only apply date detection on strings that are not parseable as a number, so that more date formats can be used as dynamic dates formats. Closes elastic#1694
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good.
To make the thing clear you could maybe add a test with a pattern that was not detected before and one with a pattern composed of numbers only that should be detected as a number.
| } else if (parseableAsLong == false && parseableAsDouble == false && context.root().dateDetection()) { | ||
| // We refuse to match pure numbers, which are too likely to be | ||
| // false positives with date formats that include eg. | ||
| // `epoch_millis` or `YYYY` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's difficult to detect ? If the pattern is epoch_millis or composed of number symbol only ? If it's not detectable then maybe add a small warning in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't think we can detect this confidently. I'll work on your test suggestions.
|
@jimczi Can you have another look? |
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks adding the tests.
LGTM
|
NOTE: I'm thinking of only having this change in 6.0 as this might be something surprising to get in a minor release for some of our users. Please speak up if you think I should merge to 5.2. |
|
I think it should be merged to 5.2. I don't think that people rely on the fact that some pattern are not detected with dates. Although it's for dynamic detection so this won't change the old indices but could be beneficial for new indices ? |
|
It could change old indices as well if a new field is introduced. But indeed, I would expect most old indices to already have a "stable" mapping. @clintongormley I'd like to run it by you before making a decision, what do you think? |
* master: (22 commits) Support negative numbers in writeVLong (elastic#22314) UnicastZenPing's PingingRound should prevent opening connections after being closed Add task to clean idea build directory. Make cleanIdea task invoke it. add trace logging to UnicastZenPingTests.testResolveReuseExistingNodeConnections Adds ingest processor headers to exception for unknown processor. (elastic#22315) Remove much ceremony from parsing client yaml test suites (elastic#22311) Support numeric bounds with decimal parts for long/integer/short/byte datatypes (elastic#21972) inner hits: Don't inline inner hits if the query the inner hits is inlined into can't resolve mappings and ignore_unmapped has been set to true Fix stackoverflow error on InternalNumericMetricAggregation Date detection should not rely on a hardcoded set of characters. (elastic#22171) `value_type` is useful regardless of scripting. (elastic#22160) Improve concurrency of ShardCoreKeyMap. (elastic#22316) fixed jdocs and removed already fixed norelease Adds abstract test classes for serialisation (elastic#22281) Introduce translog no-op Provide helpful error message if a plugin exists Clear static variable after suite Repeated language analyzers (elastic#22240) Restore deprecation warning for invalid match_mapping_type values (elastic#22304) Make `-0` compare less than `+0` consistently. (elastic#22173) ...
|
It has been like this since the first version, I think it is ok to wait a bit longer for 6.0 |
Currently we only apply date detection on strings that contain either
:,-or
/. This commit inverses the heuristic in order to only apply date detectionon strings that are not parseable as a number, so that more date formats can be
used as dynamic dates formats.
Closes #1694