-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Epoch millis and second formats accept float implicitly #26119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The coerce parameter is implicity true for the epoch millis DateFormater. It is not defined for other date formaters. This extends the current "coerce" from numbers to strings for all dates. See: elastic#14641
|
@cbuescher I think |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertzaharovits I left a small comment
|
|
||
| public void testThatFloatEpochsCanBeParsed() { | ||
|
|
||
| long millisFromEpoch = randomNonNegativeLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since dates previous to epoch 0 can still be expressed as a long and may well be encountered if the user is indexing historical data should we test negative epoch values here too?
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertzaharovits this looks great, I left a couple of smaller comments but nothing big.
Some more things:
-
since the original issue revolves around documents not being indexed when they have float values, should we add an integration test for this? DateFieldMapperTests already more or less checks that, but I think it would be good to have one round-trip test here as well
-
Not sure if we also want to parse floats with
,as decimal separator, but maybe thats overkill. Any opinions on this @colings86 -
As I understand this change now makes
"coerce" : falsenot reject any Strings any more forepoch_millisandepoch_seconds. I just want to double check that this is okay, maybe we can document this somewhere?
| public int parseInto(DateTimeParserBucket bucket, String text, int position) { | ||
| boolean isPositive = text.startsWith("-") == false; | ||
| boolean isTooLong = text.length() > estimateParsedLength(); | ||
| int firstDotIndex = text.indexOf((int)'.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we don't need the int cast here, it is done implicitely. At least my IDE removes it on "save"
| int factor = hasMilliSecondPrecision ? 1 : 1000; | ||
| try { | ||
| long millis = Long.valueOf(text) * factor; | ||
| long millis = new BigDecimal(text).longValue() * factor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, so this can handle all kinds of formats it seems.
|
|
||
| // test floats get truncated | ||
| String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong()); | ||
| assertThat(formatter.parser().parseDateTime(epochFloatValue).getMillis(), is(dateTime.getMillis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we also should support european decimal separators like ,? Maybe not, but just wanted to throw it in. I'm not sure if this complicates things too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should accept numbers as defined by javascript/JSON schema, and not formatted string representation of numbers. The issue is that we didn't accept the numbers from javascript which does not have an integer datatype. Accepting string representation of valid javascript numbers is the bonus part since all dates are parsed as strings anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| } | ||
|
|
||
| // test floats get truncated | ||
| String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a negative long in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is negative since dateTime.getMillis() is negative, the non negative long is for the fractional part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it now, thanks.
| assertEquals(mapping, mapper.mappingSource().toString()); | ||
|
|
||
| long millisFromEpoch = randomNonNegativeLong(); | ||
| String epochFloatValue = String.format(Locale.US, "%d.%d", millisFromEpoch, randomNonNegativeLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also randomly append a negative prefix to also test parsing negative values here?
|
|
||
| // date_range ignores the coerce parameter and epoch_millis date format truncates floats (see issue: #14641) | ||
| if (type.equals("date_range")) { | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe just personal preference, but early returns in test look strange to me. Can you change this to execute the rest of the test only for type.equals("date_range") == false)
| assertThat(searchResponse.getHits().getTotalHits(), equalTo(3L)); | ||
| buckets = checkBuckets(searchResponse.getAggregations().get("date_range"), "date_range", 2); | ||
| assertBucket(buckets.get(0), 2L, "1000-3000", 1000000L, 3000000L); | ||
| assertBucket(buckets.get(1), 1L, "3000-4000", 3000000L, 4000000L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great this works
|
@albertzaharovits thanks, those recent changes look good to me. That leaves the question about whether we should document the changed behaviour around |
|
@cbuescher I am not sure what you mean by:
|
Thanks, thats what I was missing. So the existing behaviour is that |
|
That's correct, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM. I think CI might be still failing because of unrelated problems, had the same yesterday. Maybe rebasing or merging in master helps to get a clean build. Also I don't know if you want to wait for @colings86 to have another look, I'm good.
Thanks a lot for this change.
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: (30 commits) Rewrite range queries with open bounds to exists query (elastic#26160) Fix eclipse compilation problem (elastic#26170) Epoch millis and second formats parse float implicitly (Closes elastic#14641) (elastic#26119) fix SplitProcessor targetField test (elastic#26178) Fixed typo in README.textile (elastic#26168) Fix incorrect class name in deleteByQuery docs (elastic#26151) Move more token filters to analysis-common module reindex: automatically choose the number of slices (elastic#26030) Fix serialization of the `_all` field. (elastic#26143) percolator: Hint what clauses are important in a conjunction query based on fields Remove unused Netty-related settings (elastic#26161) Remove SimpleQueryStringIT#testPhraseQueryOnFieldWithNoPositions. Tests: reenable ShardReduceIT#testIpRange. Allow `ClusterState.Custom` to be created on initial cluster states (elastic#26144) Teach the build about betas and rcs (elastic#26066) Fix wrong header level inner hits: Unfiltered nested source should keep its full path Document how to import Lucene Snapshot libs when elasticsearch clients (elastic#26113) Use `global_ordinals_hash` execution mode when sorting by sub aggregations. (elastic#26014) Make the README use a single type in examples. (elastic#26098) ...
All floats parsed by epoch_millis/second date formatter get truncated, either as strings or as numbers - coerce behavior. This builds on the existing behavior of parsing all dates to strings.
In this way there is no 'coerce parameter' for the DateFieldMapper. 'Coerce parameter' remains valid only for numeric data types.
The coerce behavior is implicitly enabled for a specific Formatter only, i.e. epoch_*.
A coerce parameter at the DateFieldMapper level cannot be defined irrespective of the date format because of conflicts, e.g. basic_time and epoch_second as float.
Closes: #14641