Skip to content

Conversation

@albertzaharovits
Copy link
Contributor

All floats parsed by epoch_millis/second date formatter get truncated, either as strings or as numbers - coerce behavior. This builds on the existing behavior of parsing all dates to strings.
In this way there is no 'coerce parameter' for the DateFieldMapper. 'Coerce parameter' remains valid only for numeric data types.
The coerce behavior is implicitly enabled for a specific Formatter only, i.e. epoch_*.
A coerce parameter at the DateFieldMapper level cannot be defined irrespective of the date format because of conflicts, e.g. basic_time and epoch_second as float.

Closes: #14641

The coerce parameter is implicity true for the epoch
millis DateFormater. It is not defined for other date formaters.
This extends the current "coerce" from numbers to strings for all dates.

See: elastic#14641
@albertzaharovits
Copy link
Contributor Author

@cbuescher I think from/to as double from org.elasticsearch.search.aggregations.bucket.range.RangeAggregator can be removed, since all dates can now be parsed as strings, plus this avoids the conversion from double to long. What do you think?

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertzaharovits I left a small comment


public void testThatFloatEpochsCanBeParsed() {

long millisFromEpoch = randomNonNegativeLong();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since dates previous to epoch 0 can still be expressed as a long and may well be encountered if the user is indexing historical data should we test negative epoch values here too?

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertzaharovits this looks great, I left a couple of smaller comments but nothing big.

Some more things:

  • since the original issue revolves around documents not being indexed when they have float values, should we add an integration test for this? DateFieldMapperTests already more or less checks that, but I think it would be good to have one round-trip test here as well

  • Not sure if we also want to parse floats with , as decimal separator, but maybe thats overkill. Any opinions on this @colings86

  • As I understand this change now makes "coerce" : false not reject any Strings any more for epoch_millis and epoch_seconds. I just want to double check that this is okay, maybe we can document this somewhere?

public int parseInto(DateTimeParserBucket bucket, String text, int position) {
boolean isPositive = text.startsWith("-") == false;
boolean isTooLong = text.length() > estimateParsedLength();
int firstDotIndex = text.indexOf((int)'.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we don't need the int cast here, it is done implicitely. At least my IDE removes it on "save"

int factor = hasMilliSecondPrecision ? 1 : 1000;
try {
long millis = Long.valueOf(text) * factor;
long millis = new BigDecimal(text).longValue() * factor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so this can handle all kinds of formats it seems.


// test floats get truncated
String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong());
assertThat(formatter.parser().parseDateTime(epochFloatValue).getMillis(), is(dateTime.getMillis()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we also should support european decimal separators like ,? Maybe not, but just wanted to throw it in. I'm not sure if this complicates things too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should accept numbers as defined by javascript/JSON schema, and not formatted string representation of numbers. The issue is that we didn't accept the numbers from javascript which does not have an integer datatype. Accepting string representation of valid javascript numbers is the bonus part since all dates are parsed as strings anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}

// test floats get truncated
String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a negative long in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is negative since dateTime.getMillis() is negative, the non negative long is for the fractional part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it now, thanks.

assertEquals(mapping, mapper.mappingSource().toString());

long millisFromEpoch = randomNonNegativeLong();
String epochFloatValue = String.format(Locale.US, "%d.%d", millisFromEpoch, randomNonNegativeLong());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also randomly append a negative prefix to also test parsing negative values here?


// date_range ignores the coerce parameter and epoch_millis date format truncates floats (see issue: #14641)
if (type.equals("date_range")) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just personal preference, but early returns in test look strange to me. Can you change this to execute the rest of the test only for type.equals("date_range") == false)

assertThat(searchResponse.getHits().getTotalHits(), equalTo(3L));
buckets = checkBuckets(searchResponse.getAggregations().get("date_range"), "date_range", 2);
assertBucket(buckets.get(0), 2L, "1000-3000", 1000000L, 3000000L);
assertBucket(buckets.get(1), 1L, "3000-4000", 3000000L, 4000000L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great this works

@albertzaharovits albertzaharovits changed the title Epoch millis and second formats accept float implicitly Epoch millis and second formats accept float implicitly (Closes #14641) Aug 10, 2017
@cbuescher
Copy link
Member

@albertzaharovits thanks, those recent changes look good to me. That leaves the question about whether we should document the changed behaviour around "coerce" : false. Maybe @colings86 also wants to take another look at this?

@albertzaharovits
Copy link
Contributor Author

@cbuescher I am not sure what you mean by:

"coerce" : false not reject any Strings any more for epoch_millis and epoch_seconds

coerce parameter is invalid for date field type and is ignored in aggregations.
Any String that is a valid number is acceptable, other strings are considered malformed.

@cbuescher
Copy link
Member

coerce parameter is invalid for date field type and is ignored in aggregations

Thanks, thats what I was missing. So the existing behaviour is that coerce doesn't work with the date datatype? I wasn't really sure from reading the coerce docs.

@albertzaharovits
Copy link
Contributor Author

albertzaharovits commented Aug 10, 2017

That's correct, coerce does not work with date and this is not changing.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM. I think CI might be still failing because of unrelated problems, had the same yesterday. Maybe rebasing or merging in master helps to get a clean build. Also I don't know if you want to wait for @colings86 to have another look, I'm good.
Thanks a lot for this change.

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@albertzaharovits albertzaharovits merged commit 3e3132f into elastic:master Aug 13, 2017
albertzaharovits added a commit that referenced this pull request Aug 13, 2017
#26119)

`epoch_millis` and `epoch_second` date formats truncate float values, as numbers or as strings.
The `coerce` parameter is not defined for `date` field type and this is not changing.
See PR #26119

Closes #14641
albertzaharovits added a commit that referenced this pull request Aug 13, 2017
#26119)

`epoch_millis` and `epoch_second` date formats truncate float values, as numbers or as strings.
The `coerce` parameter is not defined for `date` field type and this is not changing.
See PR #26119

Closes #14641
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Aug 14, 2017
* master: (30 commits)
  Rewrite range queries with open bounds to exists query (elastic#26160)
  Fix eclipse compilation problem (elastic#26170)
  Epoch millis and second formats parse float implicitly (Closes elastic#14641) (elastic#26119)
  fix SplitProcessor targetField test (elastic#26178)
  Fixed typo in README.textile (elastic#26168)
  Fix incorrect class name in deleteByQuery docs (elastic#26151)
  Move more token filters to analysis-common module
  reindex: automatically choose the number of slices (elastic#26030)
  Fix serialization of the `_all` field. (elastic#26143)
  percolator: Hint what clauses are important in a conjunction query based on fields
  Remove unused Netty-related settings (elastic#26161)
  Remove SimpleQueryStringIT#testPhraseQueryOnFieldWithNoPositions.
  Tests: reenable ShardReduceIT#testIpRange.
  Allow `ClusterState.Custom` to be created on initial cluster states (elastic#26144)
  Teach the build about betas and rcs (elastic#26066)
  Fix wrong header level
  inner hits: Unfiltered nested source should keep its full path
  Document how to import Lucene Snapshot libs when elasticsearch clients (elastic#26113)
  Use `global_ordinals_hash` execution mode when sorting by sub aggregations. (elastic#26014)
  Make the README use a single type in examples. (elastic#26098)
  ...
@clintongormley clintongormley changed the title Epoch millis and second formats accept float implicitly (Closes #14641) Epoch millis and second formats accept float implicitly Aug 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants