Skip to content

Conversation

@scampi
Copy link
Contributor

@scampi scampi commented Dec 5, 2016

close #21600

@scampi
Copy link
Contributor Author

scampi commented Dec 5, 2016

I have a question about the code. The {byte,short} rangeQuery methods rely on the INTEGER implementation of rangeQuery. Therefore, the call to parse uses the INTEGER.parse implementation, not the BYTE.parse one. Is this normal ?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the call to parse uses the INTEGER.parse implementation, not the BYTE.parse one. Is this normal?

Yes: we do not optimize storage for bytes or shorts at the moment, so it is fine to share the same code as far as queries are concerned. The indexing code is different however since we want to fail when someone indexes eg. a large integer in a byte field.

If I am not mistaken, a side effect of this pull request is that searching a decimal value on an integer field used to raise an error while it would not silently round the value down. I am fine with not throwing an error, but could we make sure to create a query that does not match anything if the decimal part is not null rather than silently rounding down? I think both term and terms are affected.

double doubleValue = ((Number) number).doubleValue();
return doubleValue % 1 != 0;
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to return false otherwise? Maybe return Double.parseDouble(number) % 1 != 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If number is a string like 1.1 then the parsing would fail with java.lang.NumberFormatException: For input string: "1.1". This is already the case in the existing parse methods. So I think it should be fine to return false here.

}
if (doubleValue % 1 != 0) {
throw new IllegalArgumentException("Value [" + value + "] has a decimal part");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like an exception to be thrown if this is called from parseCreateField. Maybe add a boolean coerce parameter (consistent with the rest of this class) and only perform this check if coerce is false?

@jpountz jpountz self-assigned this Dec 5, 2016
@scampi scampi force-pushed the range-query-decimal-part branch from ce43e31 to 19cdc1b Compare December 7, 2016 17:16
@scampi
Copy link
Contributor Author

scampi commented Dec 7, 2016

@jpountz Thanks for the explanation of the rangeQuery.
Ready for another look!

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lag, I just had anothen look at your changes. I think it does not work with negative values, since your code assumes that calling longValue() on a decimal rounds down, while it actually rounds up for negative values.

float[] v = new float[values.size()];
for (int i = 0; i < values.size(); ++i) {
v[i] = parse(values.get(i));
v[i] = (float) parse(values.get(i), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cast looks unnecessary?

@jpountz
Copy link
Contributor

jpountz commented Dec 14, 2016

I have a recreation if you want to look into it:

PUT index 
{
  "mappings": {
    "type": {
      "properties": {
        "field": {
          "type": "integer"
        }
      }
    }
  }
}

PUT index/type/1
{
  "field": -3
}

GET index/_search
{
  "query": {
    "range": {
      "field": {
        "gte": -3.5,
        "lte": -2.5
      }
    }
  }
}

GET index/_search
{
  "query": {
    "range": {
      "field": {
        "gte": -4.5,
        "lte": -3.5
      }
    }
  }
}

The document matches the second query but not the first one.


int[] v = new int[nonDecimalValues.size()];
for (int i = 0; i < nonDecimalValues.size(); ++i) {
v[i] = parse(nonDecimalValues.get(i), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could avoid generating intermediate garbage because of boxed objects by directly creating the primitive array, eg. something like:

int[] v = new int[values.size()];
int upTo = 0;
for (Object value : values) {
  if (hasDecimalPart(value) == false) {
    v[upTo++] = parse(value, true);
  }
}
if (upTo != v.length) {
  v = Arrays.copyOf(v, upTo);
}

@scampi
Copy link
Contributor Author

scampi commented Dec 18, 2016

@jpountz thanks for the review! I have resolved your comments in the last two PRs. Ready for another round

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to terms queries look good, but I think there are still issues with range query generation.

MappedFieldType ftInt = new NumberFieldMapper.NumberFieldType(NumberType.INTEGER);
ftInt.setName("field");
ftInt.setIndexOptions(IndexOptions.DOCS);
assertEquals(IntPoint.newRangeQuery("field", -3, -2), ftInt.rangeQuery(-3.5, -2.5, true, true, null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I may be a bit confused, but if the range is [-3.5, -2.5] then -2 should not match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about that, i'll correct this in next commit

// - if the bound is negative then we leave it as is:
// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((includeLower == false && !lowerTermHasDecimalPart) || (lowerTermHasDecimalPart && l > 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this approach cannot work since -0.5 and +0.5 both parse to 0 when coerce is true, so you have no way to know whether the original value was positive or negative?

@scampi
Copy link
Contributor Author

scampi commented Dec 21, 2016

@jpountz I corrected those two issues you pointed out.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I left some comments about formatting if you don't mind applying them and then I'll merge. Thanks @scampi!

// - if the bound is negative then we leave it as is:
// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((!lowerTermHasDecimalPart && includeLower == false) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you either use ! in both cases or == false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(others tend to have a preference for == false so I'd recommend using that)

// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((!lowerTermHasDecimalPart && includeLower == false) ||
(lowerTermHasDecimalPart && signum(lowerTerm) > 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you indent one level more so that it is easier to figure out what is part of the if statement and what is part of the inner block?

u = parse(upperTerm, true);
boolean upperTermHasDecimalPart = hasDecimalPart(upperTerm);
if ((!upperTermHasDecimalPart && includeUpper == false) ||
(upperTermHasDecimalPart && signum(upperTerm) < 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

// if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue
boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm);
if ((!lowerTermHasDecimalPart && includeLower == false) ||
(lowerTermHasDecimalPart && signum(lowerTerm) > 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

u = parse(upperTerm, true);
boolean upperTermHasDecimalPart = hasDecimalPart(upperTerm);
if ((!upperTermHasDecimalPart && includeUpper == false) ||
(upperTermHasDecimalPart && signum(upperTerm) < 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

/**
* Returns -1, 0, or 1 if the value is lower than, equal to, or greater than 0
*/
protected double signum(Object value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make those two methods private, I don't think we need to extend them?

@scampi
Copy link
Contributor Author

scampi commented Dec 21, 2016

@jpountz I have addressed your comments, thanks for the review ;o)

@jpountz jpountz merged commit e1b8528 into elastic:master Dec 22, 2016
@scampi scampi deleted the range-query-decimal-part branch December 22, 2016 14:54
jpountz pushed a commit that referenced this pull request Dec 22, 2016
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 22, 2016
* master: (22 commits)
  Support negative numbers in writeVLong (elastic#22314)
  UnicastZenPing's PingingRound should prevent opening connections after being closed
  Add task to clean idea build directory. Make cleanIdea task invoke it.
  add trace logging to UnicastZenPingTests.testResolveReuseExistingNodeConnections
  Adds ingest processor headers to exception for unknown processor. (elastic#22315)
  Remove much ceremony from parsing client yaml test suites (elastic#22311)
  Support numeric bounds with decimal parts for long/integer/short/byte datatypes (elastic#21972)
  inner hits: Don't inline inner hits if the query the inner hits is inlined into can't resolve mappings and ignore_unmapped has been set to true
  Fix stackoverflow error on InternalNumericMetricAggregation
  Date detection should not rely on a hardcoded set of characters. (elastic#22171)
  `value_type` is useful regardless of scripting. (elastic#22160)
  Improve concurrency of ShardCoreKeyMap. (elastic#22316)
  fixed jdocs and removed already fixed norelease
  Adds abstract test classes for serialisation (elastic#22281)
  Introduce translog no-op
  Provide helpful error message if a plugin exists
  Clear static variable after suite
  Repeated language analyzers (elastic#22240)
  Restore deprecation warning for invalid match_mapping_type values (elastic#22304)
  Make `-0` compare less than `+0` consistently. (elastic#22173)
  ...
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories v5.2.0 v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

inconsistent behaviour regarding numeric range queries and aggregations

3 participants