-
Notifications
You must be signed in to change notification settings - Fork 25.6k
support numeric bounds with decimal parts for long/integer/short/byte datatypes #21972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have a question about the code. The {byte,short} rangeQuery methods rely on the |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the call to parse uses the INTEGER.parse implementation, not the BYTE.parse one. Is this normal?
Yes: we do not optimize storage for bytes or shorts at the moment, so it is fine to share the same code as far as queries are concerned. The indexing code is different however since we want to fail when someone indexes eg. a large integer in a byte field.
If I am not mistaken, a side effect of this pull request is that searching a decimal value on an integer field used to raise an error while it would not silently round the value down. I am fine with not throwing an error, but could we make sure to create a query that does not match anything if the decimal part is not null rather than silently rounding down? I think both term and terms are affected.
| double doubleValue = ((Number) number).doubleValue(); | ||
| return doubleValue % 1 != 0; | ||
| } | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to return false otherwise? Maybe return Double.parseDouble(number) % 1 != 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If number is a string like 1.1 then the parsing would fail with java.lang.NumberFormatException: For input string: "1.1". This is already the case in the existing parse methods. So I think it should be fine to return false here.
| } | ||
| if (doubleValue % 1 != 0) { | ||
| throw new IllegalArgumentException("Value [" + value + "] has a decimal part"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like an exception to be thrown if this is called from parseCreateField. Maybe add a boolean coerce parameter (consistent with the rest of this class) and only perform this check if coerce is false?
ce43e31 to
19cdc1b
Compare
|
@jpountz Thanks for the explanation of the |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the lag, I just had anothen look at your changes. I think it does not work with negative values, since your code assumes that calling longValue() on a decimal rounds down, while it actually rounds up for negative values.
| float[] v = new float[values.size()]; | ||
| for (int i = 0; i < values.size(); ++i) { | ||
| v[i] = parse(values.get(i)); | ||
| v[i] = (float) parse(values.get(i), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cast looks unnecessary?
|
I have a recreation if you want to look into it: The document matches the second query but not the first one. |
|
|
||
| int[] v = new int[nonDecimalValues.size()]; | ||
| for (int i = 0; i < nonDecimalValues.size(); ++i) { | ||
| v[i] = parse(nonDecimalValues.get(i), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could avoid generating intermediate garbage because of boxed objects by directly creating the primitive array, eg. something like:
int[] v = new int[values.size()];
int upTo = 0;
for (Object value : values) {
if (hasDecimalPart(value) == false) {
v[upTo++] = parse(value, true);
}
}
if (upTo != v.length) {
v = Arrays.copyOf(v, upTo);
}|
@jpountz thanks for the review! I have resolved your comments in the last two PRs. Ready for another round |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to terms queries look good, but I think there are still issues with range query generation.
| MappedFieldType ftInt = new NumberFieldMapper.NumberFieldType(NumberType.INTEGER); | ||
| ftInt.setName("field"); | ||
| ftInt.setIndexOptions(IndexOptions.DOCS); | ||
| assertEquals(IntPoint.newRangeQuery("field", -3, -2), ftInt.rangeQuery(-3.5, -2.5, true, true, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may be a bit confused, but if the range is [-3.5, -2.5] then -2 should not match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about that, i'll correct this in next commit
| // - if the bound is negative then we leave it as is: | ||
| // if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue | ||
| boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm); | ||
| if ((includeLower == false && !lowerTermHasDecimalPart) || (lowerTermHasDecimalPart && l > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this approach cannot work since -0.5 and +0.5 both parse to 0 when coerce is true, so you have no way to know whether the original value was positive or negative?
|
@jpountz I corrected those two issues you pointed out. |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. I left some comments about formatting if you don't mind applying them and then I'll merge. Thanks @scampi!
| // - if the bound is negative then we leave it as is: | ||
| // if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue | ||
| boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm); | ||
| if ((!lowerTermHasDecimalPart && includeLower == false) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you either use ! in both cases or == false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(others tend to have a preference for == false so I'd recommend using that)
| // if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue | ||
| boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm); | ||
| if ((!lowerTermHasDecimalPart && includeLower == false) || | ||
| (lowerTermHasDecimalPart && signum(lowerTerm) > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you indent one level more so that it is easier to figure out what is part of the if statement and what is part of the inner block?
| u = parse(upperTerm, true); | ||
| boolean upperTermHasDecimalPart = hasDecimalPart(upperTerm); | ||
| if ((!upperTermHasDecimalPart && includeUpper == false) || | ||
| (upperTermHasDecimalPart && signum(upperTerm) < 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| // if lowerTerm=-1.5 then the (inclusive) bound becomes -1 due to the call to longValue | ||
| boolean lowerTermHasDecimalPart = hasDecimalPart(lowerTerm); | ||
| if ((!lowerTermHasDecimalPart && includeLower == false) || | ||
| (lowerTermHasDecimalPart && signum(lowerTerm) > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| u = parse(upperTerm, true); | ||
| boolean upperTermHasDecimalPart = hasDecimalPart(upperTerm); | ||
| if ((!upperTermHasDecimalPart && includeUpper == false) || | ||
| (upperTermHasDecimalPart && signum(upperTerm) < 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| /** | ||
| * Returns -1, 0, or 1 if the value is lower than, equal to, or greater than 0 | ||
| */ | ||
| protected double signum(Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make those two methods private, I don't think we need to extend them?
|
@jpountz I have addressed your comments, thanks for the review ;o) |
* master: (22 commits) Support negative numbers in writeVLong (elastic#22314) UnicastZenPing's PingingRound should prevent opening connections after being closed Add task to clean idea build directory. Make cleanIdea task invoke it. add trace logging to UnicastZenPingTests.testResolveReuseExistingNodeConnections Adds ingest processor headers to exception for unknown processor. (elastic#22315) Remove much ceremony from parsing client yaml test suites (elastic#22311) Support numeric bounds with decimal parts for long/integer/short/byte datatypes (elastic#21972) inner hits: Don't inline inner hits if the query the inner hits is inlined into can't resolve mappings and ignore_unmapped has been set to true Fix stackoverflow error on InternalNumericMetricAggregation Date detection should not rely on a hardcoded set of characters. (elastic#22171) `value_type` is useful regardless of scripting. (elastic#22160) Improve concurrency of ShardCoreKeyMap. (elastic#22316) fixed jdocs and removed already fixed norelease Adds abstract test classes for serialisation (elastic#22281) Introduce translog no-op Provide helpful error message if a plugin exists Clear static variable after suite Repeated language analyzers (elastic#22240) Restore deprecation warning for invalid match_mapping_type values (elastic#22304) Make `-0` compare less than `+0` consistently. (elastic#22173) ...
close #21600