Skip to content

Conversation

@verkhovin
Copy link
Contributor

@verkhovin verkhovin commented May 9, 2020

Resolves #46396

After refactoring made in #49693 the issue is still reproducable.
I've append a bit of logic to ResolveRefs rule and now it seems to work as expected.
For the query

curl --request POST \
  --url 'http://localhost:9200/_sql?format=txt' \
  --header 'authorization: Basic ZWxhc3RpYzpwYXNzd29yZA==' \
  --header 'content-type: application/json' \
  --data '{"query": "select gender as g, sum(salary) as g from test_emp group by g"}'

the following request body is returned:

{
  "error": {
    "root_cause": [
      {
        "type": "verification_exception",
        "reason": "Found 1 problem\nline 1:61: Reference [g] is ambiguous (to disambiguate use quotes or qualifiers); matches any of [g, g]"
      }
    ],
    "type": "verification_exception",
    "reason": "Found 1 problem\nline 1:61: Reference [g] is ambiguous (to disambiguate use quotes or qualifiers); matches any of [g, g]"
  },
  "status": 400
}

I've run tests from org.elasticsearch.xpack.sql and precommit gradle task to verify that everything is ok.
Please, let me know if I missed something up.

@cbuescher cbuescher added the :Analytics/SQL SQL querying label May 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/SQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label May 11, 2020
@matriv
Copy link
Contributor

matriv commented May 11, 2020

@verkhovin Thank you for picking this up!
Could you please add some unit tests in FieldAttributeTest?
There is already a testFieldAmbiguity() but doesn't test aggregations and group by.
Please test with aggregates and with/without group by.

@verkhovin
Copy link
Contributor Author

@matriv yes. Thank you

@verkhovin
Copy link
Contributor Author

@matriv please take a look, I've added a couple of tests.
I used basic-mapping.json in these tests. Let me know if it isn't cover cases fully in your opinion.

@matriv
Copy link
Contributor

matriv commented May 11, 2020

@elasticmachine test this please

@matriv
Copy link
Contributor

matriv commented May 11, 2020

@verkhovin Could you also test with multiple agg functions with the same alias?
MAX(int) AS m, MIN(int) AS m .... GROUP BY m
Also with multiple fields with the same alias:
MAX(a), field1 AS f, field2 AS f ... GROUP BY f.

@matriv matriv requested review from astefan, costin and matriv May 11, 2020 14:10
@verkhovin
Copy link
Contributor Author

@matriv i've found a problem. For the query SELECT gender AS g, max(salary) AS g, min(salary) AS g FROM test GROUP BY g i expected the following error message:
Found 1 problem line 1:75: Reference [g] is ambiguous (to disambiguate use quotes or qualifiers); matches any of ["g", "g", "g"]

but actual message is:
Found 1 problem line 1:75: Reference [g] is ambiguous (to disambiguate use quotes or qualifiers); matches any of ["g", "g"]

Only two matches are represented in the message.
I'll look on it now.

@verkhovin
Copy link
Contributor Author

verkhovin commented May 11, 2020

Ok. It is because of org.elasticsearch.xpack.ql.expression.Expressions#aliases()
https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/expression/Expressions.java#L167

Here NamedExpressions from max(salary) AS g and min(salary) AS g are converted to attributes by toAttribute()method. And the attributes are equal. So the resulting aliasMap size is 2. Here we lose the third AS g statement.

I'm not sure if it should be fixed or it's expected behavior.

Comment on lines 256 to 261
ex = expectThrows(VerificationException.class,
() -> plan("SELECT gender AS g, max(salary) AS g, min(salary) AS g FROM test GROUP BY g"));
assertEquals(
"Found 1 problem\nline 1:75: Reference [g] is ambiguous (to disambiguate use quotes or qualifiers); " +
"matches any of [\"g\", \"g\"]",
ex.getMessage());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where the problem is. This case passes because max(salary) AS g and min(salary) AS g are considered as equal Attributes (see my comment to the PR)

Copy link
Contributor

@matriv matriv May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just change the error message to something like:

Found 1 problem\nline 1:75: Reference [g] is ambiguous, (to disambiguate use quotes, qualifiers or different aliases)

@costin @astefan @bpintea what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you. But I think such message would seem quite confusing. Imagine, you have really complex query with a lot of aliases, and only two of them are duplicated. I'd spend a lot of time to understand what such error message means.

Copy link
Contributor Author

@verkhovin verkhovin May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see you changed Reference [] is ambiguous to Reference [g] is ambiguous in your comment. Sorry for the misunderstanding. If we specify which alias is ambiguous, than it lgfm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpintea what do you think?

Sounds good to me. I can't imagine a case where the "ambiguity count" is indispensable.

Copy link
Contributor Author

@verkhovin verkhovin May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the same error message (and same error message building implementation) is used in more complicated cases, e.g. here

ex = expectThrows(VerificationException.class, () -> plan("SELECT test.test FROM test"));
assertEquals(
"Found 1 problem\nline 1:8: Reference [test.test] is ambiguous (to disambiguate use quotes or qualifiers); "
+ "matches any of [\"test\".\"test\", \"test\".\"test.test\"]",
ex.getMessage());

@verkhovin
Copy link
Contributor Author

Eventually i got rid of part of the message, which shows the "ambiguity count".
So waiting for feedback

@matriv
Copy link
Contributor

matriv commented May 19, 2020

@verkhovin Apologies for the delay on my response. After some discussion we decided that we shouldn't change the error message but instead handle the resolution differently. Here: https://github.com/elastic/elasticsearch/pull/56489/files#diff-bb55908282831d2f432f4a4650d55521L168 we pass the AttributeMap keySet where fields have been overriden because of it's special hashCode & equals implementation. We need to address it with a different approach with a List or a normal Set (if that works).

Would you like to give that a try? otherwise we're happy to take it over from here as well.

@verkhovin
Copy link
Contributor Author

verkhovin commented May 19, 2020

@matriv That's Great! I'm interested to finish it up in a right way, so I'll take a look on it. Thank you for your guidelines. I'll write here, if some help will be needed.

@matriv
Copy link
Contributor

matriv commented May 20, 2020

@verkhovin Go ahead, thank you!

@matriv
Copy link
Contributor

matriv commented May 20, 2020

@elasticmachine test this please

@matriv
Copy link
Contributor

matriv commented May 21, 2020

@elasticmachine update branch

@matriv
Copy link
Contributor

matriv commented May 21, 2020

@elasticmachine test this please

@matriv
Copy link
Contributor

matriv commented May 21, 2020

Thank you @verkhovin for the new approach.
I think it's a good opportunity to improve the message and use the attribute location as a prefix so it can be easier for the user to find the ambiguous aliases (especially within a long query).

@verkhovin
Copy link
Contributor Author

@matriv do you mean we now can build such messages:
line 1:61: Reference [g] is ambiguous (to disambiguate use quotes or qualifiers); matches any of [line 1:18 [g], line 1:36 [g]] ?

I can try implement it in the current PR. Let me know if you think that it should be implemented as a separate feature in the scope of another PR

@matriv
Copy link
Contributor

matriv commented May 21, 2020

@verkhovin Yes, that seems fine. You can include it in this PR. thx!

@verkhovin
Copy link
Contributor Author

@matriv please take a look

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left some comments.
I'd like to ask you to add some tests for the ORDER by ambiguity.

Moreover, there is still an issue when you have duplicate aliases and using numeric references or the initial column names:

SELECT emp_no % languages as a, gender as a, max(salary) as a FROM test_emp GROUP BY 1, 2

or

SELECT emp_no % languages as a, gender as a, max(salary) as a FROM test_emp GROUP BY emp_no % languages, gender

both return an error like the following:

{
    "error": {
        "root_cause": [
            {
                "type": "sql_illegal_argument_exception",
                "reason": "Cannot resolve field extractor index for column [a{r}#121]"
            }
        ],
        "type": "sql_illegal_argument_exception",
        "reason": "Cannot resolve field extractor index for column [a{r}#121]"
    },
    "status": 500
}

Would you like to dig into it as well? (can also be done in a separate PR as the issue was already there, and not affected with your fix).


import java.util.Objects;

public class AttributeAlias {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please replace this with a Tuple<Attribute, Expression>?

* The rest are not as they are not part of the projection and thus are not part of the derived table.
*/
public abstract class Attribute extends NamedExpression {
public abstract class Attribute extends NamedExpression implements Comparable<Attribute>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making generic class Attribute to implement Comparable just for the purposes of constructing a nice error message is a bit too much. Please remove the comparison implementation from here and use a custom comparator to sort the ambiguous attributes.

|| Objects.equals(u.name(), attribute.qualifiedName()));
if (match) {
matches.add(attribute.withLocation(u.source()));
matches.add(attribute);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing it from here and adding it only if matches.size() == 1?

if (maybeResolved.resolved()) {
// use the matched expression (not its attribute)
grouping = resolved.stream()
.filter(attributeAlias -> attributeAlias.getAttribute().equals(maybeResolved))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, it would be better to use a for loop to increase readability.

@verkhovin
Copy link
Contributor Author

@matriv thank you for the review and suggestions. I'll get to work on it in the nearest future.

Would you like to dig into it as well? (can also be done in a separate PR as the issue was already there, and not affected with your fix).

Yeah, I am interested to dig into this after finishing the current PR. If you don't mind, I can create an issue about the error and self-assign it.

@matriv
Copy link
Contributor

matriv commented Jun 4, 2020

@verkhovin FYI: #57668

If you feel like working on it, please feel free to assign yourself, but I'd like to ask you to finish up the work on this PR first.

@verkhovin
Copy link
Contributor Author

@matriv sure

@bpintea
Copy link
Contributor

bpintea commented Jul 20, 2020

@verkhovin, many thanks for your contribution!

Issue fixed by continuation PR #59370.

@bpintea bpintea closed this Jul 20, 2020
@verkhovin
Copy link
Contributor Author

verkhovin commented Jul 20, 2020

Sorry for the latency. I really wanted to end up with it, but haven't enough time for that.
I should've notified here about it.
@bpintea thank you for doing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/SQL SQL querying Team:QL (Deprecated) Meta label for query languages team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL: NPE when using the same alias for a projection and an aggregate and GROUPed BY

5 participants