-
Notifications
You must be signed in to change notification settings - Fork 28.9k
SPARK-2096 [SQL]: Correctly parse dot notations for accessing an array of structs #2082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you changing the test case since it still cannot work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wang pangzi, someone add field3 in testData arrayOfStruct. So it requires another null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add sql("select arrayOfStruct.field1 from jsonTable where arrayOfStruct.field1 = true") this test case in ignored part. It does not work because I came up with it but did not solve it. Or it makes no sense to solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End files with a newline. Run sbt/sbt scalastyle to check style locally.
|
Thanks for working on this. I made a few small suggestions. |
|
Jenkins, test this please. |
|
QA tests have started for PR 2082 at commit
|
|
QA tests have finished for PR 2082 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent too much
|
Thank you, marmbrus, you are so nice. I am fresh here and never post any PR to a open project. I will take your suggestions and modify my code as the scala style. |
|
Jenkins, test this please. |
|
ok to test |
|
QA tests have started for PR 2082 at commit
|
|
QA tests have finished for PR 2082 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use
value.map{ t =>
if (t == null) null else t(ordinal)
}
as the last line of this eval function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= =
|
@marmbrus , do I need to check something else? Or merge the code? Besides, I see another PR : #2230 :) it is my friend, I suggested him have a look of the nested parquet sqlpaser in the sql parquet test suit which parses dot as a delimiter, not identChar. it has a problem that we don't know whether the dot should be in identchar or delimiter. It leads to different sql parsing result. If only struct in a json or parquet, apparently put the dot. in identchar is better. Does it have some other reasons about the dot parsing? |
|
Hi @chuxi, it would be great if you could discuss with @cloud-fan and perhaps adapt your |
|
Can one of the admins verify this patch? |
For example, "arrayOfStruct" is an array of structs and every element of this array has a field called "field1". "arrayOfStruct[0].field1" means to access the value of "field1" for the first element of "arrayOfStruct", but the SQL parser (in sql-core) treats "field1" as an alias. Also, "arrayOfStruct.field1" means to access all values of "field1" in this array of structs and the returns those values as an array. But, the SQL parser cannot resolve it.
I have passed the test case in JsonSuite ("Complex field and type inferring (Ignored)") which is ignored, by a little modified.
modified test part :
checkAnswer(
sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
(Seq(true, false, null), Seq("str1", null, null)) :: Nil
)
However, another question is repeated nested structure is a problem, like arrayOfStruct.field1.arrayOfStruct.field1 or arrayOfStruct[0].field1.arrayOfStruct[0].field1
I plan to ignore this problem and try to add "select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable where arrayOfStruct.field1==true "
Besides, my friend anyweil (Wei Li) solved the problem of arrayOfStruct.field1 and its Filter part( means where parsing).
I am fresh here but will continue working on spark :)
I checked the problem " where arrayOfStruct.field1==true "
this problem will lead to modify every kind of comparisonExpression. And I think it makes no sense to add this function. So I discard it.
Over.