-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan #8604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @liancheng thoughts ? |
|
Test build #42014 has finished for PR 8604 at commit
|
…en SpecificPredicate class
|
Test build #42042 has finished for PR 8604 at commit
|
|
I get this failure : [error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/src/test/scala/org/apache/spark/sql/SQLConfSuite.scala:83: not found: value ctx Please help |
|
We fixed this. Lemme trigger the test again. |
|
Jenkins, retest this please. |
|
Test build #42047 has finished for PR 8604 at commit
|
|
thank you @rxin :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment above this hunk to indicate that these variables are added to avoid unnecessary serialization?
|
LGTM except for one minor issue. Thanks for investigating and fixing this! |
|
added comments |
|
Test build #42325 has finished for PR 8604 at commit
|
|
Some OrcHadoopFSRelationSuite test is failing. Can you help with this one @liancheng ? refer: I dont understand, i just added a comment ! |
|
@saucam This is not your fault :) This test case uses a random SQL data generator to test all supported data types. Seems that something goes wrong there. However, the output is garbled because of random string and random binary values. @yhuai I can't tell whether this one is a legitimate test failure because the output is completely unreadable. Maybe we shouldn't test all data types at the same time within a single row. Put each data type into a single-cell row may improve readability of test failure output. |
|
retest this please. |
|
phew! ohk :) |
|
@liancheng I think there is something wrong with ORC. In #8702, I make this test run 100 times and seems ORC's suite failed several times https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42299/. We can ignore this test for now. #8705 is the PR. If we want to do it, we should merge it to branch master and branch 1.5 |
|
OK. I will merge #8705. Looks like this test is pretty flaky. |
|
Test build #42341 has finished for PR 8604 at commit
|
|
This time it fails jsonHadoopFSRelationSuite ! |
|
test this please |
|
I have already ignored that test. You will not see the noise any more. |
|
Test build #42343 has finished for PR 8604 at commit
|
|
@yhuai thanks for the help! |
|
Thanks! Merging to master. |
|
thanks for the merge :) |
Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient.
This reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb