[SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan #8604

saucam · 2015-09-04T19:20:08Z

Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient.

This reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb

…bleScan

saucam · 2015-09-04T19:21:10Z

cc @liancheng

thoughts ?

SparkQA · 2015-09-04T19:47:17Z

Test build #42014 has finished for PR 8604 at commit 5afb9eb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class BlockFetchException(messages: String, throwable: Throwable)

…en SpecificPredicate class

SparkQA · 2015-09-05T08:47:04Z

Test build #42042 has finished for PR 8604 at commit 1587a8b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

saucam · 2015-09-05T08:52:35Z

I get this failure :

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/src/test/scala/org/apache/spark/sql/SQLConfSuite.scala:83: not found: value ctx
[error] assert(ctx.conf.numShufflePartitions === 10)
[error] ^
[error] one error found
error Compilation failed
[error] Total time: 108 s, completed Sep 5, 2015 1:47:01 AM

Please help

rxin · 2015-09-05T09:18:10Z

We fixed this. Lemme trigger the test again.

rxin · 2015-09-05T09:18:15Z

Jenkins, retest this please.

SparkQA · 2015-09-05T11:35:57Z

Test build #42047 has finished for PR 8604 at commit 1587a8b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

saucam · 2015-09-05T16:16:05Z

thank you @rxin :)

liancheng · 2015-09-11T09:04:19Z

sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala

Could you please add a comment above this hunk to indicate that these variables are added to avoid unnecessary serialization?

liancheng · 2015-09-11T09:05:18Z

LGTM except for one minor issue. Thanks for investigating and fixing this!

saucam · 2015-09-11T09:35:11Z

added comments

SparkQA · 2015-09-11T11:18:19Z

Test build #42325 has finished for PR 8604 at commit f01f989.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

saucam · 2015-09-11T11:29:05Z

Some OrcHadoopFSRelationSuite test is failing. Can you help with this one @liancheng ?

refer:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42325/testReport/junit/org.apache.spark.sql.hive.orc/OrcHadoopFsRelationSuite/test_all_data_types/

I dont understand, i just added a comment !

liancheng · 2015-09-11T16:14:13Z

@saucam This is not your fault :) This test case uses a random SQL data generator to test all supported data types. Seems that something goes wrong there. However, the output is garbled because of random string and random binary values.

@yhuai I can't tell whether this one is a legitimate test failure because the output is completely unreadable. Maybe we shouldn't test all data types at the same time within a single row. Put each data type into a single-cell row may improve readability of test failure output.

liancheng · 2015-09-11T16:14:19Z

retest this please.

saucam · 2015-09-11T16:16:07Z

phew! ohk :)

yhuai · 2015-09-11T16:37:10Z

@liancheng I think there is something wrong with ORC. In #8702, I make this test run 100 times and seems ORC's suite failed several times https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42299/.

We can ignore this test for now. #8705 is the PR. If we want to do it, we should merge it to branch master and branch 1.5

yhuai · 2015-09-11T16:41:01Z

OK. I will merge #8705. Looks like this test is pretty flaky.

SparkQA · 2015-09-11T18:03:52Z

Test build #42341 has finished for PR 8604 at commit f01f989.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

saucam · 2015-09-11T18:06:54Z

This time it fails jsonHadoopFSRelationSuite !

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42341/testReport/junit/org.apache.spark.sql.sources/JsonHadoopFsRelationSuite/test_all_data_types/

yhuai · 2015-09-11T18:07:30Z

test this please

yhuai · 2015-09-11T18:08:48Z

I have already ignored that test. You will not see the noise any more.

SparkQA · 2015-09-11T20:24:49Z

Test build #42343 has finished for PR 8604 at commit f01f989.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

saucam · 2015-09-18T09:26:37Z

@yhuai thanks for the help!

yhuai · 2015-09-18T15:20:45Z

Thanks! Merging to master.

saucam · 2015-09-18T15:28:56Z

thanks for the merge :)

SPARK-10451: Prevent unnecessary serializations in InMemoryColumnarTa…

5afb9eb

…bleScan

saucam changed the title ~~[SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan~~ [WIP][SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan Sep 5, 2015

SPARK-10451: partitionFilters cannot be out of scope because of codeg…

1587a8b

…en SpecificPredicate class

saucam changed the title ~~[WIP][SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan~~ [SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan Sep 5, 2015

liancheng reviewed Sep 11, 2015
View reviewed changes

SPARK-10451: Incorporate review comments

f01f989

asfgit closed this in 20fd35d Sep 18, 2015

[SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan #8604

[SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan #8604

Uh oh!

Conversation

saucam commented Sep 4, 2015

Uh oh!

saucam commented Sep 4, 2015

Uh oh!

SparkQA commented Sep 4, 2015

Uh oh!

SparkQA commented Sep 5, 2015

Uh oh!

saucam commented Sep 5, 2015

Uh oh!

rxin commented Sep 5, 2015

Uh oh!

rxin commented Sep 5, 2015

Uh oh!

SparkQA commented Sep 5, 2015

Uh oh!

saucam commented Sep 5, 2015

Uh oh!

liancheng Sep 11, 2015

Choose a reason for hiding this comment

Uh oh!

liancheng commented Sep 11, 2015

Uh oh!

saucam commented Sep 11, 2015

Uh oh!

SparkQA commented Sep 11, 2015

Uh oh!

saucam commented Sep 11, 2015

Uh oh!

liancheng commented Sep 11, 2015

Uh oh!

liancheng commented Sep 11, 2015

Uh oh!

saucam commented Sep 11, 2015

Uh oh!

yhuai commented Sep 11, 2015

Uh oh!

yhuai commented Sep 11, 2015

Uh oh!

SparkQA commented Sep 11, 2015

Uh oh!

saucam commented Sep 11, 2015

Uh oh!

yhuai commented Sep 11, 2015

Uh oh!

yhuai commented Sep 11, 2015

Uh oh!

SparkQA commented Sep 11, 2015

Uh oh!

saucam commented Sep 18, 2015

Uh oh!

yhuai commented Sep 18, 2015

Uh oh!

saucam commented Sep 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants