Skip to content

Conversation

@saucam
Copy link

@saucam saucam commented Sep 4, 2015

Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient.

This reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb

@saucam
Copy link
Author

saucam commented Sep 4, 2015

cc @liancheng

thoughts ?

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #42014 has finished for PR 8604 at commit 5afb9eb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class BlockFetchException(messages: String, throwable: Throwable)

@saucam saucam changed the title [SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan [WIP][SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan Sep 5, 2015
@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42042 has finished for PR 8604 at commit 1587a8b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@saucam
Copy link
Author

saucam commented Sep 5, 2015

I get this failure :

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/src/test/scala/org/apache/spark/sql/SQLConfSuite.scala:83: not found: value ctx
[error] assert(ctx.conf.numShufflePartitions === 10)
[error] ^
[error] one error found
error Compilation failed
[error] Total time: 108 s, completed Sep 5, 2015 1:47:01 AM

Please help

@saucam saucam changed the title [WIP][SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan [SQL][SPARK-10451]: Prevent unnecessary serializations in InMemoryColumnarTableScan Sep 5, 2015
@rxin
Copy link
Contributor

rxin commented Sep 5, 2015

We fixed this. Lemme trigger the test again.

@rxin
Copy link
Contributor

rxin commented Sep 5, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42047 has finished for PR 8604 at commit 1587a8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@saucam
Copy link
Author

saucam commented Sep 5, 2015

thank you @rxin :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment above this hunk to indicate that these variables are added to avoid unnecessary serialization?

@liancheng
Copy link
Contributor

LGTM except for one minor issue. Thanks for investigating and fixing this!

@saucam
Copy link
Author

saucam commented Sep 11, 2015

added comments

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42325 has finished for PR 8604 at commit f01f989.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@saucam
Copy link
Author

saucam commented Sep 11, 2015

Some OrcHadoopFSRelationSuite test is failing. Can you help with this one @liancheng ?

refer:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42325/testReport/junit/org.apache.spark.sql.hive.orc/OrcHadoopFsRelationSuite/test_all_data_types/

I dont understand, i just added a comment !

@liancheng
Copy link
Contributor

@saucam This is not your fault :) This test case uses a random SQL data generator to test all supported data types. Seems that something goes wrong there. However, the output is garbled because of random string and random binary values.

@yhuai I can't tell whether this one is a legitimate test failure because the output is completely unreadable. Maybe we shouldn't test all data types at the same time within a single row. Put each data type into a single-cell row may improve readability of test failure output.

@liancheng
Copy link
Contributor

retest this please.

@saucam
Copy link
Author

saucam commented Sep 11, 2015

phew! ohk :)

@yhuai
Copy link
Contributor

yhuai commented Sep 11, 2015

@liancheng I think there is something wrong with ORC. In #8702, I make this test run 100 times and seems ORC's suite failed several times https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42299/.

We can ignore this test for now. #8705 is the PR. If we want to do it, we should merge it to branch master and branch 1.5

@yhuai
Copy link
Contributor

yhuai commented Sep 11, 2015

OK. I will merge #8705. Looks like this test is pretty flaky.

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42341 has finished for PR 8604 at commit f01f989.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@saucam
Copy link
Author

saucam commented Sep 11, 2015

@yhuai
Copy link
Contributor

yhuai commented Sep 11, 2015

test this please

@yhuai
Copy link
Contributor

yhuai commented Sep 11, 2015

I have already ignored that test. You will not see the noise any more.

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42343 has finished for PR 8604 at commit f01f989.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@saucam
Copy link
Author

saucam commented Sep 18, 2015

@yhuai thanks for the help!

@yhuai
Copy link
Contributor

yhuai commented Sep 18, 2015

Thanks! Merging to master.

@asfgit asfgit closed this in 20fd35d Sep 18, 2015
@saucam
Copy link
Author

saucam commented Sep 18, 2015

thanks for the merge :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants