[SPARK-6748] [SQL] Makes QueryPlan.schema a lazy val #5398

liancheng · 2015-04-07T18:00:24Z

DataFrame.collect() calls SparkPlan.executeCollect(), which consists of a single line:

execute().map(ScalaReflection.convertRowToScala(_, schema)).collect()

The problem is that, QueryPlan.schema is a function. And since 1.3.0, convertRowToScala starts returning a GenericRowWithSchema. Thus, every GenericRowWithSchema instance holds a separate copy of the schema object. Also, YJP profiling result of the following simple micro benchmark (executed in Spark shell) shows that constructing all these schema objects takes up to ~35% CPU time.

sc.parallelize(1 to 10000000).
  map(i => (i, s"val_$i")).
  toDF("key", "value").
  saveAsParquetFile("file:///tmp/src.parquet")

// Profiling started from this line
sqlContext.parquetFile("file:///tmp/src.parquet").collect()

SparkQA · 2015-04-07T18:02:42Z

Test build #29804 has started for PR 5398 at commit 3159469.

rxin · 2015-04-07T18:18:40Z

LGTM

SparkQA · 2015-04-07T19:30:16Z

Test build #29804 has finished for PR 5398 at commit 3159469.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-04-07T19:30:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29804/
Test PASSed.

liancheng · 2015-04-07T23:54:40Z

Merged to master.

Makes QueryPlan.schema a lazy val

3159469

asfgit closed this in 77bcceb Apr 7, 2015

liancheng deleted the spark-6748 branch April 7, 2015 23:54

liancheng mentioned this pull request Apr 8, 2015

[SQL] Faster Scala row conversion #5419

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-6748] [SQL] Makes QueryPlan.schema a lazy val #5398

[SPARK-6748] [SQL] Makes QueryPlan.schema a lazy val #5398

Uh oh!

liancheng commented Apr 7, 2015

Uh oh!

SparkQA commented Apr 7, 2015

Uh oh!

rxin commented Apr 7, 2015

Uh oh!

SparkQA commented Apr 7, 2015

Uh oh!

AmplabJenkins commented Apr 7, 2015

Uh oh!

liancheng commented Apr 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-6748] [SQL] Makes QueryPlan.schema a lazy val #5398

[SPARK-6748] [SQL] Makes QueryPlan.schema a lazy val #5398

Uh oh!

Conversation

liancheng commented Apr 7, 2015

Uh oh!

SparkQA commented Apr 7, 2015

Uh oh!

rxin commented Apr 7, 2015

Uh oh!

SparkQA commented Apr 7, 2015

Uh oh!

AmplabJenkins commented Apr 7, 2015

Uh oh!

liancheng commented Apr 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants