Skip to content

Conversation

@vlyubin
Copy link
Contributor

@vlyubin vlyubin commented Mar 31, 2015

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite related to this PR, but I don't think it was necessary to use GenericMutableRow here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The original version uses a mutable row mostly because of the updates in the while loop I guess.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29441 has started for PR 5279 at commit ab7585b.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29441 has finished for PR 5279 at commit ab7585b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29441/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Mar 31, 2015

I know you are probably still working on this - any benchmark numbers?

@rxin
Copy link
Contributor

rxin commented Mar 31, 2015

cc @davies since you guys are both changing this part of the code lately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break the line after the first =>.

@liancheng
Copy link
Contributor

Went though very quickly for the first time, left some styling comments.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29499 has started for PR 5279 at commit 6a35425.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29499 has finished for PR 5279 at commit 6a35425.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29499/
Test PASSed.

@vlyubin
Copy link
Contributor Author

vlyubin commented Apr 1, 2015

Here are the benchmark numbers: http://pastie.org/private/6vg7kg2yfwop2ov5zu2eq
TLDR: On real cluster the new implementation made rdd() 1.9 times faster and toDF() 2.2 times faster.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29514 has started for PR 5279 at commit 99d333a.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29514 has finished for PR 5279 at commit 99d333a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29514/
Test PASSed.

@vlyubin
Copy link
Contributor Author

vlyubin commented Apr 3, 2015

ping

@rxin
Copy link
Contributor

rxin commented Apr 3, 2015

@vlyubin who are you pinging? is this still "WIP"?

@vlyubin vlyubin changed the title [WIP] [SPARK-6620] Speed up toDF() and rdd() functions by constructing converters in ScalaReflection [SPARK-6620] Speed up toDF() and rdd() functions by constructing converters in ScalaReflection Apr 3, 2015
@vlyubin
Copy link
Contributor Author

vlyubin commented Apr 3, 2015

@rxin Sorry, I just removed the WIP tag. There isn't anything more to add, as it turned out that there are no places here where we could use SpecificMutableRow to speed things up.
So I'm pinging whoever wants to add more comments on the code ...

@rxin
Copy link
Contributor

rxin commented Apr 3, 2015

Jenkins, retest this please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this pattern cannot be

val converter = CatalystTypeConverters.createToScalaConverter(schema)
rows.map(converter).toArray

?

It looks like this pretty efficiently handles this situation in the same way that you've extracted it here (even with calling convertRowWithConverters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I'll update these.

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29896 has started for PR 5279 at commit dec6802.

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29897 has started for PR 5279 at commit c327bc9.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29896 has finished for PR 5279 at commit dec6802.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29896/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29897 has finished for PR 5279 at commit c327bc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29897/
Test PASSed.

@liancheng
Copy link
Contributor

@vlyubin Would you mind to add [SQL] to the PR title? Thanks!

@vlyubin vlyubin changed the title [SPARK-6620] Speed up toDF() and rdd() functions by constructing converters in ScalaReflection [SQL] [SPARK-6620] Speed up toDF() and rdd() functions by constructing converters in ScalaReflection Apr 9, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use
convertedMap(keyConverter(entry.getKey)) = valueConverter(entry.getValue)
to avoid creating a tuple.

@aarondav
Copy link
Contributor

aarondav commented Apr 9, 2015

LGTM, @marmbrus would you mind doing a final pass?

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #29988 has started for PR 5279 at commit 11a20ec.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #29988 has finished for PR 5279 at commit 11a20ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29988/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong comment style.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30051 has started for PR 5279 at commit e75a387.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30051 has finished for PR 5279 at commit e75a387.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30051/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use lazy val here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants