PR #2: Dt features style guidelines #17

fabuzaid21 · 2015-11-30T19:14:03Z

No description provided.

jkbradley · 2015-12-01T19:41:52Z

Some of these style fixes aren't quite right. It's mainly from indentation issues which the scalastyle script won't find. I'll comment where applicable.

IntelliJ should handle most of the issues. Btw, it can be useful to run "build/sbt gen-idea" to create the IntelliJ configs. IntelliJ complains about them being outdated, but it can convert them to its newer representation when you re-open the project.

jkbradley · 2015-12-01T19:44:33Z

mllib/src/main/scala/org/apache/spark/ml/tree/impl/AltDT.scala

This style was already acceptable. To be sure, you could put the "new" clause in braces, but it's fine as is.

When I ran scalastyle, it said that, since fromStrategy was public, the return type of the method had to be declared, so I added : AltDTMetadata. Should I just ignore the output of scalastyle?

Sorry, you're right about adding the ALTDTMetadata. My comment only applies to putting it on a newline.

jkbradley · 2015-12-01T19:44:50Z

That should be it, thanks!

fabuzaid21 · 2015-12-02T02:04:05Z

Okay, I just pushed in an update. Lemme know if anything else needs to be fixed.

jkbradley · 2015-12-02T23:52:07Z

LGTM Thank you! Could you please rebase your other PRs after I merge this? Then I'll see about testing them.

PR #2: Dt features style guidelines

fabuzaid21 · 2015-12-03T00:20:06Z

Yep! They've already been rebased and pushed.

…onfig option. ## What changes were proposed in this pull request? Currently, `OptimizeIn` optimizer replaces `In` expression into `InSet` expression if the size of set is greater than a constant, 10. This issue aims to make a configuration `spark.sql.optimizer.inSetConversionThreshold` for that. After this PR, `OptimizerIn` is configurable. ```scala scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain() == Physical Plan == WholeStageCodegen : +- Project [a#7 IN (1,2,3) AS (a IN (1, 2, 3))#8] : +- INPUT +- Generate explode([1,2]), false, false, [a#7] +- Scan OneRowRelation[] scala> sqlContext.setConf("spark.sql.optimizer.inSetConversionThreshold", "2") scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain() == Physical Plan == WholeStageCodegen : +- Project [a#16 INSET (1,2,3) AS (a IN (1, 2, 3))#17] : +- INPUT +- Generate explode([1,2]), false, false, [a#16] +- Scan OneRowRelation[] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <[email protected]> Closes apache#12562 from dongjoon-hyun/SPARK-14796.

…aggregations ## What changes were proposed in this pull request? Partial aggregations are generated in `EnsureRequirements`, but the planner fails to check if partial aggregation satisfies sort requirements. For the following query: ``` val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2") spark.sql("select max(b) from t2 group by a").explain(true) ``` Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation. ``` == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- LocalTableScan [a#5, b#6] ``` Actually, a correct plan is: ``` == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- *Sort [a#5 ASC], false, 0 +- LocalTableScan [a#5, b#6] ``` ## How was this patch tested? Added tests in `PlannerSuite`. Author: Takeshi YAMAMURO <[email protected]> Closes apache#14865 from maropu/SPARK-17289.

fabuzaid21 added 2 commits October 20, 2015 17:40

removed partitionInfosDebug

2f2ac8d

cosmetic changes to conform to Spark style guidelines

5ab1528

jkbradley reviewed Dec 1, 2015
View reviewed changes

addressing Joseph's comments on the PR

5370559

jkbradley added a commit that referenced this pull request Dec 2, 2015

Merge pull request #17 from fabuzaid21/dt-features-style-guidelines

b459c88

PR #2: Dt features style guidelines

jkbradley merged commit b459c88 into jkbradley:dt-features Dec 2, 2015

fabuzaid21 deleted the dt-features-style-guidelines branch December 7, 2015 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PR #2: Dt features style guidelines #17

PR #2: Dt features style guidelines #17

fabuzaid21 commented Nov 30, 2015

Uh oh!

jkbradley commented Dec 1, 2015

Uh oh!

jkbradley Dec 1, 2015

Uh oh!

fabuzaid21 Dec 1, 2015

Uh oh!

jkbradley Dec 1, 2015

Uh oh!

jkbradley commented Dec 1, 2015

Uh oh!

fabuzaid21 commented Dec 2, 2015

Uh oh!

jkbradley commented Dec 2, 2015

Uh oh!

fabuzaid21 commented Dec 3, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PR #2: Dt features style guidelines #17

PR #2: Dt features style guidelines #17

Conversation

fabuzaid21 commented Nov 30, 2015

Uh oh!

jkbradley commented Dec 1, 2015

Uh oh!

jkbradley Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

fabuzaid21 Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Dec 1, 2015

Uh oh!

fabuzaid21 commented Dec 2, 2015

Uh oh!

jkbradley commented Dec 2, 2015

Uh oh!

fabuzaid21 commented Dec 3, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants