Skip to content

Conversation

@fabuzaid21
Copy link

No description provided.

@jkbradley
Copy link
Owner

Some of these style fixes aren't quite right. It's mainly from indentation issues which the scalastyle script won't find. I'll comment where applicable.

IntelliJ should handle most of the issues. Btw, it can be useful to run "build/sbt gen-idea" to create the IntelliJ configs. IntelliJ complains about them being outdated, but it can convert them to its newer representation when you re-open the project.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style was already acceptable. To be sure, you could put the "new" clause in braces, but it's fine as is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran scalastyle, it said that, since fromStrategy was public, the return type of the method had to be declared, so I added : AltDTMetadata. Should I just ignore the output of scalastyle?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're right about adding the ALTDTMetadata. My comment only applies to putting it on a newline.

@jkbradley
Copy link
Owner

That should be it, thanks!

@fabuzaid21
Copy link
Author

Okay, I just pushed in an update. Lemme know if anything else needs to be fixed.

@jkbradley
Copy link
Owner

LGTM Thank you! Could you please rebase your other PRs after I merge this? Then I'll see about testing them.

jkbradley added a commit that referenced this pull request Dec 2, 2015
@jkbradley jkbradley merged commit b459c88 into jkbradley:dt-features Dec 2, 2015
@fabuzaid21
Copy link
Author

Yep! They've already been rebased and pushed.

@fabuzaid21 fabuzaid21 deleted the dt-features-style-guidelines branch December 7, 2015 17:56
jkbradley pushed a commit that referenced this pull request Apr 25, 2016
…onfig option.

## What changes were proposed in this pull request?

Currently, `OptimizeIn` optimizer replaces `In` expression into `InSet` expression if the size of set is greater than a constant, 10.
This issue aims to make a configuration `spark.sql.optimizer.inSetConversionThreshold` for that.

After this PR, `OptimizerIn` is configurable.
```scala
scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [a#7 IN (1,2,3) AS (a IN (1, 2, 3))#8]
:     +- INPUT
+- Generate explode([1,2]), false, false, [a#7]
   +- Scan OneRowRelation[]

scala> sqlContext.setConf("spark.sql.optimizer.inSetConversionThreshold", "2")

scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [a#16 INSET (1,2,3) AS (a IN (1, 2, 3))#17]
:     +- INPUT
+- Generate explode([1,2]), false, false, [a#16]
   +- Scan OneRowRelation[]
```

## How was this patch tested?

Pass the Jenkins tests (with a new testcase)

Author: Dongjoon Hyun <[email protected]>

Closes apache#12562 from dongjoon-hyun/SPARK-14796.
jkbradley pushed a commit that referenced this pull request Sep 6, 2016
…aggregations

## What changes were proposed in this pull request?
Partial aggregations are generated in `EnsureRequirements`, but the planner fails to
check if partial aggregation satisfies sort requirements.
For the following query:
```
val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2")
spark.sql("select max(b) from t2 group by a").explain(true)
```
Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation.
```
== Physical Plan ==
SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17])
+- *Sort [a#5 ASC], false, 0
   +- Exchange hashpartitioning(a#5, 200)
      +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19])
         +- LocalTableScan [a#5, b#6]
```
Actually, a correct plan is:
```
== Physical Plan ==
SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17])
+- *Sort [a#5 ASC], false, 0
   +- Exchange hashpartitioning(a#5, 200)
      +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19])
         +- *Sort [a#5 ASC], false, 0
            +- LocalTableScan [a#5, b#6]
```

## How was this patch tested?
Added tests in `PlannerSuite`.

Author: Takeshi YAMAMURO <[email protected]>

Closes apache#14865 from maropu/SPARK-17289.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants