Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -640,7 +640,8 @@ class Analyzer(

// Try resolving the ordering as though it is in the aggregate clause.
try {
val aliasedOrdering = sortOrder.map(o => Alias(o.child, "aggOrder")())
val unresolvedSortOrders = sortOrder.filterNot(_.resolved)
val aliasedOrdering = unresolvedSortOrders.map(o => Alias(o.child, "aggOrder")())
val aggregatedOrdering = aggregate.copy(aggregateExpressions = aliasedOrdering)
val resolvedAggregate: Aggregate = execute(aggregatedOrdering).asInstanceOf[Aggregate]
val resolvedAliasedOrdering: Seq[Alias] =
Expand Down Expand Up @@ -673,13 +674,18 @@ class Analyzer(
}
}

val sortOrdersMap = unresolvedSortOrders.map(
new TreeNodeRef(_)).zip(evaluatedOrderings).toMap
val finalSortOrders = sortOrder.map(
s => sortOrdersMap.getOrElse(new TreeNodeRef(s), s))

// Since we don't rely on sort.resolved as the stop condition for this rule,
// we need to check this and prevent applying this rule multiple times
if (sortOrder == evaluatedOrderings) {
if (sortOrder == finalSortOrders) {
sort
} else {
Project(aggregate.output,
Sort(evaluatedOrderings, global,
Sort(finalSortOrders, global,
aggregate.copy(aggregateExpressions = originalAggExprs ++ needsPushDown)))
}
} catch {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -217,5 +217,23 @@ class AnalysisSuite extends AnalysisTest {
nullResult,
udf4)
// checkUDF(udf4, expected4)
}

test("SPARK-11863 mixture of aliases and real columns in orderby clause - tpcds 19,55,71") {
val a = testRelation2.output(0)
val c = testRelation2.output(2)
val alias1 = a.as("a1")
val alias2 = c.as("a2")
val alias3 = count(a).as("a3")

val plan = testRelation2.
groupBy('a, 'c) ('a.as("a1"), 'c.as("a2"), count('a).as("a3")).
orderBy('a1.asc, 'c.asc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code style: we should put the . at the beginning of a line, not at the end. And also remove the space between groupBy('a, 'c) and ('a.as("a1"),...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Will do. Wenchen, there are a few test failures. I am still looking at it. So i think our idea to NOT consider the already resolved attribute for pushdown decision is causing the issue.

Here are the tests

  1. SELECT count() FROM orderByData GROUP BY a ORDER BY count()
    In this case we want the sort attribute representing the count(_) to be replace by the
    group by alias.
  2. SELECT a FROM orderByData GROUP BY a ORDER BY a, count(_), sum(b)
    In this case we want the count(*) to be pushed down to aggregate

In both these case, we are skipping pushdown processing because its a resolved attribute.
Given this wenchen, may i request you to look at the original fix. After learning more about
different conditions, it seems like that may be a safer fix. Let me know what you think.


val expected = testRelation2.
groupBy(a, c) (alias1, alias2, alias3).
orderBy(alias1.toAttribute.asc, alias2.toAttribute.asc).
select(alias1.toAttribute, alias2.toAttribute, alias3.toAttribute)
checkAnalysis(plan, expected)
}
}