Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ class Dataset[T] private[sql](
}

// Wraps analyzed logical plans with an analysis barrier so we won't traverse/resolve it again.
@transient private val planWithBarrier = AnalysisBarrier(logicalPlan)
@transient private[sql] val planWithBarrier = AnalysisBarrier(logicalPlan)

/**
* Currently [[ExpressionEncoder]] is the only implementation of [[Encoder]], here we turn the
Expand Down Expand Up @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
@scala.annotation.varargs
def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
UnresolvedHint(name, parameters, logicalPlan)
UnresolvedHint(name, parameters, planWithBarrier)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ResolveBroadcastHints rule will traverse recursively the children of logical plan. If we wrap it with a barrier, we can't be traverse down the tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding however is that planWithBarrier is already analyzed (and ResolveBroadcastHints as the very first rule had its chance to do its work). That's the extra processing hint does every time it's called. Using planWithBarrier makes it less "painful".

Just use hint twice and see the analyzed plan.

Copy link
Contributor

@cloud-fan cloud-fan Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @viirya has a valid concern. think about

val df1 = spark.table("t").select("id")
df1.hint("broadcast", "t")

We should transform down the plan of df1, find the bottom table relation and apply the hint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that that's what ResolveBroadcastHints does --> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L93-L101, doesn't it? I'm going to write a test case for it to confirm (and that's what I was asking for in the email to dev@spark the other day).

Copy link
Member

@viirya viirya Jan 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaceklaskowski Because the logical plan is wrapped in analysis barrier, ResolveBroadcastHints can't traverse down it to reach the UnresolvedRelation/SubqueryAlias. at https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L60-L61.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a possible workaround is to explicitly go through the barrier in the hint resolution rules, so that we can still use barrier here and skip analysis in other analyzer rules.

Copy link
Member

@viirya viirya Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we go with this workaround?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. cc @jaceklaskowski

}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,27 +35,23 @@ class DataFrameHintSuite extends AnalysisTest with SharedSQLContext {
test("various hint parameters") {
check(
df.hint("hint1"),
UnresolvedHint("hint1", Seq(),
df.logicalPlan
)
UnresolvedHint("hint1", Seq(), df.planWithBarrier)
)

check(
df.hint("hint1", 1, "a"),
UnresolvedHint("hint1", Seq(1, "a"), df.logicalPlan)
UnresolvedHint("hint1", Seq(1, "a"), df.planWithBarrier)
)

check(
df.hint("hint1", 1, $"a"),
UnresolvedHint("hint1", Seq(1, $"a"),
df.logicalPlan
)
UnresolvedHint("hint1", Seq(1, $"a"), df.planWithBarrier)
)

check(
df.hint("hint1", Seq(1, 2, 3), Seq($"a", $"b", $"c")),
UnresolvedHint("hint1", Seq(Seq(1, 2, 3), Seq($"a", $"b", $"c")),
df.logicalPlan
df.planWithBarrier
)
)
}
Expand Down