Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jul 26, 2021

What changes were proposed in this pull request?

Similar to PullOutGroupingExpressions. This pr add a new rule(PullOutJoinCondition) to pull out join condition. Otherwise the expression in join condition may be evaluated three times(ShuffleExchangeExec, SortExec and the join itself). For example:

CREATE TABLE t1 using parquet AS select id as a, id as b from range(100000000L);
CREATE TABLE t2 using parquet AS select id as a, id as b from range(200000000L);
SELECT t1.* FROM t1 JOIN t2 ON translate(t1.a, '123', 'abc') = translate(t2.a, '123', 'abc');

Before this pr:

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Project [a#6L, b#7L]
   +- SortMergeJoin [translate(cast(a#6L as string), 123, abc)], [translate(cast(a#8L as string), 123, abc)], Inner
      :- Sort [translate(cast(a#6L as string), 123, abc) ASC NULLS FIRST], false, 0
      :  +- Exchange hashpartitioning(translate(cast(a#6L as string), 123, abc), 5), ENSURE_REQUIREMENTS, [id=#89]
      :     +- Filter isnotnull(a#6L)
      :        +- FileScan parquet default.t1[a#6L,b#7L]
      +- Sort [translate(cast(a#8L as string), 123, abc) ASC NULLS FIRST], false, 0
         +- Exchange hashpartitioning(translate(cast(a#8L as string), 123, abc), 5), ENSURE_REQUIREMENTS, [id=#90]
            +- Filter isnotnull(a#8L)
               +- FileScan parquet default.t2[a#8L]

After this pr:

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Project [a#6L, b#7L]
   +- SortMergeJoin [translate(CAST(spark_catalog.default.t1.a AS STRING), '123', 'abc')#12], [translate(CAST(spark_catalog.default.t2.a AS STRING), '123', 'abc')#13], Inner
      :- Sort [translate(CAST(spark_catalog.default.t1.a AS STRING), '123', 'abc')#12 ASC NULLS FIRST], false, 0
      :  +- Exchange hashpartitioning(translate(CAST(spark_catalog.default.t1.a AS STRING), '123', 'abc')#12, 5), ENSURE_REQUIREMENTS, [id=#53]
      :     +- Project [a#6L, b#7L, translate(cast(a#6L as string), 123, abc) AS translate(CAST(spark_catalog.default.t1.a AS STRING), '123', 'abc')#12]
      :        +- Filter isnotnull(translate(cast(a#6L as string), 123, abc))
      :           +- FileScan parquet default.t1[a#6L,b#7L]
      +- Sort [translate(CAST(spark_catalog.default.t2.a AS STRING), '123', 'abc')#13 ASC NULLS FIRST], false, 0
         +- Exchange hashpartitioning(translate(CAST(spark_catalog.default.t2.a AS STRING), '123', 'abc')#13, 5), ENSURE_REQUIREMENTS, [id=#54]
            +- Project [translate(cast(a#8L as string), 123, abc) AS translate(CAST(spark_catalog.default.t2.a AS STRING), '123', 'abc')#13]
               +- Filter isnotnull(translate(cast(a#8L as string), 123, abc))
                  +- FileScan parquet default.t2[a#8L]

Why are the changes needed?

Improve query performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test and benchmark test:

import org.apache.spark.benchmark.Benchmark
val numRows = 1024 * 1024 * 15
spark.sql(s"CREATE TABLE t1 using parquet AS select id as a, id as b from range(${numRows}L)")
spark.sql(s"CREATE TABLE t2 using parquet AS select id as a, id as b from range(${numRows}L)")
val benchmark = new Benchmark("Benchmark pull out join condition", numRows, minNumIters = 5)

Seq(false, true).foreach { pullOutEnabled =>
  val name = s"Pull out join condition ${if (pullOutEnabled) "(Enabled)" else "(Disabled)"}"
  benchmark.addCase(name) { _ =>
    withSQLConf("spark.sql.pullOutJoinCondition" -> s"$pullOutEnabled") {
      spark.sql("SELECT t1.* FROM t1 JOIN t2 ON translate(t1.a, '123', 'abc') = translate(t2.a, '123', 'abc')").write.format("noop").mode("Overwrite").save()
    }
  }
}
benchmark.run()

Benchmark result:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.7
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark pull out join condition:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Pull out join condition (Disabled)                30197          34046         690          0.5        1919.9       1.0X
Pull out join condition (Enabled)                 19631          20484         535          0.8        1248.1       1.5X

@github-actions github-actions bot added the SQL label Jul 26, 2021
# Conflicts:
#	sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q72.sf100/explain.txt
#	sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q72/explain.txt
#	sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q72.sf100/explain.txt
#	sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q72/explain.txt
@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46162/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46162/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Test build #141646 has finished for PR 33522 at commit bbe2193.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46289/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46289/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141777 has finished for PR 33522 at commit c16d606.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46430/

@SparkQA
Copy link

SparkQA commented Jul 31, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46430/

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Test build #141921 has finished for PR 33522 at commit fc9fa2d.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [SPARK-36290][SQL] Push down join condition evaluation [SPARK-36290][SQL] Pull out join condition Aug 1, 2021
@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46432/

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46432/

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Test build #141923 has finished for PR 33522 at commit 431e873.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46438/

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46438/

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46440/

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46440/

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Test build #141928 has finished for PR 33522 at commit 990a1b4.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2021

Test build #141930 has finished for PR 33522 at commit f098c03.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum marked this pull request as draft August 2, 2021 02:33
@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49370/

@wangyum
Copy link
Member Author

wangyum commented Nov 4, 2021

I'm surprised that there is no plan change for TPCDS queries...

  1. Pull out does not support BroadcastHashJoin.
  2. Replace Operators and RewriteSubquery will introduce some complex join conditions. The plan will change if we move PullOutJoinCondition after RewriteSubquery.

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49370/

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Test build #144900 has finished for PR 33522 at commit 62a5ac0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49372/

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49374/

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49374/

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Test build #144902 has finished for PR 33522 at commit b988b9e.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49372/

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Test build #144904 has finished for PR 33522 at commit 7f6edf8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait ExpressionBuilder
  • trait PadExpressionBuilderBase extends ExpressionBuilder
  • case class StringLPad(str: Expression, len: Expression, pad: Expression)
  • case class BinaryLPad(str: Expression, len: Expression, pad: Expression, child: Expression)
  • case class BinaryRPad(str: Expression, len: Expression, pad: Expression, child: Expression)
  • case class DropIndex(
  • public class ColumnIOUtil
  • case class ParquetColumn(
  • case class DropIndexExec(
  • case class PushedDownOperators(
  • case class TableSampleInfo(
  • class RatePerMicroBatchProvider extends SimpleTableProvider with DataSourceRegister
  • class RatePerMicroBatchTable(
  • class RatePerMicroBatchStream(
  • case class RatePerMicroBatchStreamOffset(offset: Long, timestamp: Long) extends Offset
  • case class RatePerMicroBatchStreamInputPartition(
  • class RatePerMicroBatchStreamPartitionReader(

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49389/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49389/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Test build #144918 has finished for PR 33522 at commit 4aebe0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private val y = testRelation1.subquery('y)

test("Pull out join keys evaluation(String expressions)") {
val joinType = Inner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just inline it? the join type doesn't matter anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for other places in this test suite

val joinType = Inner
Seq(Upper("y.d".attr), Substring("y.d".attr, 1, 5)).foreach { udf =>
val originalQuery = x.join(y, joinType, Option("x.a".attr === udf))
.select("x.a".attr, "y.e".attr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to use qualified column names. There is no name conflict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for other places in this test suite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it. But UDF need it otherwise:

== FAIL: Plans do not match ===
 'Project [a#0, e#0]                                    'Project [a#0, e#0]
!+- 'Join Inner, (a#0 = upper(y.d)#0)                   +- 'Join Inner, (upper(d)#0 = a#0)
    :- Project [a#0, b#0, c#0]                             :- Project [a#0, b#0, c#0]
    :  +- LocalRelation <empty>, [a#0, b#0, c#0]           :  +- LocalRelation <empty>, [a#0, b#0, c#0]
!   +- Project [d#0, e#0, upper(d#0) AS upper(y.d)#0]      +- Project [d#0, e#0, upper(d#0) AS upper(d)#0]
       +- LocalRelation <empty>, [d#0, e#0]                   +- LocalRelation <empty>, [d#0, e#0]
  


test("Pull out join keys evaluation(null expressions)") {
val joinType = Inner
val udf = Coalesce(Seq("x.b".attr, "x.c".attr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we merge this test case with the one above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seq(Upper("d".attr), Substring("d".attr, 1, 5), Coalesce(Seq("b".attr, "c".attr)))...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this test to SQLQuerySuite to test infer more IsNotNull:

test("SPARK-36290: Pull out join condition can infer more filter conditions") {
import org.apache.spark.sql.catalyst.dsl.expressions.DslString
withTable("t1", "t2") {
spark.sql("CREATE TABLE t1(a int, b int) using parquet")
spark.sql("CREATE TABLE t2(a string, b string, c string) using parquet")
spark.sql("SELECT t1.* FROM t1 RIGHT JOIN t2 ON coalesce(t1.a, t1.b) = t2.a")
.queryExecution.optimizedPlan.find(_.isInstanceOf[Filter]) match {
case Some(Filter(condition, _)) =>
condition === IsNotNull(Coalesce(Seq("a".attr, "b".attr)))
case _ =>
fail("It should contains Filter")
}
spark.sql("SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.a = t2.a")
.queryExecution.optimizedPlan.find(_.isInstanceOf[Filter]) match {
case Some(Filter(condition, _)) =>
condition === IsNotNull(Cast("a".attr, IntegerType))
case _ =>
fail("It should contains Filter")
}
}

}

test("Pull out EqualNullSafe join condition") {
withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it.

case p: BatchEvalPythonExec => p
}
assert(pythonEvals.size == 2)
assert(pythonEvals.size == 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a big problem. Can you investigate why we run python UDF 2 more times?

Copy link
Member Author

@wangyum wangyum Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because we will infer two isnotnull(cast(pythonUDF0 as int)):

== Optimized Logical Plan ==
Project [a#225, b#226, c#236, d#237]
+- Join Inner, (CAST(udf(cast(a as string)) AS INT)#250 = CAST(udf(cast(c as string)) AS INT)#251)
   :- Project [_1#220 AS a#225, _2#221 AS b#226, cast(pythonUDF0#253 as int) AS CAST(udf(cast(a as string)) AS INT)#250]
   :  +- BatchEvalPython [udf(cast(_1#220 as string))], [pythonUDF0#253]
   :     +- Project [_1#220, _2#221]
   :        +- Filter isnotnull(cast(pythonUDF0#252 as int))
   :           +- BatchEvalPython [udf(cast(_1#220 as string))], [pythonUDF0#252]
   :              +- LocalRelation [_1#220, _2#221]
   +- Project [_1#231 AS c#236, _2#232 AS d#237, cast(pythonUDF0#255 as int) AS CAST(udf(cast(c as string)) AS INT)#251]
      +- BatchEvalPython [udf(cast(_1#231 as string))], [pythonUDF0#255]
         +- Project [_1#231, _2#232]
            +- Filter isnotnull(cast(pythonUDF0#254 as int))
               +- BatchEvalPython [udf(cast(_1#231 as string))], [pythonUDF0#254]
                  +- LocalRelation [_1#231, _2#232]

Before this pr:

== Optimized Logical Plan ==
Project [a#225, b#226, c#236, d#237]
+- Join Inner, (cast(pythonUDF0#250 as int) = cast(pythonUDF0#251 as int))
   :- BatchEvalPython [udf(cast(a#225 as string))], [pythonUDF0#250]
   :  +- Project [_1#220 AS a#225, _2#221 AS b#226]
   :     +- LocalRelation [_1#220, _2#221]
   +- BatchEvalPython [udf(cast(c#236 as string))], [pythonUDF0#251]
      +- Project [_1#231 AS c#236, _2#232 AS d#237]
         +- LocalRelation [_1#231, _2#232]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so it's because filter push down can lead to extra expression evaluation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If we disable spark.sql.constraintPropagation.enabled, the plan is:

== Optimized Logical Plan ==
Project [a#225, b#226, c#236, d#237]
+- Join Inner, (CAST(udf(cast(a as string)) AS INT)#250 = CAST(udf(cast(c as string)) AS INT)#251)
   :- Project [_1#220 AS a#225, _2#221 AS b#226, cast(pythonUDF0#252 as int) AS CAST(udf(cast(a as string)) AS INT)#250]
   :  +- BatchEvalPython [udf(cast(_1#220 as string))], [pythonUDF0#252]
   :     +- LocalRelation [_1#220, _2#221]
   +- Project [_1#231 AS c#236, _2#232 AS d#237, cast(pythonUDF0#253 as int) AS CAST(udf(cast(c as string)) AS INT)#251]
      +- BatchEvalPython [udf(cast(_1#231 as string))], [pythonUDF0#253]
         +- LocalRelation [_1#231, _2#232]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum @cloud-fan Seems we can just put the rule after InferFiltersFromConstraints, and actually after the earlyScanPushDownRules. The python UDF test passed in my pr, see #36874.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pull out has 3 advantages:

  1. Reduce complex join key runs from 3 to 2 for SMJ.
  2. Infer additional filters, sometimes can avoid data skew. For example: [SPARK-31809][SQL] Infer IsNotNull from some special equality join keys #28642
  3. Avoid other rules also handle this case. For example: https://github.com/apache/spark/blob/dee7396204e2f6e7346e220867953fc74cd4253d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushPartialAggregationThroughJoin.scala#L325-L327

It has two disadvantage:

  1. Increase complex join key runs from 1 to 2 for BHJ.
  2. It may increase the data size of shuffle. For example: the join key is: concat(col1, col2, col3, col4 ...).

Personally, I think this rule is valuable. We have been using this rule for half a year.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase complex join key runs from 1 to 2 for BHJ.

We can check if the poll out side can be broadcast so it should not be a blocker ?

It may increase the data size of shuffle. For example: the join key is: concat(col1, col2, col3, col4 ...).

This is really a trade-off, one conservative option may be: We only poll out the complex keys which the inside attribute is not the final output. So we can avoid the extra shuffle data as far as possible, for example:

SELECT a FROM t1 JOIN t2 on t1.a = t2.x + 1;

And a config should be introduced for enable or disable easily.

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49408/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49408/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Test build #144936 has finished for PR 33522 at commit c6fca9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

p.isInstanceOf[WholeStageCodegenExec] &&
p.asInstanceOf[WholeStageCodegenExec].child.isInstanceOf[BroadcastHashJoinExec]).isDefined)
val broadcastHashJoin = df.queryExecution.executedPlan.find {
case WholeStageCodegenExec(ProjectExec(_, _: BroadcastHashJoinExec)) => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why we now need an extra project after join here? Can we remove it? The join seems not have complex key here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cast.

== Optimized Logical Plan ==
Project [id#6L, k#2, v#3]
+- Join Inner, (id#6L = CAST(k AS BIGINT)#14L), rightHint=(strategy=broadcast)
   :- Range (0, 10, step=1, splits=Some(2))
   +- Project [k#2, v#3, cast(k#2 as bigint) AS CAST(k AS BIGINT)#14L]
      +- Filter isnotnull(cast(k#2 as bigint))
         +- LogicalRDD [k#2, v#3], false

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants