Skip to content

Conversation

@heary-cao
Copy link
Contributor

@heary-cao heary-cao commented Aug 9, 2017

What changes were proposed in this pull request?

Currently, Did a lot of special handling for non-deterministic projects and filters in optimizer. but not good enough. this patch add a new special case for non-deterministic projects. Deal with that we only need to read user needs fields for non-deterministic projects in optimizer.
For example, the fields of project contains nondeterministic function(rand function), after a executedPlan optimizer generated:

*HashAggregate(keys=[k#403L], functions=[partial_sum(cast(id#402 as bigint))], output=[k#403L, sum#800L])
+- Project [d004#607 AS id#402, FLOOR((rand(8828525941469309371) * 10000.0)) AS k#403L]
   +- HiveTableScan [c030#606L, d004#607, d005#608, d025#609, c002#610, d023#611, d024#612, c005#613L, c008#614, c009#615, c010#616, d021#617, d022#618, c017#619, c018#620, c019#621, c020#622, c021#623, c022#624, c023#625, c024#626, c025#627, c026#628, c027#629, ... 169 more fields], MetastoreRelation XXX_database, XXX_table

HiveTableScan will read all the fields from table. but we only need to ‘d004’ . it will affect the performance of task.

How was this patch tested?

Should be covered existing test cases and add test cases.

@heary-cao heary-cao changed the title Improvement a special case for non-deterministic projects and filters in optimizer [SPARK-21520][SQL]Improvement a special case for non-deterministic projects and filters in optimizer Aug 9, 2017
@heary-cao heary-cao force-pushed the non-deterministic branch 2 times, most recently from 71aba4d to a1c7559 Compare August 9, 2017 09:32
@hvanhovell
Copy link
Contributor

ok to test

@hvanhovell
Copy link
Contributor

@heary-cao can you improve the PR description, and explain what special cases you are improving.

@SparkQA
Copy link

SparkQA commented Aug 9, 2017

Test build #80450 has finished for PR 18892 at commit a1c7559.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao heary-cao force-pushed the non-deterministic branch 2 times, most recently from 987a30d to e168fb8 Compare August 10, 2017 08:17
@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80482 has finished for PR 18892 at commit 987a30d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80483 has finished for PR 18892 at commit e168fb8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80486 has finished for PR 18892 at commit b01fec8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao heary-cao changed the title [SPARK-21520][SQL]Improvement a special case for non-deterministic projects and filters in optimizer [SPARK-21520][SQL]Improvement a special case for non-deterministic projects in optimizer Aug 11, 2017
@gatorsmile
Copy link
Member

It sounds like we should not do it in optimizer. The same comment is also applicable to #18918.

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80531 has finished for PR 18892 at commit 1ee1a76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2017

Test build #80764 has finished for PR 18892 at commit 72e0252.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2017

Test build #80773 has finished for PR 18892 at commit 15596ee.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 18, 2017

Test build #80815 has finished for PR 18892 at commit 9cf8243.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 18, 2017

Test build #80821 has finished for PR 18892 at commit d163b57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

this PR going to fix in PhysicalOperation is submitted at #18969.

@heary-cao heary-cao closed this Aug 22, 2017
@heary-cao heary-cao deleted the non-deterministic branch August 22, 2017 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants