Skip to content

Conversation

@saucam
Copy link

@saucam saucam commented Apr 23, 2015

This PR attempts to add support for better size estimation in case of partitioned tables so that only the referred partition's size are taken into consideration when testing against autoBroadCastJoinThreshold and deciding whether to create a broadcast join or shuffle hash join.

We can use the values that get stored in the hive metastore during alter table / insert into partition commands to estimate the size of each of the referred partitions.

In most cases, since both alter table query and 'insert into table partition <part=val> select * from .....' store the partition size in the metastore automatically, we expect to get the correct value of partition size. We could use Analyze table query as well in case there is some mismatch.

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30860 has finished for PR 5668 at commit b0beb34.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@saucam
Copy link
Author

saucam commented Apr 23, 2015

retest please

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30868 has finished for PR 5668 at commit d25bc2a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 24, 2015

Test build #30919 has finished for PR 5668 at commit b4651fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 27, 2015

Test build #30981 has started for PR 5668 at commit ce89b15.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31139 has finished for PR 5668 at commit d5b4b52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32496 has finished for PR 5668 at commit 136b594.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32592 has finished for PR 5668 at commit 449db10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Yash Datta added 7 commits July 1, 2015 15:53
…ns in query during size estimation for checking against autoBroadcastJoinThreshold
…ectly updated when the query is being run. This is because totalsize of partitions gets updated both when alter table is called as well as when insert into overwrite partition is called.
@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36258 has finished for PR 5668 at commit 75fc3e7.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36276 has finished for PR 5668 at commit dd630e7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Looks like an old patch. @yhuai would you mind taking a look?

@JoshRosen
Copy link
Contributor

Hi @saucam, I'm going through the backlog of old pull requests and noticed that this PR and #6767 seem to both be trying to solve the same issue. Would you mind taking a look at that other PR to help figure out which approach we should move forward with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my concern on this, probably we don't want to duplicate the optimizer rules here. A better idea is to reflect the real statistic info in MetastoreRelation as @navis did in #6767, so the default optimizer will handle the rest for us.

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44983 has finished for PR 5668 at commit dd630e7.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 93b52ab Dec 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants