Skip to content

Conversation

@navis
Copy link
Contributor

@navis navis commented Jun 11, 2015

Currently, spark-sql uses stats in metastore for estimating size of hive table, which means analyze command should be executed before accessing the table for better planning especially for joins. But still with the stats, it cannot reflect real input size of the query when partition prunning predicate exists in it.

Even worse is that hive cannot update metastore stats for external tables, which is fixed recently in HIVE-6727. The issue detail says the bug is applied to all hive version between 0.13.0 and 1.2.0

@navis navis force-pushed the SPARK-8312 branch 2 times, most recently from c2c7d87 to 7a534df Compare August 26, 2015 01:00
@andrewor14
Copy link
Contributor

ok to test @yhuai

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #41905 has finished for PR 6767 at commit 7a534df.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2015

Test build #42636 has finished for PR 6767 at commit 6dbedd1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2015

Test build #42652 has finished for PR 6767 at commit e905c6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class HiveTableStats extends Rule[LogicalPlan]

@JoshRosen
Copy link
Contributor

Hi @navis, I'm going through the backlog of old pull requests and noticed that this PR and #5668 seem to both be trying to solve the same issue. Would you mind taking a look at that other PR to help figure out which approach we should move forward with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plan foreach ?

@chenghao-intel
Copy link
Contributor

@navis I like the idea to move the partition prunning stuff into the MetastoreRelation, the only concern is we don't want to make the MetastoreRelation as mutable, probably a better idea is to put the partition predicates in the constructor argument list, so the operator will be more like immutable and informative. What do you think?

@SparkQA
Copy link

SparkQA commented Nov 19, 2015

Test build #46308 has finished for PR 6767 at commit 4362f94.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public final class UnsafeSorterSpillReader extends UnsafeSorterIterator implements Closeable\n * class CountVectorizerModelWriter(instance: CountVectorizerModel) extends Writer\n * final class IDF(override val uid: String) extends Estimator[IDFModel] with IDFBase with Writable\n * class MinMaxScalerModelWriter(instance: MinMaxScalerModel) extends Writer\n * class StandardScalerModelWriter(instance: StandardScalerModel) extends Writer\n * class StringIndexModelWriter(instance: StringIndexerModel) extends Writer\n * trait ScalaReflection\n * case class Schema(dataType: DataType, nullable: Boolean)\n * s\"Unable to generate an encoder for inner class$\n *case class EncodeUsingSerializer(child: Expression, kryo: Boolean) extends UnaryExpression \n *case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: Boolean)\n * class HiveTableStats extends Rule[LogicalPlan] `\n

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46396 has finished for PR 6767 at commit 84c45c0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class HiveTableStats extends Rule[LogicalPlan]\n

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46501 has finished for PR 6767 at commit 1b7a50b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class HiveTableStats extends Rule[LogicalPlan]\n

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46503 has finished for PR 6767 at commit ec3e274.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class HiveTableStats extends Rule[LogicalPlan]\n

@navis
Copy link
Contributor Author

navis commented Nov 23, 2015

@chenghao-intel Sorry for long delay. Could see this again when your time allowed?

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants