Skip to content

Conversation

@piaozhexiu
Copy link

@marmbrus @liancheng per request, I am reopening PR that contains #7216 and #7386.

Can you help me to understand unit test failures?

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37372 has finished for PR 7421 at commit 69eb136.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@piaozhexiu
Copy link
Author

Hmm, jenkins failed for a unknown reason-

[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/build/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Phive-thriftserver -Phive sql/test mllib/test hive-thriftserver/test hive/test catalyst/test examples/test ; received return code 143
Archiving unit tests logs...
> Send successful.
Attempting to post to Github...
 > Post successful.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Finished: FAILURE

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37421 has finished for PR 7421 at commit 5599cc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class StandaloneRecoveryModeFactory(conf: SparkConf, serializer: Serializer)
    • class LDAModel(JavaModelWrapper):
    • class LDA(object):
    • trait ImplicitCastInputTypes extends ExpectsInputTypes
    • abstract class BinaryOperator extends BinaryExpression with ExpectsInputTypes
    • case class UnaryMinus(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class UnaryPositive(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Abs(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Pmod(left: Expression, right: Expression) extends BinaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • final class SpecificRow extends $
    • case class Factorial(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Hex(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Unhex(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Round(child: Expression, scale: Expression)
    • case class Md5(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Sha1(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Crc32(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Not(child: Expression)
    • case class And(left: Expression, right: Expression) extends BinaryOperator with Predicate
    • case class Or(left: Expression, right: Expression) extends BinaryOperator with Predicate
    • trait StringRegexExpression extends ImplicitCastInputTypes
    • trait String2StringExpression extends ImplicitCastInputTypes
    • trait StringComparison extends ImplicitCastInputTypes
    • case class StringSpace(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class StringLength(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Base64(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class UnBase64(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Exchange(newPartitioning: Partitioning, child: SparkPlan) extends UnaryNode

@marmbrus
Copy link
Contributor

I'm going to trigger a bunch of test runs. Lets see what happens...

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #1087 has finished for PR 7421 at commit 5599cc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #1083 has finished for PR 7421 at commit 5599cc4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class StandaloneRecoveryModeFactory(conf: SparkConf, serializer: Serializer)
    • case class Pmod(left: Expression, right: Expression) extends BinaryArithmetic
    • final class SpecificRow extends $

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #1085 has finished for PR 7421 at commit 5599cc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class StandaloneRecoveryModeFactory(conf: SparkConf, serializer: Serializer)

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #1086 has finished for PR 7421 at commit 5599cc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class StandaloneRecoveryModeFactory(conf: SparkConf, serializer: Serializer)
    • case class Pmod(left: Expression, right: Expression) extends BinaryArithmetic
    • final class SpecificRow extends $

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #1084 has finished for PR 7421 at commit 5599cc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@piaozhexiu
Copy link
Author

4/5 passed... Do you think this is because of multithreading in unit tests? Otherwise, I have no explanation.

@chutium
Copy link
Contributor

chutium commented Jul 17, 2015

getPartitionsByFilter is really a great improvement, normally in a production hive data warehouse, there are tables with huge amount of partitions. looking forward to see this will be included in next release :)

@piaozhexiu
Copy link
Author

I am repeatedly running the sql/hive unit tests after synchronizing getPartitionsByFilterMethod.invoke. I'll report back how it goes.

@marmbrus
Copy link
Contributor

I am pretty confused about why this this failing, but only sometimes. I don't think it could be locking because getPartitionsByFilter is guarded by the locks in withHiveState, right?

Have you been able to reproduce the failure locally?

@liancheng
Copy link
Contributor

Investigated the following 3 build failure samples:

Firstly, this issue couldn't be steadily reproduced, and only showed up on Jenkins occasionally. An obvious guess is that it's probably a concurrency bug and only occurs in highly concurrent jobs. (Notice that TestHive is configured with 32 local executor threads, and the Jenkins server has 32 cores, while our laptops usually have only 8 or less).

Secondly, all 3 build failures behaved extremely consistently: 18 ParquetDataSourceOffMetastoreSuite test cases involving partitioned Hive metastore Parquet tables failed altogether. It seems that some internal Hive state got corrupted before this test suite was executed. However, this PR only updates the read path and doesn't introduce any extra state. So my guess is that, this PR doesn't introduce but just somehow triggers an existing issue. The root cause probably lies in some initialization phase, e.g. HiveContext initialization, or testing partitioned table creation in ParquetDataSourceOffMetastoreSuite.beforeAll().

And I got another interesting finding after single step debugging a failed test case. The following stacktrace snippet appears in all 3 build failures:

Caused by: MetaException(message:Filtering is supported only on partition keys of type string)
      .----
      | at org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185)
      | at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:452)
      | at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357)
      | at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:279)
      | at org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:590)
      | at org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:2417)
      | at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2029)
      | at org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:146)
      | at org.apache.hadoop.hive.metastore.ObjectStore$4.getJdoResult(ObjectStore.java:2332)
      | at org.apache.hadoop.hive.metastore.ObjectStore$4.getJdoResult(ObjectStore.java:2317)
      `----
        at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2214)

The marked code path showed above is actually NEVER executed in normal cases. To be more specific, the getJdoResult() method in the anonymous GetListHelper object is never called in GetHelper<T>.run() in normal cases. Instead, only the getSqlResult() method is called. And we can see that this behavior is controlled by doUseDirectSql, which is partially decided by ObjectStore.directSql.isCompatibleDatastore. Since ObjectStore is initialized while initializing HiveContext, ObjectStore.directSql.isCompatibleDatastore is probably the corrupted Hive internal state.

Haven't got any clue how this state gets corrupted yet. My guess is that there is a race condition during HiveContext initialization. For example, maybe the underlying Derby database is not fully created while ObjectStore is been initialized.

@liancheng
Copy link
Contributor

It seems that Hive prefers to access the underlying metastore database via direct SQL, and uses JDO ORM as a fallback. The existing bug doesn't show up because usually either direct SQL or JDO ORM is capable to do the work. But in case of getPartitionByFilter, the ORM one doesn't support predicates involving integral types by default, and thus leads to build failure.

Before fixing the root cause, I guess we can workaround this issue by setting hive.metastore.integral.jdo.pushdown to true to let the JDO ORM code path be able to handle integral partition columns.

@liancheng
Copy link
Contributor

Experimenting the workaround mentioned above in PR #7492.

liancheng added a commit to liancheng/spark that referenced this pull request Jul 18, 2015
asfgit pushed a commit that referenced this pull request Jul 20, 2015
…or partition pruning

This PR forks PR #7421 authored by piaozhexiu and adds [a workaround] [1] for fixing the occasional test failures occurred in PR #7421. Please refer to these [two] [2] [comments] [3] for details.

[1]: liancheng@536ac41
[2]: #7421 (comment)
[3]: #7421 (comment)

Author: Cheolsoo Park <[email protected]>
Author: Cheng Lian <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes #7492 from liancheng/pr-7421-workaround and squashes the following commits:

5599cc4 [Cheolsoo Park] Predicate pushdown to hive metastore
536ac41 [Cheng Lian] Sets hive.metastore.integral.jdo.pushdown to true to workaround test failures caused by in #7421
@piaozhexiu
Copy link
Author

Closing as it is merged as part of #7492.

@litao-buptsse
Copy link
Contributor

@liancheng @piaozhexiu Have you cherry-pick this PR to spark branch-1.5?

@piaozhexiu
Copy link
Author

@litao-buptsse yes, this patch is committed in branch-1.5. You need to set spark.sql.hive.metastorePartitionPruning to true to enable it, which is false by default.

@litao-buptsse
Copy link
Contributor

@piaozhexiu OK, I got it, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants