Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jun 15, 2020

What changes were proposed in this pull request?

This PR brings #28751 back

  • It once reverted by 4a25200 because of inevitable maven test failure

    • See related updates in this followup a0187cd
  • And reverted again because of the flakiness of the added unit tests

    • In this PR, The flakiness reason found is caused by the hive metastore connection that the SparkSQLCLIService trying to create which turns out is unnecessary at all. This metastore client points to a dummy metastore server only.
    • Also, add some cleanups for SharedThriftServer trait in before and after to prevent its configurations being polluted or polluting others

Why are the changes needed?

fix flaky test

Does this PR introduce any user-facing change?

no

How was this patch tested?

passing sbt and maven tests

@dongjoon-hyun
Copy link
Member

Thank you for investigating this, @yaooqinn .
cc @cloud-fan

@yaooqinn yaooqinn changed the title [WIP][SPARK-31926][TESTS][FOLLOWUP] Cleanup the thread local variable of hive metastore [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore Jun 15, 2020
@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 15, 2020

Test build #124064 has finished for PR 28835 at commit f9ea941.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2020

Test build #124067 has finished for PR 28835 at commit 03fcf59.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2020

Test build #124068 has finished for PR 28835 at commit 7908d62.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2020

Test build #124069 has finished for PR 28835 at commit 7908d62.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Sorry, @yaooqinn and @cloud-fan . It seems that we had better revert the patch first because it's too flaky.

@yaooqinn
Copy link
Member Author

Thanks, @dongjoon-hyun, and sorry for the inconvenience. I will keep working on this.

yaooqinn added 2 commits June 16, 2020 11:40
…urrency issue for ThriftCLIService to getPortNumber""

This reverts commit 75afd88.
@yaooqinn yaooqinn changed the title [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore [WIP][SPARK-31926][TESTS][FOLLOWUP] Cleanup the thread local variable of hive metastore Jun 16, 2020
@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2020

Test build #124088 has finished for PR 28835 at commit 9c6ac83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn changed the title [WIP][SPARK-31926][TESTS][FOLLOWUP] Cleanup the thread local variable of hive metastore [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore Jun 16, 2020
@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2020

Test build #124087 has finished for PR 28835 at commit 9c6ac83.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2020

Test build #124102 has finished for PR 28835 at commit 9c6ac83.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2020

Test build #124113 has started for PR 28835 at commit 5e79ad9.

* the super class [[CLIService#start]] starts a useless dummy metastore client, skip it and call
* the ancestor [[CompositeService#start]] directly.
*/
override def start(): Unit = startCompositeService()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLIService will create a metastore connection during start, which is useless for our dummy execution hive conf and will cause class cast issue though different classloader

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we bypass it and start the registered services as CompositeService does

@SparkQA
Copy link

SparkQA commented Jun 16, 2020

Test build #124119 has started for PR 28835 at commit 9cdd7fa.

@SparkQA
Copy link

SparkQA commented Jun 16, 2020

Test build #124108 has finished for PR 28835 at commit 9c6ac83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@yaooqinn yaooqinn changed the title [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore [WIP][SPARK-31926][TESTS][FOLLOWUP]Cleanup the thread local variable of hive metastore Jun 17, 2020
@yaooqinn
Copy link
Member Author

retest this please

1 similar comment
@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 17, 2020

Test build #124157 has finished for PR 28835 at commit 5ce343a.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2020

Test build #124158 has finished for PR 28835 at commit 5ce343a.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2020

Test build #124150 has finished for PR 28835 at commit 5ce343a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 17, 2020

Test build #124162 has finished for PR 28835 at commit 5ce343a.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @juliuszsompolski

@yaooqinn
Copy link
Member Author

retest this please

@yaooqinn yaooqinn changed the title [WIP][SPARK-31926][TESTS][FOLLOWUP]Cleanup the thread local variable of hive metastore [WIP][SPARK-31926][TESTS][FOLLOWUP] Fix concurrency issue for ThriftCLIService to getPortNumber Jun 18, 2020
Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this!

throw new ServiceException("Failed to Start " + getName, e)
}

// Emulating `AbstractService.start`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractService.start does also startTime = System.currentTimeMillis();
Lets set startTime as well, just in case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Comment on lines 94 to 96
sqlContext.setConf(ConfVars.METASTORECONNECTURLKEY.varname,
s"jdbc:derby:;databaseName=$metastorePath;create=true")
sqlContext.setConf(ConfVars.METASTOREURIS.varname, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some test failures showed that the metastore client was trying to connect through METASTOREURIS which seems to be set by other tests. But now I guess this won't be necessary as we skip that part

@SparkQA
Copy link

SparkQA commented Jun 18, 2020

Test build #124202 has finished for PR 28835 at commit 5ce343a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2020

Test build #124215 has finished for PR 28835 at commit d5341ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Emulating `AbstractService.start`
val startTime = new java.lang.Long(System.currentTimeMillis())
setAncestorField(this, 3, "startTime", startTime)
invoke(classOf[AbstractService], this, "ensureCurrentState", classOf[STATE] -> STATE.INITED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can throw IllegalStateException, and in the original implementation it is inside the try catch block, which would turn it into ServiceException.
Let's do that as well just in case :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

@yaooqinn yaooqinn changed the title [WIP][SPARK-31926][TESTS][FOLLOWUP] Fix concurrency issue for ThriftCLIService to getPortNumber [SPARK-31926][SQL][TESTS][FOLLOWUP] Fix concurrency issue for ThriftCLIService to getPortNumber Jun 18, 2020
Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Jun 18, 2020

Test build #124218 has finished for PR 28835 at commit e409f9a.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 18, 2020

Test build #124228 has finished for PR 28835 at commit e409f9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn changed the title [SPARK-31926][SQL][TESTS][FOLLOWUP] Fix concurrency issue for ThriftCLIService to getPortNumber [SPARK-31926][SQL][TESTS][FOLLOWUP][test-hive1.2] Fix concurrency issue for ThriftCLIService to getPortNumber Jun 19, 2020
@yaooqinn
Copy link
Member Author

retest this please

@yaooqinn yaooqinn changed the title [SPARK-31926][SQL][TESTS][FOLLOWUP][test-hive1.2] Fix concurrency issue for ThriftCLIService to getPortNumber [SPARK-31926][SQL][TESTS][FOLLOWUP][test-hive1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber Jun 19, 2020
@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 19, 2020

Test build #124249 has finished for PR 28835 at commit e409f9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master! (don't backport as it's not a test only PR anymore)

@cloud-fan cloud-fan closed this in abc8ccc Jun 19, 2020
@SparkQA
Copy link

SparkQA commented Jun 19, 2020

Test build #124250 has finished for PR 28835 at commit e409f9a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants