Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jun 11, 2020

What changes were proposed in this pull request?

This PR brings 02f32cf back which reverted by 4a25200 because of maven test failure

diffs newly made:

  1. add a missing log4j file to test resources
  2. Call SessionState.detachSession() to clean the thread local one in afterAll.
  3. Not use dedicated JVMs for sbt test runner too

Why are the changes needed?

fix the maven test

Does this PR introduce any user-facing change?

no

How was this patch tested?

add new tests

@dongjoon-hyun
Copy link
Member

Thank you, @yaooqinn !

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123816 has finished for PR 28797 at commit dbbd1e0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -0,0 +1,65 @@
#
Copy link
Member Author

@yaooqinn yaooqinn Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is added because when I run the mvn tests for the hive-thriftserver module, it was reported missing

hiveServer2.stop()
} finally {
super.afterAll()
SessionState.detachSession()
Copy link
Member Author

@yaooqinn yaooqinn Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is added to SharedThriftServer not SharedThriftServer for a lower scope in this PR, but I think most of hive and SharedSparkSession related tests that need a dedicated JVM may cause by this.

@yaooqinn
Copy link
Member Author

"org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite",
"org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite",
"org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite",
"org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple test suites from thriftserver moduel here. Are the other suites OK to be executed in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified it locally with all SharedThriftServer-like tests in a single JVM, which needed to remove org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite here, and passed.

command:

build/sbt "hive-thriftserver/test-only *ThriftServer*" -Phive-thriftserver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we can pass tests when running within a single JVM. We can send a new PR to optimize it by running tests in parallel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I meant the other hive-thriftserver suites in testsWhichShouldRunInTheirOwnDedicatedJvm.
From line 479 to 482. I haven't looked into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the log again and found that the 3 of SharedThriftServer-like tests in #28797 (comment) were executed sequentially in the single JVM.

https://www.scala-sbt.org/1.x/docs/Testing.html#Forking+tests

By default, sbt runs all tasks in parallel and within the same JVM as sbt itself.

The tests in a single group are run sequentially.

Not quite sure which rule above worked

@yaooqinn yaooqinn changed the title [SPARK-31926][SQL][TEST-HIVE1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber Jun 11, 2020
@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123834 has finished for PR 28797 at commit c00ac4e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@yaooqinn yaooqinn changed the title [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber [SPARK-31926][SQL][TEST-HIVE1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber Jun 11, 2020
@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123859 has started for PR 28797 at commit c00ac4e.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123855 has finished for PR 28797 at commit c00ac4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123861 has finished for PR 28797 at commit c00ac4e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@yaooqinn
Copy link
Member Author

the lastest test failure was not related

- get observable metrics by callback *** FAILED ***
  2 did not equal 1 (DataFrameCallbackSuite.scala:269)

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123877 has finished for PR 28797 at commit c00ac4e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123897 has started for PR 28797 at commit c00ac4e.

@yaooqinn
Copy link
Member Author

Build timed out (after 500 minutes). Marking the build as aborted.
@cloud-fan any thoughts on increasing the maven build time limit?

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123937 has finished for PR 28797 at commit c00ac4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

cloud-fan pushed a commit that referenced this pull request Jun 15, 2020
…r ThriftCLIService to getPortNumber

### What changes were proposed in this pull request?

This PR brings 02f32cf back which reverted by 4a25200 because of maven test failure

diffs newly made:
1. add a missing log4j file to test resources
2. Call `SessionState.detachSession()` to clean the thread local one in `afterAll`.
3. Not use dedicated JVMs for sbt test runner too

### Why are the changes needed?

fix the maven test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

add new tests

Closes #28797 from yaooqinn/SPARK-31926-NEW.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a0187cd)
Signed-off-by: Wenchen Fan <[email protected]>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 15, 2020

Hi, All.
The situation is better than before, but this seems to break one test case in the branch-3.0 Jenkins jobs again. Could you take a look at this?

...ThriftServerWithSparkContextInHttpSuite.SPARK-29911: Uncache cached tables when session closed
requirement failed: Failed to bind an actual port for HiveThriftServer2
java.lang.IllegalArgumentException: requirement failed: Failed to bind an actual port for HiveThriftServer2

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 15, 2020


override def afterAll(): Unit = {
try {
hiveServer2.stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some Jenkins jobs, NPE is reported here.

sbt.ForkMain$ForkError: java.lang.NullPointerException: null
	at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.afterAll(SharedThriftServer.scala:53)

sqlContext.setConf(ConfVars.HIVE_SERVER2_TRANSPORT_MODE.varname, mode.toString)

try {
hiveServer2 = HiveThriftServer2.startWithContext(sqlContext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before NPE occurs, this failed.

ThriftServerWithSparkContextInHttpSuite:
05:42:37.405 WARN hive.metastore: Failed to connect to the MetaStore Server...
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
	at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:480)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:247)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:129)
	at org.apache.hive.service.cli.CLIService.start(CLIService.java:152)
	at org.apache.hive.service.CompositeService.start(CompositeService.java:70)
	at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:105)
	at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:161)
	at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:62)
	at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:92)

@dongjoon-hyun
Copy link
Member

For a record, this fails sometime (not always), but the failures seems to occur frequently on branch-3.0.

@yaooqinn
Copy link
Member Author

It seems to hit the derby metastore constraints that fail the connection to metastore

@yaooqinn
Copy link
Member Author

The test failures are all by SBT? If so, should I send a followup to use dedicated JVMs for them first? @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 15, 2020

The failures occurs on Maven, too. Please see #28797 (comment) .

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 15, 2020

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 16, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants