-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31926][SQL][TEST-HIVE1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber #28797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ervice to getPortNumber
|
Thank you, @yaooqinn ! |
|
Test build #123816 has finished for PR 28797 at commit
|
| @@ -0,0 +1,65 @@ | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is added because when I run the mvn tests for the hive-thriftserver module, it was reported missing
| hiveServer2.stop() | ||
| } finally { | ||
| super.afterAll() | ||
| SessionState.detachSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix is added to SharedThriftServer not SharedThriftServer for a lower scope in this PR, but I think most of hive and SharedSparkSession related tests that need a dedicated JVM may cause by this.
|
cc @cloud-fan @dongjoon-hyun @gengliangwang @wangyum thanks. |
| "org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite", | ||
| "org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite", | ||
| "org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite", | ||
| "org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple test suites from thriftserver moduel here. Are the other suites OK to be executed in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified it locally with all SharedThriftServer-like tests in a single JVM, which needed to remove org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite here, and passed.
command:
build/sbt "hive-thriftserver/test-only *ThriftServer*" -Phive-thriftserverThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we can pass tests when running within a single JVM. We can send a new PR to optimize it by running tests in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I meant the other hive-thriftserver suites in testsWhichShouldRunInTheirOwnDedicatedJvm.
From line 479 to 482. I haven't looked into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the log again and found that the 3 of SharedThriftServer-like tests in #28797 (comment) were executed sequentially in the single JVM.
https://www.scala-sbt.org/1.x/docs/Testing.html#Forking+tests
By default, sbt runs all tasks in parallel and within the same JVM as sbt itself.
The tests in a single group are run sequentially.
Not quite sure which rule above worked
|
retest this please |
|
Test build #123834 has finished for PR 28797 at commit
|
|
retest this please |
|
Test build #123859 has started for PR 28797 at commit |
|
retest this please |
|
Test build #123855 has finished for PR 28797 at commit
|
|
Test build #123861 has finished for PR 28797 at commit
|
|
retest this please |
|
the lastest test failure was not related |
|
Test build #123877 has finished for PR 28797 at commit
|
|
retest this please |
|
Test build #123897 has started for PR 28797 at commit |
|
Build timed out (after 500 minutes). Marking the build as aborted. |
|
retest this please |
|
Test build #123937 has finished for PR 28797 at commit
|
|
thanks, merging to master/3.0! |
…r ThriftCLIService to getPortNumber ### What changes were proposed in this pull request? This PR brings 02f32cf back which reverted by 4a25200 because of maven test failure diffs newly made: 1. add a missing log4j file to test resources 2. Call `SessionState.detachSession()` to clean the thread local one in `afterAll`. 3. Not use dedicated JVMs for sbt test runner too ### Why are the changes needed? fix the maven test ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add new tests Closes #28797 from yaooqinn/SPARK-31926-NEW. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a0187cd) Signed-off-by: Wenchen Fan <[email protected]>
|
Hi, All.
|
|
For |
|
|
||
| override def afterAll(): Unit = { | ||
| try { | ||
| hiveServer2.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From some Jenkins jobs, NPE is reported here.
sbt.ForkMain$ForkError: java.lang.NullPointerException: null
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.afterAll(SharedThriftServer.scala:53)
| sqlContext.setConf(ConfVars.HIVE_SERVER2_TRANSPORT_MODE.varname, mode.toString) | ||
|
|
||
| try { | ||
| hiveServer2 = HiveThriftServer2.startWithContext(sqlContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before NPE occurs, this failed.
ThriftServerWithSparkContextInHttpSuite:
05:42:37.405 WARN hive.metastore: Failed to connect to the MetaStore Server...
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:480)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:247)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:129)
at org.apache.hive.service.cli.CLIService.start(CLIService.java:152)
at org.apache.hive.service.CompositeService.start(CompositeService.java:70)
at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:105)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:161)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:62)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:92)
|
For a record, this fails sometime (not always), but the failures seems to occur frequently on |
|
It seems to hit the derby metastore constraints that fail the connection to metastore |
|
The test failures are all by SBT? If so, should I send a followup to use dedicated JVMs for them first? @dongjoon-hyun |
|
The failures occurs on Maven, too. Please see #28797 (comment) . |
|
The followings is the failure on |
|
Sorry, @yaooqinn and @cloud-fan . As I mentioned on @yaooqinn's followup PR , it seems that we need to revert this again. |
What changes were proposed in this pull request?
This PR brings 02f32cf back which reverted by 4a25200 because of maven test failure
diffs newly made:
SessionState.detachSession()to clean the thread local one inafterAll.Why are the changes needed?
fix the maven test
Does this PR introduce any user-facing change?
no
How was this patch tested?
add new tests