-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec #9589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #45508 has finished for PR 9589 at commit
|
|
Test build #45710 has finished for PR 9589 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment about why we need special handling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, it would be better if we didn't have to handle this specially. It make the control flow even more confusing than it already is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of adding flags to these function, but that seems too complicate. @marmbrus do you have a better idea here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to have special handling for CLISessionState. It is complicating the dependencies and making the already complex control flow even harder to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the idea is to check if it contain user setting(i.e from .hiverc) or not. If we create new SessionState from scratch those info would be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not arguing against the change to avoid replacing the SessionState if it already exists. I'm asking why it has to check to see if it is a CliSessionState or not in order for this logic to work. Ideally we would not couple these components this closely. If we have to do so, then you need to explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have a CliSessionState, we are using Spark SQL CLI, in this case we never need second SessionState here. Creating another SessionState would fail some cases since CliSessionState is inherited from SessionState, which could lead to ClassCastException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case does a plain SessionState exist, where you need to create a new one?
|
Test build #45834 has finished for PR 9589 at commit
|
|
retest this please. |
|
Test build #45839 has finished for PR 9589 at commit
|
|
Hi @adrian-wang , |
|
This is critical bug. Strong hopefully it can be reviewed and merged in 1.6.0. Thanks ! |
|
Unfortunately RC1 has been cut already and changes to initialization are too likely to break things at this point. |
|
Thanks @marmbrus for your response. This is regression bug(not found in 1.5.X) so hopefully it can be fixed in 1.6.0. |
6203bc7 to
88eef1f
Compare
|
Test build #50813 has finished for PR 9589 at commit
|
|
@marmbrus We have instantiated and started a instance of |
88eef1f to
e8f1846
Compare
|
Test build #51358 has finished for PR 9589 at commit
|
| logDebug(s"Hive Config: $k=$v") | ||
| // originState will be created if not exists, will never be null | ||
| val originalState = SessionState.get() | ||
| if (originalState.isInstanceOf[CliSessionState]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you don't special case this? Why is this dependent on the type of the session state and not just on the fact that a session has already been started?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the SessionState.get() method would create a instance of SessionState if not exists.
|
test this please |
In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would lead to exception because `processCmd` need to get the `CliSessionState` instance by calling `SessionState.get()`, but the return value would be a instance of `SessionState`. See the exception below. spark-sql> !echo "test"; Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Author: Daoyuan Wang <[email protected]> Closes #9589 from adrian-wang/clicommand. (cherry picked from commit 5d80fac) Signed-off-by: Michael Armbrust <[email protected]> Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala
|
Merged to master and 1.6 |
|
Test build #51702 has finished for PR 9589 at commit
|
The `hive` subproject currently depends on `hive-cli` in order to perform a check to see whether a `SessionState` is an instance of `org.apache.hadoop.hive.cli.CliSessionState` (see #9589). The introduction of this `hive-cli` dependency has caused problems for users whose Hive metastore JAR classpaths don't include the `hive-cli` classes (such as in #11495). This patch removes this dependency on `hive-cli` and replaces the `isInstanceOf` check by reflection. I added a Maven Enforcer rule to ban `hive-cli` from the `hive` subproject in order to make sure that this dependency is not accidentally reintroduced. /cc rxin yhuai adrian-wang preecet Author: Josh Rosen <[email protected]> Closes #12551 from JoshRosen/remove-hive-cli-dep-from-hive-subproject.
In SparkSQLCLI, we have created a
CliSessionState, but then we callSparkSQLEnv.init(), which will start anotherSessionState. This would lead to exception becauseprocessCmdneed to get theCliSessionStateinstance by callingSessionState.get(), but the return value would be a instance ofSessionState. See the exception below.spark-sql> !echo "test";
Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)