-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-2087] [SQL] Multiple thriftserver sessions with single HiveContext instance #4885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2087] [SQL] Multiple thriftserver sessions with single HiveContext instance #4885
Conversation
|
Test build #28253 has started for PR 4885 at commit
|
|
cc @liancheng @tianyi @guowei2 |
|
Test build #28253 has finished for PR 4885 at commit
|
|
Test FAILed. |
|
Test build #28256 has started for PR 4885 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no need to overwrite SQLSession and createSession here, for SessionState self is ThreadLocal. we just need to set SessionState when openSession in SparkSQLSessionManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guowei2 I think either way is OK for now. Putting all session-specific stuff into a central place (SQLSession) seems cleaner to me. Making SQLSession a thread-local does look a little ugly, however, right now it's not used anywhere other than the Thrift server. When we do decide to move Hive into a separate data source and make our own data source neutral Spark SQL server, we can handle the session problem in a cleaner way (e.g., using an actor for each session and keep all session-specific stuff in the actor instance).
|
Test build #28256 has finished for PR 4885 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
�Indentations are off in this test case.
|
Hey @chenghao-intel, terribly sorry for the delay. In general this LGTM. Left some comments, mostly on styling issues. Thanks! |
0ca4bbd to
815b27a
Compare
|
Test build #28636 has started for PR 4885 at commit
|
|
Thank you @liancheng @guowei2 for the review, I've updated the code as suggested. Still, I am thinking how to handle the temporal |
|
Test build #28636 has finished for PR 4885 at commit
|
|
Test FAILed. |
|
Test build #28638 has started for PR 4885 at commit
|
|
Test build #28638 has finished for PR 4885 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add comment to indicate that the expected value should be "<undefined>". I was quite confused at first as 200 should be the default value of "spark.sql.shuffle.partitions" :)
|
Hey @chenghao-intel, left another 3 minor comments. But I'm gonna merge this. Please fix them in another PR. Also verified locally that both session isolation and cache sharing work as expected. Thanks for the efforts!! |
|
Thank you very much @liancheng, I will create another PR for the requirements that we discussed above, and also the minor issues. |
Still, we keep only a single HiveContext within ThriftServer, and we also create a object called
SQLSessionfor isolating the different user states.Developers can obtain/release a new user session via
openSessionandcloseSession, andSQLContextandHiveContextwill also provide a default session if noopenSessioncalled, for backward-compatibility.