-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-2087] [SQL] Multiple thriftserver sessions with different HiveContext instances #4382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #26816 has started for PR 4382 at commit
|
|
@liancheng Do you have any idea how to collect the thrift sever logs in the unit test? It says timeout exception, and I believe either port error or the server process exited due to some error (like the create the hive metastore client failure). |
|
@chenghao-intel |
|
@guowei2 Probably no. HiveContext has it's own internal metastore (temp metastore), |
|
Test build #26816 has finished for PR 4382 at commit
|
|
Test PASSed. |
|
Seems |
|
FWIW I'd like to add my two cents. The main piece of functionality the installation at my company would benefit from is independent user sessions. I'm not familiar enough with the source to say exactly what that means in terms of a source patch, but one of the key use cases is the ability to set the session default database ("use ") and SQLConf settings independent of other beeline connections. Right now, setting the database sets it across all connections and that is a major impediment to wider use of a shared thriftserver. Cheers! |
|
@mallman this PR exactly aims to fix the bug you mentioned, and it passed the tested in my local machine. However, I am still figuring out some of the unit testing failures, hopefully I can update the title by removing the "WIP" soon. |
|
@liancheng Seems HiveThriftServer2Suite didn't run, is it disabled by default? |
|
@chenghao-intel This is just fixed by #4486. |
|
Should I retest this? |
|
test this please. |
|
HiveThriftServer2Suite timeout is also fixed, please refer to #4484. |
|
Yeah, you should retest. |
|
Test build #27143 has started for PR 4382 at commit
|
|
Test build #27143 has finished for PR 4382 at commit
|
|
Test PASSed. |
|
@liancheng Seems the |
|
retest this please |
|
Test build #27158 has started for PR 4382 at commit
|
|
Test build #27158 has finished for PR 4382 at commit
|
|
Test PASSed. |
|
@chenghao-intel The biggest issue I see in this PR is that users can no longer share cached tables, which breaks many existing use scenarios. I'm afraid it may require major efforts to support both multi-session and cache sharing. I can think of two alternatives:
|
|
Yeah, I can understand people want to share the cached table among multi-sessions, is there any potential requirement that people just want to keep the 'temp' table visibility within the HiveContext (will not share with the other session)? |
|
I agree with liancheng, user want to share cached tables in many cases. |
403d6ec to
73922ae
Compare
|
Test build #27326 has started for PR 4382 at commit
|
|
I agree with @liancheng, too, both code and description are updated. |
|
Test build #27326 has finished for PR 4382 at commit
|
|
Test FAILed. |
|
Seems failure due to the irrelevant code. |
|
retest this please |
|
Test build #27328 has started for PR 4382 at commit
|
|
Test build #27328 has finished for PR 4382 at commit
|
|
Test FAILed. |
73922ae to
9d3b296
Compare
|
Test build #27501 has started for PR 4382 at commit
|
|
Test build #27501 has finished for PR 4382 at commit
|
|
Test FAILed. |
9d3b296 to
197e806
Compare
|
Test build #27830 has started for PR 4382 at commit
|
|
Test build #27830 has finished for PR 4382 at commit
|
|
Test PASSed. |
|
/cc @liancheng can you review this for me? |
|
/cc @liancheng @marmbrus can you give some high level comments? and then I can start the rebasing (again) |
|
Hey @chenghao-intel, thanks for working on this, AFAIK this is a pain point for many Spark SQL users who would like to put HiveThriftServer2 into production. Also had a discussion with @marmbrus about this recently. As we've discussed offline, instead of changing
The benefits of this approach are:
|
|
@liancheng thank you very much for the so detailed comment! Actually I am quite fighting with the 2 approaches: The main reasons I choice the later approach is people can create arbitrary number of Whatever approach we take, it's intuitive that the What do you think? |
|
@chenghao-intel I'm little confusing about why people have to create multiple |
|
@chenghao-intel I'm posting the summary of our offline discussion here for future reference:
|
|
Test build #28377 has started for PR 4382 at commit
|
|
Test build #28377 has finished for PR 4382 at commit
|
|
Test PASSed. |
|
Closing it since #4885 has been merged. |
Within thriftserver mode, only a single
HiveContextinstance for the process, and it leads tosqlconfobject shared across multiple user sessions.In order to isolate the
sqlconffor each user session, we create new HiveContext instance for each of the user session. However, we also want keep the existed logic like multiple users will share the same catalog and cache, hence I pull out thecatalog,cachemanageras well asfunctionRegistryas global unique instance.