[SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thrift Server #19989

zuotingbing · 2017-12-15T10:24:27Z

What changes were proposed in this pull request?

Start HiveThriftServer2.
Connect to thriftserver through beeline.
Close the beeline.
repeat step2 and step 3 for several times, which caused the leak of Memory.

we found there are many directories never be dropped under the path hive.exec.local.scratchdir and hive.exec.scratchdir, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated.

In addition, we use jmap -histo:live [PID]
to printout the size of objects in HiveThriftServer2 Process, we can find the object org.apache.spark.sql.hive.client.HiveClientImpl and org.apache.hadoop.hive.ql.session.SessionState keep increasing even though we closed all the beeline connections, which caused the leak of Memory.

How was this patch tested?

manual tests

AmplabJenkins · 2017-12-15T10:28:19Z

Can one of the admins verify this patch?

mgaido91 · 2017-12-15T11:38:23Z

I am not sure about this change actually. In this way all the users would use the same metadataHive. This might have also concurrency issue. Did you experienced a OOM error due to the memory leak?

gatorsmile · 2017-12-17T06:31:02Z

cc @liufengdb

liufengdb · 2017-12-17T07:21:26Z

I think this method can take care of resource clean up automatically: https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java#L151

Can you really make a heap dump and find out why the sessions are not cleaned up?

zuotingbing · 2017-12-18T07:44:37Z

as i debug, every time when i connect to thrift server through beeline, the SessionState.start(state) will be called two times. one is in HiveSessionImpl:open , another is in HiveClientImpl for sql use default .
SessionManager.java#L151 or HiveSessionImpl:close only clean the first.

BTW, Session Timeout Checker does not work in SessionManager. and i create another PR to follow it #20025 . Thanks @liufengdb

zuotingbing · 2017-12-18T07:49:37Z

we can find the cache size of FileSystem deleteOnExit will keep increasing even though i have closed all the connections of beeline before.

zuotingbing · 2017-12-21T01:53:19Z

Could anybody please to check this PR or find out how to correct it? It seems a critical bug. Thanks very much! @cloud-fan @rxin @gatorsmile @vanzin @jerryshao @zsxwing

cloud-fan · 2017-12-21T02:26:52Z

This is not a backport as this patch is not merged to master yet. Let's move the discussion to the primary PR that against the master branch.

zuotingbing · 2017-12-21T09:34:08Z

OK , please move to #20029. Thanks all.

# What changes were proposed in this pull request? 1. Start HiveThriftServer2. 2. Connect to thriftserver through beeline. 3. Close the beeline. 4. repeat step2 and step 3 for many times. we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated. In addition, we use `jmap -histo:live [PID]` to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory. # How was this patch tested? manual tests This PR follw-up the #19989 Author: zuotingbing <[email protected]> Closes #20029 from zuotingbing/SPARK-22793. (cherry picked from commit be9a804) Signed-off-by: gatorsmile <[email protected]>

# What changes were proposed in this pull request? 1. Start HiveThriftServer2. 2. Connect to thriftserver through beeline. 3. Close the beeline. 4. repeat step2 and step 3 for many times. we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated. In addition, we use `jmap -histo:live [PID]` to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory. # How was this patch tested? manual tests This PR follw-up the apache#19989 Author: zuotingbing <[email protected]> Closes apache#20029 from zuotingbing/SPARK-22793.

zuotingbing · 2018-01-08T03:00:05Z

@gatorsmile @liufengdb Could you please also check this PR ? it [BACKPORT-2.0] from master/2.3 about #20029

gatorsmile · 2018-01-08T03:15:50Z

@zuotingbing No new 2.0 release is planned. Thus, we do not backport it to 2.0.

zuotingbing · 2018-01-08T03:20:20Z

ok, got it. Thanks!

# What changes were proposed in this pull request? 1. Start HiveThriftServer2. 2. Connect to thriftserver through beeline. 3. Close the beeline. 4. repeat step2 and step 3 for many times. we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated. In addition, we use `jmap -histo:live [PID]` to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory. # How was this patch tested? manual tests This PR follw-up the apache#19989 Author: zuotingbing <[email protected]> Closes apache#20029 from zuotingbing/SPARK-22793. (cherry picked from commit be9a804)

[SPARK-22793][SQL]Memory leak in Spark Thrift Server

7f79a42

zuotingbing mentioned this pull request Dec 20, 2017

[SPARK-22793][SQL]Memory leak in Spark Thrift Server #20029

Closed

zuotingbing changed the title ~~[SPARK-22793][SQL]Memory leak in Spark Thrift Server~~ [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thrift Server Dec 20, 2017

zuotingbing changed the title ~~[SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thrift Server~~ [SPARK-22793][SQL]Memory leak in Spark Thrift Server Dec 21, 2017

zuotingbing changed the title ~~[SPARK-22793][SQL]Memory leak in Spark Thrift Server~~ [SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thrift Server Jan 8, 2018

zuotingbing closed this Jan 8, 2018

[SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thrift Server #19989

[SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thrift Server #19989

Uh oh!

Conversation

zuotingbing commented Dec 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Dec 15, 2017

Uh oh!

mgaido91 commented Dec 15, 2017

Uh oh!

gatorsmile commented Dec 17, 2017

Uh oh!

liufengdb commented Dec 17, 2017

Uh oh!

zuotingbing commented Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuotingbing commented Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuotingbing commented Dec 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Dec 21, 2017

Uh oh!

zuotingbing commented Dec 21, 2017

Uh oh!

zuotingbing commented Jan 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Jan 8, 2018

Uh oh!

zuotingbing commented Jan 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zuotingbing commented Dec 15, 2017 •

edited

Loading

zuotingbing commented Dec 18, 2017 •

edited

Loading

zuotingbing commented Dec 18, 2017 •

edited

Loading

zuotingbing commented Dec 21, 2017 •

edited

Loading

zuotingbing commented Jan 8, 2018 •

edited

Loading