Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

ReplSuite.newProductSeqEncoder with REPL defined class was flaky and throws OOM exception frequently. By analyzing the heap dump, we found the reason is that, in each test case of ReplSuite, we create a REPL instance, which creates a classloader and loads a lot of classes related to SparkContext. More details please see #17833 (comment).

In this PR, we create a new test suite, SingletonReplSuite, which shares one REPL instances among all the test cases. Then we move most of the tests from ReplSuite to SingletonReplSuite, to avoid creating a lot of REPL instances and reduce memory footprint.

How was this patch tested?

test only change

@cloud-fan
Copy link
Contributor Author

cc @zsxwing @sameeragarwal

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #76423 has finished for PR 17844 at commit 1565a34.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal
Copy link
Member

test this please

@SparkQA
Copy link

SparkQA commented May 4, 2017

Test build #76461 has finished for PR 17844 at commit 1565a34.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2017

Test build #76471 has finished for PR 17844 at commit 9248a5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Just some nits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's better to use some more special string. E.g.,

val timestamp = System.currentTimeMillis()
in.write((input + s"\nval _result_$timestamp = 1\n").getBytes)
in.flush()
val stopMessage = s"_result_$timestamp: Int = 1"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment mentioning that the codes in input should not throw an exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use eventually

    eventually(timeout(50.seconds), interval(500.millis)) {
      assert(cond(), "current output: " + out.toString)
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you don't move this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test only works in local mode...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's safe to just use local-cluster[2,1,1024].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please call Thread.setDaemon(true) in case that something is wrong and it prevents the process from exiting.

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76569 has finished for PR 17844 at commit fd5b43b.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76577 has finished for PR 17844 at commit fd5b43b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76582 has finished for PR 17844 at commit fd5b43b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76600 has finished for PR 17844 at commit fd5b43b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76618 has started for PR 17844 at commit 7dde745.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76649 has finished for PR 17844 at commit 7dde745.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76656 has finished for PR 17844 at commit 7dde745.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76675 has finished for PR 17844 at commit 7dde745.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 9, 2017
`ReplSuite.newProductSeqEncoder with REPL defined class` was flaky and throws OOM exception frequently. By analyzing the heap dump, we found the reason is that, in each test case of `ReplSuite`, we create a REPL instance, which creates a classloader and loads a lot of classes related to `SparkContext`. More details please see #17833 (comment).

In this PR, we create a new test suite, `SingletonReplSuite`, which shares one REPL instances among all the test cases. Then we move most of the tests from `ReplSuite` to `SingletonReplSuite`, to avoid creating a lot of REPL instances and reduce memory footprint.

test only change

Author: Wenchen Fan <[email protected]>

Closes #17844 from cloud-fan/flaky-test.

(cherry picked from commit f561a76)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to maser/2.2!

@asfgit asfgit closed this in f561a76 May 9, 2017
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?

`ReplSuite.newProductSeqEncoder with REPL defined class` was flaky and throws OOM exception frequently. By analyzing the heap dump, we found the reason is that, in each test case of `ReplSuite`, we create a REPL instance, which creates a classloader and loads a lot of classes related to `SparkContext`. More details please see apache#17833 (comment).

In this PR, we create a new test suite, `SingletonReplSuite`, which shares one REPL instances among all the test cases. Then we move most of the tests from `ReplSuite` to `SingletonReplSuite`, to avoid creating a lot of REPL instances and reduce memory footprint.

## How was this patch tested?

test only change

Author: Wenchen Fan <[email protected]>

Closes apache#17844 from cloud-fan/flaky-test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants