Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Jun 17, 2015

KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in one thread and check it in another thread. It may fail the tests.

This PR changes them to thread-safe collections.

Note: I cannot reproduce the test failures in my environment. But at least, this PR should make the tests more reliable.

@zsxwing
Copy link
Member Author

zsxwing commented Jun 17, 2015

cc @tdas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be run concurrently though?

@srowen
Copy link
Member

srowen commented Jun 17, 2015

In all cases foreachRDD executes serially on the driver; what's the thread safety issue?

@zsxwing
Copy link
Member Author

zsxwing commented Jun 17, 2015

In all cases foreachRDD executes serially on the driver; what's the thread safety issue?

The codes in foreachRDD run in JobScheduler.jobExecutor. But the checking codes run in the main thread. And there is no memory barrier to guarantee the memory visibility.

@srowen
Copy link
Member

srowen commented Jun 17, 2015

Yes I understand that argument, though have you observed a failure as a result?
There are certainly some memory barriers in between the writes and reads without this.
If you go this route, there are 1000 other places this needs to change in Spark. I am not sure it helps to do it piecemeal, and not sure it fixes something?

@zsxwing
Copy link
Member Author

zsxwing commented Jun 17, 2015

There are certainly some memory barriers in between the writes and reads without this.

For these tests, there is no memory barrier because the checking codes are called after ssc.start() immediately.

And one potential issue is that, e.g., writing and reading a java.util.HashMap at the same time may cause an infinite loop: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6423457 http://mailinator.blogspot.jp/2009/06/beautiful-race-condition.html

If a HashMap is used in a concurrent setting with insufficient synchronization,
it is possible for the data structure to get corrupted in such a way that
infinite loops appear in the data structure and thus get() could loop forever.

Of cause, I'm not sure if scala.collection.mutable.HashMap has a similar issue.

Sorry. There are no enough entries in these tests to trigger this issue. The issue happens in HashMap.resize.

@SparkQA
Copy link

SparkQA commented Jun 17, 2015

Test build #35028 has finished for PR 6852 at commit d464211.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Jun 17, 2015

retest this please

@srowen
Copy link
Member

srowen commented Jun 17, 2015

Are you worried about writes being visible or a correctness issue? Yes I see the potential correctness issue -- it's not foreachRDDs happening at the same time but the polling read afterwards.

I think you'd find there are definitely memory barriers in, for example, merely submitting a task. So I don't think there's a possibility that the writes never turn up in the reading thread. However that point is moot since I think the change is needed for the reason above anyway.

@zsxwing
Copy link
Member Author

zsxwing commented Jun 17, 2015

Writes being visible is what I'm concerned.

@srowen
Copy link
Member

srowen commented Jun 17, 2015

OK, that's fine, but the problems you mentioned above aren't a symptom of that.

@SparkQA
Copy link

SparkQA commented Jun 17, 2015

Test build #35032 timed out for PR 6852 at commit d464211 after a configured wait of 175m.

@srowen
Copy link
Member

srowen commented Jun 17, 2015

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jun 17, 2015

Test build #35050 has finished for PR 6852 at commit d464211.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Jun 17, 2015

@srowen We have been seeing some flakiness in this test in our daily master builds in Jenkins. So these fixes LGTM.

@tdas
Copy link
Contributor

tdas commented Jun 17, 2015

Merging this in Spark 1.4 and master

asfgit pushed a commit that referenced this pull request Jun 17, 2015
…the tests more reliable

KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in one thread and check it in another thread. It may fail the tests.

This PR changes them to thread-safe collections.

Note: I cannot reproduce the test failures in my environment. But at least, this PR should make the tests more reliable.

Author: zsxwing <[email protected]>

Closes #6852 from zsxwing/fix-KafkaStreamSuite and squashes the following commits:

d464211 [zsxwing] Use thread-safe collections to make the tests more reliable

(cherry picked from commit a06d9c8)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in a06d9c8 Jun 17, 2015
@zsxwing zsxwing deleted the fix-KafkaStreamSuite branch June 18, 2015 02:27
@srowen
Copy link
Member

srowen commented Jun 18, 2015

@tdas +1 yep #6852 (comment) is the good catch here; was just questioning the other motivation

@zsxwing
Copy link
Member Author

zsxwing commented Jun 18, 2015

@srowen Sorry. The issue I mentioned in #6852 (comment) won't happen. I put strikethrough there.

nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…the tests more reliable

KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in one thread and check it in another thread. It may fail the tests.

This PR changes them to thread-safe collections.

Note: I cannot reproduce the test failures in my environment. But at least, this PR should make the tests more reliable.

Author: zsxwing <[email protected]>

Closes apache#6852 from zsxwing/fix-KafkaStreamSuite and squashes the following commits:

d464211 [zsxwing] Use thread-safe collections to make the tests more reliable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants