Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Oct 30, 2014

The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1.

In [14]: import random

In [15]: r1 = random.Random(10)

In [16]: r1.randint(0, 1)
Out[16]: 1

In [17]: r1.random()
Out[17]: 0.4288890546751146

In [18]: r1.random()
Out[18]: 0.5780913011344704

In [19]: r2 = random.Random(10)

In [20]: r2.randint(0, 1)
Out[20]: 1

In [21]: r2.randint(0, 1)
Out[21]: 0

In [22]: r2.random()
Out[22]: 0.5780913011344704

Note: The new tests are not for this bug fix.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22513 has started for PR 3010 at commit c1bacd9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22513 has finished for PR 3010 at commit c1bacd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22513/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should be moved into tests.py, then the docstring will be much clearer.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22637 has started for PR 3010 at commit 869ae4b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22637 has finished for PR 3010 at commit 869ae4b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22637/
Test FAILed.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22648 has started for PR 3010 at commit 869ae4b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22648 has finished for PR 3010 at commit 869ae4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22648/
Test PASSed.

@mengxr
Copy link
Contributor Author

mengxr commented Nov 3, 2014

@JoshRosen Does it look good to you? We should fix it for 1.2 and backport it to 1.0.x and 1.1.x.

@JoshRosen
Copy link
Contributor

This seems fine to me.

@davies
Copy link
Contributor

davies commented Nov 3, 2014

LGTM。

asfgit pushed a commit that referenced this pull request Nov 3, 2014
…d.sample

The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1.

~~~
In [14]: import random

In [15]: r1 = random.Random(10)

In [16]: r1.randint(0, 1)
Out[16]: 1

In [17]: r1.random()
Out[17]: 0.4288890546751146

In [18]: r1.random()
Out[18]: 0.5780913011344704

In [19]: r2 = random.Random(10)

In [20]: r2.randint(0, 1)
Out[20]: 1

In [21]: r2.randint(0, 1)
Out[21]: 0

In [22]: r2.random()
Out[22]: 0.5780913011344704
~~~

Note: The new tests are not for this bug fix.

Author: Xiangrui Meng <[email protected]>

Closes #3010 from mengxr/SPARK-4148 and squashes the following commits:

869ae4b [Xiangrui Meng] move tests tests.py
c1bacd9 [Xiangrui Meng] fix seed distribution and add some tests for rdd.sample

(cherry picked from commit 3cca196)
Signed-off-by: Xiangrui Meng <[email protected]>
@asfgit asfgit closed this in 3cca196 Nov 3, 2014
@mengxr
Copy link
Contributor Author

mengxr commented Nov 3, 2014

Thanks! I've merged this into both master and branch-1.2. I need to resolve conflicts for branch-1.1 and branch-1.0. I will send separate PRs later.

asfgit pushed a commit that referenced this pull request Nov 5, 2014
…tests for rdd.sample

Port #3010 to branch-1.1.

Author: Xiangrui Meng <[email protected]>

Closes #3104 from mengxr/SPARK-4148-1.1 and squashes the following commits:

684c002 [Xiangrui Meng] apply SPARK-4148 to branch-1.1
asfgit pushed a commit that referenced this pull request Dec 19, 2014
…tests for rdd.sample

Port #3010 to branch-1.0.

Author: Xiangrui Meng <[email protected]>

Closes #3106 from mengxr/SPARK-4148-1.0 and squashes the following commits:

c834cee [Xiangrui Meng] apply SPARK-4148 to branch-1.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants