Skip to content

Conversation

@w3iBStime
Copy link

@w3iBStime w3iBStime commented Apr 5, 2017

The random number generated by XORShiftRandom.nextDouble() is a value between zero and one, including zero but not including one. I.e., 0 <= x < 1 . I've denoted this by changing the closing square bracket to a closing parenthesis.

You can also think of trying to uniformly randomly assign items in a list to three classes 'A', 'B' and 'C'. For each item, if {randomDouble * 3.0} is between 0.000 and 0.999, it gets assigned to A. If between 1.000 and 1.999, it goes to B. If between 2.000 and 2.999 it goes to C. All three classes have the same probability of receiving the item. If it were possible for the raw random number to be exactly 1.000, then after scaling the range by multiplying times 3.0 class C would be slightly more likely to receive the item than A or B (assuming simple logic instead of more extensive/expensive logic to break ties).

Also, see the existing comment in SamplingUtils which uses the same function:

// below is a random long chosen uniformly from [0,l)

https://en.wikipedia.org/wiki/Interval_(mathematics)

What changes were proposed in this pull request?

A small notation correction in some documenting comments

How was this patch tested?

N/A documentation only.

The random number generated by XORShiftRandom.nextDouble() is a value between zero and one, including zero but not including one. I.e., 0 <= x < 1 . I've denoted this by changing the closing square bracket to a closing parenthesis.

You can also think of trying to uniformly randomly assign items in a list to three classes 'A', 'B' and 'C'. For each item, if {randomDouble * 3.0} is between 0.000 and 0.999, it gets assigned to A. If between 1.000 and 1.999, it goes to B. If between 2.000 and 2.999 it goes to C. All three classes have the same probability of receiving the item. If it were possible for the raw random number to be exactly 1.000, then after scaling the range by multiplying times 3.0 class C would be slightly more likely to receive the item than A or B (assuming simple logic instead of more extensive/expensive logic to break ties).

Also, see the existing comment in SamplingUtils which uses the same function: https://github.com/apache/spark/blob/79f5f281bb69cb2de9f64006180abd753e8ae427/core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala#L62

https://en.wikipedia.org/wiki/Interval_(mathematics)
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Apr 5, 2017

That's fine, can you look for other instances?

@w3iBStime
Copy link
Author

I don't have a local clone of the code--just used the GitHub web UI for a small change. Unfortunately, it doesn't look like GitHub can search for "1.0]" .

@srowen
Copy link
Member

srowen commented Apr 6, 2017

@w3iBStime the Python and R docs need to change too if this changes, as does RandomDataGenerator. There are several instances of this. If you can make the change with a proper git clone, go for it.

@srowen
Copy link
Member

srowen commented Apr 25, 2017

Ping @w3iBStime can you update this or close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants