Commit b715aa0
[SPARK-2937] Separate out samplyByKeyExact as its own API in PairRDDFunction
To enable Python consistency and `Experimental` label of the `sampleByKeyExact` API.
Author: Doris Xin <[email protected]>
Author: Xiangrui Meng <[email protected]>
Closes apache#1866 from dorx/stratified and squashes the following commits:
0ad97b2 [Doris Xin] reviewer comments.
2948aae [Doris Xin] remove unrelated changes
e990325 [Doris Xin] Merge branch 'master' into stratified
555a3f9 [Doris Xin] separate out sampleByKeyExact as its own API
616e55c [Doris Xin] merge master
245439e [Doris Xin] moved minSamplingRate to getUpperBound
eaf5771 [Doris Xin] bug fixes.
17a381b [Doris Xin] fixed a merge issue and a failed unit
ea7d27f [Doris Xin] merge master
b223529 [Xiangrui Meng] use approx bounds for poisson fix poisson mean for waitlisting add unit tests for Java
b3013a4 [Xiangrui Meng] move math3 back to test scope
eecee5f [Doris Xin] Merge branch 'master' into stratified
f4c21f3 [Doris Xin] Reviewer comments
a10e68d [Doris Xin] style fix
a2bf756 [Doris Xin] Merge branch 'master' into stratified
680b677 [Doris Xin] use mapPartitionWithIndex instead
9884a9f [Doris Xin] style fix
bbfb8c9 [Doris Xin] Merge branch 'master' into stratified
ee9d260 [Doris Xin] addressed reviewer comments
6b5b10b [Doris Xin] Merge branch 'master' into stratified
254e03c [Doris Xin] minor fixes and Java API.
4ad516b [Doris Xin] remove unused imports from PairRDDFunctions
bd9dc6e [Doris Xin] unit bug and style violation fixed
1fe1cff [Doris Xin] Changed fractionByKey to a map to enable arg check
944a10c [Doris Xin] [SPARK-2145] Add lower bound on sampling rate
0214a76 [Doris Xin] cleanUp
90d94c0 [Doris Xin] merge master
9e74ab5 [Doris Xin] Separated out most of the logic in sampleByKey
7327611 [Doris Xin] merge master
50581fc [Doris Xin] added a TODO for logging in python
46f6c8c [Doris Xin] fixed the NPE caused by closures being cleaned before being passed into the aggregate function
7e1a481 [Doris Xin] changed the permission on SamplingUtil
1d413ce [Doris Xin] fixed checkstyle issues
9ee94ee [Doris Xin] [SPARK-2082] stratified sampling in PairRDDFunctions that guarantees exact sample size
e3fd6a6 [Doris Xin] Merge branch 'master' into takeSample
7cab53a [Doris Xin] fixed import bug in rdd.py
ffea61a [Doris Xin] SPARK-1939: Refactor takeSample method in RDD
1441977 [Doris Xin] SPARK-1939 Refactor takeSample method in RDD to use ScaSRS1 parent 28dcbb5 commit b715aa0
File tree
4 files changed
+216
-128
lines changed- core/src
- main/scala/org/apache/spark
- api/java
- rdd
- test
- java/org/apache/spark
- scala/org/apache/spark/rdd
4 files changed
+216
-128
lines changedLines changed: 31 additions & 37 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
| 136 | + | |
| 137 | + | |
142 | 138 | | |
143 | 139 | | |
144 | 140 | | |
145 | 141 | | |
146 | | - | |
147 | 142 | | |
148 | | - | |
| 143 | + | |
149 | 144 | | |
150 | 145 | | |
151 | 146 | | |
152 | 147 | | |
153 | 148 | | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
| 149 | + | |
| 150 | + | |
160 | 151 | | |
161 | 152 | | |
162 | | - | |
| 153 | + | |
163 | 154 | | |
164 | 155 | | |
165 | | - | |
166 | | - | |
167 | | - | |
| 156 | + | |
| 157 | + | |
168 | 158 | | |
169 | 159 | | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
174 | 163 | | |
175 | | - | |
176 | | - | |
177 | | - | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
178 | 169 | | |
179 | | - | |
| 170 | + | |
| 171 | + | |
180 | 172 | | |
181 | 173 | | |
182 | | - | |
| 174 | + | |
183 | 175 | | |
184 | 176 | | |
185 | | - | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
186 | 180 | | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
193 | 186 | | |
194 | | - | |
| 187 | + | |
195 | 188 | | |
196 | | - | |
197 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
198 | 192 | | |
199 | 193 | | |
200 | 194 | | |
| |||
Lines changed: 37 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
209 | 203 | | |
210 | 204 | | |
211 | 205 | | |
212 | 206 | | |
213 | | - | |
214 | 207 | | |
215 | 208 | | |
216 | 209 | | |
217 | 210 | | |
218 | | - | |
219 | | - | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
220 | 243 | | |
221 | 244 | | |
222 | 245 | | |
223 | 246 | | |
224 | | - | |
| 247 | + | |
225 | 248 | | |
226 | | - | |
| 249 | + | |
227 | 250 | | |
228 | 251 | | |
229 | 252 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1239 | 1239 | | |
1240 | 1240 | | |
1241 | 1241 | | |
1242 | | - | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
1243 | 1259 | | |
1244 | 1260 | | |
1245 | 1261 | | |
1246 | 1262 | | |
1247 | | - | |
| 1263 | + | |
1248 | 1264 | | |
1249 | 1265 | | |
1250 | 1266 | | |
| |||
0 commit comments