Commit 84454d7
[SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None
## What changes were proposed in this pull request?
Currently `df.na.replace("*", Map[String, String]("NULL" -> null))` will produce exception.
This PR enables passing null/None as value in the replacement map in DataFrame.replace().
Note that the replacement map keys and values should still be the same type, while the values can have a mix of null/None and that type.
This PR enables following operations for example:
`df.na.replace("*", Map[String, String]("NULL" -> null))`(scala)
`df.na.replace("*", Map[Any, Any](60 -> null, 70 -> 80))`(scala)
`df.na.replace('Alice', None)`(python)
`df.na.replace([10, 20])`(python, replacing with None is by default)
One use case could be: I want to replace all the empty strings with null/None because they were incorrectly generated and then drop all null/None data
`df.na.replace("*", Map("" -> null)).na.drop()`(scala)
`df.replace(u'', None).dropna()`(python)
## How was this patch tested?
Scala unit test.
Python doctest and unit test.
Author: bravo-zhang <[email protected]>
Closes #18820 from bravo-zhang/spark-14932.1 parent c06f3f5 commit 84454d7
File tree
4 files changed
+113
-37
lines changed- python/pyspark/sql
- sql/core/src
- main/scala/org/apache/spark/sql
- test/scala/org/apache/spark/sql
4 files changed
+113
-37
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1362 | 1362 | | |
1363 | 1363 | | |
1364 | 1364 | | |
1365 | | - | |
1366 | | - | |
| 1365 | + | |
| 1366 | + | |
1367 | 1367 | | |
1368 | 1368 | | |
1369 | 1369 | | |
| |||
1373 | 1373 | | |
1374 | 1374 | | |
1375 | 1375 | | |
1376 | | - | |
1377 | | - | |
| 1376 | + | |
| 1377 | + | |
1378 | 1378 | | |
1379 | 1379 | | |
1380 | 1380 | | |
| |||
1393 | 1393 | | |
1394 | 1394 | | |
1395 | 1395 | | |
| 1396 | + | |
| 1397 | + | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
1396 | 1406 | | |
1397 | 1407 | | |
1398 | 1408 | | |
| |||
1425 | 1435 | | |
1426 | 1436 | | |
1427 | 1437 | | |
1428 | | - | |
| 1438 | + | |
1429 | 1439 | | |
1430 | 1440 | | |
1431 | | - | |
| 1441 | + | |
| 1442 | + | |
1432 | 1443 | | |
1433 | | - | |
| 1444 | + | |
1434 | 1445 | | |
1435 | 1446 | | |
1436 | 1447 | | |
| |||
1446 | 1457 | | |
1447 | 1458 | | |
1448 | 1459 | | |
1449 | | - | |
1450 | | - | |
1451 | | - | |
1452 | 1460 | | |
1453 | 1461 | | |
1454 | 1462 | | |
1455 | 1463 | | |
1456 | 1464 | | |
| 1465 | + | |
| 1466 | + | |
1457 | 1467 | | |
1458 | 1468 | | |
1459 | 1469 | | |
1460 | 1470 | | |
1461 | 1471 | | |
1462 | | - | |
1463 | | - | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
1464 | 1475 | | |
1465 | 1476 | | |
1466 | 1477 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1964 | 1964 | | |
1965 | 1965 | | |
1966 | 1966 | | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
| 1970 | + | |
| 1971 | + | |
| 1972 | + | |
| 1973 | + | |
| 1974 | + | |
| 1975 | + | |
| 1976 | + | |
| 1977 | + | |
| 1978 | + | |
| 1979 | + | |
| 1980 | + | |
| 1981 | + | |
1967 | 1982 | | |
1968 | 1983 | | |
1969 | 1984 | | |
| |||
Lines changed: 32 additions & 25 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
260 | 260 | | |
261 | 261 | | |
262 | 262 | | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | 263 | | |
267 | 264 | | |
268 | 265 | | |
| |||
277 | 274 | | |
278 | 275 | | |
279 | 276 | | |
280 | | - | |
281 | | - | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
| |||
288 | 288 | | |
289 | 289 | | |
290 | 290 | | |
291 | | - | |
292 | | - | |
293 | 291 | | |
294 | 292 | | |
295 | 293 | | |
| |||
301 | 299 | | |
302 | 300 | | |
303 | 301 | | |
304 | | - | |
305 | | - | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
306 | 307 | | |
307 | 308 | | |
308 | 309 | | |
| |||
312 | 313 | | |
313 | 314 | | |
314 | 315 | | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | 316 | | |
320 | 317 | | |
321 | 318 | | |
| |||
328 | 325 | | |
329 | 326 | | |
330 | 327 | | |
331 | | - | |
332 | | - | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
346 | | - | |
347 | | - | |
348 | 346 | | |
349 | 347 | | |
350 | 348 | | |
| |||
354 | 352 | | |
355 | 353 | | |
356 | 354 | | |
357 | | - | |
358 | | - | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
359 | 360 | | |
360 | 361 | | |
361 | 362 | | |
| |||
366 | 367 | | |
367 | 368 | | |
368 | 369 | | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
374 | 379 | | |
375 | 380 | | |
376 | | - | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
377 | 384 | | |
378 | 385 | | |
379 | 386 | | |
| |||
Lines changed: 43 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
265 | 308 | | |
0 commit comments