Commit 2ca9bb0
[SPARK-23173][SQL] Avoid creating corrupt parquet files when loading data from JSON
## What changes were proposed in this pull request?
The from_json() function accepts an additional parameter, where the user might specify the schema. The issue is that the specified schema might not be compatible with data. In particular, the JSON data might be missing data for fields declared as non-nullable in the schema. The from_json() function does not verify the data against such errors. When data with missing fields is sent to the parquet encoder, there is no verification either. The end results is a corrupt parquet file.
To avoid corruptions, make sure that all fields in the user-specified schema are set to be nullable.
Since this changes the behavior of a public function, we need to include it in release notes.
The behavior can be reverted by setting `spark.sql.fromJsonForceNullableSchema=false`
## How was this patch tested?
Added two new tests.
Author: Michał Świtakowski <[email protected]>
Closes #20694 from mswit-databricks/SPARK-23173.1 parent 2c36736 commit 2ca9bb0
File tree
4 files changed
+70
-9
lines changed- sql
- catalyst/src
- main/scala/org/apache/spark/sql
- catalyst/expressions
- internal
- test/scala/org/apache/spark/sql/catalyst/expressions
- core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet
4 files changed
+70
-9
lines changedLines changed: 14 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
515 | 516 | | |
516 | 517 | | |
517 | 518 | | |
518 | | - | |
519 | 519 | | |
520 | | - | |
521 | | - | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
522 | 528 | | |
523 | 529 | | |
524 | 530 | | |
| |||
535 | 541 | | |
536 | 542 | | |
537 | 543 | | |
538 | | - | |
| 544 | + | |
539 | 545 | | |
540 | 546 | | |
541 | 547 | | |
542 | | - | |
| 548 | + | |
543 | 549 | | |
544 | 550 | | |
545 | 551 | | |
546 | | - | |
| 552 | + | |
547 | 553 | | |
548 | 554 | | |
549 | 555 | | |
550 | 556 | | |
551 | 557 | | |
552 | 558 | | |
553 | | - | |
| 559 | + | |
554 | 560 | | |
555 | 561 | | |
556 | 562 | | |
| |||
563 | 569 | | |
564 | 570 | | |
565 | 571 | | |
566 | | - | |
| 572 | + | |
567 | 573 | | |
568 | 574 | | |
569 | 575 | | |
| |||
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
493 | 493 | | |
494 | 494 | | |
495 | 495 | | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
496 | 504 | | |
497 | 505 | | |
498 | 506 | | |
| |||
Lines changed: 29 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
| 27 | + | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
29 | | - | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
680 | 682 | | |
681 | 683 | | |
682 | 684 | | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
683 | 711 | | |
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| |||
780 | 781 | | |
781 | 782 | | |
782 | 783 | | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
783 | 802 | | |
784 | 803 | | |
785 | 804 | | |
| |||
0 commit comments