Commit 1d9338b
[SPARK-23786][SQL] Checking column names of csv headers
## What changes were proposed in this pull request?
Currently column names of headers in CSV files are not checked against provided schema of CSV data. It could cause errors like showed in the [SPARK-23786](https://issues.apache.org/jira/browse/SPARK-23786) and #20894 (comment). I introduced new CSV option - `enforceSchema`. If it is enabled (by default `true`), Spark forcibly applies provided or inferred schema to CSV files. In that case, CSV headers are ignored and not checked against the schema. If `enforceSchema` is set to `false`, additional checks can be performed. For example, if column in CSV header and in the schema have different ordering, the following exception is thrown:
```
java.lang.IllegalArgumentException: CSV file header does not contain the expected fields
Header: depth, temperature
Schema: temperature, depth
CSV file: marina.csv
```
## How was this patch tested?
The changes were tested by existing tests of CSVSuite and by 2 new tests.
Author: Maxim Gekk <[email protected]>
Author: Maxim Gekk <[email protected]>
Closes #20894 from MaxGekk/check-column-names.1 parent 416cd1f commit 1d9338b
File tree
10 files changed
+411
-37
lines changed- python/pyspark/sql
- sql/core/src
- main/scala/org/apache/spark/sql
- execution/datasources/csv
- test/scala/org/apache/spark/sql/execution/datasources/csv
10 files changed
+411
-37
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
346 | 346 | | |
347 | 347 | | |
348 | 348 | | |
349 | | - | |
| 349 | + | |
350 | 350 | | |
351 | 351 | | |
352 | 352 | | |
| |||
373 | 373 | | |
374 | 374 | | |
375 | 375 | | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
376 | 386 | | |
377 | 387 | | |
378 | 388 | | |
| |||
449 | 459 | | |
450 | 460 | | |
451 | 461 | | |
452 | | - | |
| 462 | + | |
| 463 | + | |
453 | 464 | | |
454 | 465 | | |
455 | 466 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
564 | 564 | | |
565 | 565 | | |
566 | 566 | | |
567 | | - | |
| 567 | + | |
| 568 | + | |
568 | 569 | | |
569 | 570 | | |
570 | 571 | | |
| |||
592 | 593 | | |
593 | 594 | | |
594 | 595 | | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
595 | 606 | | |
596 | 607 | | |
597 | 608 | | |
| |||
664 | 675 | | |
665 | 676 | | |
666 | 677 | | |
667 | | - | |
| 678 | + | |
668 | 679 | | |
669 | 680 | | |
670 | 681 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3056 | 3056 | | |
3057 | 3057 | | |
3058 | 3058 | | |
| 3059 | + | |
| 3060 | + | |
| 3061 | + | |
| 3062 | + | |
| 3063 | + | |
| 3064 | + | |
| 3065 | + | |
| 3066 | + | |
| 3067 | + | |
| 3068 | + | |
| 3069 | + | |
| 3070 | + | |
| 3071 | + | |
| 3072 | + | |
| 3073 | + | |
| 3074 | + | |
| 3075 | + | |
| 3076 | + | |
3059 | 3077 | | |
3060 | 3078 | | |
3061 | 3079 | | |
| |||
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
474 | 475 | | |
475 | 476 | | |
476 | 477 | | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
477 | 481 | | |
478 | 482 | | |
479 | 483 | | |
| |||
499 | 503 | | |
500 | 504 | | |
501 | 505 | | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
502 | 513 | | |
503 | 514 | | |
504 | 515 | | |
| |||
539 | 550 | | |
540 | 551 | | |
541 | 552 | | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
542 | 560 | | |
543 | 561 | | |
544 | 562 | | |
| |||
583 | 601 | | |
584 | 602 | | |
585 | 603 | | |
| 604 | + | |
586 | 605 | | |
587 | 606 | | |
588 | 607 | | |
| |||
Lines changed: 119 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
53 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
| |||
110 | 114 | | |
111 | 115 | | |
112 | 116 | | |
113 | | - | |
| 117 | + | |
114 | 118 | | |
115 | 119 | | |
116 | 120 | | |
117 | 121 | | |
118 | 122 | | |
119 | 123 | | |
120 | 124 | | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
121 | 203 | | |
122 | 204 | | |
123 | 205 | | |
| |||
127 | 209 | | |
128 | 210 | | |
129 | 211 | | |
130 | | - | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
131 | 215 | | |
132 | 216 | | |
133 | 217 | | |
| |||
136 | 220 | | |
137 | 221 | | |
138 | 222 | | |
139 | | - | |
140 | | - | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
141 | 241 | | |
142 | 242 | | |
143 | 243 | | |
| |||
206 | 306 | | |
207 | 307 | | |
208 | 308 | | |
209 | | - | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
210 | 321 | | |
211 | 322 | | |
212 | 323 | | |
213 | 324 | | |
214 | | - | |
| 325 | + | |
| 326 | + | |
215 | 327 | | |
216 | 328 | | |
217 | 329 | | |
| |||
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| 133 | + | |
133 | 134 | | |
134 | 135 | | |
135 | 136 | | |
136 | 137 | | |
137 | 138 | | |
138 | 139 | | |
139 | 140 | | |
140 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
141 | 148 | | |
142 | 149 | | |
143 | 150 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
159 | 165 | | |
160 | 166 | | |
161 | 167 | | |
| |||
0 commit comments