Commit 4fa1a43
[SPARK-19641][SQL] JSON schema inference in DROPMALFORMED mode produces incorrect schema for non-array/object JSONs
## What changes were proposed in this pull request?
Currently, when we infer the types for vaild JSON strings but object or array, we are producing empty schemas regardless of parse modes as below:
```scala
scala> spark.read.option("mode", "DROPMALFORMED").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema()
root
```
```scala
scala> spark.read.option("mode", "FAILFAST").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema()
root
```
This PR proposes to handle parse modes in type inference.
After this PR,
```scala
scala> spark.read.option("mode", "DROPMALFORMED").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema()
root
|-- a: long (nullable = true)
```
```
scala> spark.read.option("mode", "FAILFAST").json(Seq("""{"a": 1}""", """"a"""").toDS).printSchema()
java.lang.RuntimeException: Failed to infer a common schema. Struct types are expected but string was found.
```
This PR is based on NathanHowell@e233fd0 and I and NathanHowell talked about this in https://issues.apache.org/jira/browse/SPARK-19641
## How was this patch tested?
Unit tests in `JsonSuite` for both `DROPMALFORMED` and `FAILFAST` modes.
Author: hyukjinkwon <[email protected]>
Closes #17492 from HyukjinKwon/SPARK-19641.1 parent 4d28e84 commit 4fa1a43
File tree
2 files changed
+78
-33
lines changed- sql/core/src
- main/scala/org/apache/spark/sql/execution/datasources/json
- test/scala/org/apache/spark/sql/execution/datasources/json
2 files changed
+78
-33
lines changedLines changed: 47 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
62 | 66 | | |
63 | 67 | | |
64 | | - | |
65 | | - | |
| 68 | + | |
| 69 | + | |
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
69 | 73 | | |
70 | 74 | | |
71 | | - | |
| 75 | + | |
72 | 76 | | |
73 | 77 | | |
74 | 78 | | |
| |||
202 | 206 | | |
203 | 207 | | |
204 | 208 | | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
216 | 230 | | |
217 | | - | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
218 | 236 | | |
219 | 237 | | |
220 | 238 | | |
221 | 239 | | |
222 | 240 | | |
223 | 241 | | |
224 | 242 | | |
225 | | - | |
| 243 | + | |
226 | 244 | | |
227 | 245 | | |
228 | 246 | | |
229 | | - | |
| 247 | + | |
230 | 248 | | |
231 | | - | |
232 | | - | |
233 | | - | |
| 249 | + | |
| 250 | + | |
234 | 251 | | |
235 | 252 | | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
240 | 257 | | |
241 | 258 | | |
242 | 259 | | |
| |||
Lines changed: 31 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1041 | 1041 | | |
1042 | 1042 | | |
1043 | 1043 | | |
1044 | | - | |
1045 | 1044 | | |
1046 | 1045 | | |
1047 | 1046 | | |
| |||
1082 | 1081 | | |
1083 | 1082 | | |
1084 | 1083 | | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
| 1087 | + | |
| 1088 | + | |
| 1089 | + | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
1085 | 1096 | | |
1086 | 1097 | | |
1087 | 1098 | | |
| |||
1882 | 1893 | | |
1883 | 1894 | | |
1884 | 1895 | | |
| 1896 | + | |
| 1897 | + | |
| 1898 | + | |
| 1899 | + | |
| 1900 | + | |
| 1901 | + | |
| 1902 | + | |
| 1903 | + | |
| 1904 | + | |
| 1905 | + | |
| 1906 | + | |
| 1907 | + | |
| 1908 | + | |
| 1909 | + | |
| 1910 | + | |
| 1911 | + | |
| 1912 | + | |
| 1913 | + | |
1885 | 1914 | | |
1886 | 1915 | | |
1887 | 1916 | | |
| |||
1903 | 1932 | | |
1904 | 1933 | | |
1905 | 1934 | | |
1906 | | - | |
1907 | 1935 | | |
1908 | | - | |
| 1936 | + | |
1909 | 1937 | | |
1910 | 1938 | | |
1911 | 1939 | | |
| |||
0 commit comments