Commit eab10f9
[SPARK-24068][BACKPORT-2.3] Propagating DataFrameReader's options to Text datasource on schema inferring
## What changes were proposed in this pull request?
While reading CSV or JSON files, DataFrameReader's options are converted to Hadoop's parameters, for example there:
https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L302
but the options are not propagated to Text datasource on schema inferring, for instance:
https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala#L184-L188
The PR proposes propagation of user's options to Text datasource on scheme inferring in similar way as user's options are converted to Hadoop parameters if schema is specified.
## How was this patch tested?
The changes were tested manually by using https://github.com/twitter/hadoop-lzo:
```
hadoop-lzo> mvn clean package
hadoop-lzo> ln -s ./target/hadoop-lzo-0.4.21-SNAPSHOT.jar ./hadoop-lzo.jar
```
Create 2 test files in JSON and CSV format and compress them:
```shell
$ cat test.csv
col1|col2
a|1
$ lzop test.csv
$ cat test.json
{"col1":"a","col2":1}
$ lzop test.json
```
Run `spark-shell` with hadoop-lzo:
```
bin/spark-shell --jars ~/hadoop-lzo/hadoop-lzo.jar
```
reading compressed CSV and JSON without schema:
```scala
spark.read.option("io.compression.codecs", "com.hadoop.compression.lzo.LzopCodec").option("inferSchema",true).option("header",true).option("sep","|").csv("test.csv.lzo").show()
+----+----+
|col1|col2|
+----+----+
| a| 1|
+----+----+
```
```scala
spark.read.option("io.compression.codecs", "com.hadoop.compression.lzo.LzopCodec").option("multiLine", true).json("test.json.lzo").printSchema
root
|-- col1: string (nullable = true)
|-- col2: long (nullable = true)
```
Author: Maxim Gekk <[email protected]>
Author: Maxim Gekk <[email protected]>
Closes #21292 from MaxGekk/text-options-backport-v2.3.1 parent 8889d78 commit eab10f9
File tree
5 files changed
+16
-12
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/json
- core/src/main/scala/org/apache/spark/sql/execution/datasources
- csv
- json
5 files changed
+16
-12
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| |||
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
184 | 184 | | |
185 | 185 | | |
186 | 186 | | |
187 | | - | |
| 187 | + | |
| 188 | + | |
188 | 189 | | |
189 | 190 | | |
190 | 191 | | |
| |||
248 | 249 | | |
249 | 250 | | |
250 | 251 | | |
251 | | - | |
| 252 | + | |
| 253 | + | |
252 | 254 | | |
253 | 255 | | |
254 | 256 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| |||
Lines changed: 0 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | 22 | | |
25 | 23 | | |
26 | 24 | | |
| |||
Lines changed: 10 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
| 107 | + | |
| 108 | + | |
108 | 109 | | |
109 | 110 | | |
110 | 111 | | |
111 | 112 | | |
112 | 113 | | |
113 | | - | |
| 114 | + | |
| 115 | + | |
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
| |||
144 | 146 | | |
145 | 147 | | |
146 | 148 | | |
147 | | - | |
| 149 | + | |
148 | 150 | | |
149 | 151 | | |
150 | 152 | | |
151 | 153 | | |
152 | 154 | | |
153 | 155 | | |
154 | | - | |
| 156 | + | |
| 157 | + | |
155 | 158 | | |
156 | | - | |
| 159 | + | |
| 160 | + | |
157 | 161 | | |
158 | 162 | | |
159 | 163 | | |
| |||
0 commit comments