Commit 6be8b93
[SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
### What changes were proposed in this pull request?
Spark sql commands are failing on selecting the orc tables
Steps to reproduce
Example 1 -
Prerequisite - This is the location(/Users/test/tpcds_scale5data/date_dim) for orc data which is generated by the hive.
```
val table = """CREATE TABLE `date_dim` (
`d_date_sk` INT,
`d_date_id` STRING,
`d_date` TIMESTAMP,
`d_month_seq` INT,
`d_week_seq` INT,
`d_quarter_seq` INT,
`d_year` INT,
`d_dow` INT,
`d_moy` INT,
`d_dom` INT,
`d_qoy` INT,
`d_fy_year` INT,
`d_fy_quarter_seq` INT,
`d_fy_week_seq` INT,
`d_day_name` STRING,
`d_quarter_name` STRING,
`d_holiday` STRING,
`d_weekend` STRING,
`d_following_holiday` STRING,
`d_first_dom` INT,
`d_last_dom` INT,
`d_same_day_ly` INT,
`d_same_day_lq` INT,
`d_current_day` STRING,
`d_current_week` STRING,
`d_current_month` STRING,
`d_current_quarter` STRING,
`d_current_year` STRING)
USING orc
LOCATION '/Users/test/tpcds_scale5data/date_dim'"""
spark.sql(table).collect
val u = """select date_dim.d_date_id from date_dim limit 5"""
spark.sql(u).collect
```
Example 2
```
val table = """CREATE TABLE `test_orc_data` (
`_col1` INT,
`_col2` STRING,
`_col3` INT)
USING orc"""
spark.sql(table).collect
spark.sql("insert into test_orc_data values(13, '155', 2020)").collect
val df = """select _col2 from test_orc_data limit 5"""
spark.sql(df).collect
```
Its Failing with below error
```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, 192.168.0.103, executor driver): java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:133)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)`
```
The reason behind this initBatch is not getting the schema that is needed to find out the column value in OrcFileFormat.scala
```
batchReader.initBatch(
TypeDescription.fromString(resultSchemaString)
```
### Why are the changes needed?
Spark sql queries for orc tables are failing
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test is added for this .Also Tested through spark shell and spark submit the failing queries
Closes #29045 from SaurabhChawla100/SPARK-32234.
Lead-authored-by: SaurabhChawla <[email protected]>
Co-authored-by: SaurabhChawla <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent bdeb626 commit 6be8b93
File tree
4 files changed
+78
-22
lines changed- sql
- core/src/main/scala/org/apache/spark/sql/execution/datasources
- orc
- v2/orc
- hive/src/test/scala/org/apache/spark/sql/hive/orc
4 files changed
+78
-22
lines changedLines changed: 5 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
164 | 164 | | |
165 | 165 | | |
166 | 166 | | |
167 | | - | |
168 | | - | |
169 | 167 | | |
170 | 168 | | |
171 | 169 | | |
| |||
179 | 177 | | |
180 | 178 | | |
181 | 179 | | |
182 | | - | |
| 180 | + | |
183 | 181 | | |
184 | 182 | | |
185 | 183 | | |
186 | 184 | | |
187 | 185 | | |
188 | | - | |
| 186 | + | |
189 | 187 | | |
190 | 188 | | |
191 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
| |||
Lines changed: 34 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
120 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
121 | 123 | | |
122 | 124 | | |
123 | 125 | | |
124 | 126 | | |
125 | 127 | | |
126 | 128 | | |
127 | | - | |
| 129 | + | |
128 | 130 | | |
129 | 131 | | |
130 | 132 | | |
| |||
136 | 138 | | |
137 | 139 | | |
138 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
139 | 145 | | |
140 | 146 | | |
141 | 147 | | |
142 | 148 | | |
143 | 149 | | |
144 | 150 | | |
145 | 151 | | |
146 | | - | |
| 152 | + | |
147 | 153 | | |
148 | 154 | | |
149 | 155 | | |
| |||
152 | 158 | | |
153 | 159 | | |
154 | 160 | | |
155 | | - | |
| 161 | + | |
156 | 162 | | |
157 | 163 | | |
158 | 164 | | |
| |||
170 | 176 | | |
171 | 177 | | |
172 | 178 | | |
173 | | - | |
| 179 | + | |
174 | 180 | | |
175 | 181 | | |
176 | 182 | | |
| |||
199 | 205 | | |
200 | 206 | | |
201 | 207 | | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
202 | 229 | | |
Lines changed: 11 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
70 | | - | |
71 | 69 | | |
72 | 70 | | |
73 | 71 | | |
74 | 72 | | |
75 | 73 | | |
76 | 74 | | |
77 | | - | |
| 75 | + | |
78 | 76 | | |
79 | 77 | | |
80 | 78 | | |
81 | 79 | | |
82 | 80 | | |
83 | | - | |
| 81 | + | |
84 | 82 | | |
85 | 83 | | |
86 | | - | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| |||
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
115 | | - | |
116 | | - | |
117 | 115 | | |
118 | 116 | | |
119 | 117 | | |
120 | 118 | | |
121 | 119 | | |
122 | 120 | | |
123 | | - | |
| 121 | + | |
124 | 122 | | |
125 | 123 | | |
126 | 124 | | |
127 | 125 | | |
128 | 126 | | |
129 | | - | |
| 127 | + | |
130 | 128 | | |
131 | 129 | | |
132 | | - | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
133 | 134 | | |
134 | 135 | | |
135 | 136 | | |
| |||
Lines changed: 28 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
288 | 288 | | |
289 | 289 | | |
290 | 290 | | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
291 | 319 | | |
0 commit comments