Commit 6d7ebf2
[SPARK-22165][SQL] Fixes type conflicts between double, long, decimals, dates and timestamps in partition column
## What changes were proposed in this pull request?
This PR proposes to add a rule that re-uses `TypeCoercion.findWiderCommonType` when resolving type conflicts in partition values.
Currently, this uses numeric precedence-like comparison; therefore, it looks introducing failures for type conflicts between timestamps, dates and decimals, please see:
```scala
private val upCastingOrder: Seq[DataType] =
Seq(NullType, IntegerType, LongType, FloatType, DoubleType, StringType)
...
literals.map(_.dataType).maxBy(upCastingOrder.indexOf(_))
```
The codes below:
```scala
val df = Seq((1, "2015-01-01"), (2, "2016-01-01 00:00:00")).toDF("i", "ts")
df.write.format("parquet").partitionBy("ts").save("/tmp/foo")
spark.read.load("/tmp/foo").printSchema()
val df = Seq((1, "1"), (2, "1" * 30)).toDF("i", "decimal")
df.write.format("parquet").partitionBy("decimal").save("/tmp/bar")
spark.read.load("/tmp/bar").printSchema()
```
produces output as below:
**Before**
```
root
|-- i: integer (nullable = true)
|-- ts: date (nullable = true)
root
|-- i: integer (nullable = true)
|-- decimal: integer (nullable = true)
```
**After**
```
root
|-- i: integer (nullable = true)
|-- ts: timestamp (nullable = true)
root
|-- i: integer (nullable = true)
|-- decimal: decimal(30,0) (nullable = true)
```
### Type coercion table:
This PR proposes the type conflict resolusion as below:
**Before**
|InputA \ InputB|`NullType`|`IntegerType`|`LongType`|`DecimalType(38,0)`|`DoubleType`|`DateType`|`TimestampType`|`StringType`|
|------------------------|----------|----------|----------|----------|----------|----------|----------|----------|
|**`NullType`**|`StringType`|`IntegerType`|`LongType`|`StringType`|`DoubleType`|`StringType`|`StringType`|`StringType`|
|**`IntegerType`**|`IntegerType`|`IntegerType`|`LongType`|`IntegerType`|`DoubleType`|`IntegerType`|`IntegerType`|`StringType`|
|**`LongType`**|`LongType`|`LongType`|`LongType`|`LongType`|`DoubleType`|`LongType`|`LongType`|`StringType`|
|**`DecimalType(38,0)`**|`StringType`|`IntegerType`|`LongType`|`DecimalType(38,0)`|`DoubleType`|`DecimalType(38,0)`|`DecimalType(38,0)`|`StringType`|
|**`DoubleType`**|`DoubleType`|`DoubleType`|`DoubleType`|`DoubleType`|`DoubleType`|`DoubleType`|`DoubleType`|`StringType`|
|**`DateType`**|`StringType`|`IntegerType`|`LongType`|`DateType`|`DoubleType`|`DateType`|`DateType`|`StringType`|
|**`TimestampType`**|`StringType`|`IntegerType`|`LongType`|`TimestampType`|`DoubleType`|`TimestampType`|`TimestampType`|`StringType`|
|**`StringType`**|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|
**After**
|InputA \ InputB|`NullType`|`IntegerType`|`LongType`|`DecimalType(38,0)`|`DoubleType`|`DateType`|`TimestampType`|`StringType`|
|------------------------|----------|----------|----------|----------|----------|----------|----------|----------|
|**`NullType`**|`NullType`|`IntegerType`|`LongType`|`DecimalType(38,0)`|`DoubleType`|`DateType`|`TimestampType`|`StringType`|
|**`IntegerType`**|`IntegerType`|`IntegerType`|`LongType`|`DecimalType(38,0)`|`DoubleType`|`StringType`|`StringType`|`StringType`|
|**`LongType`**|`LongType`|`LongType`|`LongType`|`DecimalType(38,0)`|`StringType`|`StringType`|`StringType`|`StringType`|
|**`DecimalType(38,0)`**|`DecimalType(38,0)`|`DecimalType(38,0)`|`DecimalType(38,0)`|`DecimalType(38,0)`|`StringType`|`StringType`|`StringType`|`StringType`|
|**`DoubleType`**|`DoubleType`|`DoubleType`|`StringType`|`StringType`|`DoubleType`|`StringType`|`StringType`|`StringType`|
|**`DateType`**|`DateType`|`StringType`|`StringType`|`StringType`|`StringType`|`DateType`|`TimestampType`|`StringType`|
|**`TimestampType`**|`TimestampType`|`StringType`|`StringType`|`StringType`|`StringType`|`TimestampType`|`TimestampType`|`StringType`|
|**`StringType`**|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|`StringType`|
This was produced by:
```scala
test("Print out chart") {
val supportedTypes: Seq[DataType] = Seq(
NullType, IntegerType, LongType, DecimalType(38, 0), DoubleType,
DateType, TimestampType, StringType)
// Old type conflict resolution:
val upCastingOrder: Seq[DataType] =
Seq(NullType, IntegerType, LongType, FloatType, DoubleType, StringType)
def oldResolveTypeConflicts(dataTypes: Seq[DataType]): DataType = {
val topType = dataTypes.maxBy(upCastingOrder.indexOf(_))
if (topType == NullType) StringType else topType
}
println(s"|InputA \\ InputB|${supportedTypes.map(dt => s"`${dt.toString}`").mkString("|")}|")
println(s"|------------------------|${supportedTypes.map(_ => "----------").mkString("|")}|")
supportedTypes.foreach { inputA =>
val types = supportedTypes.map(inputB => oldResolveTypeConflicts(Seq(inputA, inputB)))
println(s"|**`$inputA`**|${types.map(dt => s"`${dt.toString}`").mkString("|")}|")
}
// New type conflict resolution:
def newResolveTypeConflicts(dataTypes: Seq[DataType]): DataType = {
dataTypes.fold[DataType](NullType)(findWiderTypeForPartitionColumn)
}
println(s"|InputA \\ InputB|${supportedTypes.map(dt => s"`${dt.toString}`").mkString("|")}|")
println(s"|------------------------|${supportedTypes.map(_ => "----------").mkString("|")}|")
supportedTypes.foreach { inputA =>
val types = supportedTypes.map(inputB => newResolveTypeConflicts(Seq(inputA, inputB)))
println(s"|**`$inputA`**|${types.map(dt => s"`${dt.toString}`").mkString("|")}|")
}
}
```
## How was this patch tested?
Unit tests added in `ParquetPartitionDiscoverySuite`.
Author: hyukjinkwon <[email protected]>
Closes #19389 from HyukjinKwon/partition-type-coercion.1 parent 2d868d9 commit 6d7ebf2
File tree
4 files changed
+235
-23
lines changed- docs
- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis
- core/src
- main/scala/org/apache/spark/sql/execution/datasources
- test/scala/org/apache/spark/sql/execution/datasources/parquet
4 files changed
+235
-23
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1577 | 1577 | | |
1578 | 1578 | | |
1579 | 1579 | | |
| 1580 | + | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
| 1612 | + | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
| 1618 | + | |
| 1619 | + | |
| 1620 | + | |
| 1621 | + | |
| 1622 | + | |
| 1623 | + | |
| 1624 | + | |
| 1625 | + | |
| 1626 | + | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
| 1702 | + | |
| 1703 | + | |
| 1704 | + | |
| 1705 | + | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
1580 | 1719 | | |
1581 | 1720 | | |
1582 | 1721 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| |||
Lines changed: 40 additions & 20 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
309 | 309 | | |
310 | 310 | | |
311 | 311 | | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
| 312 | + | |
| 313 | + | |
319 | 314 | | |
320 | 315 | | |
321 | 316 | | |
| |||
372 | 367 | | |
373 | 368 | | |
374 | 369 | | |
| 370 | + | |
375 | 371 | | |
376 | | - | |
377 | | - | |
| 372 | + | |
| 373 | + | |
378 | 374 | | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
379 | 393 | | |
| 394 | + | |
380 | 395 | | |
381 | 396 | | |
382 | 397 | | |
| |||
427 | 442 | | |
428 | 443 | | |
429 | 444 | | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | 445 | | |
434 | 446 | | |
435 | 447 | | |
| |||
468 | 480 | | |
469 | 481 | | |
470 | 482 | | |
471 | | - | |
472 | | - | |
| 483 | + | |
| 484 | + | |
473 | 485 | | |
474 | 486 | | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
| 487 | + | |
| 488 | + | |
480 | 489 | | |
481 | 490 | | |
482 | 491 | | |
483 | 492 | | |
484 | 493 | | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
485 | 505 | | |
Lines changed: 55 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
249 | 249 | | |
250 | 250 | | |
251 | 251 | | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
252 | 257 | | |
253 | 258 | | |
254 | 259 | | |
| |||
314 | 319 | | |
315 | 320 | | |
316 | 321 | | |
317 | | - | |
| 322 | + | |
318 | 323 | | |
319 | 324 | | |
320 | 325 | | |
| |||
324 | 329 | | |
325 | 330 | | |
326 | 331 | | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
327 | 358 | | |
328 | 359 | | |
329 | 360 | | |
| |||
395 | 426 | | |
396 | 427 | | |
397 | 428 | | |
398 | | - | |
| 429 | + | |
399 | 430 | | |
400 | 431 | | |
401 | 432 | | |
| |||
1067 | 1098 | | |
1068 | 1099 | | |
1069 | 1100 | | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
1070 | 1123 | | |
0 commit comments