Skip to content

Commit 7546b44

Browse files
committed
[SPARK-42164][CORE] Register partitioned-table-related classes to KryoSerializer
### What changes were proposed in this pull request? This PR aims to register partitioned-table-related classes to `KryoSerializer`. Specifically, `CREATE TABLE` and `MSCK REPAIR TABLE` uses this classes. ### Why are the changes needed? To support partitioned-tables more easily with `KryoSerializer`. Previously, it fails like the following. ``` java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation ``` ``` java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.util.HadoopFSUtils$SerializableFileStatus ``` ``` java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.sql.execution.command.PartitionStatistics ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manually tests. **TEST TABLE** ``` $ tree /tmp/t /tmp/t ├── p=1 │   └── users.orc ├── p=10 │   └── users.orc ├── p=11 │   └── users.orc ├── p=2 │   └── users.orc ├── p=3 │   └── users.orc ├── p=4 │   └── users.orc ├── p=5 │   └── users.orc ├── p=6 │   └── users.orc ├── p=7 │   └── users.orc ├── p=8 │   └── users.orc └── p=9 └── users.orc ``` **CREATE PARTITIONED TABLES AND RECOVER PARTITIONS** ``` $ bin/spark-shell -c spark.kryo.registrationRequired=true -c spark.serializer=org.apache.spark.serializer.KryoSerializer -c spark.sql.sources.parallelPartitionDiscovery.threshold=1 scala> sql("CREATE TABLE t USING ORC LOCATION '/tmp/t'").show() ++ || ++ ++ scala> sql("MSCK REPAIR TABLE t").show() ++ || ++ ++ ``` Closes apache#39713 from dongjoon-hyun/SPARK-42164. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 934d14d commit 7546b44

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -510,6 +510,10 @@ private[serializer] object KryoSerializer {
510510
// SQL / ML / MLlib classes once and then re-use that filtered list in newInstance() calls.
511511
private lazy val loadableSparkClasses: Seq[Class[_]] = {
512512
Seq(
513+
"org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation",
514+
"[Lorg.apache.spark.util.HadoopFSUtils$SerializableBlockLocation;",
515+
"org.apache.spark.util.HadoopFSUtils$SerializableFileStatus",
516+
513517
"org.apache.spark.sql.catalyst.expressions.BoundReference",
514518
"org.apache.spark.sql.catalyst.expressions.SortOrder",
515519
"[Lorg.apache.spark.sql.catalyst.expressions.SortOrder;",
@@ -536,6 +540,7 @@ private[serializer] object KryoSerializer {
536540
"org.apache.spark.sql.types.DecimalType",
537541
"org.apache.spark.sql.types.Decimal$DecimalAsIfIntegral$",
538542
"org.apache.spark.sql.types.Decimal$DecimalIsFractional$",
543+
"org.apache.spark.sql.execution.command.PartitionStatistics",
539544
"org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTaskResult",
540545
"org.apache.spark.sql.execution.joins.EmptyHashedRelation$",
541546
"org.apache.spark.sql.execution.joins.LongHashedRelation",

0 commit comments

Comments
 (0)