Skip to content

Commit 85cb388

Browse files
viiryacloud-fan
authored andcommitted
[SPARK-30050][SQL] analyze table and rename table should not erase hive table bucketing info
### What changes were proposed in this pull request? This patch adds Hive provider into table metadata in `HiveExternalCatalog.alterTableStats`. When we call `HiveClient.alterTable`, `alterTable` will erase if it can not find hive provider in given table metadata. Rename table also has this issue. ### Why are the changes needed? Because running `ANALYZE TABLE` on a Hive table, if the table has bucketing info, will erase existing bucket info. ### Does this PR introduce any user-facing change? Yes. After this PR, running `ANALYZE TABLE` on Hive table, won't erase existing bucketing info. ### How was this patch tested? Unit test. Closes #26685 from viirya/fix-hive-bucket. Lead-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 51e69fe commit 85cb388

File tree

2 files changed

+28
-2
lines changed

2 files changed

+28
-2
lines changed

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
5555
import org.apache.spark.sql.catalyst.expressions.Expression
5656
import org.apache.spark.sql.catalyst.parser.{CatalystSqlParser, ParseException}
5757
import org.apache.spark.sql.execution.QueryExecutionException
58-
import org.apache.spark.sql.execution.command.DDLUtils
58+
import org.apache.spark.sql.hive.HiveExternalCatalog
5959
import org.apache.spark.sql.hive.HiveExternalCatalog.{DATASOURCE_SCHEMA, DATASOURCE_SCHEMA_NUMPARTS, DATASOURCE_SCHEMA_PART_PREFIX}
6060
import org.apache.spark.sql.hive.HiveUtils
6161
import org.apache.spark.sql.hive.client.HiveClientImpl._
@@ -1059,7 +1059,7 @@ private[hive] object HiveClientImpl {
10591059
}
10601060

10611061
table.bucketSpec match {
1062-
case Some(bucketSpec) if DDLUtils.isHiveTable(table) =>
1062+
case Some(bucketSpec) if !HiveExternalCatalog.isDatasourceTable(table) =>
10631063
hiveTable.setNumBuckets(bucketSpec.numBuckets)
10641064
hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava)
10651065

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,32 @@ class HiveExternalCatalogSuite extends ExternalCatalogSuite {
108108
assert(bucketString.contains("10"))
109109
}
110110

111+
test("SPARK-30050: analyze/rename table should not erase the bucketing metadata at hive side") {
112+
val catalog = newBasicCatalog()
113+
externalCatalog.client.runSqlHive(
114+
"""
115+
|CREATE TABLE db1.t(a string, b string)
116+
|CLUSTERED BY (a, b) SORTED BY (a, b) INTO 10 BUCKETS
117+
|STORED AS PARQUET
118+
""".stripMargin)
119+
120+
val bucketString1 = externalCatalog.client.runSqlHive("DESC FORMATTED db1.t")
121+
.filter(_.contains("Num Buckets")).head
122+
assert(bucketString1.contains("10"))
123+
124+
catalog.alterTableStats("db1", "t", None)
125+
126+
val bucketString2 = externalCatalog.client.runSqlHive("DESC FORMATTED db1.t")
127+
.filter(_.contains("Num Buckets")).head
128+
assert(bucketString2.contains("10"))
129+
130+
catalog.renameTable("db1", "t", "t2")
131+
132+
val bucketString3 = externalCatalog.client.runSqlHive("DESC FORMATTED db1.t2")
133+
.filter(_.contains("Num Buckets")).head
134+
assert(bucketString3.contains("10"))
135+
}
136+
111137
test("SPARK-23001: NullPointerException when running desc database") {
112138
val catalog = newBasicCatalog()
113139
catalog.createDatabase(newDb("dbWithNullDesc").copy(description = null), ignoreIfExists = false)

0 commit comments

Comments
 (0)