Skip to content

Commit 064db17

Browse files
committed
[SPARK-17810][SQL] Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
Always resolve spark.sql.warehouse.dir as a local path, and as relative to working dir not home dir Existing tests. Author: Sean Owen <[email protected]> Closes #15382 from srowen/SPARK-17810. (cherry picked from commit 4ecbe1b) Signed-off-by: Sean Owen <[email protected]>
1 parent 00a2e01 commit 064db17

File tree

7 files changed

+22
-45
lines changed

7 files changed

+22
-45
lines changed

docs/sql-programming-guide.md

Lines changed: 5 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -857,50 +857,27 @@ access data stored in Hive.
857857
Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` (for security configuration),
858858
and `hdfs-site.xml` (for HDFS configuration) file in `conf/`.
859859

860-
<div class="codetabs">
861-
862-
<div data-lang="scala" markdown="1">
863-
864860
When working with Hive, one must instantiate `SparkSession` with Hive support, including
865861
connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions.
866862
Users who do not have an existing Hive deployment can still enable Hive support. When not configured
867863
by the `hive-site.xml`, the context automatically creates `metastore_db` in the current directory and
868864
creates a directory configured by `spark.sql.warehouse.dir`, which defaults to the directory
869-
`spark-warehouse` in the current directory that the spark application is started. Note that
865+
`spark-warehouse` in the current directory that the Spark application is started. Note that
870866
the `hive.metastore.warehouse.dir` property in `hive-site.xml` is deprecated since Spark 2.0.0.
871867
Instead, use `spark.sql.warehouse.dir` to specify the default location of database in warehouse.
872-
You may need to grant write privilege to the user who starts the spark application.
868+
You may need to grant write privilege to the user who starts the Spark application.
873869

870+
<div class="codetabs">
871+
872+
<div data-lang="scala" markdown="1">
874873
{% include_example spark_hive scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala %}
875874
</div>
876875

877876
<div data-lang="java" markdown="1">
878-
879-
When working with Hive, one must instantiate `SparkSession` with Hive support, including
880-
connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions.
881-
Users who do not have an existing Hive deployment can still enable Hive support. When not configured
882-
by the `hive-site.xml`, the context automatically creates `metastore_db` in the current directory and
883-
creates a directory configured by `spark.sql.warehouse.dir`, which defaults to the directory
884-
`spark-warehouse` in the current directory that the spark application is started. Note that
885-
the `hive.metastore.warehouse.dir` property in `hive-site.xml` is deprecated since Spark 2.0.0.
886-
Instead, use `spark.sql.warehouse.dir` to specify the default location of database in warehouse.
887-
You may need to grant write privilege to the user who starts the spark application.
888-
889877
{% include_example spark_hive java/org/apache/spark/examples/sql/hive/JavaSparkHiveExample.java %}
890878
</div>
891879

892880
<div data-lang="python" markdown="1">
893-
894-
When working with Hive, one must instantiate `SparkSession` with Hive support, including
895-
connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions.
896-
Users who do not have an existing Hive deployment can still enable Hive support. When not configured
897-
by the `hive-site.xml`, the context automatically creates `metastore_db` in the current directory and
898-
creates a directory configured by `spark.sql.warehouse.dir`, which defaults to the directory
899-
`spark-warehouse` in the current directory that the spark application is started. Note that
900-
the `hive.metastore.warehouse.dir` property in `hive-site.xml` is deprecated since Spark 2.0.0.
901-
Instead, use `spark.sql.warehouse.dir` to specify the default location of database in warehouse.
902-
You may need to grant write privilege to the user who starts the spark application.
903-
904881
{% include_example spark_hive python/sql/hive.py %}
905882
</div>
906883

examples/src/main/java/org/apache/spark/examples/sql/hive/JavaSparkHiveExample.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ public void setValue(String value) {
5656
public static void main(String[] args) {
5757
// $example on:spark_hive$
5858
// warehouseLocation points to the default location for managed databases and tables
59-
String warehouseLocation = "file:" + System.getProperty("user.dir") + "spark-warehouse";
59+
String warehouseLocation = "spark-warehouse";
6060
SparkSession spark = SparkSession
6161
.builder()
6262
.appName("Java Spark Hive Example")

examples/src/main/python/sql/hive.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
if __name__ == "__main__":
3535
# $example on:spark_hive$
3636
# warehouse_location points to the default location for managed databases and tables
37-
warehouse_location = 'file:${system:user.dir}/spark-warehouse'
37+
warehouse_location = 'spark-warehouse'
3838

3939
spark = SparkSession \
4040
.builder \

examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ object SparkHiveExample {
3838

3939
// $example on:spark_hive$
4040
// warehouseLocation points to the default location for managed databases and tables
41-
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
41+
val warehouseLocation = "spark-warehouse"
4242

4343
val spark = SparkSession
4444
.builder()

sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ import org.apache.spark.internal.Logging
3030
import org.apache.spark.internal.config._
3131
import org.apache.spark.network.util.ByteUnit
3232
import org.apache.spark.sql.catalyst.CatalystConf
33+
import org.apache.spark.util.Utils
3334

3435
////////////////////////////////////////////////////////////////////////////////////////////////////
3536
// This file defines the configuration options for Spark SQL.
@@ -56,7 +57,7 @@ object SQLConf {
5657
val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
5758
.doc("The default location for managed databases and tables.")
5859
.stringConf
59-
.createWithDefault("${system:user.dir}/spark-warehouse")
60+
.createWithDefault(Utils.resolveURI("spark-warehouse").toString)
6061

6162
val OPTIMIZER_MAX_ITERATIONS = SQLConfigBuilder("spark.sql.optimizer.maxIterations")
6263
.internal()

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
115115
val catalog = spark.sessionState.catalog
116116

117117
withTempDir { tmpDir =>
118-
val path = tmpDir.toString
118+
val path = tmpDir.getCanonicalPath
119119
// The generated temp path is not qualified.
120120
assert(!path.startsWith("file:/"))
121121
val uri = tmpDir.toURI
@@ -147,7 +147,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
147147

148148
test("Create/Drop Database") {
149149
withTempDir { tmpDir =>
150-
val path = tmpDir.toString
150+
val path = tmpDir.getCanonicalPath
151151
withSQLConf(SQLConf.WAREHOUSE_PATH.key -> path) {
152152
val catalog = spark.sessionState.catalog
153153
val databaseNames = Seq("db1", "`database`")
@@ -158,7 +158,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
158158

159159
sql(s"CREATE DATABASE $dbName")
160160
val db1 = catalog.getDatabaseMetadata(dbNameWithoutBackTicks)
161-
val expectedLocation = makeQualifiedPath(path + "/" + s"$dbNameWithoutBackTicks.db")
161+
val expectedLocation = makeQualifiedPath(s"$path/$dbNameWithoutBackTicks.db")
162162
assert(db1 == CatalogDatabase(
163163
dbNameWithoutBackTicks,
164164
"",
@@ -183,9 +183,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
183183
try {
184184
sql(s"CREATE DATABASE $dbName")
185185
val db1 = catalog.getDatabaseMetadata(dbName)
186-
val expectedLocation =
187-
makeQualifiedPath(s"${System.getProperty("user.dir")}/spark-warehouse" +
188-
"/" + s"$dbName.db")
186+
val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db")
189187
assert(db1 == CatalogDatabase(
190188
dbName,
191189
"",
@@ -203,7 +201,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
203201
val catalog = spark.sessionState.catalog
204202
val databaseNames = Seq("db1", "`database`")
205203
withTempDir { tmpDir =>
206-
val path = new Path(tmpDir.toString).toUri.toString
204+
val path = new Path(tmpDir.getCanonicalPath).toUri
207205
databaseNames.foreach { dbName =>
208206
try {
209207
val dbNameWithoutBackTicks = cleanIdentifier(dbName)
@@ -226,7 +224,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
226224

227225
test("Create Database - database already exists") {
228226
withTempDir { tmpDir =>
229-
val path = tmpDir.toString
227+
val path = tmpDir.getCanonicalPath
230228
withSQLConf(SQLConf.WAREHOUSE_PATH.key -> path) {
231229
val catalog = spark.sessionState.catalog
232230
val databaseNames = Seq("db1", "`database`")
@@ -236,7 +234,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
236234
val dbNameWithoutBackTicks = cleanIdentifier(dbName)
237235
sql(s"CREATE DATABASE $dbName")
238236
val db1 = catalog.getDatabaseMetadata(dbNameWithoutBackTicks)
239-
val expectedLocation = makeQualifiedPath(path + "/" + s"$dbNameWithoutBackTicks.db")
237+
val expectedLocation = makeQualifiedPath(s"$path/$dbNameWithoutBackTicks.db")
240238
assert(db1 == CatalogDatabase(
241239
dbNameWithoutBackTicks,
242240
"",
@@ -269,15 +267,15 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
269267

270268
test("Alter/Describe Database") {
271269
withTempDir { tmpDir =>
272-
val path = tmpDir.toString
270+
val path = tmpDir.getCanonicalPath
273271
withSQLConf(SQLConf.WAREHOUSE_PATH.key -> path) {
274272
val catalog = spark.sessionState.catalog
275273
val databaseNames = Seq("db1", "`database`")
276274

277275
databaseNames.foreach { dbName =>
278276
try {
279277
val dbNameWithoutBackTicks = cleanIdentifier(dbName)
280-
val location = makeQualifiedPath(path + "/" + s"$dbNameWithoutBackTicks.db")
278+
val location = makeQualifiedPath(s"$path/$dbNameWithoutBackTicks.db")
281279

282280
sql(s"CREATE DATABASE $dbName")
283281

sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ import org.apache.hadoop.fs.Path
2222
import org.apache.spark.sql.{QueryTest, Row, SparkSession, SQLContext}
2323
import org.apache.spark.sql.execution.WholeStageCodegenExec
2424
import org.apache.spark.sql.test.{SharedSQLContext, TestSQLContext}
25+
import org.apache.spark.util.Utils
2526

2627
class SQLConfSuite extends QueryTest with SharedSQLContext {
2728
private val testKey = "test.key.0"
@@ -215,8 +216,8 @@ class SQLConfSuite extends QueryTest with SharedSQLContext {
215216
try {
216217
// to get the default value, always unset it
217218
spark.conf.unset(SQLConf.WAREHOUSE_PATH.key)
218-
assert(spark.sessionState.conf.warehousePath
219-
=== new Path(s"${System.getProperty("user.dir")}/spark-warehouse").toString)
219+
assert(new Path(Utils.resolveURI("spark-warehouse")).toString ===
220+
spark.sessionState.conf.warehousePath + "/")
220221
} finally {
221222
sql(s"set ${SQLConf.WAREHOUSE_PATH}=$original")
222223
}

0 commit comments

Comments
 (0)