Skip to content

Commit 314fae2

Browse files
committed
[SPARK-23434][SQL] Spark should not warn metadata directory for a HDFS file path
## What changes were proposed in this pull request? In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it warns with a wrong warning message during looking up `people.json/_spark_metadata`. The root cause of this situation is the difference between `LocalFileSystem` and `DistributedFileSystem`. `LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` raises `org.apache.hadoop.security.AccessControlException`. ```scala scala> spark.version res0: String = 2.4.0-SNAPSHOT scala> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show +----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+ scala> spark.read.json("hdfs:///tmp/people.json") 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory. 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory. ``` After this PR, ```scala scala> spark.read.json("hdfs:///tmp/people.json").show +----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+ ``` ## How was this patch tested? Manual. Author: Dongjoon Hyun <[email protected]> Closes #20616 from dongjoon-hyun/SPARK-23434.
1 parent 9bd25c9 commit 314fae2

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,11 @@ object FileStreamSink extends Logging {
4242
try {
4343
val hdfsPath = new Path(singlePath)
4444
val fs = hdfsPath.getFileSystem(hadoopConf)
45-
val metadataPath = new Path(hdfsPath, metadataDir)
46-
val res = fs.exists(metadataPath)
47-
res
45+
if (fs.isDirectory(hdfsPath)) {
46+
fs.exists(new Path(hdfsPath, metadataDir))
47+
} else {
48+
false
49+
}
4850
} catch {
4951
case NonFatal(e) =>
5052
logWarning(s"Error while looking for metadata directory.")

0 commit comments

Comments
 (0)