Skip to content

Commit 23a79ad

Browse files
cloud-fanRobert Kruszewski
authored andcommitted
[SPARK-23390][SQL] Flaky Test Suite: FileBasedDataSourceSuite in Spark 2.3/hadoop 2.7
## What changes were proposed in this pull request? This test only fails with sbt on Hadoop 2.7, I can't reproduce it locally, but here is my speculation by looking at the code: 1. FileSystem.delete doesn't delete the directory entirely, somehow we can still open the file as a 0-length empty file.(just speculation) 2. ORC intentionally allow empty files, and the reader fails during reading without closing the file stream. This PR improves the test to make sure all files are deleted and can't be opened. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes apache#20584 from cloud-fan/flaky-test.
1 parent 777ea40 commit 23a79ad

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717

1818
package org.apache.spark.sql
1919

20+
import java.io.FileNotFoundException
21+
2022
import org.apache.hadoop.fs.Path
2123

2224
import org.apache.spark.SparkException
@@ -102,17 +104,27 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
102104
def testIgnoreMissingFiles(): Unit = {
103105
withTempDir { dir =>
104106
val basePath = dir.getCanonicalPath
107+
105108
Seq("0").toDF("a").write.format(format).save(new Path(basePath, "first").toString)
106109
Seq("1").toDF("a").write.format(format).save(new Path(basePath, "second").toString)
110+
107111
val thirdPath = new Path(basePath, "third")
112+
val fs = thirdPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
108113
Seq("2").toDF("a").write.format(format).save(thirdPath.toString)
114+
val files = fs.listStatus(thirdPath).filter(_.isFile).map(_.getPath)
115+
109116
val df = spark.read.format(format).load(
110117
new Path(basePath, "first").toString,
111118
new Path(basePath, "second").toString,
112119
new Path(basePath, "third").toString)
113120

114-
val fs = thirdPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
121+
// Make sure all data files are deleted and can't be opened.
122+
files.foreach(f => fs.delete(f, false))
115123
assert(fs.delete(thirdPath, true))
124+
for (f <- files) {
125+
intercept[FileNotFoundException](fs.open(f))
126+
}
127+
116128
checkAnswer(df, Seq(Row("0"), Row("1")))
117129
}
118130
}

0 commit comments

Comments
 (0)