-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Revert "[SPARK-10216][SQL] Avoid creating empty files during overwrit… #13181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing with group by query" This reverts commit 8d05a7a.
|
Test build #58818 has finished for PR 13181 at commit
|
|
test this please |
|
Test build #58821 has finished for PR 13181 at commit
|
|
test this please |
|
hmmm, this might be failing tests? @HyukjinKwon can you investigate if it fails again? |
|
@marmbrus Sure I will |
|
Test build #58829 has finished for PR 13181 at commit
|
|
Hi @marmbrus , it seems okay! |
|
@marmbrus I tested and could produce the exceptions for reading in https://issues.apache.org/jira/browse/SPARK-15393 but it seems this PR might not be the reason. I tested the codes below on c0c3ec3 (right before this PR) and master branch. test("SPARK-15393: create empty file") {
withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
withTempPath { path =>
val schema = StructType(
StructField("k", StringType, true) ::
StructField("v", IntegerType, false) :: Nil)
val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
emptyDf.write
.format("parquet")
.save(path.getCanonicalPath)
val copyEmptyDf = spark.read
.format("parquet")
.load(path.getCanonicalPath)
copyEmptyDf.show()
}
}
}and it seems both produce the exceptions below: Unable to infer schema for ParquetFormat at /private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443. It must be specified manually;
org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at /private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443. It must be specified manually;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:323)I will try to figure out why but I don't mind reverting this if you think my PR is problematic in any way. I can fix both issues together anyway later. |
|
Interesting, I'm currently working with a custom build where I've reverted the PR manually to work around the issue. Will add an testcase to the JIRA |
|
@jurriaan Maybe I am doing something wrong. I will tell you after testing the one you will add in the JIRA. |
|
I'm going to go ahead and merge this, but please to ping me on follow up issues that try to add this back. |
…rit… This reverts commit 8d05a7a from #12855, which seems to have caused regressions when working with empty DataFrames. Author: Michael Armbrust <[email protected]> Closes #13181 from marmbrus/revert12855. (cherry picked from commit 2ba3ff0) Signed-off-by: Michael Armbrust <[email protected]>
This reverts commit 8d05a7a from #12855, which seems to have caused regressions when working with empty DataFrames.