Revert "[SPARK-10216][SQL] Avoid creating empty files during overwrit… #13181

marmbrus · 2016-05-18T22:32:27Z

This reverts commit 8d05a7a from #12855, which seems to have caused regressions when working with empty DataFrames.

…ing with group by query" This reverts commit 8d05a7a.

SparkQA · 2016-05-18T22:45:44Z

Test build #58818 has finished for PR 13181 at commit 2222b38.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-05-18T23:00:38Z

test this please

SparkQA · 2016-05-19T00:15:48Z

Test build #58821 has finished for PR 13181 at commit 2222b38.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-05-19T00:17:28Z

test this please

marmbrus · 2016-05-19T00:39:39Z

hmmm, this might be failing tests? @HyukjinKwon can you investigate if it fails again?

HyukjinKwon · 2016-05-19T00:59:11Z

@marmbrus Sure I will

SparkQA · 2016-05-19T01:52:46Z

Test build #58829 has finished for PR 13181 at commit 2222b38.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-05-19T04:03:57Z

Hi @marmbrus , it seems okay!

HyukjinKwon · 2016-05-20T05:47:27Z

@marmbrus I tested and could produce the exceptions for reading in https://issues.apache.org/jira/browse/SPARK-15393 but it seems this PR might not be the reason.

I tested the codes below on c0c3ec3 (right before this PR) and master branch.

  test("SPARK-15393: create empty file") {
    withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
      withTempPath { path =>
        val schema = StructType(
          StructField("k", StringType, true) ::
          StructField("v", IntegerType, false) :: Nil)
        val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
        emptyDf.write
          .format("parquet")
          .save(path.getCanonicalPath)

        val copyEmptyDf = spark.read
          .format("parquet")
          .load(path.getCanonicalPath)

        copyEmptyDf.show()
      }
    }
  }

and it seems both produce the exceptions below:

Unable to infer schema for ParquetFormat at /private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443. It must be specified manually;
org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at /private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443. It must be specified manually;
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:323)

I will try to figure out why but I don't mind reverting this if you think my PR is problematic in any way. I can fix both issues together anyway later.

jurriaan · 2016-05-20T06:21:41Z

Interesting, I'm currently working with a custom build where I've reverted the PR manually to work around the issue. Will add an testcase to the JIRA

HyukjinKwon · 2016-05-20T06:24:49Z

@jurriaan Maybe I am doing something wrong. I will tell you after testing the one you will add in the JIRA.

marmbrus · 2016-05-20T20:00:07Z

I'm going to go ahead and merge this, but please to ping me on follow up issues that try to add this back.

…rit… This reverts commit 8d05a7a from #12855, which seems to have caused regressions when working with empty DataFrames. Author: Michael Armbrust <[email protected]> Closes #13181 from marmbrus/revert12855. (cherry picked from commit 2ba3ff0) Signed-off-by: Michael Armbrust <[email protected]>

Revert "[SPARK-10216][SQL] Avoid creating empty files during overwrit…

2222b38

…ing with group by query" This reverts commit 8d05a7a.

asfgit closed this in 2ba3ff0 May 20, 2016

HyukjinKwon mentioned this pull request Mar 21, 2021

[SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default #31909

Closed

Revert "[SPARK-10216][SQL] Avoid creating empty files during overwrit… #13181

Revert "[SPARK-10216][SQL] Avoid creating empty files during overwrit… #13181

Uh oh!

Conversation

marmbrus commented May 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented May 18, 2016

Uh oh!

marmbrus commented May 18, 2016

Uh oh!

SparkQA commented May 19, 2016

Uh oh!

marmbrus commented May 19, 2016

Uh oh!

marmbrus commented May 19, 2016

Uh oh!

HyukjinKwon commented May 19, 2016

Uh oh!

SparkQA commented May 19, 2016

Uh oh!

HyukjinKwon commented May 19, 2016

Uh oh!

HyukjinKwon commented May 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jurriaan commented May 20, 2016

Uh oh!

HyukjinKwon commented May 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marmbrus commented May 20, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marmbrus commented May 18, 2016 •

edited

Loading

HyukjinKwon commented May 20, 2016 •

edited

Loading

HyukjinKwon commented May 20, 2016 •

edited

Loading