Skip to content

Conversation

@cfmcgrady
Copy link
Contributor

What changes were proposed in this pull request?

An empty dataframe is saved with partitions should write a metadata only file, make this behavior the same with the non-partitioned dataframe.(see PR-20525)

// create an empty DF with schema
val inputDF = Seq(
  ("value1", "value2", "partition1"),
  ("value3", "value4", "partition2"))
  .toDF("some_column_1", "some_column_2", "some_partition_column_1")
  .where("1==2")

// write dataframe into partitions
inputDF.write
  .partitionBy("some_partition_column_1")
  .mode(SaveMode.Overwrite)
  .parquet("/tmp/parquet/t1")


// Read dataframe
val readDF = spark.read.parquet("/tmp/parquet/t1")

Before this PR, an AnalysisException will throw.

 [SPARK-35592●●] >tree /tmp/parquet/t1
/tmp/parquet/t1
└── _SUCCESS

0 directories, 1 file

After this PR

 [SPARK-35592●●] >tree /tmp/parquet/t1
/tmp/parquet/t1
├── _SUCCESS
└── some_partition_column_1=__HIVE_DEFAULT_PARTITION__
    └── part-00000-2a29f11e-64fb-450d-8916-91ccac53476c.c000.snappy.parquet

1 directory, 2 files

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Jun 6, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@cfmcgrady cfmcgrady closed this Jun 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants