-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededregressionSomething that used to work no longer doesSomething that used to work no longer does
Description
Describe the bug
Consider a snippet like this:
df.write_parquet(
"dir/data",
DataFrameWriteOptions::new().with_single_file_output(true),
None
).awaitBefore v43 this would write a single file called data, but in v43 this is creating data as a directory with a randomly named file(s) in it.
This seems to be related to #13079 (cc @dhegberg) that added an extension-based heuristic.
I see this as a regression, as single file output is requested explicitly, and I don't want a heuristics to be applied.
We are using Parquet files with a content-addressable file system and our files don't have extensions.
To Reproduce
See above
Expected behavior
Considering the introduction of the extension-based heuristic I would suggest the following behavior:
with_single_file_outputis not called (single_file_output == None) - apply the heuristicwith_single_file_output(true)- produce a single file at the exact path specifiedwith_single_file_output(false)- create directory under specified path if doesn't exist and write one or many files
Additional context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededregressionSomething that used to work no longer doesSomething that used to work no longer does