-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Dataframe::to_parquet takes a path as string, and used to create a directory and write files to it, even if it did not end with /. Starting with v36, it writes to a single file if the path does not end with /.
To Reproduce
(using the Python library)
from pathlib import Path
import shutil
import datafusion
import pyarrow.dataset
target_path = Path("/tmp/") / "dataset"
# clean up any existing data at /tmp/dataset
if target_path.is_file():
target_path.unlink()
elif target_path.is_dir():
shutil.rmtree(target_path)
assert not target_path.exists()
ctx = datafusion.SessionContext()
ctx.sql("SELECT 1").write_parquet(str(target_path))
assert target_path.exists()
if target_path.is_file():
print(f"{target_path} is a file")
else:
print(
f"{target_path} is a directory with entries:",
", ".join(map(str, target_path.iterdir()))
)
# clean up /tmp/dataset
if target_path.is_file():
target_path.unlink()
elif target_path.is_dir():
shutil.rmtree(target_path)
assert not target_path.exists()v36 behavior:
/tmp/dataset is a file
Expected behavior
v35 behavior:
/tmp/dataset is a directory with entries: /tmp/dataset/u6mIvvxwwFbjo17Q_0.parquet
Additional context
It is common to use the pathlib module in Python when working with paths, and it always strips the trailing /.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working