Describe the bug
Saving a dataset .to_json() fails with a ValueError since the latest pandas release (2.1.0)
In their latest release we have:
Improved error handling when using DataFrame.to_json() with incompatible index and orient arguments (GH 52143)
i.e. an error is now raised for invalid combinations of index and orient.
This means that unfortunately the custom logic at this line might sometimes lead to contradictions:
|
index = self.to_json_kwargs.pop("index", False if orient in ["split", "table"] else True) |
e.g. for the default case orient=records leads to index=True, which now raises a ValueError
Steps to reproduce the bug
import datasets
if __name__ == '__main__':
dataset = datasets.Dataset.from_dict({"A": [1, 2, 3], "B": [4, 5, 6]})
dataset.to_json("dataset.json")
>>>
ValueError: 'index=True' is only valid when 'orient' is 'split', 'table', 'index', or 'columns'.
Expected behavior
The dataset is successfully saved as .json
Environment info
python >= 3.9
pandas >= 2.1.0