Skip to content

Commit 92cad2a

Browse files
xinrong-mengdongjoon-hyun
authored andcommitted
[SPARK-49716][PS][DOCS][TESTS] Fix documentation and add test of barh plot
### What changes were proposed in this pull request? - Update the documentation for barh plot to clarify the difference between axis interpretation in Plotly and Matplotlib. - Test multiple columns as value axis. The parameter difference is demonstrated as below. ```py >>> df = ps.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]}) >>> df.plot.barh(x='val', y='lab').show() # plot1 >>> ps.set_option('plotting.backend', 'matplotlib') >>> import matplotlib.pyplot as plt >>> df.plot.barh(x='lab', y='val') >>> plt.show() # plot2 ``` plot1 ![newplot (5)](https://github.com/user-attachments/assets/f1b6fabe-9509-41bb-8cfb-0733f65f1643) plot2 ![Figure_1](https://github.com/user-attachments/assets/10e1b65f-6116-4490-9956-29e1fbf0c053) ### Why are the changes needed? The barh plot’s x and y axis behavior differs between Plotly and Matplotlib, which may confuse users. The updated documentation and tests help ensure clarity and prevent misinterpretation. ### Does this PR introduce _any_ user-facing change? No. Doc change only. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48161 from xinrong-meng/ps_barh. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent f0fb0c8 commit 92cad2a

File tree

2 files changed

+13
-5
lines changed

2 files changed

+13
-5
lines changed

python/pyspark/pandas/plot/core.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -756,10 +756,10 @@ def barh(self, x=None, y=None, **kwargs):
756756
757757
Parameters
758758
----------
759-
x : label or position, default DataFrame.index
760-
Column to be used for categories.
761-
y : label or position, default All numeric columns in dataframe
759+
x : label or position, default All numeric columns in dataframe
762760
Columns to be plotted from the DataFrame.
761+
y : label or position, default DataFrame.index
762+
Column to be used for categories.
763763
**kwds
764764
Keyword arguments to pass on to
765765
:meth:`pyspark.pandas.DataFrame.plot` or :meth:`pyspark.pandas.Series.plot`.
@@ -770,6 +770,13 @@ def barh(self, x=None, y=None, **kwargs):
770770
Return an custom object when ``backend!=plotly``.
771771
Return an ndarray when ``subplots=True`` (matplotlib-only).
772772
773+
Notes
774+
-----
775+
In Plotly and Matplotlib, the interpretation of `x` and `y` for `barh` plots differs.
776+
In Plotly, `x` refers to the values and `y` refers to the categories.
777+
In Matplotlib, `x` refers to the categories and `y` refers to the values.
778+
Ensure correct axis labeling based on the backend used.
779+
773780
See Also
774781
--------
775782
plotly.express.bar : Plot a vertical bar plot using plotly.

python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,9 +105,10 @@ def check_barh_plot_with_x_y(pdf, psdf, x, y):
105105
self.assertEqual(pdf.plot.barh(x=x, y=y), psdf.plot.barh(x=x, y=y))
106106

107107
# this is testing plot with specified x and y
108-
pdf1 = pd.DataFrame({"lab": ["A", "B", "C"], "val": [10, 30, 20]})
108+
pdf1 = pd.DataFrame({"lab": ["A", "B", "C"], "val": [10, 30, 20], "val2": [1.1, 2.2, 3.3]})
109109
psdf1 = ps.from_pandas(pdf1)
110-
check_barh_plot_with_x_y(pdf1, psdf1, x="lab", y="val")
110+
check_barh_plot_with_x_y(pdf1, psdf1, x="val", y="lab")
111+
check_barh_plot_with_x_y(pdf1, psdf1, x=["val", "val2"], y="lab")
111112

112113
def test_barh_plot(self):
113114
def check_barh_plot(pdf, psdf):

0 commit comments

Comments
 (0)