You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows
This is PR is meant to replace #20503, which lay dormant for a while. The solution in the original PR is still valid, so this is just that patch rebased onto the current master.
Original summary follows.
## What changes were proposed in this pull request?
Fix `__repr__` behaviour for Rows.
Rows `__repr__` assumes data is a string when column name is missing.
Examples,
```
>>> from pyspark.sql.types import Row
>>> Row ("Alice", "11")
<Row(Alice, 11)>
>>> Row (name="Alice", age=11)
Row(age=11, name='Alice')
>>> Row ("Alice", 11)
<snip stack trace>
TypeError: sequence item 1: expected string, int found
```
This is because Row () when called without column names assumes everything is a string.
## How was this patch tested?
Manually tested and a unit test was added to `python/pyspark/sql/tests/test_types.py`.
Closes#24448 from tbcs/SPARK-23299.
Lead-authored-by: Tibor Csögör <[email protected]>
Co-authored-by: Shashwat Anand <[email protected]>
Signed-off-by: Holden Karau <[email protected]>
0 commit comments