[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta` #48159

zhengruifeng · 2024-09-19T04:04:53Z

What changes were proposed in this pull request?

Refine the string representation of timedelta, by following the ISO format.
Note that the used units in JVM side (Duration) and Pandas are different.

Why are the changes needed?

We should not leak the raw data

Does this PR introduce any user-facing change?

yes

PySpark Classic:

In [1]: from pyspark.sql import functions as sf

In [2]: import datetime

In [3]: sf.lit(datetime.timedelta(1, 1))
Out[3]: Column<'PT24H1S'>

PySpark Connect (before):

In [1]: from pyspark.sql import functions as sf

In [2]: import datetime

In [3]: sf.lit(datetime.timedelta(1, 1))
Out[3]: Column<'86401000000'>

PySpark Connect (after):

In [1]: from pyspark.sql import functions as sf

In [2]: import datetime

In [3]: sf.lit(datetime.timedelta(1, 1))
Out[3]: Column<'P1DT0H0M1S'>

How was this patch tested?

added test

Was this patch authored or co-authored using generative AI tooling?

no

init

HyukjinKwon · 2024-09-19T04:14:03Z

python/pyspark/sql/connect/expressions.py

+            delta = DayTimeIntervalType().fromInternal(self._value)
+            if delta is not None and isinstance(delta, datetime.timedelta):
+                try:
+                    import pandas as pd


spark connect requires pyarrow/pandas iirc so you won't need to handle exceptions here if that's what you're covering

yeah, the pandas is a mandatory dependency for connect, let me remove this try-catch

HyukjinKwon · 2024-09-19T04:15:00Z

python/pyspark/sql/tests/test_column.py

+            s = str(sf.lit(delta))
+
+            # Parse the ISO string representation and compare
+            self.assertTrue(pd.Timedelta(s[8:-2]).to_pytimedelta() == delta)


I think it should be connect specific test?

it also works for pyspark classic.

Classic also use a ISO-8601 string, but JVM side and Pandas apply different units.

A string representation from the JVM side can also be parsed by Pandas.

This test will be ran in both classic and connect

zhengruifeng · 2024-09-19T13:11:31Z

thanks, merged to master

zhengruifeng added 2 commits September 19, 2024 10:43

init

f307cbc

init

fix

c058531

github-actions bot added SQL PYTHON CONNECT labels Sep 19, 2024

zhengruifeng requested a review from HyukjinKwon September 19, 2024 04:07

HyukjinKwon reviewed Sep 19, 2024

View reviewed changes

address comments

ed8222f

HyukjinKwon approved these changes Sep 19, 2024

View reviewed changes

zhengruifeng closed this in 94dca78 Sep 19, 2024

zhengruifeng deleted the pc_lit_delta branch September 19, 2024 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta` #48159

[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta` #48159

Uh oh!

zhengruifeng commented Sep 19, 2024 •

edited

Loading

Uh oh!

HyukjinKwon Sep 19, 2024

Uh oh!

zhengruifeng Sep 19, 2024

Uh oh!

HyukjinKwon Sep 19, 2024

Uh oh!

zhengruifeng Sep 19, 2024

Uh oh!

zhengruifeng Sep 19, 2024

Uh oh!

zhengruifeng commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-49693][PYTHON][CONNECT] Refine the string representation of timedelta #48159

[SPARK-49693][PYTHON][CONNECT] Refine the string representation of timedelta #48159

Uh oh!

Conversation

zhengruifeng commented Sep 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon Sep 19, 2024

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Sep 19, 2024

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 19, 2024

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Sep 19, 2024

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Sep 19, 2024

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta` #48159

[SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta` #48159

zhengruifeng commented Sep 19, 2024 •

edited

Loading