-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20639][SQL] Add single argument support for to_timestamp in SQL with documentation improvement #17901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
45bf353
f8921f4
b2d3b0a
497a229
b6f867c
fc02460
b038927
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -144,12 +144,6 @@ def _(): | |
| 'measured in radians.', | ||
| } | ||
|
|
||
| _functions_2_2 = { | ||
| 'to_date': 'Converts a string date into a DateType using the (optionally) specified format.', | ||
| 'to_timestamp': 'Converts a string timestamp into a timestamp type using the ' + | ||
| '(optionally) specified format.', | ||
| } | ||
|
|
||
| # math functions that take two arguments as input | ||
| _binary_mathfunctions = { | ||
| 'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' + | ||
|
|
@@ -987,9 +981,10 @@ def months_between(date1, date2): | |
| def to_date(col, format=None): | ||
| """Converts a :class:`Column` of :class:`pyspark.sql.types.StringType` or | ||
| :class:`pyspark.sql.types.TimestampType` into :class:`pyspark.sql.types.DateType` | ||
| using the optionally specified format. Default format is 'yyyy-MM-dd'. | ||
| Specify formats according to | ||
| using the optionally specified format. Specify formats according to | ||
| `SimpleDateFormats <http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html>`_. | ||
| By default, it follows casting rules to :class:`pyspark.sql.types.DateType` if the format | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto, not sure if it's clear to python user with |
||
| is omitted (equivalent to ``col.cast("date")``). | ||
|
|
||
| >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t']) | ||
| >>> df.select(to_date(df.t).alias('date')).collect() | ||
|
|
@@ -1011,9 +1006,10 @@ def to_date(col, format=None): | |
| def to_timestamp(col, format=None): | ||
| """Converts a :class:`Column` of :class:`pyspark.sql.types.StringType` or | ||
| :class:`pyspark.sql.types.TimestampType` into :class:`pyspark.sql.types.DateType` | ||
| using the optionally specified format. Default format is 'yyyy-MM-dd HH:mm:ss'. Specify | ||
| formats according to | ||
| using the optionally specified format. Specify formats according to | ||
| `SimpleDateFormats <http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html>`_. | ||
| By default, it follows casting rules to :class:`pyspark.sql.types.TimestampType` if the format | ||
| is omitted (equivalent to ``col.cast("timestamp")``). | ||
|
|
||
| >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t']) | ||
| >>> df.select(to_timestamp(df.t).alias('dt')).collect() | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2683,13 +2683,12 @@ object functions { | |
| def unix_timestamp(s: Column, p: String): Column = withExpr { UnixTimestamp(s.expr, Literal(p)) } | ||
|
|
||
| /** | ||
| * Convert time string to a Unix timestamp (in seconds). | ||
| * Uses the pattern "yyyy-MM-dd HH:mm:ss" and will return null on failure. | ||
| * Convert time string to a Unix timestamp (in seconds) by casting rules to `TimestampType`. | ||
| * @group datetime_funcs | ||
| * @since 2.2.0 | ||
| */ | ||
| def to_timestamp(s: Column): Column = withExpr { | ||
| new ParseToTimestamp(s.expr, Literal("yyyy-MM-dd HH:mm:ss")) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here we change the default value of the format string to be locale sensitive(same as
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rxin and @cloud-fan, I would rather take out the change here if this holds off this PR. This is essentially orthogonal with this PR. |
||
| new ParseToTimestamp(s.expr) | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -2704,15 +2703,15 @@ object functions { | |
| } | ||
|
|
||
| /** | ||
| * Converts the column into DateType. | ||
| * Converts the column into `DateType` by casting rules to `DateType`. | ||
| * | ||
| * @group datetime_funcs | ||
| * @since 1.5.0 | ||
| */ | ||
| def to_date(e: Column): Column = withExpr { ToDate(e.expr) } | ||
| def to_date(e: Column): Column = withExpr { new ParseToDate(e.expr) } | ||
|
|
||
| /** | ||
| * Converts the column into a DateType with a specified format | ||
| * Converts the column into a `DateType` with a specified format | ||
| * (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) | ||
| * return null if fail. | ||
| * | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, instead of deleting this we should keep it and we should add
this is for doc tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, we might need this as a standalone fix for 2.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to my knowledge, these were for defining single argumented function that takes a column conveniently but we are defining them below already and both look taking additional format argument. Finally both look having the annotatioms correctly.
Let me double check and address this comment if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, that's right - we don't need them - not sure if these are left behind from before
formatparameter was added or something.