-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24324][PYTHON] Pandas Grouped Map UDF should assign result columns by name #21427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
21e0c3e
bbe3587
a653e9b
88e2aa3
7cc0c49
d4b5da1
63c3963
9bbf014
74c5d8e
b2d0966
5a7edb2
59972d6
27b4cad
c593650
2d2ced6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,7 @@ import org.apache.arrow.memory.RootAllocator | |
| import org.apache.arrow.vector.types.{DateUnit, FloatingPointPrecision, TimeUnit} | ||
| import org.apache.arrow.vector.types.pojo.{ArrowType, Field, FieldType, Schema} | ||
|
|
||
| import org.apache.spark.sql.internal.SQLConf | ||
| import org.apache.spark.sql.types._ | ||
|
|
||
| object ArrowUtils { | ||
|
|
@@ -120,4 +121,19 @@ object ArrowUtils { | |
| StructField(field.getName, dt, field.isNullable) | ||
| }) | ||
| } | ||
|
|
||
| /** Return Map with conf settings to be used in ArrowPythonRunner */ | ||
| def getPythonRunnerConfMap(conf: SQLConf): Map[String, String] = { | ||
| val timeZoneConf = if (conf.pandasRespectSessionTimeZone) { | ||
| Seq(SQLConf.SESSION_LOCAL_TIMEZONE.key -> conf.sessionLocalTimeZone) | ||
| } else { | ||
| Nil | ||
| } | ||
| val pandasColsByPosition = if (conf.pandasGroupedMapAssignColumnssByPosition) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we do:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's better to just omit the config for the default case, that way it's easier to process in the worker.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am sorry can you explain why it's easier to process in the worker? I think we just need to remove the default value here? Also one thing is not great about omitting the conf for default case is that you need to put the default value in two places..(both python and java) |
||
| Seq(SQLConf.PANDAS_GROUPED_MAP_ASSIGN_COLUMNS_BY_POSITION.key -> "true") | ||
| } else { | ||
| Nil | ||
| } | ||
| Map(timeZoneConf ++ pandasColsByPosition: _*) | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this function out ofArrowUtils? Doesn't seem to be Arrow specific.Edit: Actually, nvm