-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12070][PYSPARK] PySpark implementation of Slicing operator inc… #10062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #46950 has finished for PR 10062 at commit
|
|
Test build #46953 has finished for PR 10062 at commit
|
|
Test build #46954 has finished for PR 10062 at commit
|
|
Test build #46963 has finished for PR 10062 at commit
|
@davies Please help review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start of slice is zero based, but startPos of substr is one based, it's confusing between these two, so I'd like to not support slice.
|
Test build #47027 has finished for PR 10062 at commit
|
|
@davies The inconsistency between slice and startPos is because in the world of sql people use 1 based while in the world of programmer they usually use 0 based. Column#substr (scala) is already exposed as the 2 usages (one is explicitly used as part of data frame api, another is used implicitly in sql). I think scala programmer will also confuse to find that substr is 1 based for now. Besides, slice is a standard operation for python users. If we don't support this then have to enforce user to use substr directly, they may also confuse at the 1 based substr. I suppose there are more people using data frame api directly than using sql, so should make them comfortable about the api. So here's my suggestion:
Anyway I have to admit there's no perfect solution for now. If necessary, I can start a thread on spark user mail list to get more feedback from users. |
|
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. |
|
Thank you for this @davies I was breaking my head trying to figure out why my slicing of a string column isn't working. |
…orrect