[SPARK-12070][PYSPARK] PySpark implementation of Slicing operator inc… #10062

zjffdu · 2015-12-01T07:11:07Z

…orrect

SparkQA · 2015-12-01T07:21:56Z

Test build #46950 has finished for PR 10062 at commit dfdebb3.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-01T07:47:31Z

Test build #46953 has finished for PR 10062 at commit 5201882.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-01T08:45:53Z

Test build #46954 has finished for PR 10062 at commit a42528e.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-01T13:08:08Z

Test build #46963 has finished for PR 10062 at commit 9c20341.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zjffdu · 2015-12-01T13:13:09Z

getslice has been deprecated, should use getitem
slice with step is not supported for now, could do it in a follow up ticket

@davies Please help review.

davies · 2015-12-01T18:36:20Z

python/pyspark/sql/column.py

The start of slice is zero based, but startPos of substr is one based, it's confusing between these two, so I'd like to not support slice.

SparkQA · 2015-12-02T02:15:00Z

Test build #47027 has finished for PR 10062 at commit fa89b5a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zjffdu · 2015-12-02T02:18:59Z

@davies The inconsistency between slice and startPos is because in the world of sql people use 1 based while in the world of programmer they usually use 0 based. Column#substr (scala) is already exposed as the 2 usages (one is explicitly used as part of data frame api, another is used implicitly in sql). I think scala programmer will also confuse to find that substr is 1 based for now. Besides, slice is a standard operation for python users. If we don't support this then have to enforce user to use substr directly, they may also confuse at the 1 based substr. I suppose there are more people using data frame api directly than using sql, so should make them comfortable about the api. So here's my suggestion:

Add document on substr to highlight that it is 1 based
deprecate substr and replace it with a new function substring that is 0 based to make the people using data frame api comfortable. So that in the world of sql, they use substr which is 1 based while the programmer use substring which is 0 based.
Use substring to support python slice

Anyway I have to admit there's no perfect solution for now. If necessary, I can start a thread on spark user mail list to get more feedback from users.

rxin · 2016-06-15T22:07:31Z

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.

IshwarBhat · 2018-05-04T21:00:56Z

Thank you for this @davies

I was breaking my head trying to figure out why my slicing of a string column isn't working. df['time'][0:19] instead of just df['time'][:19] worked.

[SPARK-12070][PYSPARK] PySpark implementation of Slicing operator inc…

dfdebb3

…orrect

fix code style

5201882

fix code style

a42528e

zjffdu added 2 commits December 1, 2015 20:28

fix the test failure

8b7fc5a

minor change on format

9c20341

davies reviewed Dec 1, 2015
View reviewed changes

minor change

fa89b5a

asfgit closed this in 1a33f2e Jun 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-12070][PYSPARK] PySpark implementation of Slicing operator inc… #10062

[SPARK-12070][PYSPARK] PySpark implementation of Slicing operator inc… #10062

Uh oh!

zjffdu commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

zjffdu commented Dec 1, 2015

Uh oh!

davies Dec 1, 2015

Uh oh!

SparkQA commented Dec 2, 2015

Uh oh!

zjffdu commented Dec 2, 2015

Uh oh!

rxin commented Jun 15, 2016

Uh oh!

IshwarBhat commented May 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-12070][PYSPARK] PySpark implementation of Slicing operator inc… #10062

[SPARK-12070][PYSPARK] PySpark implementation of Slicing operator inc… #10062

Uh oh!

Conversation

zjffdu commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

SparkQA commented Dec 1, 2015

Uh oh!

zjffdu commented Dec 1, 2015

Uh oh!

davies Dec 1, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 2, 2015

Uh oh!

zjffdu commented Dec 2, 2015

Uh oh!

rxin commented Jun 15, 2016

Uh oh!

IshwarBhat commented May 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants