Skip to content

Conversation

@zjffdu
Copy link
Contributor

@zjffdu zjffdu commented Dec 1, 2015

…orrect

@SparkQA
Copy link

SparkQA commented Dec 1, 2015

Test build #46950 has finished for PR 10062 at commit dfdebb3.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 1, 2015

Test build #46953 has finished for PR 10062 at commit 5201882.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 1, 2015

Test build #46954 has finished for PR 10062 at commit a42528e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 1, 2015

Test build #46963 has finished for PR 10062 at commit 9c20341.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu
Copy link
Contributor Author

zjffdu commented Dec 1, 2015

  • getslice has been deprecated, should use getitem
  • slice with step is not supported for now, could do it in a follow up ticket

@davies Please help review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start of slice is zero based, but startPos of substr is one based, it's confusing between these two, so I'd like to not support slice.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47027 has finished for PR 10062 at commit fa89b5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu
Copy link
Contributor Author

zjffdu commented Dec 2, 2015

@davies The inconsistency between slice and startPos is because in the world of sql people use 1 based while in the world of programmer they usually use 0 based. Column#substr (scala) is already exposed as the 2 usages (one is explicitly used as part of data frame api, another is used implicitly in sql). I think scala programmer will also confuse to find that substr is 1 based for now. Besides, slice is a standard operation for python users. If we don't support this then have to enforce user to use substr directly, they may also confuse at the 1 based substr. I suppose there are more people using data frame api directly than using sql, so should make them comfortable about the api. So here's my suggestion:

  • Add document on substr to highlight that it is 1 based
  • deprecate substr and replace it with a new function substring that is 0 based to make the people using data frame api comfortable. So that in the world of sql, they use substr which is 1 based while the programmer use substring which is 0 based.
  • Use substring to support python slice

Anyway I have to admit there's no perfect solution for now. If necessary, I can start a thread on spark user mail list to get more feedback from users.

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
@IshwarBhat
Copy link

Thank you for this @davies

I was breaking my head trying to figure out why my slicing of a string column isn't working. df['time'][0:19] instead of just df['time'][:19] worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants