Skip to content

Conversation

@jliwork
Copy link
Contributor

@jliwork jliwork commented Nov 23, 2015

I was having some problem with rebase on #9709, so I had to close that PR and creating a new pull request with my latest fix.

Thanks to @jkbradley and @holdenk for your comments. I have updated my fix so that it will allow user to config either to filter out null values or throw an error with StringIndexer.setHandleInvalid("skip") API. The default is StringIndexer.setHandleInvalid("error").

Please let me know what you think. Thanks again!

@jliwork
Copy link
Contributor Author

jliwork commented Nov 30, 2015

@jkbradley @holdenk Could you please take a look my latest fix and let me know of any further comments? thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer filters out values that are not present in labaelToIndex

@holdenk
Copy link
Contributor

holdenk commented Dec 3, 2015

Sorry for my slow reply - looking at this it seems like you've updated the meaning of handleInvalid - it no longer serves its original purposes (unless I've missed something). This is probably not quite the best path forward - maybe something for handleNulls and keep the old handle invalid?

I really like the thoroughness of the tests & I think the logic is pretty solid (just changing the meaning of things in the API is to avoided).

@srowen
Copy link
Member

srowen commented May 6, 2016

@jliwork would you like to continue working on this? or else please close the PR

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jliwork
Copy link
Contributor Author

jliwork commented May 12, 2016

Yes. I'd like to continue working on this. But since this PR is obsolete. I will close it and open a new one instead.

@jliwork jliwork closed this May 12, 2016
@holdenk
Copy link
Contributor

holdenk commented May 12, 2016

Great - can you ping me on your new PR when its ready? :)

@jliwork
Copy link
Contributor Author

jliwork commented May 12, 2016

@holdenk Definitely :-) Thanks!

@imatiach-msft
Copy link
Contributor

@jliwork @srowen are you currently working on this in-progress JIRA 11569? If not, I would be interested in continuing the initial pull request that was closed. Please let me know, thank you!

ghost pushed a commit to dbtsai/spark that referenced this pull request Mar 14, 2017
## What changes were proposed in this pull request?

This PR is to enhance StringIndexer with NULL values handling.

Before the PR, StringIndexer will throw an exception when encounters NULL values.
With this PR:
- handleInvalid=error: Throw an exception as before
- handleInvalid=skip: Skip null values as well as unseen labels
- handleInvalid=keep: Give null values an additional index as well as unseen labels

BTW, I noticed someone was trying to solve the same problem ( apache#9920 ) but seems getting no progress or response for a long time. Would you mind to give me a chance to solve it ? I'm eager to help. :-)

## How was this patch tested?

new unit tests

Author: Menglong TAN <[email protected]>
Author: Menglong TAN <[email protected]>

Closes apache#17233 from crackcell/11569_StringIndexer_NULL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants