Skip to content

Conversation

@aditya1702
Copy link

What changes were proposed in this pull request?

This pull allows a dictionary containing string as keys to be passed in the fit() method. Presently it only allows an instance of the Estimator to be passed as key.

How was this patch tested?

This patch was manually tested on a local PC.

@aditya1702
Copy link
Author

aditya1702 commented Nov 13, 2016

@srowen @jaceklaskowski Could you take a look and check if this improvement is proper? Since this is my first contribution it might have some issues. 😅

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@aditya1702 aditya1702 changed the title [SPARK-17116] Allow parameters to be {string,value} dict at runtime [SPARK-17116][PYSPARK] Allow parameters to be {string,value} dict at runtime Nov 13, 2016
@aditya1702 aditya1702 changed the title [SPARK-17116][PYSPARK] Allow parameters to be {string,value} dict at runtime [SPARK-17116][Pyspark] Allow parameters to be {string,value} dict at runtime Nov 13, 2016
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just use isinstance instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon I tried using isinstance(type(params.keys()[0],str) but it returned False. And why cant we directly check if it is a str or not?

Copy link
Member

@HyukjinKwon HyukjinKwon Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant, isinstance(params.keys()[0], str). Direct type comparison does not take care of subclasses. I believe it is a common pattern to check the types. For example,

>>> class A(str):
...     pass
...
>>> isinstance(A(), str)
True
>>> type(A()) is str
False

This might be a major problem in a way, for example, when you give an argument as OrderedDict which extends dict.

Another reason is consistency across codebase. I believe isinstance is more widely used in the codes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Ah I get you. I will make the changes

@HyukjinKwon
Copy link
Member

I think we need a test here maybe and update the argument description if this change is legitimate.

@HyukjinKwon
Copy link
Member

cc @holdenk

@aditya1702
Copy link
Author

@HyukjinKwon this is my first time with contributing in Spark and so I didnt know if we add tests to each pull. If it is proper I will add the tests :)

@aditya1702
Copy link
Author

aditya1702 commented Nov 14, 2016

@HyukjinKwon @holdenk @srowen I have updated the code. Could you please take a look

for param, value in params.items():
p = getattr(self, param)
param_new.update({p: value})
params=param_new
Copy link
Member

@HyukjinKwon HyukjinKwon Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just make it just like the one as below?

params = dict([(getattr(self, p), v) for p, v in params.items()])

BTW, I am not supposed to decide what should be added into Spark. We can wait for committer's review :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Ok sure.

@holdenk
Copy link
Contributor

holdenk commented Nov 15, 2016

Thanks for working on this! So at first glance I'm not super sure if we want to add this (is this being done to be more closely related to the skicitlearn apis or something?)

A good next step would probably be adding some tests for this as well :) Once we've got some tests we can ping some of the Python ML people and see what they think.

@aditya1702
Copy link
Author

@holdenk Sure. I will add the tests. I saw that as per the issue opened the setParams was taking in string dict while fit() method was not. Hence I thought it would be an improvement to do that to the fit() method as well.

@holdenk
Copy link
Contributor

holdenk commented Nov 26, 2016

@aditya1702 - thanks that sounds like a good reason for us to be consistent between the two. Let me know how adding the tests goes :)

@HyukjinKwon
Copy link
Member

Hi @aditya1702, do you mind if I ask whether you are still working on this? Maybe it should be closed for now if you are currently not able to work on this further. It seems inactive for few months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants