Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

Remove BisectingKMeansModel.setDistanceMeasure method.
In BisectingKMeansModel set this param is meaningless.

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Sep 7, 2018

Test build #95797 has finished for PR 22360 at commit 64a1c27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 7, 2018

Pardon, why is it meaningless? It's used in the implementation it delegates to.

@WeichenXu123
Copy link
Contributor Author

@srowen The delegated mllib.BisectingKMeansModel is:

class BisectingKMeansModel private[clustering] (
    private[clustering] val root: ClusteringTreeNode,
    @Since("2.4.0") val distanceMeasure: String
  ) extends...

its constructor argument distanceMeasure is immutable and the setDistanceMeasure here only change the wrapper model's param and it effect nothing.

@srowen
Copy link
Member

srowen commented Sep 8, 2018

Oh right, the model, not the implementation. CC @mgaido91 too. OK we can take it out because it was only introduced for 2.4.0.

Hm, so the model still has a distanceMeasure, but it's just not something that should be settable.

The bisecting k-means model just wraps the older .mllib model. But I don't see that its getDistanceMeasure would actually return the value inside that wrapped model? I am missing where the wrapper gets updated to return the actual distance measure used.

Is that also an issue or did I miss something?

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 8, 2018

Thanks for pinging me @srowen.

But I don't see that its getDistanceMeasure would actually return the value inside that wrapped model? I am missing where the wrapper gets updated to return the actual distance measure used.

Yes, that is the reason that setDistanceMeasure was introduces IIRC. So probably the issue here is that is never set. It would be good if we can add a test for this if there is no-one.

@srowen
Copy link
Member

srowen commented Sep 8, 2018

OK so the wrapper model's constructor needs to set this value once from the wrapped model? It does seem like it should be immutable and seems to not be exposed to set in k means for example.

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 8, 2018

Yes, I think the point here is that the parameter is part of BisectingKMeansParams which defines as final the getter method. I think KMeans has the same issue. We can probably remove this and set the distanceMeasure from the parent model at creation time.

@WeichenXu123
Copy link
Contributor Author

Do we need to set distanceMeasure again for the parent model ?
When parent model created, it will use the same distanceMeasure with the one used in training.

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 9, 2018

Oh, right, it is already set in copyValues. Yes, then we can remove this (we should indeed since it takes no effect and can be misleading). Thanks, LGTM.

@srowen
Copy link
Member

srowen commented Sep 9, 2018

Merged to master/2.4

@asfgit asfgit closed this in 88a930d Sep 9, 2018
asfgit pushed a commit that referenced this pull request Sep 9, 2018
## What changes were proposed in this pull request?

Remove `BisectingKMeansModel.setDistanceMeasure` method.
In `BisectingKMeansModel` set this param is meaningless.

## How was this patch tested?

N/A

Closes #22360 from WeichenXu123/bkmeans_update.

Authored-by: WeichenXu <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 88a930d)
Signed-off-by: Sean Owen <[email protected]>
@WeichenXu123 WeichenXu123 deleted the bkmeans_update branch September 10, 2018 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants