FEA add ValueDifferenceMetric as a pairwise metric #796

glemaitre · 2021-02-13T20:56:49Z

…ples

pep8speaks · 2021-02-13T21:16:53Z

Hello @glemaitre! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file doc/conf.py:

Line 23:1: E402 module level import not at top of file
Line 87:1: E402 module level import not at top of file

In the file imblearn/metrics/_classification.py:

Line 764:17: W503 line break before binary operator

Comment last updated at 2021-02-14 18:28:41 UTC

chkoar · 2021-02-13T21:21:37Z

Could you add in this branch some bench code so we could test vdm performance in order to improve it?

glemaitre · 2021-02-13T22:06:08Z

Could you add in this branch some bench code so we could test vdm performance in order to improve it?

Yep. We could do that. I made some profiling and I am actually not sure that we can speed-up the computation.
Indeed, the expensive part was the encoding (that I put outside and a requirement as data structure).

codecov · 2021-02-13T22:11:22Z

Codecov Report

Merging #796 (5526c8a) into master (b00e31b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #796   +/-   ##
=======================================
  Coverage   98.62%   98.62%           
=======================================
  Files          89       89           
  Lines        5881     5881           
  Branches      494      494           
=======================================
  Hits         5800     5800           
  Misses         80       80           
  Partials        1        1

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a66fb7d...f545386. Read the comment docs.

glemaitre · 2021-02-13T23:39:39Z

@chkoar I am wondering if this metric should be public or private. Indeed, it required some OrdinalEncoder and proper dtype that we will manage internally but I don't know if we want a user to use it outside.

chkoar · 2021-02-14T12:04:42Z

@chkoar I am wondering if this metric should be public or private.

We could add it without a leading _ but without a reference to the docs.
I played with the DistanceMetric API but it seems slow.

glemaitre · 2021-02-14T12:20:46Z

I played with the DistanceMetric API but it seems slow.

It seems x20 slower than the current implementation. I think that we are fine to go indeed.
The only issue is that we will have a matrix of (n_samples, n_samples) leaving in memory.
But it might be best than not having SMOTEN :)

glemaitre · 2021-02-14T12:21:58Z

We could add it without a leading _ but without a reference to the docs.

I am starting to think that it could leave in the documentation as well.
It is not too tricky to use it publicly. The only requirement is to use an OrdinalEncoder. I think it is quite explicit in the doc.

chkoar

In the case that we will use fit, probably we could inherit from the base estimator since it estimates from data.

class ValueDifferenceMetric(BaseEstimator):
    def __init__(self, k=1, r=2):...

    def fit(self, X, y):
        # learning unique classes here

    def pairwise(self, X, Y=None):...

On the other hand, another option would be to require the data in the init.

class ValueDifferenceMetric:
    def __init__(self, X, y, k=1, r=2):...

    def pairwise(self, X, Y=None):...

Additionally we could implement the callable API

vdm = ValueDifferenceMetric(...)
distance = vdm(x1, x2)

Design wise, I would be in favor for the following in order to use the DistanceMetric API but is way to slow.

class ValueDifferenceMetric:
    def __init__(self, X, y):...

    def __call__(self, x1, x2, k=1, r=2):... 

vdm = ValueDifferenceMetric(X,y)

metric = DistanceMetric.get_metric(vdm)
metric.pairwise(X)

# or

knn = KNearestNeighbors(metric=metric)

All these assuming that X is ordinal encoded with ints.

imblearn/metrics/pairwise.py

imblearn/metrics/tests/test_pairwise.py

imblearn/metrics/pairwise.py

chkoar · 2021-02-14T12:27:51Z

The only issue is that we will have a matrix of (n_samples, n_samples) leaving in memory.

Can we get rid of that after the sampling?

glemaitre · 2021-02-14T12:32:41Z

Can we get rid of that after the sampling?

yes, it is just temporary for the sampling for the NN search.

Co-authored-by: Christos Aridas <[email protected]>

glemaitre · 2021-02-14T17:23:29Z

@chkoar I think that I would like to see this PR merge as is and open another one for SMOTEN.
I think that we have a nice coverage regarding the basic usage case in the test.

Do you see anything else to add?

doc/metrics.rst

imblearn/metrics/_classification.py

chkoar · 2021-02-14T17:26:47Z

@chkoar I think that I would like to see this PR merge as is and open another one for SMOTEN.

Agreed.

Co-authored-by: Christos Aridas <[email protected]>

glemaitre · 2021-02-14T18:32:15Z

OK I think this is good to be merged. I fixed the issue with what's new. @chkoar Feel free to merge.

chkoar · 2021-02-15T02:50:59Z

OK I think this is good to be merged. I fixed the issue with what's new. @chkoar Feel free to merge.

Done

glemaitre added 2 commits February 13, 2021 21:55

FEA add ValueDifferenceMetric to compute distance between nominal sam…

efaa4b0

…ples

linting

d0e5d2f

style

e59585a

glemaitre added 2 commits February 13, 2021 23:20

TST basic tests

92d89ca

TST check that still true wiht differen r and k

6cc8969

glemaitre changed the title ~~FEA add SMOTEN for nominal categorical features only~~ FEA add ValueDifferenceMetric as a pairwise metric Feb 13, 2021

glemaitre added 4 commits February 14, 2021 00:09

DEBUG

adb2433

iter

1635aca

iter

4e15dfb

iter

1d521d0

iter

dc1de98

improve support for str labels

7eaa696

chkoar reviewed Feb 14, 2021

View reviewed changes

imblearn/metrics/pairwise.py Outdated Show resolved Hide resolved

imblearn/metrics/tests/test_pairwise.py Outdated Show resolved Hide resolved

imblearn/metrics/pairwise.py Outdated Show resolved Hide resolved

imblearn/metrics/pairwise.py Outdated Show resolved Hide resolved

glemaitre and others added 7 commits February 14, 2021 13:33

Apply suggestions from code review

24232e0

Co-authored-by: Christos Aridas <[email protected]>

Apply suggestions from code review

2df76aa

Co-authored-by: Christos Aridas <[email protected]>

TST improve test and add auto mode for n_categories

e449a3c

add tags to mention that we expect categorical as X input

a4a6026

speed-up

741b76f

fix when missing categories

b52961e

TST add test for corner case

80d68aa

DOC update user guide

cc56be3

chkoar reviewed Feb 14, 2021

View reviewed changes

doc/metrics.rst Outdated Show resolved Hide resolved

imblearn/metrics/_classification.py Outdated Show resolved Hide resolved

glemaitre and others added 4 commits February 14, 2021 18:46

Apply suggestions from code review

7c897f6

Co-authored-by: Christos Aridas <[email protected]>

iter

5526c8a

add entry whats new

f16bfdd

merge master

f545386

glemaitre merged commit ce4e1f7 into scikit-learn-contrib:master Feb 14, 2021

FEA add ValueDifferenceMetric as a pairwise metric #796

FEA add ValueDifferenceMetric as a pairwise metric #796

Uh oh!

Conversation

glemaitre commented Feb 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Feb 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-02-14 18:28:41 UTC

Uh oh!

chkoar commented Feb 13, 2021

Uh oh!

glemaitre commented Feb 13, 2021

Uh oh!

codecov bot commented Feb 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

glemaitre commented Feb 13, 2021

Uh oh!

chkoar commented Feb 14, 2021

Uh oh!

glemaitre commented Feb 14, 2021

Uh oh!

glemaitre commented Feb 14, 2021

Uh oh!

chkoar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chkoar commented Feb 14, 2021

Uh oh!

glemaitre commented Feb 14, 2021

Uh oh!

glemaitre commented Feb 14, 2021

Uh oh!

Uh oh!

Uh oh!

chkoar commented Feb 14, 2021

Uh oh!

glemaitre commented Feb 14, 2021

Uh oh!

chkoar commented Feb 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

glemaitre commented Feb 13, 2021 •

edited

Loading

pep8speaks commented Feb 13, 2021 •

edited

Loading

codecov bot commented Feb 13, 2021 •

edited

Loading

chkoar left a comment •

edited

Loading