Consistency brigade

Parameter naming

Things that are not consistent and should be fixed

SVC's parameter C should be lowercase (it's not a matrix)
The labels are sometimes stored as attribute classes (SGD), others as labels_ (SVMs), but should be stored as classes_ as is usually the case.
chunk_size parameters should be renamed to batch_size in all MiniBatch* models.
Single letter parameter names:
- p in affinity propagation clustering
Which is better: n_train, train_fraction, train_size (in cross validation module)?

API

Models taking a symmetric kernel, affinity or distance matrix

Some models (SVC, KernelPCA, SpectralClustering...) can accept a precomputed kernel, affinity or distance matrix with shape (n_samples, n_samples) as main data argument in place of the traditional (n_samples, n_features) shaped design matrix.

One way to solve this ambiguity would be to introduce a dedicated fit method for handling fitting from precomputed kernel / affinity. Possible name suggestions:

fit_symmetric
fit_pairwise (as we are fitting from a materialized pairwise relationship between the samples).
fit_kernel or fit_from_kernel (but not always a kernel that often has a special meaning).
fit_precomputed

GridSearchCV, cross_val_score and other tools should also be updated.

TODO: study the impact on the rest of the scikit-learn API (predict, transform...).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistency brigade

Parameter naming

API

Models taking a symmetric kernel, affinity or distance matrix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally