Skip to content

Consistency brigade

amueller edited this page Mar 4, 2012 · 6 revisions

Parameter naming

Things that are not consistent and should be fixed

  • SVC's parameter C should be lowercase (it's not a matrix)
  • The labels are sometimes stored as attribute classes (SGD), others as labels_ (SVMs), but should be stored as classes_ as is usually the case.
  • chunk_size parameters should be renamed to batch_size in all MiniBatch* models.
  • Single letter parameter names:
    • p in affinity propagation clustering
  • Which is better: n_train, train_fraction, train_size (in cross validation module)?

API

Models taking a symmetric kernel, affinity or distance matrix

Some models (SVC, KernelPCA, SpectralClustering...) can accept a precomputed kernel, affinity or distance matrix with shape (n_samples, n_samples) as main data argument in place of the traditional (n_samples, n_features) shaped design matrix.

One way to solve this ambiguity would be to introduce a dedicated fit method for handling fitting from precomputed kernel / affinity. Possible name suggestions:

  • fit_symmetric
  • fit_pairwise (as we are fitting from a materialized pairwise relationship between the samples).
  • fit_kernel or fit_from_kernel (but not always a kernel that often has a special meaning).
  • fit_precomputed

GridSearchCV, cross_val_score and other tools should also be updated.

TODO: study the impact on the rest of the scikit-learn API (predict, transform...).

Clone this wiki locally