forked from scikit-learn/scikit-learn
-
Notifications
You must be signed in to change notification settings - Fork 0
Consistency brigade
amueller edited this page Mar 4, 2012
·
6 revisions
Things that are not consistent and should be fixed
- SVC's parameter C should be lowercase (it's not a matrix)
- The labels are sometimes stored as attribute classes (SGD), others as labels_ (SVMs), but should
be stored as
classes_as is usually the case. -
chunk_sizeparameters should be renamed tobatch_sizein allMiniBatch*models. - Single letter parameter names:
- p in affinity propagation clustering
- Which is better:
n_train,train_fraction,train_size(in cross validation module)?
Some models (SVC, KernelPCA, SpectralClustering...) can accept a precomputed kernel, affinity or distance matrix with shape (n_samples, n_samples) as main data argument in place of the traditional (n_samples, n_features) shaped design matrix.
One way to solve this ambiguity would be to introduce a dedicated fit method for handling fitting from precomputed kernel / affinity. Possible name suggestions:
fit_symmetric-
fit_pairwise(as we are fitting from a materialized pairwise relationship between the samples). -
fit_kernelorfit_from_kernel(but not always a kernel that often has a special meaning). fit_precomputed
GridSearchCV, cross_val_score and other tools should also be updated.
TODO: study the impact on the rest of the scikit-learn API (predict, transform...).