LightGBM parameter changes to match Python implementation results #6064

torronen · 2022-01-30T11:30:58Z

Suggested changes for LightGBM results through ML.NET similar as through Python:

keep LightGBM default seed if seed has not been set
add mapping from NumberOfIterations to num_iterations
add NumberOfIterations to parameters array for LightGBM
change sigmoid default value to match LightGBM
Default Evaluation Metric to None per LightGBM default

Project that can be used for comparison between LightGBM in Python and Microsoft.ML.LightGBM and also compare ModelBuilder with python-FLAML: https://github.com/torronen/lightgbm-comparison

Rationale: microsoft/FLAML#409 (comment)
Reasons for changes explained in the issues:

I suggest the results should be equal through Python and ML.NET so that developers can discuss and share best practices about hyperparameters. Also, it enables to use tuning from Python.

Sigmoid value change has been propose before but was not implemented.
It may need more consideration: #667

PR is for comments and discussions for now. Results are not yet equal through ML.Net and Python.

keep lightgbm default seed if it has not been specified in Seed

LightGBM: map NumberOfIterations to num_trees

LightGBM: Sigmod to default of LightGBM (0.5 => 1)

dnfadmin · 2022-01-30T11:31:12Z

All CLA requirements met.

torronen · 2022-02-01T11:37:04Z

These names are valid aliases. Defaults are not yet considered, but at least metric should be "" per default, not logloss, but it might not matter too much.

Missing:

min_child_samples

learning_rate
They are set here:

machinelearning/src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs

Lines 431 to 433 in 8eac93a

    
           GbmOptions["learning_rate"] = learningRate; 
        
           GbmOptions["num_leaves"] = numberOfLeaves; 
        
           GbmOptions["min_data_per_leaf"] = minimumExampleCountPerLeaf;

torronen · 2022-02-01T11:47:21Z

Is there any reason to use aliases? If not, I suggest we update

main_split_gain=main_split_to_gain
min_sum_hessian_in_leaf=min_child_weight
bagging_freq = subsample_freq
bagging_fraction = subsample
lambda_l2 =reg_lambda
lambda_l1 =reg_alpha
boosting =boosting_type
verbosity =verbose
unbalanced_sets = is_unbalance
min_data_in_leaf = min_data_per_leaf

TODO: Check names are valid for 2.3.1, above is from current documentation.
2.3.1 source: https://github.com/microsoft/LightGBM/blob/v2.3.1/src/io/config_auto.cpp

torronen · 2022-02-01T15:00:29Z

Iterations actually seem to be ran inside .NET, so it does not need to be passed anywhere.

machinelearning/src/Microsoft.ML.LightGbm/WrappedLightGbmTraining.cs

Line 19 in 8eac93a

    
           Dictionary<string, object> parameters, Dataset dtrain, Dataset dvalid = null, int numIteration = 100,

I will close this PR as it is better not to update the defaults if it does not provide any better performance. Changing default may be a breaking change for some developers.

However, for some reason Python seems to provide better speed and accuracy for LightGBM (and therefore, for many applications) at least on a few datasets I've tried.

torronen added 3 commits January 30, 2022 13:12

Update LightGbmTrainerBase.cs

0ee875b

keep lightgbm default seed if it has not been specified in Seed

Update LightGbmTrainerBase.cs

e2c2f45

LightGBM: map NumberOfIterations to num_trees

Update LightGbmBinaryTrainer.cs

d9ad922

LightGBM: Sigmod to default of LightGBM (0.5 => 1)

syntax fix

dfd8cd1

torronen mentioned this pull request Jan 30, 2022

Q: Roadmap for LightGBM interface in .NET #6065

Closed

torronen added 3 commits January 31, 2022 09:30

num_iterations instead of alias

1a69660

style fix

eea96ac

NumberOfIterations to LightGBM param array

a91d9d4

torronen added 3 commits February 1, 2022 13:53

LightGBM default metric EvaluateMetricType.None

badc83a

MinimumExampleCountPerLeaf & LearningRate to array

c392524

Update LightGbmTrainerBase.cs

59c614c

torronen closed this Feb 1, 2022

ghost locked as resolved and limited conversation to collaborators Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LightGBM parameter changes to match Python implementation results #6064

LightGBM parameter changes to match Python implementation results #6064

Uh oh!

torronen commented Jan 30, 2022 •

edited

Loading

Uh oh!

dnfadmin commented Jan 30, 2022 •

edited

Loading

Uh oh!

torronen commented Feb 1, 2022 •

edited

Loading

Uh oh!

torronen commented Feb 1, 2022 •

edited

Loading

Uh oh!

torronen commented Feb 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LightGBM parameter changes to match Python implementation results #6064

LightGBM parameter changes to match Python implementation results #6064

Uh oh!

Conversation

torronen commented Jan 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnfadmin commented Jan 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

torronen commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

torronen commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

torronen commented Feb 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

torronen commented Jan 30, 2022 •

edited

Loading

dnfadmin commented Jan 30, 2022 •

edited

Loading

torronen commented Feb 1, 2022 •

edited

Loading

torronen commented Feb 1, 2022 •

edited

Loading