-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-34415][ML] Randomization in hyperparameter optimization #31535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Non deterministic bug...?
Non deterministic bug...?
…ouble for log-space. I guess this is just highly unlikely but not impossible.
|
Test build #135389 has finished for PR 31535 at commit
|
|
@srowen Cool. Will do. |
…ts. Behaviour for float and integer are different. Unknown type raises exception.
|
Jenkins retest this please |
|
Test build #135473 has started for PR 31535 at commit |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Jenkins retest this please |
|
Kubernetes integration test starting |
|
Test build #135518 has finished for PR 31535 at commit
|
|
Kubernetes integration test status success |
|
Merged to master |
|
Hi, @PhillHenry and @srowen . |
|
Oh OK let me figure that out - can probably fix forward with a patch. Er, where can I see the output? I don't see it here. |
|
Ah I see it: @PhillHenry was there an additional example file that was meant to be included in the PR? if so just open another PR and I'll add it. If necessary to restore the linter soon I can temporarily remove the reference to this example. |
|
Thanks for the analysis, @srowen ! |
Missing Python example file for [SPARK-34415][ML] Randomization in hyperparameter optimization (#31535) ### What changes were proposed in this pull request? For some reason (probably me being silly) a examples/src/main/python/ml/model_selection_random_hyperparameters_example.py was not pushed in a previous PR. This PR restores that file. ### Why are the changes needed? A single file (examples/src/main/python/ml/model_selection_random_hyperparameters_example.py) that should have been pushed as part of SPARK-34415 but was not. This was causing Lint errors as highlighted by dongjoon-hyun. Consequently, srowen asked for a new PR. ### Does this PR introduce _any_ user-facing change? No, it merely restores a file that was overlook in SPARK-34415. ### How was this patch tested? By running: `bin/spark-submit examples/src/main/python/ml/model_selection_random_hyperparameters_example.py` Closes #31687 from PhillHenry/SPARK-34415_model_selection_random_hyperparameters_example. Authored-by: Phillip Henry <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
Code in the PR generates random parameters for hyperparameter tuning. A discussion with Sean Owen can be found on the dev mailing list here:
http://apache-spark-developers-list.1001551.n3.nabble.com/Hyperparameter-Optimization-via-Randomization-td30629.html
All code is entirely my own work and I license the work to the project under the project’s open source license.
Why are the changes needed?
Randomization can be a more effective techinique than a grid search since min/max points can fall between the grid and never be found. Randomisation is not so restricted although the probability of finding minima/maxima is dependent on the number of attempts.
Alice Zheng has an accessible description on how this technique works at https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
Although there are Python libraries with more sophisticated techniques, not every Spark developer is using Python.
Does this PR introduce any user-facing change?
A new class (
ParamRandomBuilder.scala) and its tests have been created but there is no change to existing code. This class offers an alternative toParamGridBuilderand can be dropped into the code whereverParamGridBuilderappears. Indeed, it extendsParamGridBuilderand is completely compatible with its interface. It merely adds one method that provides a range over which a hyperparameter will be randomly defined.How was this patch tested?
Tests
ParamRandomBuilderSuite.scalaandRandomRangesSuite.scalawere added.ParamRandomBuilderSuiteis the analogue of the already existingParamGridBuilderSuitewhich tests the user-facing interface.RandomRangesSuiteuses ScalaCheck to test the random ranges over which hyperparameters are distributed.