@@ -537,24 +537,24 @@ the detectors in the `analysis_config`, starting at zero.
537537end::detector-index[]
538538
539539tag::dfas-alpha[]
540- Advanced configuration option. {ml-cap} uses loss guided tree growing.
541- This means that trees will grow where the regularized loss reduces
542- the most . This parameter multiplies a term based on tree depth in
543- the regularized loss . Higher values result in shallower trees
544- and faster training times. Values should be greater than or equal
545- to zero. By default, this value is calculated during hyperparameter optimization.
540+ Advanced configuration option. {ml-cap} uses loss guided tree growing, which
541+ means that the decision trees grow where the regularized loss decreases most
542+ quickly . This parameter affects loss calculations by acting as a multiplier of
543+ the tree depth . Higher alpha values result in shallower trees and faster
544+ training times. By default, this value is calculated during hyperparameter
545+ optimization. It must be greater than or equal to zero.
546546end::dfas-alpha[]
547547
548548tag::dfas-downsample-factor[]
549- Advanced configuration option. This controls the fraction of data
550- that is used to compute the derivatives of the loss function for tree training.
551- The lower the value the smaller the fraction of data that is used.
552- Typically accuracy improves if this is set to be less than 1. However, too small
553- a value may result in poor convergence for the ensemble and so require more trees.
554- For more information about shrinkage, refer to
549+ Advanced configuration option. Controls the fraction of data that is used to
550+ compute the derivatives of the loss function for tree training. A small value
551+ results in the use of a small fraction of the data. If this value is set to be
552+ less than 1, accuracy typically improves . However, too small a value may result
553+ in poor convergence for the ensemble and so require more trees. For more
554+ information about shrinkage, refer to
555555{wikipedia} /Gradient_boosting#Stochastic_gradient_boosting[this wiki article].
556- Values must be greater than zero and less than or equal to 1.
557- By default, this value is calculated during hyperparameter optimization .
556+ By default, this value is calculated during hyperparameter optimization. It
557+ must be greater than zero and less than or equal to 1 .
558558end::dfas-downsample-factor[]
559559
560560tag::dfas-early-stopping-enabled[]
@@ -566,11 +566,10 @@ By default, early stoppping is enabled.
566566end::dfas-early-stopping-enabled[]
567567
568568tag::dfas-eta-growth[]
569- Advanced configuration option.
570- Specifies the rate at which `eta` increases for each new tree that is added
571- to the forest. For example, a rate of `1.05` increases `eta` by 5% for each
572- extra tree. Values must be in the range of 0.5 to 2.
573- By default, this value is calculated during hyperparameter optimization.
569+ Advanced configuration option. Specifies the rate at which `eta` increases for
570+ each new tree that is added to the forest. For example, a rate of 1.05
571+ increases `eta` by 5% for each extra tree. By default, this value is calculated
572+ during hyperparameter optimization. It must be between 0.5 and 2.
574573end::dfas-eta-growth[]
575574
576575tag::dfas-feature-processors[]
@@ -696,18 +695,18 @@ end::dfas-num-splits[]
696695
697696tag::dfas-soft-limit[]
698697Advanced configuration option. {ml-cap} uses loss guided tree growing, which
699- means that the decision trees grow where the regularized loss decreases most quickly. This
700- soft limit combines with the `soft_tree_depth_tolerance` to penalize trees that
701- exceed the specified depth; the regularized loss increases quickly beyond this
702- depth. Values must be greater than or equal to 0 . By default, this value is
703- calculated during hyperparameter optimization.
698+ means that the decision trees grow where the regularized loss decreases most
699+ quickly. This soft limit combines with the `soft_tree_depth_tolerance` to
700+ penalize trees that exceed the specified depth; the regularized loss increases
701+ quickly beyond this depth . By default, this value is calculated during
702+ hyperparameter optimization. It must be greater than or equal to 0 .
704703end::dfas-soft-limit[]
705704
706705tag::dfas-soft-tolerance[]
707706Advanced configuration option. This option controls how quickly the regularized
708- loss increases when the tree depth exceeds `soft_tree_depth_limit` . Values must
709- be greater than or equal to 0.01. By default, this value is calculated during
710- hyperparameter optimization.
707+ loss increases when the tree depth exceeds `soft_tree_depth_limit` . By default,
708+ this value is calculated during hyperparameter optimization. It must be greater
709+ than or equal to 0.01.
711710end::dfas-soft-tolerance[]
712711
713712tag::dfas-timestamp[]
@@ -753,10 +752,11 @@ end::empty-bucket-count[]
753752tag::eta[]
754753Advanced configuration option. The shrinkage applied to the weights. Smaller
755754values result in larger forests which have a better generalization error.
756- However, the smaller the value the longer the training will take . For more
757- information about shrinkage, refer to
755+ However, larger forests cause slower training. For more information about
756+ shrinkage, refer to
758757{wikipedia} /Gradient_boosting#Shrinkage[this wiki article].
759- By default, this value is calculated during hyperparameter optimization.
758+ By default, this value is calculated during hyperparameter optimization. It must
759+ be a value between 0.001 and 1.
760760end::eta[]
761761
762762tag::exclude-frequent[]
@@ -842,11 +842,11 @@ end::function[]
842842
843843tag::gamma[]
844844Advanced configuration option. Regularization parameter to prevent overfitting
845- on the training data set. Multiplies a linear penalty associated with the size of
846- individual trees in the forest. The higher the value the more training will
847- prefer smaller trees. The smaller this parameter the larger individual trees
848- will be and the longer training will take . By default, this value is calculated
849- during hyperparameter optimization .
845+ on the training data set. Multiplies a linear penalty associated with the size
846+ of individual trees in the forest. A high gamma value causes training to prefer
847+ small trees. A small gamma value results in larger individual trees and slower
848+ training. By default, this value is calculated during hyperparameter
849+ optimization. It must be a nonnegative value .
850850end::gamma[]
851851
852852tag::groups[]
@@ -1046,13 +1046,14 @@ end::jobs-stats-anomaly-detection[]
10461046
10471047tag::lambda[]
10481048Advanced configuration option. Regularization parameter to prevent overfitting
1049- on the training data set. Multiplies an L2 regularisation term which applies to
1050- leaf weights of the individual trees in the forest. The higher the value the
1051- more training will attempt to keep leaf weights small . This makes the prediction
1049+ on the training data set. Multiplies an L2 regularization term which applies to
1050+ leaf weights of the individual trees in the forest. A high lambda value causes
1051+ training to favor small leaf weights. This behavior makes the prediction
10521052function smoother at the expense of potentially not being able to capture
1053- relevant relationships between the features and the {depvar} . The smaller this
1054- parameter the larger individual trees will be and the longer training will take.
1055- By default, this value is calculated during hyperparameter optimization.
1053+ relevant relationships between the features and the {depvar} . A small lambda
1054+ value results in large individual trees and slower training. By default, this
1055+ value is calculated during hyperparameter optimization. It must be a nonnegative
1056+ value.
10561057end::lambda[]
10571058
10581059tag::last-data-time[]
@@ -1095,9 +1096,9 @@ set.
10951096end::max-empty-searches[]
10961097
10971098tag::max-trees[]
1098- Advanced configuration option. Defines the maximum number of trees the forest is
1099- allowed to contain . The maximum value is 2000. By default, this value is
1100- calculated during hyperparameter optimization.
1099+ Advanced configuration option. Defines the maximum number of decision trees in
1100+ the forest . The maximum value is 2000. By default, this value is calculated
1101+ during hyperparameter optimization.
11011102end::max-trees[]
11021103
11031104tag::method[]
@@ -1386,11 +1387,10 @@ multiple jobs running on the same node. For more information, see
13861387end::query-delay[]
13871388
13881389tag::randomize-seed[]
1389- Defines the seed to the random generator that is used to pick
1390- which documents will be used for training. By default it is randomly generated.
1391- Set it to a specific value to ensure the same documents are used for training
1392- assuming other related parameters (e.g. `source` , `analyzed_fields` , etc.) are
1393- the same.
1390+ Defines the seed for the random generator that is used to pick training data. By
1391+ default, it is randomly generated. Set it to a specific value to use the same
1392+ training data each time you start a job (assuming other related parameters such
1393+ as `source` and `analyzed_fields` are the same).
13941394end::randomize-seed[]
13951395
13961396tag::rare-category-count[]
0 commit comments