You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-2152][MLlib] fix bin offset in DecisionTree node aggregations (also resolves SPARK-2160)
Hi, this pull fixes (what I believe to be) a bug in DecisionTree.scala.
In the extractLeftRightNodeAggregates function, the first set of rightNodeAgg values for Regression are set in line 792 as follows:
rightNodeAgg(featureIndex)(2 * (numBins - 2))
= binData(shift + (2 * numBins - 1)))
Then there is a loop that sets the rest of the values, as in line 809:
rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) =
binData(shift + (2 *(numBins - 2 - splitIndex))) +
rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex))
But since splitIndex starts at 1, this ends up skipping a set of binData values.
The changes here address this issue, for both the Regression and Classification cases.
Author: johnnywalleye <[email protected]>
Closesapache#1316 from johnnywalleye/master and squashes the following commits:
73809da [johnnywalleye] fix bin offset in DecisionTree node aggregations
0 commit comments