-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
P2Priority of the issue for triage purpose: Needs to be fixed at some point.Priority of the issue for triage purpose: Needs to be fixed at some point.
Description
System information
- OS version/distro: Windows 10
- .NET Version (eg., dotnet --info):
.NET SDK (reflecting any global.json):
Version: 5.0.200
Commit: 70b3e65d53
Runtime Environment:
OS Name: Windows
OS Version: 10.0.19042
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\5.0.200\
Issue
- What did you do?
I tried to use ML.Net to run a stats 101 case to get familiar with the library.
The data points are generated so that y = x * 2 + random(). I use OLS trainer to estimate its slope and output its tstats and pvalues. - What happened?
pValue turns out to be 1 and tstat turns out to be 0. - What did you expect?
pValue is supposed to be close to zero and tstat is supposed to be very large.
Here is the equivalent R code
df <- data.frame(x = 1:100, y = 1:100*2 + runif(100))
model <- lm(y ~ x, df)
summary(model)output of R
Residuals:
Min 1Q Median 3Q Max
-0.48638 -0.20409 -0.04365 0.22835 0.52931
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5067878 0.0562763 9.005 1.74e-14 ***
x 1.9994857 0.0009675 2066.691 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2793 on 98 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 4.271e+06 on 1 and 98 DF, p-value: < 2.2e-16
Source code / logs
The following is the F# script file. Or you can run it in Jupyter notebook via dotnet interactive kernel.
#r "nuget: Microsoft.ML"
#r "nuget: Microsoft.ML.Mkl.Components"
open System
open Microsoft.ML
open Microsoft.ML.Data
[<CLIMutable>]
type Factor = {
[<ColumnName("Label")>]
y : float32
intercept: float32
x : float32
}
// Generate data: y = x * 2 + rnd
let rnd = Random()
let rows =
[1.0 .. 100.0]
|> Seq.map(fun v ->
{
y = float32 (v * 2.0 + rnd.NextDouble())
intercept = float32 1.
x = float32 v
}
)
let context = new MLContext()
let dataView = context.Data.LoadFromEnumerable(rows)
let pipeline =
EstimatorChain()
.Append(context.Transforms.Concatenate("Features", "intercept", "x"))
.Append(context.Regression.Trainers.Ols())
let model = dataView |> pipeline.Fit
let modelParams = model.LastTransformer.Model
Seq.zip3 modelParams.Weights modelParams.TValues modelParams.PValues
|> Array.ofSeq
|> Array.iteri(fun i (w, t, p) ->
printfn $"Beta {i}, w: {w:f3}, tStats: {t:f3}, pValue: {p:f3}")Output
Beta 0, w: 0.005, tStats: 0.000, pValue: 1.000
Beta 1, w: 2.000, tStats: 0.000, pValue: 1.000
Another general feedback is that the ceremony in ML.NET is so complicated, compared to the simplicity in R sample above. I do not expect users from R/Python community can embrace this complexity. The library seems to be designed for software engineers only in mind. Maybe there's a balance in between R/Python and dotnet.
WalternativE and kentongray
Metadata
Metadata
Assignees
Labels
P2Priority of the issue for triage purpose: Needs to be fixed at some point.Priority of the issue for triage purpose: Needs to be fixed at some point.