OLS regression outputs wrong TStats and PValue

### System information

- **OS version/distro**: Windows 10
- **.NET Version (eg., dotnet --info)**: 
.NET SDK (reflecting any global.json):
 Version:   5.0.200
 Commit:    70b3e65d53

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.19042
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\5.0.200\

### Issue

- **What did you do?**
I tried to use ML.Net to run a stats 101 case to get familiar with the library.
The data points are generated so that y = x * 2 + random(). I use OLS trainer to estimate its slope and output its tstats and pvalues.
- **What happened?**
pValue turns out to be 1 and tstat turns out to be 0.
- **What did you expect?**
pValue is supposed to be close to zero and tstat is supposed to be very large.

Here is the equivalent R code
```r
df <- data.frame(x = 1:100, y = 1:100*2 + runif(100))
model <- lm(y ~ x, df)
summary(model)
```
output of R
```
Residuals:
     Min       1Q   Median       3Q      Max 
-0.48638 -0.20409 -0.04365  0.22835  0.52931 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) 0.5067878  0.0562763    9.005 1.74e-14 ***
x           1.9994857  0.0009675 2066.691  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2793 on 98 degrees of freedom
Multiple R-squared:      1,	Adjusted R-squared:      1 
F-statistic: 4.271e+06 on 1 and 98 DF,  p-value: < 2.2e-16
```

### Source code / logs
The following is the F# script file. Or you can run it in Jupyter notebook via dotnet interactive kernel.
```fsharp
#r "nuget: Microsoft.ML"
#r "nuget: Microsoft.ML.Mkl.Components"
open System
open Microsoft.ML
open Microsoft.ML.Data

[<CLIMutable>]
type Factor = {
    [<ColumnName("Label")>]
    y : float32
    intercept: float32
    x : float32
}

// Generate data: y = x * 2 + rnd
let rnd = Random()
let rows =
    [1.0 .. 100.0]
    |> Seq.map(fun v ->
        {
            y = float32 (v * 2.0 + rnd.NextDouble())
            intercept = float32 1.
            x = float32 v
        }
    )

let context = new MLContext()
let dataView = context.Data.LoadFromEnumerable(rows)
let pipeline =
    EstimatorChain()
        .Append(context.Transforms.Concatenate("Features", "intercept", "x"))
        .Append(context.Regression.Trainers.Ols())

let model = dataView |> pipeline.Fit
let modelParams = model.LastTransformer.Model
Seq.zip3 modelParams.Weights modelParams.TValues modelParams.PValues
|> Array.ofSeq
|> Array.iteri(fun i (w, t, p) ->
    printfn $"Beta {i}, w: {w:f3}, tStats: {t:f3}, pValue: {p:f3}")
```
Output
```
Beta 0, w: 0.005, tStats: 0.000, pValue: 1.000
Beta 1, w: 2.000, tStats: 0.000, pValue: 1.000
```

Another general feedback is that the ceremony in ML.NET is so complicated, compared to the simplicity in R sample above. I do not expect users from R/Python community can embrace this complexity. The library seems to be designed for software engineers only in mind. Maybe there's a balance in between R/Python and dotnet.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OLS regression outputs wrong TStats and PValue #5696

System information

Issue

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OLS regression outputs wrong TStats and PValue #5696

Description

System information

Issue

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions