-
Notifications
You must be signed in to change notification settings - Fork 195
Description
I would like to Feature Request the following methods:
StatsBase.model_matrix(obj::RegressionModel)
StatsBase.coefnames(obj::RegressionModel)
StatsBase.wald_test(obj::RegressionModel;
restrictions::Vector{Symbol} = setdiff(StatsBase.coefnames(obj), ["(Intercept)"]),
Vcov::AbstractMatrix = vcov(obj),
rdf::Integer = StatsBase.dof_residual(obj))
-
model_matrixwould return a the design matrix as aMatrix{Float64}.This would allow various packages to access the field with a standard API instead of hunting down the structure of the structs.
-
coefnameswould return the names of the coefficients asVector{Symbol}.It mirrors
DataFrames.coefnames(obj::DataFrames.ModelFrame) -
wald_testwould return the Wald statistic for those restrictions as aDict{Symbol,Any}with keys: Statistic (Value of test), p_value (p-value based on the F-Distribution with number of restrictions and residual degrees of freedom), and possibly some way to display the restrictions.It would allow for a standard and automated default method rather than have each package define it (redundant code). Currently the most useful syntax is to especially which coefficients are jointly significant (different from zero) with the default being all, but the intercept if included. However, for being comprehensive a different syntax would have to be worked in order to allow for any valid linear combination restrictions, (e.g.,
coef(obj)[1] == coef(obj)[2] and coef(obj)[3] == 2 * coef(obj)[4]). Therdfkeyword parameter allows for degrees of freedom adjustments commonly used with cluster-robust variance-covariance estimators.
For example,
output = Dict{Symbol,Float64}()
varlist = coefnames(obj)
β = StatsBase.coef(obj)
R = diagm(map(var -> var ∈ restrictions, varlist))
R = R[find(mapslices(col -> col[max(1, findfirst(col))] .== 1,
RowEchelon.rref(round.(R, 12)), 1)),:]
k = size(R, 1)
Bread = R * β
Meat = inv(R * Vcov * R')
output[:Statistic] = (Bread' * Meat * Bread) / k
F_Dist = Distributions.FDist(k, rdf)
output[:p_value] = Distributions.ccdf(F_Dist, output[:Statistic])
output