This package aims to estimate a logistic regression model in a fast(er)
way using the iteratively reweighted least squares (IRLS) algorithm.
This is implemented using the C++ library
armadillo. The package provides
R-bindings through
Rcpp in the
R package fastlr
and Python-bindings through
pybind11 in the
Python package fastlr
; the Python package also provides a pure Python
implementation of the IRLS algorithm.
from fastlr import fastlr, generate_data
X, y = generate_data(N=10_000, k=10, seed=0)
print(py_res := fastlr(X, y))
FastLrResult(coefficients=array([-0.19547786, 0.26833757, -0.1303476 , -0.03979692, -0.15035753,
-0.26321948, 0.33105813, -0.19471808, 0.12025924, 0.11202108]), iterations=4, converged=True, time=0.051200235)
# Alternatively, use the pure Python implementation
print(py_res_simple := fastlr(X, y, method="python"))
FastLrResult(coefficients=array([-0.19547786, 0.26833757, -0.1303476 , -0.03979692, -0.15035753,
-0.26321948, 0.33105813, -0.19471808, 0.12025924, 0.11202108]), iterations=4, converged=True, time=0.002805208001518622)
import numpy as np
np.allclose(py_res.coefficients, py_res_simple.coefficients)
True
library(fastlr)
library(reticulate)
m <- fastlr(py$X, py$y) # py from reticulate; reticulate nice
print(m)
$coefficients
[1] -0.19547786 0.26833757 -0.13034760 -0.03979692 -0.15035753 -0.26321948
[7] 0.33105813 -0.19471808 0.12025924 0.11202108
$iterations
[1] 4
$time
[1] 0.00205007
$converged
[1] TRUE
Thanks reticulate!
py_estimates <- py$py_res$coefficients |> as.numeric()
r_estimates <- m$coefficients
print(py_estimates)
[1] -0.19547786 0.26833757 -0.13034760 -0.03979692 -0.15035753 -0.26321948
[7] 0.33105813 -0.19471808 0.12025924 0.11202108
print(r_estimates)
[1] -0.19547786 0.26833757 -0.13034760 -0.03979692 -0.15035753 -0.26321948
[7] 0.33105813 -0.19471808 0.12025924 0.11202108
all.equal(py_estimates, r_estimates, tolerance = 1e-6)
[1] TRUE
git clone https://github.com/jsr-p/fastlr
cd fastlr
uv sync
pip install .
or from pypi
pip install fastlr
git clone https://github.com/jsr-p/fastlr
cd fastlr
Rscript -e 'devtools::install_local(".")'
To reproduce the benchmarks install the development versions of both packages and run:
just bench
This benchmark shows the same results as shown in the fastglm
package
with the fastlr
(Rcpp) implementation added to
the figure (run on my laptop).
See scripts/fastglm_bm.R and the Justfile.
For the sessionInfo()
see here.
BTW:
grep 'Running under' output/sessioninfo.txt
Running under: Arch Linux
A benchmark study of this package’s two implementations
- Python implementation of the
IRLS
algorithm - C++ implementation of the
IRLS
algorithm with Python bindings through pybind11
against:
- glum
- statsmodels logit
- minimal
newton-cg
minimize implementation from scipy
for varying sample size
See the generate_data
function
here.
Interestingly, as seen from the figure, the pure Python implementation is quite fast and comparable to the C++ version!
- See here for the benchmark results as a table.