Skip to content

jsr-p/fastlr

Repository files navigation

fastlr: fast(er) logistic regression

This package aims to estimate a logistic regression model in a fast(er) way using the iteratively reweighted least squares (IRLS) algorithm. This is implemented using the C++ library armadillo. The package provides R-bindings through Rcpp in the R package fastlr and Python-bindings through pybind11 in the Python package fastlr; the Python package also provides a pure Python implementation of the IRLS algorithm.

Usage

Python

from fastlr import fastlr, generate_data

X, y = generate_data(N=10_000, k=10, seed=0)
print(py_res := fastlr(X, y))
FastLrResult(coefficients=array([-0.19547786,  0.26833757, -0.1303476 , -0.03979692, -0.15035753,
       -0.26321948,  0.33105813, -0.19471808,  0.12025924,  0.11202108]), iterations=4, converged=True, time=0.051200235)
# Alternatively, use the pure Python implementation
print(py_res_simple := fastlr(X, y, method="python"))
FastLrResult(coefficients=array([-0.19547786,  0.26833757, -0.1303476 , -0.03979692, -0.15035753,
       -0.26321948,  0.33105813, -0.19471808,  0.12025924,  0.11202108]), iterations=4, converged=True, time=0.002805208001518622)
import numpy as np
np.allclose(py_res.coefficients, py_res_simple.coefficients)
True

R

library(fastlr)
library(reticulate)

m <- fastlr(py$X, py$y)  # py from reticulate; reticulate nice
print(m)
$coefficients
 [1] -0.19547786  0.26833757 -0.13034760 -0.03979692 -0.15035753 -0.26321948
 [7]  0.33105813 -0.19471808  0.12025924  0.11202108

$iterations
[1] 4

$time
[1] 0.00205007

$converged
[1] TRUE

Thanks reticulate!

py_estimates <- py$py_res$coefficients |> as.numeric() 
r_estimates <- m$coefficients

print(py_estimates)
 [1] -0.19547786  0.26833757 -0.13034760 -0.03979692 -0.15035753 -0.26321948
 [7]  0.33105813 -0.19471808  0.12025924  0.11202108
print(r_estimates)
 [1] -0.19547786  0.26833757 -0.13034760 -0.03979692 -0.15035753 -0.26321948
 [7]  0.33105813 -0.19471808  0.12025924  0.11202108
all.equal(py_estimates, r_estimates, tolerance = 1e-6)
[1] TRUE

Installation

Python

git clone https://github.com/jsr-p/fastlr
cd fastlr
uv sync
pip install .

or from pypi

pip install fastlr

R

git clone https://github.com/jsr-p/fastlr
cd fastlr
Rscript -e 'devtools::install_local(".")'

Benchmarks

To reproduce the benchmarks install the development versions of both packages and run:

just bench

Benchmark against fastglm

This benchmark shows the same results as shown in the fastglm package with the fastlr (Rcpp) implementation added to the figure (run on my laptop).

See scripts/fastglm_bm.R and the Justfile.

For the sessionInfo() see here.

BTW:

grep 'Running under' output/sessioninfo.txt
Running under: Arch Linux

Benchmark against Python packages

A benchmark study of this package’s two implementations

against:

for varying sample size $N$ and number of covariates $k$.

See the generate_data function here.

Benchmark Python implementations

Interestingly, as seen from the figure, the pure Python implementation is quite fast and comparable to the C++ version!

Benchmark R implementations on same setup as above

Benchmark results as tables

  • See here for the benchmark results as a table.

Development

About

Fast(er) logistic regression

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published