algorithms for feature extraction from spatio-temporal data
Source or feature extraction is the process of identifying spatial features of interest from data that varies over space and time. It can be either unsupervised or supervised, and is common in biological data analysis problems, like identifying neurons in calcium imaging data.
This package contains a collection of approaches for solving this problem. It defines a set of algorithms in the scikit-learn style, each of which can be fit to data, and return a model that can be used to transform new data. Compatible with Python 2.7+ and 3.4+. Works well alongside thunder and supprts parallelization via spark, but can be used as a standalone package on local numpy arrays.
pip install thunder-extraction
# generate data
from extraction.utils import make_gaussian
data = make_gaussian()
# fit a model
from extraction import NMF
model = NMF().fit(data)
# extract sources by transforming data
sources = model.transform(data)Analysis starts by import and constructing an algorithm
from extraction import NMF
algorithm = NMF(k=10)Algorithms can be fit to data in the form of a thunder images object or an t,x,y(,z) numpy array
model = algorithm.fit(data)The model is a collection of identified features that can be used to extract temporal signals from new data
signals = model.transform(data)All algorithms have the following methods
Fits the algorithm to the data, which should be a collection of time-varying images. It can either be a thunder images object, or a numpy array with shape t,x,y(,z).
For many algorithms, fit will take the optional arguments chunk_size and padding, which allows the algorithm to be performed on smaller chunks of the data, either in serial (if running locally) or in parallel (if running on a cluster).
A chunk is defined a subset of the image in space, including all time points. The chunk_size is the size of each chunk in pixels, and padding is the amount by which to pad the chunks in each dimension. For example, given a (100,100,500) data set, we could set chunk_size=(50,50) resulting in four chunks each of which are (50,50,500).
The result of fitting an algorithm is a model. Every model has the following properties and methods.
The spatial regions identified during fitting.
Transform a new data set using the model, by averaging pixels within each of the regions. As with fitting, data can either be a thunder images object, or a numpy array with shape t,x,y(,z). It will return a thunder series object, which can be converted to a numpy array by calling toarray().
Merge overlapping regions in the model, by greedily comparing nearby regions and merging those that are similar to one another more than the specified overlap. Repeats greedy merging process max_iter times. Only considers k_nearest neighbors to speed up computation.
Here are all the algorithms currently available.
Local non-negative matrix factorization followed by thresholding to yield binary spatial regions. Applies factorization either to image blocks or to the entire image.
The algorithm takes the following parameters.
knumber of components to estimate per blockmax_sizemaximum size of each regionmin_sizeminimum size for each regionmax_itermaximum number of algorithm iterationspercentilevalue for thresholding (higher means more thresholding)overlapvalue for determining whether to merge (higher means fewer merges)
The fit method takes the following options.
block_sizea size in megabytes like150or a size in pixels like(10,10), ifNonewill use full image