[PRD]: Collection Lifecycle for TanStack DB

_@samwillis and I wrote this_

## Summary  
Adds automatic and configurable lifecycle management to TanStack DB collections to optimize resource usage and align with application routing behavior.

---

## Introduction

This PRD proposes the introduction of automatic lifecycle management for collections in TanStack DB. As applications grow, developers define many collections—some of which are used rarely or only in specific UI flows. Without lifecycle control, all collections persist indefinitely, increasing memory and network usage while forcing developers to manage resource cleanup manually. By adding a simple, configurable lifecycle system—lazy initialization, garbage collection after inactivity, and minimal imperative controls—we enable scalable, efficient usage of collections without burdening developers with complex lifecycle logic.

---

## Background

TanStack DB allows applications to declaratively define collections of data and interact with them through queries, subscriptions, and sync primitives. As applications scale, it is common for developers to define many collections—some used pervasively, others specific to isolated screens or components. Without lifecycle controls, these collections persist indefinitely once defined, consuming memory and maintaining sync connections even when not in use.

This leads to several problems:
- Collections may fetch data on app start even when not immediately needed.
- Collections used temporarily (e.g. during onboarding flows or modal views) may never be cleaned up, leading to unnecessary memory and network usage.
- Complex chains of dependent collections (e.g. A depends on B depends on C) are difficult to manually track or tear down.
- Defining collections in a shared file (e.g. `collections.ts`) creates startup pressure that contradicts the goal of lazy, demand-driven data loading.

TanStack Query solves a related problem for queries by introducing cache- and garbage-collection policies. Inspired by that design, this PRD proposes a lifecycle model for TanStack DB collections that allows automatic initialization on usage, configurable teardown after idle time, and minimal imperative control for advanced cases.

The goal is to improve performance and developer ergonomics, particularly in larger apps with dynamic navigation, staged data loading, or ephemeral UI flows.

---

## Problem

As applications grow, developers commonly define many collections in shared modules or route-specific components. Without lifecycle management, these collections:

1. **Start too early** — Collections fetch or initialize on app startup, even if the user never visits the associated route or component.
2. **Never stop** — Collections remain active and in memory indefinitely, even when no part of the app is using them.
3. **Waste resources** — Maintaining sync state, indexes, and memory for unused collections increases CPU usage, network chatter, and memory pressure.
4. **Are hard to manage manually** — In apps where collections depend on other collections, or where multiple transient routes activate different data sets, manually tracking when to start/stop collections is error-prone and doesn’t scale.

These problems result in slower startups, higher memory usage, and more complex application logic. They are especially acute in:
- Large single-page apps that define many collections up front.
- Navigation-based apps where only a small subset of collections are active at any one time.
- Apps that use TanStack DB’s sync primitives, incurring real backend/network cost.

Automatic lifecycle management would solve this by ensuring collections only start when needed, stop when unused, and can be controlled predictably when necessary.

---

## Personas

**1. The Route-Based Developer**  
*“I define collections for each route or screen, but they all load even if the user never visits that part of the app.”*  
- Builds SPAs with a file like `collections.ts` that registers all collections  
- Wants navigation to trigger data loading lazily, not on startup  
- Struggles with initial app load being slow due to eager collection start

**2. The Performance-Sensitive Architect**  
*“We want to keep memory and sync usage minimal—no reason to keep unused collections active.”*  
- Works on large apps with complex state and sync layers  
- Wants collections to spin down when not actively used  
- Prioritizes efficient memory, battery, and network use, especially on mobile

**3. The Composition-Focused Engineer**  
*“My collection A depends on B which depends on C, and I can’t track who’s using what anymore.”*  
- Composes collections into reusable utilities and hooks  
- Doesn’t want to manually manage cascading dependencies  
- Needs a system that auto-cleans unused chains without tight coupling

**4. The Pragmatic Frontender**  
*“Sometimes I know when I’m done with a collection—I just want to clean it up.”*  
- Builds flows like onboarding or modals where collections are temporary  
- Wants a simple `.cleanup()` API for known teardown moments  
- Appreciates predictability but avoids overly complex lifecycle APIs

---

## Requirements and Phases

### Phase 1: Collection Lifecycle Management

This phase introduces automatic and configurable lifecycle behavior for all collections in TanStack DB. It ensures that collections initialize only when needed, remain active while in use, and are garbage-collected after becoming idle. It also provides minimal imperative controls for preload and cleanup.

**Requirements:**
- Collections are **lazy by default**: creating a collection does not start it.
- A collection is **started** when any of the following occur:
  - A call to `collection.query()` or `collection.subscribe()`
  - A call to `collection.preload()`
- A collection is **garbage-collected** after no active usage (queries/subscribers) for a configurable `gcTime` (default: 5 minutes).
- Each collection exposes:
  - `.status`: `"idle" | "loading" | "ready" | "error" | "cleaned-up"`
- Collections automatically **restart** on next usage after being GCed or cleaned up.
- Collections expose two lifecycle methods:
  - `collection.preload(): Promise<void>` — triggers immediate load; resolves once initial load completes; concurrent calls share a promise.
  - `collection.cleanup(): Promise<void>` — immediately tears down the collection (even if preload is in progress).
- `gcTime` is configured per collection.
- Preloading overrides GC to ensure the collection stays alive for at least `gcTime`.

**Acceptance Criteria:**
- Querying or subscribing to an unused collection causes it to start (`.status` transitions to `"loading"` then `"ready"`).
- When no usage remains, the collection is torn down after `gcTime`.
- After cleanup or GC, a new query causes reinitialization.
- Multiple concurrent `preload()` calls return the same in-progress promise.
- `cleanup()` removes the collection’s in-memory state and unsubscribes from sync.
- Preload does not block `cleanup()` but continues in background unless cancellable.
- `.status` reflects the current lifecycle state at all times.

**Considerations:**
- Should `gcTime` be reset on *every* access, or only when the collection transitions to “no longer used”?
- What if preload fails — do we memoize the error or try again on next call?
- Should `cleanup()` throw or warn if the collection is actively used?
- Do we expose GC-related events to devtools or keep this internal?

---

## User Research

While no formal interviews have been conducted, several recurring pain points have emerged through experience building and maintaining applications with TanStack DB:

1. **App startup cost due to static collection registration**  
   Developers often define all collections up front in a shared file (e.g. `collections.ts`). Without lazy initialization, every collection begins syncing or fetching immediately—even if the user never navigates to the parts of the app that use them.

2. **Memory and network overhead from inactive collections**  
   In real-world apps, many collections are only used temporarily (e.g. for a modal, onboarding flow, or admin page). Without lifecycle management, these collections remain resident indefinitely, using memory and sometimes maintaining sync connections.

3. **Complexity of manual lifecycle management in compositional setups**  
   When collections are composed (e.g. `A` depends on `B`, which depends on `C`), it becomes difficult to track which are still in use. Developers currently have no good way to coordinate or garbage-collect unused chains.

4. **Desire for minimal imperative control**  
   Developers building ephemeral flows often know when a collection is no longer needed and want to clean it up directly. While automation handles the general case, manual control via `cleanup()` is necessary in targeted flows.

These pain points suggest that collection lifecycle is a critical missing abstraction in real-world TanStack DB usage. Inspired by TanStack Query’s approach to query caching and GC, this PRD proposes a parallel design that aligns with actual developer needs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PRD]: Collection Lifecycle for TanStack DB #195

Summary

Introduction

Background

Problem

Personas

Requirements and Phases

Phase 1: Collection Lifecycle Management

User Research

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[PRD]: Collection Lifecycle for TanStack DB #195

Description

Summary

Introduction

Background

Problem

Personas

Requirements and Phases

Phase 1: Collection Lifecycle Management

User Research

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions