Paginated / Infinite Collections

A common request we are receiving is to lazily load data into a "query collection" using the infinite query pattern.

We need to consider how to support this in a way that is then useable by other sync engines, and ties into out plans for pageable live queries.

This is an additional proposal to the "partitioned collections" proposed here: https://github.com/TanStack/db/issues/315 

## Current state

Currently `Collections` are not ordered by default. However you can create an ordered collection by passing a `compare` function to the `createCollection` function.

```ts
const collection = createCollection({
  compare: (a, b) => a.createdAt.getTime() - b.createdAt.getTime(),
})
```

This will create a collection that is ordered by the `createdAt` field - all the `map` like methods (`keys`, `values`, `entries`, etc.) will return the items in the correct order, albeit with the optimistic *inserts* added at the end (we should consider if we can sort these too).

This is used internally by the live queries when materialising a result set from a query with an `orderBy` clause.

When you run a live query over an ordered collection, the results are *not* in the order of the parent collection unless you use the `orderBy` clause on this new query to maintain that order.

## Infinite Queries

Tanstack Query [Infinite Queries](https://tanstack.com/query/latest/docs/framework/react/guides/infinite-queries) are an abstraction over apis that return paginated data, from the docs:

> Rendering lists that can additively "load more" data onto an existing set of data or "infinite scroll" is also a very common UI pattern. TanStack Query supports a useful version of useQuery called useInfiniteQuery for querying these types of lists.
> 
> When using useInfiniteQuery, you'll notice a few things are different:
>
>  - data is now an object containing infinite query data:
>  - data.pages array containing the fetched pages
>  - data.pageParams array containing the page params used to fetch the pages
>  - The fetchNextPage and fetchPreviousPage functions are now available (fetchNextPage is required)  
>  - The initialPageParam option is now available (and required) to specify the initial page param
>  - The getNextPageParam and getPreviousPageParam options are available for both determining if there is more data to load and the information to fetch it. This information is supplied as an additional parameter in the query function
>  - A hasNextPage boolean is now available and is true if getNextPageParam returns a value other than null or undefined
>  - A hasPreviousPage boolean is now available and is true if getPreviousPageParam returns a value other than null or undefined
>  - The isFetchingNextPage and isFetchingPreviousPage booleans are now available to distinguish between a background refresh state and a loading more state

Using `useInfiniteQuery` looks like this:

```ts
 const {
    data,
    error,
    fetchNextPage,
    hasNextPage,
    isFetching,
    isFetchingNextPage,
    status,
  } = useInfiniteQuery({
    queryKey: ['projects'],
    queryFn:  async ({ pageParam }) => {
      const res = await fetch('/api/projects?cursor=' + pageParam)
      return res.json()
    },
    initialPageParam: 0,
    getNextPageParam: (lastPage, pages) => lastPage.nextCursor,
  })
```

## Optimising orderBy+limit using indexes

Currently when processing an orderBy+limit clause, we push everything through the pipeline, fully sort (albeit incrementally) and maintain that state in the orderBy operator, and push out the limited results.

With the new indexes, as they are ordered we can use them to inject messages in order into the graph *in order*, and then stop once we have the limited results.

Take this example:

```ts
const comments = useLiveQuery((q) =>
  q
    .from({ comment: commentsCollection })
    .join({ user: usersCollection }, ({ comment, user }) =>
      eq(comment.userId, user.id)
    )
    .where(({ comment, user }) =>
      and(eq(comment.issueId, "123"), eq(user.isBlocked, false))
    )
    .orderBy(({ comment }) => comment.createdAt)
    .limit(10)
);
```

If there is an index on the `createdAt` field, we know that we need at least the first 10 records ordered by `createdAt`. We push them into the graph and count the output. If the output is `n` less than 10, we then push the next `n` records into the graph and count the output again. We continue this process until we have the limited results.

With the work going into optimising joins to use indexes, one side of the join is pushed though with all its rows (driving side), and the other side is pulled based on the join (lazy side). However we continue to sent all live rows on the lazy side, the *driving* side of the join just messages the lazy side of the join to send the initial state of specific rows if they are needed. We then maintain a set of set row keys internally to handle update->insert flips (when a row had not been sent yet), and filtering out deletes of rows that were never sent.

We can reuse all this machinery for injecting the rows when stepping through until we reach the limit of the live query.

## Live queries with changeable offset and limit

Currently the `offset` and `limit` are baked into the query, and cannot be changed. However we need to enable this to make the query pageable:

```ts
const comments = useLiveQuery((q) =>
  q
    // ...
    .offset(0)
    .limit(20)
)

// Some time later
comments.utils.updateQuery({
  offset: 10,
  limit: 30,
})
```

This will then push additional rows into the graph until we reach the limit. and Output the inserts+deletes to change the materialized state.

## Lazy sync

If a collection is ordered on the same property as the `orderBy` clause on a live query, and we are using the infinite query pattern, we can lazily sync the collection.

When the live query asked for the next `n` rows, if we do not have them, we can ask the sync implementation to get the next page of rows.

So in the case of a "Query Collection", it would call the `fetchNextPage` method on the query, sync them into the collection, and them push them into the live query graph.

As the sync is async, we would want to make the `utils.updateQuery` method return a promise that resolves when any sync driven by the change in limit is complete.

## Sync implementation

We need a way to tell the sync engine to get the next page of rows. We can do this by adding a `registerNextPageCallback` method to the `sync` implementation:

```ts
const sync = {
  sync: ({ begin, write, commit, markReady, registerNextPageCallback }) => {
    registerNextPageCallback(async () => {
      // Fetch the next page of rows
      const nextPage = await fetchNextPage()
      // Sync them into the collection
      begin()
      for (const row of nextPage) {
        write({
          type: `insert`,
          value: row,
        })
      }
      commit()
    })
  }
}
```

Here we have explicitly gone and fetched the next page of rows, and then synced them into the collection, but the implementation of how to handle this would vary between sync engines.

When a sync implementation registers a callback that collection is marked as being "paged" and will trigger the callback when a query asks for more data.

For the Query Collection, this would be tied into the infinite query pattern, with the page or cursor being passed as the `pageParam` to the query function.

## Query Collection

We want to the query colleciton to expose exactly the same api as an infinite query for configuration. Internally it will use the `fetchNextPage` callback, but this isn't something that the user should have to know about.

```ts
const queryCollection = createCollection(queryCollectionOptions({
  queryKey: ['projects'],
  queryFn:  async ({ pageParam }) => {
    const res = await fetch('/api/projects?cursor=' + pageParam)
    return res.json()
  },
  initialPageParam: 0,
  getNextPageParam: (lastPage, pages) => lastPage.nextCursor,
}))
```

You can then "just" query the collection and it will fetch the next page of rows when needed.

## Not fully considered

- How to remove older pages from the beginning of the collection as you can in TanStack Query.
- We likely want/need to proactively fetch the next page of rows when the user scrolls *close* to the end of the current page.
- How to indicate which prop the collection is ordered by...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Paginated / Infinite Collections #343

Infinite Queries

Optimising orderBy+limit using indexes

Live queries with changeable offset and limit

Lazy sync

Sync implementation

Query Collection

Not fully considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Paginated / Infinite Collections #343

Description

Infinite Queries

Optimising orderBy+limit using indexes

Live queries with changeable offset and limit

Lazy sync

Sync implementation

Query Collection

Not fully considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions