Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,9 @@ Layout/LineLength:

Naming/VariableNumber:
EnforcedStyle: snake_case

# In Ruby 3.0 interpolated strings will no longer be frozen automatically, so
# to ensure consistent performance, we have to manually call `String#freeze` in
# some places.
Style/RedundantFreeze:
Enabled: false
13 changes: 12 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,25 @@ These are the latest changes on the project's `master` branch that have not yet
Follow the same format as previous releases by categorizing your feature into "Added", "Changed", "Deprecated", "Removed", "Fixed", or "Security".
--->

### Changed
- **Breaking change:** The way records are retrieved from a given cursor has been changed to no longer use `CONCAT` but instead simply use a compound `WHERE` clause in case of a custom order and having both the custom field as well as the `id` field in the `ORDER BY` query. This is a breaking change since it now changes the internal order of how records with the same value of the `order_by` field are returned.
- Remove `ORDER BY` clause from `COUNT` queries

### Fixed
- Only trigger one SQL query to load the records from the database and use it to determine if there was a previous / is a next page
- Memoize the `Paginator#page` method which is invoked multiple times to prevent it from mapping over the `records` again and again and rebuilding all cursors

### Added
- Description about `order_by` on arbitrary SQL to README.md

## [0.1.3] - 2021-03-17

### Changed
- Make the gem publicly available via github.com/xing/rails_cursor_pagination and release it to Rubygems.org
- Reference changelog file in the gemspec instead of the general releases Github tab

### Removed
- Remove bulk from release: The previous gem releases contained files like the content of the `bin` folder or the Gemfile used for testing. Since this is not useful for gem users, adjust the gemspec file accordingly.
- Remove bulk from release: The previous gem releases contained files like the content of the `bin` folder or the Gemfile used for testing. Since this is not useful for gem users, adjust the gemspec file accordingly.

## [0.1.2] - 2021-02-04

Expand Down
53 changes: 45 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,44 @@ Of course, this can both be combined with `first`, `last`, `before`, and `after`

**Important:**
If your app regularly orders by another column, you might want to add a database index for this.
Say that your order column is `author` then your index should be on `CONCAT(author, '-', id)`.
Say that your order column is `author` then you'll want to add a compound index on `(author, id)`.
If your table is called `posts` you can use a query like this in MySQL or Postgres:
```sql
CREATE INDEX index_posts_on_author_and_id ON posts (author, id);
```
Or you can just do it via an `ActiveRecord::Migration`:
```ruby
class AddAuthorAndIdIndexToPosts < ActiveRecord::Migration
def change
add_index :posts, %i[author id]
end
end
```

Please take a look at the _"How does it work?"_ to find out more why this is necessary.

#### Order by more complex logic

Sometimes you might not only want to oder by a column ascending or descending, but need more complex logic.
Imagine you would also store the post's `category` on the `posts` table (as a plain string for simplicity's sake).
And the category could be `pinned`, `announcement`, or `general`.
Then you might want to show all `pinned` posts first, followed by the `announcement` ones and lastly show the `general` posts.

In MySQL you could e.g. use a `FIELD(category, 'pinned', 'announcement', 'general')` query in the `ORDER BY` clause to achieve this.
However, you cannot add an index to such a statement.
And therefore, the performance of this is – especially when using cursor pagination where we not only have an `ORDER BY` clause but also need it twice in the `WHERE` clauses – is pretty dismal.

For this reason, the gem currently only supports ordering by natural columns of the relation.
You **cannot** pass a generic SQL query to the `order_by` parameter.

Implementing support for arbitrary SQL queries would also be fairly complex to handle in this gem.
We would have to ensure that SQL injection attacks aren't possible by passing malicious code to the `oder_by` parameter.
And we would need to return the data produced by the statement so that it can be encoded in the cursor.
This is, for now, out of scope of the functionality of this gem.

What is recommended if you _do_ need to order by more complex logic is to have a separate column that you only use for ordering.
You can use `ActiveRecord` hooks to automatically update this column whenever you change your data.
Or, for example in MySQL, you can also use a [generated column](https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html) that is automatically being updated by the database based on some stored logic.

### Configuration options

Expand Down Expand Up @@ -314,11 +349,11 @@ This will issue the following SQL query:
```sql
SELECT *
FROM "posts"
ORDER BY CONCAT(author, '-', "posts"."id") ASC
ORDER BY "posts"."author" ASC, "posts"."id" ASC
LIMIT 2
```

As you can see, this will now order by a concatenation of the requested column, a dash `-`, and the ID column.
As you can see, this will now order by the author first, and if two records have the same author it will order them by ID.
Ordering only the author is not enough since we cannot know if the custom column only has unique values.
And we need to guarantee the correct order of ambiguous records independent of the direction of ordering.
This unique order is the basis of being able to paginate forward and backward repeatedly and getting the correct records.
Expand All @@ -344,17 +379,19 @@ We get this SQL query:
```sql
SELECT *
FROM "posts"
WHERE CONCAT(author, '-', "posts"."id") > 'Jane-4'
ORDER BY CONCAT(author, '-', "posts"."id") ASC
WHERE (author > 'Jane' OR (author = 'Jane') AND ("posts"."id" > 4))
ORDER BY "posts"."author" ASC, "posts"."id" ASC
LIMIT 2
```

You can see how the cursor is being translated into the WHERE clause to uniquely identify the row and properly filter based on this.
You can see how the cursor is being used by the WHERE clause to uniquely identify the row and properly filter based on this.
We only want to get records that either have a name that is alphabetically _after_ `"Jane"` or another `"Jane"` record with an ID that is higher than `4`.
We will get the records #5 and #2 as response.

As you can see, when using a custom `order_by`, the concatenation is used for both filtering and ordering.
When using a custom `order_by`, this affects both filtering as well as ordering.
Therefore, it is recommended to add an index for columns that are frequently used for ordering.
In our test case we would want to add an index for `CONCAT(author, '-', id)`.
In our test case we would want to add a compound index for the `(author, id)` column combination.
Databases like MySQL and Postgres are able to then use the leftmost part of the index, in our case `author`, by its own _or_ can use it combined with the `id` index.

## Development

Expand Down
30 changes: 17 additions & 13 deletions lib/rails_cursor_pagination.rb
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,11 @@
#
# SELECT *
# FROM "posts"
# ORDER BY CONCAT(author, '-', "posts"."id") ASC
# ORDER BY "posts"."author" ASC, "posts"."id" ASC
# LIMIT 2
#
# As you can see, this will now order by a concatenation of the requested
# column, a dash `-`, and the ID column. Ordering only the author is not
# As you can see, this will now order by the author first, and if two records
# have the same author it will order them by ID. Ordering only the author is not
# enough since we cannot know if the custom column only has unique values.
# And we need to guarantee the correct order of ambiguous records independent
# of the direction of ordering. This unique order is the basis of being able
Expand Down Expand Up @@ -128,18 +128,22 @@
#
# SELECT *
# FROM "posts"
# WHERE CONCAT(author, '-', "posts"."id") > 'Jane-4'
# ORDER BY CONCAT(author, '-', "posts"."id") ASC
# WHERE (author > 'Jane' OR (author = 'Jane') AND ("posts"."id" > 4))
# ORDER BY "posts"."author" ASC, "posts"."id" ASC
# LIMIT 2
#
# You can see how the cursor is being translated into the WHERE clause to
# uniquely identify the row and properly filter based on this. We will get
# the records #5 and #2 as response.
#
# As you can see, when using a custom `order_by`, the concatenation is used
# for both filtering and ordering. Therefore, it is recommended to add an
# index for columns that are frequently used for ordering. In our test case
# we would want to add an index for `CONCAT(author, '-', id)`.
# You can see how the cursor is being used by the WHERE clause to uniquely
# identify the row and properly filter based on this. We only want to get
# records that either have a name that is alphabetically after `"Jane"` or
# another `"Jane"` record with an ID that is higher than `4`. We will get the
# records #5 and #2 as response.
#
# When using a custom `order_by`, this affects both filtering as well as
# ordering. Therefore, it is recommended to add an index for columns that are
# frequently used for ordering. In our test case we would want to add a compound
# index for the `(author, id)` column combination. Databases like MySQL and
# Postgres are able to then use the leftmost part of the index, in our case
# `author`, by its own _or_ can use it combined with the `id` index.
#
module RailsCursorPagination
class Error < StandardError; end
Expand Down
118 changes: 68 additions & 50 deletions lib/rails_cursor_pagination/paginator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,12 @@ class InvalidCursorError < ParameterError; end
# Cursor to paginate upto (excluding). Can be combined with `last`.
# @param order_by [Symbol, String, nil]
# Column to order by. If none is provided, will default to ID column.
# NOTE: this will cause an SQL `CONCAT` query. Therefore, you might want
# to add an index to your database: `CONCAT(<order_by_field>, '-', id)`
# NOTE: this will cause the query to filter on both the given column as
# well as the ID column. So you might want to add a compound index to your
# database similar to:
# ```sql
# CREATE INDEX <index_name> ON <table_name> (<order_by_field>, id)
# ```
# @param order [Symbol, nil]
# Ordering to apply, either `:asc` or `:desc`. Defaults to `:asc`.
#
Expand Down Expand Up @@ -143,19 +147,21 @@ def page_info
#
# @return [Array<Hash>] List of hashes, each with a `cursor` and `data`
def page
records.map do |item|
{
cursor: cursor_for_record(item),
data: item
}
memoize :page do
records.map do |item|
{
cursor: cursor_for_record(item),
data: item
}
end
end
end

# Get the total number of records in the given relation
#
# @return [Integer]
def total
memoize(:total) { @relation.size }
memoize(:total) { @relation.reorder('').size }
end

# Check if the pagination direction is forward
Expand All @@ -182,11 +188,12 @@ def previous_page?
# When paginating forward, we can only have a previous page if we were
# provided with a cursor and there were records discarded after applying
# this filter. These records would have to be on previous pages.
@cursor.present? && filtered_and_sorted_relation.size < total
@cursor.present? &&
filtered_and_sorted_relation.reorder('').size < total
else
# When paginating backwards, if we managed to load one more record than
# requested, this record will be available on the previous page.
@page_size < limited_relation_plus_one.size
records_plus_one.size > @page_size
end
end

Expand All @@ -197,32 +204,34 @@ def next_page?
if paginate_forward?
# When paginating forward, if we managed to load one more record than
# requested, this record will be available on the next page.
@page_size < limited_relation_plus_one.size
records_plus_one.size > @page_size
else
# When paginating backward, if applying our cursor reduced the number
# records returned, we know that the missing records will be on
# subsequent pages.
filtered_and_sorted_relation.size < total
filtered_and_sorted_relation.reorder('').size < total
end
end

# Load the correct records and return them in the right order
#
# @return [Array<ActiveRecord>]
def records
records = limited_relation_plus_one.first(@page_size)
records = records_plus_one.first(@page_size)

paginate_forward? ? records : records.reverse
end

# Apply limit to filtered and sorted relation that contains one item more
# than the user-requested page size. This is useful for determining if there
# is an additional page available without having to do a separate DB query.
# Then, fetch the records from the database to prevent multiple queries to
# load the records and count them.
#
# @return [ActiveRecord::Relation]
def limited_relation_plus_one
memoize :limited_relation_plus_one do
filtered_and_sorted_relation.limit(@page_size + 1)
def records_plus_one
memoize :records_plus_one do
filtered_and_sorted_relation.limit(@page_size + 1).load
end
end

Expand Down Expand Up @@ -374,38 +383,6 @@ def decoded_cursor_field
decoded_cursor.first
end

# The SQL identifier of the column we need to consider for both ordering and
# filtering.
#
# In case we have a custom field order, this is a concatenation
# of the custom order field and the ID column joined by a dash. This is to
# ensure uniqueness of records even if they might have duplicates in the
# custom order field. If we don't have a custom order, it just returns a
# reference to the table's ID column.
#
# This uses the fully qualified and escaped reference to the ID column to
# prevent ambiguity in case of a query that uses JOINs and therefore might
# have multiple ID columns.
#
# @return [String]
def sql_column
memoize :sql_column do
escaped_table_name = @relation.quoted_table_name
escaped_id_column = @relation.connection.quote_column_name(:id)

id_column = "#{escaped_table_name}.#{escaped_id_column}"

sql =
if custom_order_field?
"CONCAT(#{@order_field}, '-', #{id_column})"
else
id_column
end

Arel.sql(sql)
end
end

# Ensure that the relation has the ID column and any potential `order_by`
# column selected. These are required to generate the record's cursor and
# therefore it's crucial that they are part of the selected fields.
Expand All @@ -432,19 +409,60 @@ def relation_with_cursor_fields
#
# @return [ActiveRecord::Relation]
def sorted_relation
unless custom_order_field?
return relation_with_cursor_fields.reorder id: pagination_sorting.upcase
end

relation_with_cursor_fields
.reorder(sql_column => pagination_sorting.upcase)
.reorder(@order_field => pagination_sorting.upcase,
id: pagination_sorting.upcase)
end

# Return a properly escaped reference to the ID column prefixed with the
# table name. This prefixing is important in case of another model having
# been joined to the passed relation.
#
# @return [String (frozen)]
def id_column
escaped_table_name = @relation.quoted_table_name
escaped_id_column = @relation.connection.quote_column_name(:id)

"#{escaped_table_name}.#{escaped_id_column}".freeze
end

# Applies the filtering based on the provided cursor and order column to the
# sorted relation.
#
# In case a custom `order_by` field is provided, we have to filter based on
# this field and the ID column to ensure reproducible results.
#
# To better understand this, let's consider our example with the `posts`
# table. Say that we're paginating forward and add `order_by: :author` to
# the call, and if the cursor that is passed encodes `['Jane', 4]`. In this
# case we will have to select all posts that either have an author whose
# name is alphanumerically greater than 'Jane', or if the author is 'Jane'
# we have to ensure that the post's ID is greater than `4`.
#
# So our SQL WHERE clause needs to be something like:
# WHERE author > 'Jane' OR author = 'Jane' AND id > 4
#
# @return [ActiveRecord::Relation]
def filtered_and_sorted_relation
memoize :filtered_and_sorted_relation do
next sorted_relation if @cursor.blank?

sorted_relation.where "#{sql_column} #{filter_operator} ?", filter_value
unless custom_order_field?
next sorted_relation.where "#{id_column} #{filter_operator} ?",
decoded_cursor_id
end

sorted_relation
.where("#{@order_field} #{filter_operator} ?", decoded_cursor_field)
.or(
sorted_relation
.where("#{@order_field} = ?", decoded_cursor_field)
.where("#{id_column} #{filter_operator} ?", decoded_cursor_id)
)
end
end

Expand Down
Loading