diff --git a/.rubocop.yml b/.rubocop.yml index 4582853..0a33e84 100644 --- a/.rubocop.yml +++ b/.rubocop.yml @@ -32,3 +32,9 @@ Layout/LineLength: Naming/VariableNumber: EnforcedStyle: snake_case + +# In Ruby 3.0 interpolated strings will no longer be frozen automatically, so +# to ensure consistent performance, we have to manually call `String#freeze` in +# some places. +Style/RedundantFreeze: + Enabled: false diff --git a/CHANGELOG.md b/CHANGELOG.md index 8a8315b..d0de991 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,17 @@ These are the latest changes on the project's `master` branch that have not yet Follow the same format as previous releases by categorizing your feature into "Added", "Changed", "Deprecated", "Removed", "Fixed", or "Security". ---> +### Changed +- **Breaking change:** The way records are retrieved from a given cursor has been changed to no longer use `CONCAT` but instead simply use a compound `WHERE` clause in case of a custom order and having both the custom field as well as the `id` field in the `ORDER BY` query. This is a breaking change since it now changes the internal order of how records with the same value of the `order_by` field are returned. +- Remove `ORDER BY` clause from `COUNT` queries + +### Fixed +- Only trigger one SQL query to load the records from the database and use it to determine if there was a previous / is a next page +- Memoize the `Paginator#page` method which is invoked multiple times to prevent it from mapping over the `records` again and again and rebuilding all cursors + +### Added +- Description about `order_by` on arbitrary SQL to README.md + ## [0.1.3] - 2021-03-17 ### Changed @@ -21,7 +32,7 @@ These are the latest changes on the project's `master` branch that have not yet - Reference changelog file in the gemspec instead of the general releases Github tab ### Removed -- Remove bulk from release: The previous gem releases contained files like the content of the `bin` folder or the Gemfile used for testing. Since this is not useful for gem users, adjust the gemspec file accordingly. +- Remove bulk from release: The previous gem releases contained files like the content of the `bin` folder or the Gemfile used for testing. Since this is not useful for gem users, adjust the gemspec file accordingly. ## [0.1.2] - 2021-02-04 diff --git a/README.md b/README.md index f8b8eed..c8fd8d9 100644 --- a/README.md +++ b/README.md @@ -163,9 +163,44 @@ Of course, this can both be combined with `first`, `last`, `before`, and `after` **Important:** If your app regularly orders by another column, you might want to add a database index for this. -Say that your order column is `author` then your index should be on `CONCAT(author, '-', id)`. +Say that your order column is `author` then you'll want to add a compound index on `(author, id)`. +If your table is called `posts` you can use a query like this in MySQL or Postgres: +```sql +CREATE INDEX index_posts_on_author_and_id ON posts (author, id); +``` +Or you can just do it via an `ActiveRecord::Migration`: +```ruby +class AddAuthorAndIdIndexToPosts < ActiveRecord::Migration + def change + add_index :posts, %i[author id] + end +end +``` Please take a look at the _"How does it work?"_ to find out more why this is necessary. + +#### Order by more complex logic + +Sometimes you might not only want to oder by a column ascending or descending, but need more complex logic. +Imagine you would also store the post's `category` on the `posts` table (as a plain string for simplicity's sake). +And the category could be `pinned`, `announcement`, or `general`. +Then you might want to show all `pinned` posts first, followed by the `announcement` ones and lastly show the `general` posts. + +In MySQL you could e.g. use a `FIELD(category, 'pinned', 'announcement', 'general')` query in the `ORDER BY` clause to achieve this. +However, you cannot add an index to such a statement. +And therefore, the performance of this is – especially when using cursor pagination where we not only have an `ORDER BY` clause but also need it twice in the `WHERE` clauses – is pretty dismal. + +For this reason, the gem currently only supports ordering by natural columns of the relation. +You **cannot** pass a generic SQL query to the `order_by` parameter. + +Implementing support for arbitrary SQL queries would also be fairly complex to handle in this gem. +We would have to ensure that SQL injection attacks aren't possible by passing malicious code to the `oder_by` parameter. +And we would need to return the data produced by the statement so that it can be encoded in the cursor. +This is, for now, out of scope of the functionality of this gem. + +What is recommended if you _do_ need to order by more complex logic is to have a separate column that you only use for ordering. +You can use `ActiveRecord` hooks to automatically update this column whenever you change your data. +Or, for example in MySQL, you can also use a [generated column](https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html) that is automatically being updated by the database based on some stored logic. ### Configuration options @@ -314,11 +349,11 @@ This will issue the following SQL query: ```sql SELECT * FROM "posts" -ORDER BY CONCAT(author, '-', "posts"."id") ASC +ORDER BY "posts"."author" ASC, "posts"."id" ASC LIMIT 2 ``` -As you can see, this will now order by a concatenation of the requested column, a dash `-`, and the ID column. +As you can see, this will now order by the author first, and if two records have the same author it will order them by ID. Ordering only the author is not enough since we cannot know if the custom column only has unique values. And we need to guarantee the correct order of ambiguous records independent of the direction of ordering. This unique order is the basis of being able to paginate forward and backward repeatedly and getting the correct records. @@ -344,17 +379,19 @@ We get this SQL query: ```sql SELECT * FROM "posts" -WHERE CONCAT(author, '-', "posts"."id") > 'Jane-4' -ORDER BY CONCAT(author, '-', "posts"."id") ASC +WHERE (author > 'Jane' OR (author = 'Jane') AND ("posts"."id" > 4)) +ORDER BY "posts"."author" ASC, "posts"."id" ASC LIMIT 2 ``` -You can see how the cursor is being translated into the WHERE clause to uniquely identify the row and properly filter based on this. +You can see how the cursor is being used by the WHERE clause to uniquely identify the row and properly filter based on this. +We only want to get records that either have a name that is alphabetically _after_ `"Jane"` or another `"Jane"` record with an ID that is higher than `4`. We will get the records #5 and #2 as response. -As you can see, when using a custom `order_by`, the concatenation is used for both filtering and ordering. +When using a custom `order_by`, this affects both filtering as well as ordering. Therefore, it is recommended to add an index for columns that are frequently used for ordering. -In our test case we would want to add an index for `CONCAT(author, '-', id)`. +In our test case we would want to add a compound index for the `(author, id)` column combination. +Databases like MySQL and Postgres are able to then use the leftmost part of the index, in our case `author`, by its own _or_ can use it combined with the `id` index. ## Development diff --git a/lib/rails_cursor_pagination.rb b/lib/rails_cursor_pagination.rb index 7f1287e..645dd16 100644 --- a/lib/rails_cursor_pagination.rb +++ b/lib/rails_cursor_pagination.rb @@ -96,11 +96,11 @@ # # SELECT * # FROM "posts" -# ORDER BY CONCAT(author, '-', "posts"."id") ASC +# ORDER BY "posts"."author" ASC, "posts"."id" ASC # LIMIT 2 # -# As you can see, this will now order by a concatenation of the requested -# column, a dash `-`, and the ID column. Ordering only the author is not +# As you can see, this will now order by the author first, and if two records +# have the same author it will order them by ID. Ordering only the author is not # enough since we cannot know if the custom column only has unique values. # And we need to guarantee the correct order of ambiguous records independent # of the direction of ordering. This unique order is the basis of being able @@ -128,18 +128,22 @@ # # SELECT * # FROM "posts" -# WHERE CONCAT(author, '-', "posts"."id") > 'Jane-4' -# ORDER BY CONCAT(author, '-', "posts"."id") ASC +# WHERE (author > 'Jane' OR (author = 'Jane') AND ("posts"."id" > 4)) +# ORDER BY "posts"."author" ASC, "posts"."id" ASC # LIMIT 2 # -# You can see how the cursor is being translated into the WHERE clause to -# uniquely identify the row and properly filter based on this. We will get -# the records #5 and #2 as response. -# -# As you can see, when using a custom `order_by`, the concatenation is used -# for both filtering and ordering. Therefore, it is recommended to add an -# index for columns that are frequently used for ordering. In our test case -# we would want to add an index for `CONCAT(author, '-', id)`. +# You can see how the cursor is being used by the WHERE clause to uniquely +# identify the row and properly filter based on this. We only want to get +# records that either have a name that is alphabetically after `"Jane"` or +# another `"Jane"` record with an ID that is higher than `4`. We will get the +# records #5 and #2 as response. +# +# When using a custom `order_by`, this affects both filtering as well as +# ordering. Therefore, it is recommended to add an index for columns that are +# frequently used for ordering. In our test case we would want to add a compound +# index for the `(author, id)` column combination. Databases like MySQL and +# Postgres are able to then use the leftmost part of the index, in our case +# `author`, by its own _or_ can use it combined with the `id` index. # module RailsCursorPagination class Error < StandardError; end diff --git a/lib/rails_cursor_pagination/paginator.rb b/lib/rails_cursor_pagination/paginator.rb index f6ed996..21876cb 100644 --- a/lib/rails_cursor_pagination/paginator.rb +++ b/lib/rails_cursor_pagination/paginator.rb @@ -34,8 +34,12 @@ class InvalidCursorError < ParameterError; end # Cursor to paginate upto (excluding). Can be combined with `last`. # @param order_by [Symbol, String, nil] # Column to order by. If none is provided, will default to ID column. - # NOTE: this will cause an SQL `CONCAT` query. Therefore, you might want - # to add an index to your database: `CONCAT(, '-', id)` + # NOTE: this will cause the query to filter on both the given column as + # well as the ID column. So you might want to add a compound index to your + # database similar to: + # ```sql + # CREATE INDEX ON (, id) + # ``` # @param order [Symbol, nil] # Ordering to apply, either `:asc` or `:desc`. Defaults to `:asc`. # @@ -143,11 +147,13 @@ def page_info # # @return [Array] List of hashes, each with a `cursor` and `data` def page - records.map do |item| - { - cursor: cursor_for_record(item), - data: item - } + memoize :page do + records.map do |item| + { + cursor: cursor_for_record(item), + data: item + } + end end end @@ -155,7 +161,7 @@ def page # # @return [Integer] def total - memoize(:total) { @relation.size } + memoize(:total) { @relation.reorder('').size } end # Check if the pagination direction is forward @@ -182,11 +188,12 @@ def previous_page? # When paginating forward, we can only have a previous page if we were # provided with a cursor and there were records discarded after applying # this filter. These records would have to be on previous pages. - @cursor.present? && filtered_and_sorted_relation.size < total + @cursor.present? && + filtered_and_sorted_relation.reorder('').size < total else # When paginating backwards, if we managed to load one more record than # requested, this record will be available on the previous page. - @page_size < limited_relation_plus_one.size + records_plus_one.size > @page_size end end @@ -197,12 +204,12 @@ def next_page? if paginate_forward? # When paginating forward, if we managed to load one more record than # requested, this record will be available on the next page. - @page_size < limited_relation_plus_one.size + records_plus_one.size > @page_size else # When paginating backward, if applying our cursor reduced the number # records returned, we know that the missing records will be on # subsequent pages. - filtered_and_sorted_relation.size < total + filtered_and_sorted_relation.reorder('').size < total end end @@ -210,7 +217,7 @@ def next_page? # # @return [Array] def records - records = limited_relation_plus_one.first(@page_size) + records = records_plus_one.first(@page_size) paginate_forward? ? records : records.reverse end @@ -218,11 +225,13 @@ def records # Apply limit to filtered and sorted relation that contains one item more # than the user-requested page size. This is useful for determining if there # is an additional page available without having to do a separate DB query. + # Then, fetch the records from the database to prevent multiple queries to + # load the records and count them. # # @return [ActiveRecord::Relation] - def limited_relation_plus_one - memoize :limited_relation_plus_one do - filtered_and_sorted_relation.limit(@page_size + 1) + def records_plus_one + memoize :records_plus_one do + filtered_and_sorted_relation.limit(@page_size + 1).load end end @@ -374,38 +383,6 @@ def decoded_cursor_field decoded_cursor.first end - # The SQL identifier of the column we need to consider for both ordering and - # filtering. - # - # In case we have a custom field order, this is a concatenation - # of the custom order field and the ID column joined by a dash. This is to - # ensure uniqueness of records even if they might have duplicates in the - # custom order field. If we don't have a custom order, it just returns a - # reference to the table's ID column. - # - # This uses the fully qualified and escaped reference to the ID column to - # prevent ambiguity in case of a query that uses JOINs and therefore might - # have multiple ID columns. - # - # @return [String] - def sql_column - memoize :sql_column do - escaped_table_name = @relation.quoted_table_name - escaped_id_column = @relation.connection.quote_column_name(:id) - - id_column = "#{escaped_table_name}.#{escaped_id_column}" - - sql = - if custom_order_field? - "CONCAT(#{@order_field}, '-', #{id_column})" - else - id_column - end - - Arel.sql(sql) - end - end - # Ensure that the relation has the ID column and any potential `order_by` # column selected. These are required to generate the record's cursor and # therefore it's crucial that they are part of the selected fields. @@ -432,19 +409,60 @@ def relation_with_cursor_fields # # @return [ActiveRecord::Relation] def sorted_relation + unless custom_order_field? + return relation_with_cursor_fields.reorder id: pagination_sorting.upcase + end + relation_with_cursor_fields - .reorder(sql_column => pagination_sorting.upcase) + .reorder(@order_field => pagination_sorting.upcase, + id: pagination_sorting.upcase) + end + + # Return a properly escaped reference to the ID column prefixed with the + # table name. This prefixing is important in case of another model having + # been joined to the passed relation. + # + # @return [String (frozen)] + def id_column + escaped_table_name = @relation.quoted_table_name + escaped_id_column = @relation.connection.quote_column_name(:id) + + "#{escaped_table_name}.#{escaped_id_column}".freeze end # Applies the filtering based on the provided cursor and order column to the # sorted relation. # + # In case a custom `order_by` field is provided, we have to filter based on + # this field and the ID column to ensure reproducible results. + # + # To better understand this, let's consider our example with the `posts` + # table. Say that we're paginating forward and add `order_by: :author` to + # the call, and if the cursor that is passed encodes `['Jane', 4]`. In this + # case we will have to select all posts that either have an author whose + # name is alphanumerically greater than 'Jane', or if the author is 'Jane' + # we have to ensure that the post's ID is greater than `4`. + # + # So our SQL WHERE clause needs to be something like: + # WHERE author > 'Jane' OR author = 'Jane' AND id > 4 + # # @return [ActiveRecord::Relation] def filtered_and_sorted_relation memoize :filtered_and_sorted_relation do next sorted_relation if @cursor.blank? - sorted_relation.where "#{sql_column} #{filter_operator} ?", filter_value + unless custom_order_field? + next sorted_relation.where "#{id_column} #{filter_operator} ?", + decoded_cursor_id + end + + sorted_relation + .where("#{@order_field} #{filter_operator} ?", decoded_cursor_field) + .or( + sorted_relation + .where("#{@order_field} = ?", decoded_cursor_field) + .where("#{id_column} #{filter_operator} ?", decoded_cursor_id) + ) end end diff --git a/spec/rails_cursor_pagination/paginator_spec.rb b/spec/rails_cursor_pagination/paginator_spec.rb index cba90e8..d5e5d21 100644 --- a/spec/rails_cursor_pagination/paginator_spec.rb +++ b/spec/rails_cursor_pagination/paginator_spec.rb @@ -162,25 +162,25 @@ ] end let(:posts_by_author) do - # Note that the ID is being used in a string sorting. Therefore, the order - # is '1' < '10' < '2' (instead of 1 < 2 < 10 as it would be for integers). + # Posts are first ordered by the author's name and then, in case of two + # posts having the same author, by ID [ # All posts by "Jane" - post_13, post_2, post_3, post_5, post_7, - post_10, + post_13, # All posts by "Jess" post_9, - post_1, + post_10, # All posts by "John" - post_11, - post_12, + post_1, post_4, post_6, - post_8 + post_8, + post_11, + post_12 ] end