SQL Seek Method or Keyset Pagination
Last modified: Jul 15, 2021
In this article, we are going to see what the SQL Seek Method or Keyset Pagination is and why you should consider it when navigating over large results sets.
The goal of pagination is to avoid fetching large volumes of data since the UI has a limited viewport that could be used to display data.
OFFSET Pagination
Before discussing Keyset Pagination, let’s see how the default OFFSET pagination works in SQL.
Although relational database systems have long been providing specific ways of restricting a query result set, since SQL:2008, there is a standard pagination syntax.
Therefore, a TOP-N query that limits the number of records of a given result set can use the FETCH FIRST N ROWS ONLY
directive, as illustrated by the following example:
SELECT id
FROM post
ORDER BY created_on DESC
FETCH FIRST 50 ROWS ONLY
And, a NEXT-N query that skips over the first M records and fetches the next N records looks like this:
SELECT id
FROM post
ORDER BY created_on DESC
OFFSET 150 ROWS
FETCH NEXT 50 ROWS ONLY
OFFSET Pagination Indexing
Since pagination requires an ORDER BY
clause in order to guarantee a consistent sort order, it’s common to index the sorting criteria.
In our case, we need to create the following index on the created_on
column:
CREATE INDEX idx_post_created_on ON post (created_on DESC)
When executing the TOP-N query, we can see that the idx_post_created_on
is being used, and only 50 records are being scanned:
SELECT id
FROM post
ORDER BY created_on DESC
FETCH FIRST 50 ROWS ONLY
Limit (cost=0.28..2.51 rows=50 width=16)
(actual time=0.013..0.022 rows=50 loops=1)
-> Index Scan using idx_post_created_on on post p
(cost=0.28..223.28 rows=5000 width=16)
(actual time=0.013..0.019 rows=50 loops=1)
Planning time: 0.113 ms
Execution time: 0.055 ms
For the second page, we can see that the idx_post_created_on
has to scan 100 records because it needs to skip over the first 50 rows contained on the first page in order to load the next 50 records that are needed to be returned by this query:
SELECT id
FROM post
ORDER BY created_on DESC
OFFSET 50 ROWS
FETCH NEXT 50 ROWS ONLY
Limit (cost=2.51..4.74 rows=50 width=16)
(actual time=0.032..0.044 rows=50 loops=1)
-> Index Scan using idx_post_created_on on post p
(cost=0.28..223.28 rows=5000 width=16)
(actual time=0.022..0.040 rows=100 loops=1)
Planning time: 0.198 ms
Execution time: 0.071 ms
The further away we go from the first page, the more records will need to be scanned by the idx_post_created_on
index in order to skip over the records indicated by the OFFSET
clause:
SELECT id
FROM post
ORDER BY created_on DESC
OFFSET 4950 ROWS
FETCH NEXT 50 ROWS ONLY
Limit (cost=221.05..223.28 rows=50 width=16)
(actual time=1.154..1.166 rows=50 loops=1)
-> Index Scan using idx_post_created_on on post p
(cost=0.28..223.28 rows=5000 width=16)
(actual time=0.079..1.033 rows=5000 loops=1)
Planning time: 1.629 ms
Execution time: 1.190 ms
Note that scanning the entire idx_post_created_on
index takes 20 times more than scanning a single page, which was the case for the initial TOP-N query.
To cope with this index scanning issue that’s inherent in the OFFSET pagination, we can use the Seek Method or Keyset Pagination technique.
The TOP-N Keyset Pagination query looks as follows:
SELECT id, created_on
FROM post
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY
Notice that we need to include the id
in the ORDER BY clause since the created_on
column values are not unique. Hence, we will need to pass both the last processed created_on
and id
when loading the next page. Therefore, this time, the query projection needs to load the created_on
column as well.
The Next-N query will use the previously processed created_on
and id
column values to locate the next page of records that need to be loaded.
SELECT id, created_on
FROM post
WHERE
(created_on, id) < ('2019-10-02 21:00:00.0', 4951)
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY
The (created_on, id) < ('2019-10-02 21:00:00.0', 4951)
row value expression is equivalent to:
created_on < '2019-10-02 21:00:00.0' OR
(
(created_on = '2019-10-02 21:00:00.0') AND
(id < 4951)
)
SQL Seek Method or Keyset Pagination Indexing
Because the Seek Method uses both the created_on
and the id
columns in the ORDER BY
clause, we can create the idx_post_created_on
index on both these two columns:
CREATE INDEX idx_post_created_on ON post (created_on DESC, id DESC)
Now, when executing the TOP-N Keyset Pagination query, we can see that it uses the idx_post_created_on
index, and just 50 records are scanned:
SELECT id, created_on
FROM post
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY
Limit (cost=0.28..1.91 rows=50 width=16)
(actual time=0.104..0.110 rows=50 loops=1)
-> Index Only Scan using idx_post_created_on_id on post
(cost=0.28..163.28 rows=5000 width=16)
(actual time=0.102..0.107 rows=50 loops=1)
Heap Fetches: 0
Planning Time: 0.882 ms
Execution Time: 0.129 ms
The Next-N Keyset Pagination query also uses the idx_post_created_on
index and, unlike the OFFSET Pagination, only 50 rows are scanned this time:
SELECT id, created_on
FROM post
WHERE
(created_on, id) < ('2019-10-02 21:00:00.0', 4951)
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY
Limit (cost=0.28..3.40 rows=50 width=32)
(actual time=0.029..0.063 rows=50 loops=1)
-> Index Scan using idx_post_created_on_id on post
(cost=0.28..308.58 rows=4950 width=32)
(actual time=0.027..0.057 rows=50 loops=1)
Index Cond: (
created_on <=
'2020-04-24 06:00:00'::timestamp without time zone
)
Filter: (
ROW(created_on, (id)::numeric) <
ROW('2020-04-24 06:00:00'::timestamp without time zone, '4951'::numeric)
)
Rows Removed by Filter: 2
Heap Fetches: 52
Planning Time: 0.806 ms
Execution Time: 0.158 ms
And, loading the last page will be fast as well since Keyset Pagination doesn’t need to scan the entire index in order to skip over the OFFSET records:
SELECT id, created_on
FROM post
WHERE
(created_on, id) < ('2019-10-03 02:00:00.0', 51)
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY
Limit (cost=48.82..48.83 rows=1 width=16)
(actual time=0.168..0.175 rows=50 loops=1)
-> Sort (cost=48.82..48.83 rows=1 width=16)
(actual time=0.166..0.170 rows=50 loops=1)
Sort Key: created_on DESC, id DESC
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on post
(cost=4.76..48.81 rows=1 width=16)
(actual time=0.071..0.085 rows=50 loops=1)
Recheck Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone)
Filter: (
(created_on < '2019-10-03 02:00:00'::timestamp without time zone) OR
(
(created_on = '2019-10-03 02:00:00'::timestamp without time zone) AND
(id < '51'::bigint)
)
)
Rows Removed by Filter: 2
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_post_created_on_id
(cost=0.00..4.75 rows=63 width=0)
(actual time=0.061..0.062 rows=52 loops=1)
Index Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone)
Planning Time: 0.676 ms
Execution Time: 0.279 ms
Cool, right?
Conclusion
The Keyset Pagination allows you to use an index to locate the first record of any page that needs to be navigated, and, for this reason, the SQL query can scan fewer records than when using the default OFFSET pagination.