SQL Seek Method or Keyset Pagination

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see what the SQL Seek Method or Keyset Pagination is and why you should consider it when navigating over large results sets.

The goal of pagination is to avoid fetching large volumes of data since the UI has a limited viewport that could be used to display data.

OFFSET Pagination

Before discussing Keyset Pagination, let’s see how the default OFFSET pagination works in SQL.

Although relational database systems have long been providing specific ways of restricting a query result set, since SQL:2008, there is a standard pagination syntax.

Therefore, a TOP-N query that limits the number of records of a given result set can use the FETCH FIRST N ROWS ONLY directive, as illustrated by the following example:

SELECT id
FROM post
ORDER BY created_on DESC
FETCH FIRST 50 ROWS ONLY

And, a NEXT-N query that skips over the first M records and fetches the next N records looks like this:

SELECT id
FROM post
ORDER BY created_on DESC
OFFSET 150 ROWS
FETCH NEXT 50 ROWS ONLY

OFFSET Pagination Indexing

Since pagination requires an ORDER BY clause in order to guarantee a consistent sort order, it’s common to index the sorting criteria.

In our case, we need to create the following index on the created_on column:

CREATE INDEX idx_post_created_on ON post (created_on DESC)

When executing the TOP-N query, we can see that the idx_post_created_on is being used, and only 50 records are being scanned:

SELECT id
FROM post
ORDER BY created_on DESC
FETCH FIRST 50 ROWS ONLY

Limit  (cost=0.28..2.51 rows=50 width=16) 
       (actual time=0.013..0.022 rows=50 loops=1)
  ->  Index Scan using idx_post_created_on on post p  
         (cost=0.28..223.28 rows=5000 width=16) 
         (actual time=0.013..0.019 rows=50 loops=1)
		 
Planning time: 0.113 ms
Execution time: 0.055 ms

For the second page, we can see that the idx_post_created_on has to scan 100 records because it needs to skip over the first 50 rows contained on the first page in order to load the next 50 records that are needed to be returned by this query:

SELECT id
FROM post
ORDER BY created_on DESC
OFFSET 50 ROWS
FETCH NEXT 50 ROWS ONLY

Limit  (cost=2.51..4.74 rows=50 width=16) 
       (actual time=0.032..0.044 rows=50 loops=1)
  ->  Index Scan using idx_post_created_on on post p  
         (cost=0.28..223.28 rows=5000 width=16) 
         (actual time=0.022..0.040 rows=100 loops=1)
		 
Planning time: 0.198 ms
Execution time: 0.071 ms

The further away we go from the first page, the more records will need to be scanned by the idx_post_created_on index in order to skip over the records indicated by the OFFSET clause:

SELECT id
FROM post
ORDER BY created_on DESC
OFFSET 4950 ROWS
FETCH NEXT 50 ROWS ONLY

Limit  (cost=221.05..223.28 rows=50 width=16) 
       (actual time=1.154..1.166 rows=50 loops=1)
  ->  Index Scan using idx_post_created_on on post p  
         (cost=0.28..223.28 rows=5000 width=16) 
         (actual time=0.079..1.033 rows=5000 loops=1)
		 
Planning time: 1.629 ms
Execution time: 1.190 ms

Note that scanning the entire idx_post_created_on index takes 20 times more than scanning a single page, which was the case for the initial TOP-N query.

SQL Seek Method or Keyset Pagination

To cope with this index scanning issue that’s inherent in the OFFSET pagination, we can use the Seek Method or Keyset Pagination technique.

The TOP-N Keyset Pagination query looks as follows:

SELECT id, created_on
FROM post
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY

Notice that we need to include the id in the ORDER BY clause since the created_on column values are not unique. Hence, we will need to pass both the last processed created_on and id when loading the next page. Therefore, this time, the query projection needs to load the created_on column as well.

The Next-N query will use the previously processed created_on and id column values to locate the next page of records that need to be loaded.

SELECT id, created_on
FROM post
WHERE
  (created_on, id) < ('2019-10-02 21:00:00.0', 4951)
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY

The (created_on, id) < ('2019-10-02 21:00:00.0', 4951) row value expression is equivalent to:

created_on < '2019-10-02 21:00:00.0' OR 
(
    (created_on = '2019-10-02 21:00:00.0') AND 
    (id < 4951)
)

SQL Seek Method or Keyset Pagination Indexing

Because the Seek Method uses both the created_on and the id columns in the ORDER BY clause, we can create the idx_post_created_on index on both these two columns:

CREATE INDEX idx_post_created_on ON post (created_on DESC, id DESC)

Now, when executing the TOP-N Keyset Pagination query, we can see that it uses the idx_post_created_on index, and just 50 records are scanned:

SELECT id, created_on
FROM post
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY

Limit  (cost=0.28..1.91 rows=50 width=16) 
       (actual time=0.104..0.110 rows=50 loops=1)
  ->  Index Only Scan using idx_post_created_on_id on post  
        (cost=0.28..163.28 rows=5000 width=16) 
        (actual time=0.102..0.107 rows=50 loops=1)
        Heap Fetches: 0
        
Planning Time: 0.882 ms
Execution Time: 0.129 ms

The Next-N Keyset Pagination query also uses the idx_post_created_on index and, unlike the OFFSET Pagination, only 50 rows are scanned this time:

SELECT id, created_on
FROM post
WHERE
  (created_on, id) < ('2019-10-02 21:00:00.0', 4951)
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY

Limit  (cost=0.28..3.40 rows=50 width=32) 
       (actual time=0.029..0.063 rows=50 loops=1)
  ->  Index Scan using idx_post_created_on_id on post  
        (cost=0.28..308.58 rows=4950 width=32) 
        (actual time=0.027..0.057 rows=50 loops=1)
        Index Cond: (
          created_on <= 
          '2020-04-24 06:00:00'::timestamp without time zone
        )
        Filter: (
          ROW(created_on, (id)::numeric) < 
          ROW('2020-04-24 06:00:00'::timestamp without time zone, '4951'::numeric)
        )
        Rows Removed by Filter: 2
        Heap Fetches: 52

Planning Time: 0.806 ms
Execution Time: 0.158 ms

And, loading the last page will be fast as well since Keyset Pagination doesn’t need to scan the entire index in order to skip over the OFFSET records:

SELECT id, created_on
FROM post
WHERE
  (created_on, id) < ('2019-10-03 02:00:00.0', 51)
ORDER BY created_on DESC, id DESC
FETCH FIRST 50 ROWS ONLY

Limit  (cost=48.82..48.83 rows=1 width=16) 
       (actual time=0.168..0.175 rows=50 loops=1)
  ->  Sort  (cost=48.82..48.83 rows=1 width=16) 
            (actual time=0.166..0.170 rows=50 loops=1)
        Sort Key: created_on DESC, id DESC
        Sort Method: quicksort  Memory: 27kB
        ->  Bitmap Heap Scan on post  
              (cost=4.76..48.81 rows=1 width=16) 
              (actual time=0.071..0.085 rows=50 loops=1)
              Recheck Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone)
              Filter: (
                (created_on < '2019-10-03 02:00:00'::timestamp without time zone) OR 
                (
                  (created_on = '2019-10-03 02:00:00'::timestamp without time zone) AND 
                  (id < '51'::bigint)
                )
              )
              Rows Removed by Filter: 2
              Heap Blocks: exact=1
              ->  Bitmap Index Scan on idx_post_created_on_id  
                  (cost=0.00..4.75 rows=63 width=0) 
                  (actual time=0.061..0.062 rows=52 loops=1)
                    Index Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone)
                    
Planning Time: 0.676 ms
Execution Time: 0.279 ms

Cool, right?

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The Keyset Pagination allows you to use an index to locate the first record of any page that needs to be navigated, and, for this reason, the SQL query can scan fewer records than when using the default OFFSET pagination.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.