Improving Website Pagination Speed with MySQL
Nearly every website implements some sort of pagination of data. Pagination is the process of splitting large sets of information into multiple pages (i.e. search results, emails, etc). In most cases for pagination to work, the total number of items for the dataset must be calculated to know total number of pages. This is commonly done by performing two SQL queries, one to receive the data for the current page and another to count the total number of rows to later calculate the total number of pages.
Below are some methods to improve the performance of pagination, specifically in MySQL. There are also ways to implement pagination without knowing the total number of items in the dataset. That will be discussed later on in the document. As for the SQL queries, I ran them all on the same machine with the sample MySQL database. SQL caching was disabled on all queries.
MySQL SQL_CALC_FOUND_ROWS
Using two of the same queries to fetch the dataset and then the total number of rows for the dataset can be slow. Instead, you can use the MySQL option SQL_CALC_FOUND_ROWS
that makes the database calculate the total number of rows that can be returned for that query. After the initial query, a second query of SELECT FOUND_ROWS()
is required to get the actual number of found rows from the previous query. If you omit the option SQL_CALC_FOUND_ROWS
, the FOUND_ROWS
function will instead return the number of rows returned by the last query.
SELECT SQL_CALC_FOUND_ROWS * FROM `employees`
WHERE `first_name` LIKE '%George%'
LIMIT 10;
SELECT FOUND_ROWS();
As for the performance, this is when things can get a little bit tricky. Using SQL_CALC_FOUND_ROWS
will not always be more efficient. Depending on your table’s indexes, the columns used inside of the WHERE
condition, and based on how the result is being ordered, using SQL_CALC_FOUND_ROWS
could actually be much slower. In cases where you are comparing a non-indexed column, SQL_CALC_FOUND_ROWS
should be twice as fast compared to running a second query with COUNT(*)
. In cases where you are using a query based on an indexed column, using a second COUNT(*)
query is generally faster. More information about this can be found on Percona’s blog. The below example is searching a non-indexed column.
For the control query, we are searching for employees with the name George, and ordering results by their last name. Neither of those two columns are indexed. The total time took 0.57 (0.38 + 0.19) seconds on a table with around 300,000 rows.
MariaDB [employees]> SELECT * FROM `employees` WHERE `first_name` = 'George' ORDER BY `last_name` ASC LIMIT 1;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 242182 | 1955-02-21 | George | Akaboshi | M | 1990-06-28 |
+--------+------------+------------+-----------+--------+------------+
1 row in set (0.38 sec)
MariaDB [employees]> SELECT COUNT(*) FROM `employees` WHERE `first_name` = 'George';
+----------+
| COUNT(*) |
+----------+
| 231 |
+----------+
1 row in set (0.19 sec)
As for the query using SQL_CALC_FOUND_ROWS
, the initial query takes the same amount of time as the previous example, but we get a nice performance boost when we select the number of found rows. The total time only takes 0.38 seconds.
MariaDB [employees]> SELECT SQL_CALC_FOUND_ROWS * FROM `employees` WHERE `first_name` = 'George' ORDER BY `last_name` ASC LIMIT 1;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 242182 | 1955-02-21 | George | Akaboshi | M | 1990-06-28 |
+--------+------------+------------+-----------+--------+------------+
1 row in set (0.38 sec)
MariaDB [employees]> SELECT FOUND_ROWS();
+--------------+
| FOUND_ROWS() |
+--------------+
| 231 |
+--------------+
1 row in set (0.00 sec)
Using Information Schema
I’m generally against using the INFORMATION_SCHEMA
database directly in application code because the output can be inaccurate and changes to the database can break your application. With that aside, you can use the information schema’s TABLES
table to get the total number of rows inside any given table. If you are not using any where
conditions or unions
, you will be able to get the total number of rows available for the table. A few places you generally see pagination without any where
statements being used are sitemaps, post archives, and admin panels where it lists all user accounts, employees, and other information. This method will not work with InnoDB tables.
To get the number of rows for a table from the information schema, run the below query. The query assumes your database name is “my_database” and the table is “my_table”. You can dynamically get the currently selected database by using SELECT DATABASE()
, but you should already have a variable with your database name.
SELECT `TABLE_ROWS` FROM `INFORMATION_SCHEMA`.`TABLES`
WHERE `TABLE_SCHEMA` = "my_database"
AND `TABLE_NAME` = "my_table";
As for the performance, it’s instant to select data from the information schema, compared to 0.15 seconds to count the rows in the employees table.
MariaDB [employees]> SELECT COUNT(*) FROM `employees`;
+----------+
| count(*) |
+----------+
| 300024 |
+----------+
1 row in set (0.15 sec)
MariaDB [employees]> SELECT `TABLE_ROWS` FROM `INFORMATION_SCHEMA`.`TABLES` WHERE `TABLE_SCHEMA`='employees' AND `TABLE_NAME`='employees';
+------------+
| TABLE_ROWS |
+------------+
| 300024 |
+------------+
1 row in set (0.00 sec)
Using a Memory Cache
In addition to the above methods, caching the results you receive from your database to memory will improve performance along with reduce database load. You will want to setup a cache server such as Memcached. Every time you need to request data from the database, you will first check the cache server. When the cache server doesn’t have the information you want, you will request the data from the database instead. With the response from the database, you will now save it to the cache server for next time.
Cache servers also provide the benefit of storing actual results from the database. Instead of having to query the database for a specific row or group of rows, this data can be stored in a memory cache for quicker access. Keep in mind that cache servers are not to be used for permanent storage. Items from the cache may be deleted before they are set to expire. Also note that cache servers store data as a key to value pair, which cause lookups to use a key such as hash(["user", ID])
instead of a SQL query.
Changing Pagination Implementation
Not all pagination implementations need to know the total number of pages available. For instance, an ajax infinite scrolling page doesn’t need to know if there are more results, it simply performs a SQL query for more data until there isn’t any more data left. There is also the pagination setup where you only show buttons to the next and previous result set. In this case, you only need to know if there is at least one more item that isn’t being viewed, which can be done by increasing your LIMIT
by one. If the response data set has less items than was specified in the LIMIT
, you know there are no more pages.