- Published on
What is an index
Indexes are a way to quickly find data in your postgres database table. I'll assume anyone reading this guide has at least a basic idea of how they work.
We can use the postgres query planning (prefix EXPLAIN
to a SELECT or UPDATE query), which outputs an explanation of the costs involved with a query.
Run it with EXPLAIN (ANALYZE, BUFFERS)
for more info - it actually runs the query (caution: be aware of this if running an UPDATE query).
Postgres will keep the indexes up to date. Every time you add a new row, update a row or delete a row then the index will be kept in sync. This means it can slow down updates/inserts/deletes.
Indexes also take up additional space. If you have an index on multiple columns this additional space can increase even more.
How Postgres decides on what rows to return
If you do a query with a WHERE
condition, the Postgres planner will figure out how to filter those rows. You can see what kind it decided on by running a EXPLAIN
query.
There are a few main types:
Sequential Scan
- this does not involve an index
- it will look at every single row until row(s) are found
- If you have no
LIMIT
then this will look at every single row
Index scan
This uses an index, and for each match it will read row data from the table, and also checks MVCC visibility from the table
Bitmap index or heap scans
This will generate a bitmap of the matches (using index(es)). If multiple indexes are used, then the cross section of matches will be used. Once it knows what rows to return, it will then read the row data from the table.
Index only scan
This uses an index, but for each match it can return all the data directly from the index (no need to read data from the table itself). Sometimes uses the table to check for MVCC visibility
Order matters
B-tree indexes are sorted.
If you have sorted query (ORDER BY ...
), try and match it to an index.
Multi column indexes
Some apps can over use multi-column indexes. There are performance (and disk size) considerations to take into account. You can often get away with using just a single index to reduce the number of matches, then use a scan on those matches for any other where clause.
If you are using a multi column index in a b-tree index, then try and aim for the first condition to be an equality clause (such as some_column = ?
, as opposed to some_column > ?
), and all other conditions after that. The other conditions should be sorted so the most selective conditions are first (to filter out as many matches as possible, as quick as possible).
- Gist multi column indexes sort by num of distinct elements.
- Gin multi column indexes are expensive (and some like to avoid using them) but all indexed columns have equal weight
- Brin - again like Gin they all have an equal weight and the order does not matter
- Hash does not support multi column indexes