This tutorial explores the different types of indexes in databases in two parts:
- Indexes in General
- Data Structures
1. Indexes in General
Let us take our track table and if we run a query on it like:
WHERE album_id = 5;
If there are no indexes added to the table on this column, the database would have to scan row by row through the entire table to find all of the matching rows. If there were many rows in this table and we only expect to have a few rows returned, this can be quite an inefficient search. Imagine if there were one million rows in the table and this album_id did not even exist. This means that the database would have to search through all one million rows to identify that there were no matching values at all. If however, the database has an index on the album_id column, the database would be much more efficient to find the matching data.
The concept of the indexes is quite similar to the alphabetic index at the end of the book. The individual would only need to scan through the index fairly quickly to find the topics they are interested in, find what page to turn to, and go there. This would be a much faster approach than having to read through the entire book just to find the content that you needed. It does depend on the author of the book to create the topic list. Similarly, it is for the database developer to determine what indexes could be useful.
Once the index is created in the database, nothing else needs to occur as the database will automatically update the index when the table data is modified. The database will also make use of the index if it will be more useful than searching row by row. These indexes can be useful not only in SELECT statements with joins on the indexed columns but also in UPDATE and DELETE commands that have filtering criteria.
It is important to note that we do not want to index every single column in the database as creating the indexes can take a long time. There is also a cost to maintaining the index. This does depend on the database but it could occur with every insert, update or delete SQL statement that affects the index. The index has to be synchronized with the table so if there are indexes that are not frequently used, they should ideally be removed. By default, primary keys have indexes created for them as well as columns with the UNIQUE constraint.
2. Data Structures behind Indexes
Most databases will have implement indexes using one of the following data structures:
- Hash index – this type of index is based on an order list of hash values. Typically we will have a hash algorithm that is used to create a hash value from a key column. This value then points to an entry in a hash table which then points to an actual location of the data row. Hash indexes can only handle simple equality operators and in most cases will be the first index to use when we use the = operator.
- B-tree index – this type of index has the data organized in an upside-down tree. The index tree is stored separately from the data itself. The B-tree index self-balance meaning that it will take about the same amount of time to access any row in the index regardless of the size of the index. For example, with one million rows of data, it will take less than 20 searches to find data using a B-tree index compared to one million searches if it were done sequentially. This is the most common type of index in databases and generally is the most useful when we have limited repeating values. B-trees can handle equality and range queries quite well. If the query uses a <, <=, =, >= or > operator, a B-tree index if available could be used.
- Bitmap index – this type of index uses a bit array of zeros and ones. These are useful to represent a value or condition that may be true or false. For example, this could be useful for a check if an individual was signed up for a newsletter. Typically, since the values in the columns that have the bitmap index are one of two values, the operator most commonly used would be the equality operator =.
- Generalized Search Tree (GiST) – This is a more complex type of index unique to PostgreSQL that is not used often but focuses on certain functionality like searching for a value closest to another value or finding pattern matching. As such, there are special types of operators that could be useful for this purpose like <<, &< &>, >>, <<|, &<|, |&>, |>> @<, @>, ~= and &&. You may notice that we have not explored any of these operators are they are not used that frequently.
- Generalized Inverted Index (GIN) – This is a unique type of index for PostgreSQL that focuses on indexing array values and testing if a value exists. Operators that you would see with this would be <@, @>, = and &&.
Indexes are used to speed up queries and avoid having to search through data one row at a time.