Home > Article > System Tutorial > PostgreSQL's journey of exploration
Postgres has several index types, and every new version seems to add some new index types. Each index type is useful, but which type to use depends on (1) the type of data, sometimes (2) the underlying data in the table and (3) the type of lookup performed. In the following content, we will introduce the index types you can use in Postgres, and when you should use which index type. Before we get started, here's a list of index types we'll walk you through:
B-Tree
Generalized Inverted Index (GIN)
Generalized Inverted Seach Tree (GiST)
Space partitioned GiST (SP-GiST)
Block Range Index (BRIN)
Hash
Now let’s start with indexing.
If you have a computer science degree, then B-Tree indexing may be the first index you learn. B-tree indexes create a tree that always maintains its balance. When it looks for something based on the index, it walks the tree to find the key and returns the data you're looking for. Using an index is significantly faster than a sequential scan because it only needs to read a few pages (when you are returning only a few records) as opposed to sequentially scanning thousands of records.
If you run a standard CREATE INDEX statement, it will create a B-tree index for you. B-tree indexes are valuable on most data types, such as text, numbers, and timestamps. If you're just starting to use indexes in your database, and don't use too many of Postgres' advanced features on your database, using a standard B-Tree index is probably your best option.
GIN index for multi-valued columnsGeneralized Inverted Index, generally called GIN, is mostly suitable for data types when a single column contains multiple values.
According to Postgres documentation:
"GIN is designed to handle situations where the entry being indexed is a compound value, and the query processed by the index needs to search for values that occur in the compound entry. For example, this entry might be a document, and the query can search for the value contained in the document. Specify characters."
The most common data types included in this range are:
hStore
Array
Range
JSONB
One of the most satisfying things about GIN indexes is their ability to understand data stored in composite values. However, because a GIN index requires specific knowledge of the data structure for each individual type being added, not all data types are supported by the GIN index.
Generalized Inverted Seach Tree (GiST) index is mostly suitable when your data overlaps with other rows of data in the same column. The best use of GiST indexes is if you declare a geometry data type and you want to know whether two polygons contain some points. In one case a particular point may be contained in a box, while at the same time, other points only exist in a polygon. Common data types indexed using GiST are:
Geometry type
Text type requiring full-text search
There are many fixed limits on the size of GiST indexes, otherwise, GiST indexes may become extremely large. At the cost of this, GiST indexes are lossy (inexact).
According to official documentation:
"GiST indexes are lossy, which means that the index may produce false matches, so it is necessary to check the real table rows to eliminate false matches. (PostgreSQL will automatically perform this action when necessary)"
This does not mean that you will get a false result, it just means that Postgres will do a small extra work to filter these false results before returning the data to you.
Special note: GIN and GiST indexes can often be used on the same data type. Usually one has good performance but takes up a lot of disk space, and vice versa. When it comes to GIN vs. GiST, there is no one-size-fits-all solution that will work in every situation, but the above rules should apply to most common situations.
SP-GiST index for larger dataThe spatially partitioned GiST (SP-GiST) index adopts the spatially partitioned tree from Purdue Research. SP-GiST indexes are often used when your data has a natural clustering factor and is not a balanced tree. Phone numbers are a great example (at least US phone numbers are). They have the following format:
3-digit area code
3-digit prefix number (related to old telephone exchanges)
4-digit line number
This means that there is a natural clustering factor at the first three digits of the first set, followed by the second set of three digits, and then the numbers are evenly distributed. However, in some area codes of phone numbers, there is a higher saturation state than in others. The result can be a very unbalanced tree. Because there is a natural aggregation factor at the front and the data is not equally distributed, data like phone numbers might be a good case for SP-GiST.
Block range indexes (BRIN) focus on some situations like SP-GiST, they are best used when the data has some natural ordering, and the data volume is often large. If you have a billion records in chronological order, BRIN may come in handy. If you are querying a large set of data that is naturally grouped, such as several zip codes, BRIN can help you ensure that similar zip codes are stored in close locations on disk.
When you have a very large database sorted by date or zip code, the BRIN index allows you to skip or exclude some unnecessary data very quickly. Additionally, BRIN indexes are relatively small compared to the overall data size, so when you have a large data set, BRIN indexes can perform better.
Hash index, finally not afraid of crashHash indexes have been present in Postgres for many years, however, until Postgres 10 was released, there was a huge caveat about their use, it was not WAL-logged. This means that if your server crashes and you can't failover to a standby or restore from an archive using something like wal-g, then you will lose that index until you rebuild it. With the release of Postgres 10, they are now WAL-logged, so you may consider using them again, but the real question is, should you?
Hash indexes sometimes provide faster lookups than B-Tree indexes, and are also fast to create. The biggest problem is that they are restricted to only "equality" comparison operations, so you can only use them for exact match lookups. This makes hash indexes much less flexible than commonly used B-Tree indexes, and you shouldn't think of them as a replacement, but as an index for special cases.
Which one should you use?We have just introduced a lot, and it is normal if you are a little scared. If you know this before, CREATE INDEX will always create an index for you using a B-Tree, and the good news is that Postgres performs very well or very well for most databases. :) If you're considering using more Postgres features, here's a cheat sheet when you use other Postgres index types:
B-Tree - suitable for most data types and queries
GIN - for JSONB/hstore/arrays
GiST - suitable for full-text search and geometric data types
SP-GiST - suitable for large data sets that have natural aggregation factors but are unevenly distributed
BRIN - suitable for really large data sets with sequential order
Hash - good for equality operations, but usually a B-Tree index is still all you need.
If you have any questions or feedback about this article, feel free to join our slack channel.
The above is the detailed content of PostgreSQL's journey of exploration. For more information, please follow other related articles on the PHP Chinese website!