PostgreSQL's journey of exploration-LINUX-php.cn

Home

System Tutorial

LINUX

PostgreSQL's journey of exploration

王林

Jan 17, 2024 am 08:15 AM

linuxlinux tutorialRed Hatlinux systemlinux commandlinux certificationred hat linuxlinux video

PostgreSQLs journey of exploration

Postgres has several index types, and every new version seems to add some new index types. Each index type is useful, but which type to use depends on (1) the type of data, sometimes (2) the underlying data in the table and (3) the type of lookup performed. In the following content, we will introduce the index types you can use in Postgres, and when you should use which index type. Before we get started, here's a list of index types we'll walk you through:

B-Tree
Generalized Inverted Index (GIN)
Generalized Inverted Seach Tree (GiST)
Space partitioned GiST (SP-GiST)
Block Range Index (BRIN)
Hash
Now let’s start with indexing.

In Postgres, B-Tree indexes are the most common indexes you use

If you have a computer science degree, then B-Tree indexing may be the first index you learn. B-tree indexes create a tree that always maintains its balance. When it looks for something based on the index, it walks the tree to find the key and returns the data you're looking for. Using an index is significantly faster than a sequential scan because it only needs to read a few pages (when you are returning only a few records) as opposed to sequentially scanning thousands of records.

If you run a standard CREATE INDEX statement, it will create a B-tree index for you. B-tree indexes are valuable on most data types, such as text, numbers, and timestamps. If you're just starting to use indexes in your database, and don't use too many of Postgres' advanced features on your database, using a standard B-Tree index is probably your best option.

GIN index for multi-valued columns

Generalized Inverted Index, generally called GIN, is mostly suitable for data types when a single column contains multiple values.

According to Postgres documentation:

"GIN is designed to handle situations where the entry being indexed is a compound value, and the query processed by the index needs to search for values that occur in the compound entry. For example, this entry might be a document, and the query can search for the value contained in the document. Specify characters."

The most common data types included in this range are:

hStore
Array
Range
JSONB
One of the most satisfying things about GIN indexes is their ability to understand data stored in composite values. However, because a GIN index requires specific knowledge of the data structure for each individual type being added, not all data types are supported by the GIN index.

GiST index, for rows with overlapping values

Generalized Inverted Seach Tree (GiST) index is mostly suitable when your data overlaps with other rows of data in the same column. The best use of GiST indexes is if you declare a geometry data type and you want to know whether two polygons contain some points. In one case a particular point may be contained in a box, while at the same time, other points only exist in a polygon. Common data types indexed using GiST are:

Geometry type
Text type requiring full-text search
There are many fixed limits on the size of GiST indexes, otherwise, GiST indexes may become extremely large. At the cost of this, GiST indexes are lossy (inexact).

According to official documentation:

"GiST indexes are lossy, which means that the index may produce false matches, so it is necessary to check the real table rows to eliminate false matches. (PostgreSQL will automatically perform this action when necessary)"

This does not mean that you will get a false result, it just means that Postgres will do a small extra work to filter these false results before returning the data to you.

Special note: GIN and GiST indexes can often be used on the same data type. Usually one has good performance but takes up a lot of disk space, and vice versa. When it comes to GIN vs. GiST, there is no one-size-fits-all solution that will work in every situation, but the above rules should apply to most common situations.

SP-GiST index for larger data

The spatially partitioned GiST (SP-GiST) index adopts the spatially partitioned tree from Purdue Research. SP-GiST indexes are often used when your data has a natural clustering factor and is not a balanced tree. Phone numbers are a great example (at least US phone numbers are). They have the following format:

3-digit area code
3-digit prefix number (related to old telephone exchanges)
4-digit line number
This means that there is a natural clustering factor at the first three digits of the first set, followed by the second set of three digits, and then the numbers are evenly distributed. However, in some area codes of phone numbers, there is a higher saturation state than in others. The result can be a very unbalanced tree. Because there is a natural aggregation factor at the front and the data is not equally distributed, data like phone numbers might be a good case for SP-GiST.

BRIN index, for larger data

Block range indexes (BRIN) focus on some situations like SP-GiST, they are best used when the data has some natural ordering, and the data volume is often large. If you have a billion records in chronological order, BRIN may come in handy. If you are querying a large set of data that is naturally grouped, such as several zip codes, BRIN can help you ensure that similar zip codes are stored in close locations on disk.

When you have a very large database sorted by date or zip code, the BRIN index allows you to skip or exclude some unnecessary data very quickly. Additionally, BRIN indexes are relatively small compared to the overall data size, so when you have a large data set, BRIN indexes can perform better.

Hash index, finally not afraid of crash

Hash indexes have been present in Postgres for many years, however, until Postgres 10 was released, there was a huge caveat about their use, it was not WAL-logged. This means that if your server crashes and you can't failover to a standby or restore from an archive using something like wal-g, then you will lose that index until you rebuild it. With the release of Postgres 10, they are now WAL-logged, so you may consider using them again, but the real question is, should you?

Hash indexes sometimes provide faster lookups than B-Tree indexes, and are also fast to create. The biggest problem is that they are restricted to only "equality" comparison operations, so you can only use them for exact match lookups. This makes hash indexes much less flexible than commonly used B-Tree indexes, and you shouldn't think of them as a replacement, but as an index for special cases.

Which one should you use?

We have just introduced a lot, and it is normal if you are a little scared. If you know this before, CREATE INDEX will always create an index for you using a B-Tree, and the good news is that Postgres performs very well or very well for most databases. :) If you're considering using more Postgres features, here's a cheat sheet when you use other Postgres index types:

B-Tree - suitable for most data types and queries
GIN - for JSONB/hstore/arrays
GiST - suitable for full-text search and geometric data types
SP-GiST - suitable for large data sets that have natural aggregation factors but are unevenly distributed
BRIN - suitable for really large data sets with sequential order
Hash - good for equality operations, but usually a B-Tree index is still all you need.
If you have any questions or feedback about this article, feel free to join our slack channel.

The above is the detailed content of PostgreSQL's journey of exploration. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:Linux就该这么学. If there is any infringement, please contact admin@php.cn delete

How does performance differ between Linux and Windows for various tasks?May 14, 2025 am 12:03 AM

Linux performs well in servers and development environments, while Windows performs better in desktop and gaming. 1) Linux's file system performs well when dealing with large numbers of small files. 2) Linux performs excellently in high concurrency and high throughput network scenarios. 3) Linux memory management has more advantages in server environments. 4) Linux is efficient when executing command line and script tasks, while Windows performs better on graphical interfaces and multimedia applications.

How to Create GUI Applications In Linux Using PyGObjectMay 13, 2025 am 11:09 AM

Creating graphical user interface (GUI) applications is a fantastic way to bring your ideas to life and make your programs more user-friendly. PyGObject is a Python library that allows developers to create GUI applications on Linux desktops using the

How to Install LAMP Stack with PhpMyAdmin in Arch LinuxMay 13, 2025 am 11:01 AM

Arch Linux provides a flexible cutting-edge system environment and is a powerfully suited solution for developing web applications on small non-critical systems because is a completely open source and provides the latest up-to-date releases on kernel

How to Install LEMP (Nginx, PHP, MariaDB) on Arch LinuxMay 13, 2025 am 10:43 AM

Due to its Rolling Release model which embraces cutting-edge software Arch Linux was not designed and developed to run as a server to provide reliable network services because it requires extra time for maintenance, constant upgrades, and sensible fi

12 Must-Have Linux Console [Terminal] File ManagersMay 13, 2025 am 10:14 AM

Linux console file managers can be very helpful in day-to-day tasks, when managing files on a local machine, or when connected to a remote one. The visual console representation of the directory helps us quickly perform file/folder operations and sav

qBittorrent: A Powerful Open-Source BitTorrent ClientMay 13, 2025 am 10:12 AM

qBittorrent is a popular open-source BitTorrent client that allows users to download and share files over the internet. The latest version, qBittorrent 5.0, was released recently and comes packed with new features and improvements. This article will

Setup Nginx Virtual Hosts, phpMyAdmin, and SSL on Arch LinuxMay 13, 2025 am 10:03 AM

The previous Arch Linux LEMP article just covered basic stuff, from installing network services (Nginx, PHP, MySQL, and PhpMyAdmin) and configuring minimal security required for MySQL server and PhpMyadmin. This topic is strictly related to the forme

Zenity: Building GTK Dialogs in Shell ScriptsMay 13, 2025 am 09:38 AM

Zenity is a tool that allows you to create graphical dialog boxes in Linux using the command line. It uses GTK , a toolkit for creating graphical user interfaces (GUIs), making it easy to add visual elements to your scripts. Zenity can be extremely u

See all articles