Home >Database >Mysql Tutorial >How to design primary key in MySQL

How to design primary key in MySQL

Guanhuiforward: 2020-05-23 11:21:363242browse

This article explains the primary key issue of Mysql, understands the knowledge related to Mysql primary key from the perspective of why, and extends to the primary key generation solution. No more fear of only knowing CRUD when asked about Mysql.

1. Why a primary key is needed

Data records must be unique (first normal form)

Data needs to be associated with join

The underlying index of the database is used to retrieve data.

The following is a lot of nonsense, you can skip directly to the next section.

“Information is what is used to eliminate random uncertainty” (Shannon). People can understand and transform the world by obtaining and identifying different information from nature and society to distinguish between different things. Data is a record that reflects the attributes of objective things and is a specific manifestation of information. After data is processed, it becomes information; and information needs to be digitally transformed into data before it can be stored and transmitted. A database is used to store data records. That being the case, records are deterministic (relative) information, and their certainty is uniqueness. We come to the first reason:

1. Data records need to be unique

The world is composed of objective existence and its relationships. Data exists in digital and modeled relationships. In addition to the descriptive value of data itself, its value lies in its interconnectedness. In order to achieve the accuracy of the association, the data needs to be externally associated with each other. Therefore, reflected in data storage, the second role of the primary key is also the second factor of existence:

2. Data needs to be associated

Data is used to describe objective reality, and it is meaningless in itself. Only after organizing according to subjective needs and satisfying people in a certain way to understand things can it have meaning. So the data needs to be retrieved and organized. Then the third role of the primary key:

3. The underlying index of the database is used to retrieve data

2. Why the primary key should not be too long

This The problem is the length. So what are the advantages of being shorter than being longer? (Hey hey hey, connotation) - short and does not take up space. But such a small amount of disk space is insignificant compared to the entire data volume, and we generally don't use the primary key column very much. Then the reason should be the speed, and it has little to do with the original data. From this, it is naturally concluded that it is related to the index and related to index reading. So why do long primary keys affect performance in indexes?

The above is the index data structure of Innodb. On the left is the clustered index, which locates data records by primary key. On the right is the secondary index, which indexes the column data and searches for the primary key of the data through the column data. If the data is queried through the secondary index, the process is as shown in the figure. First, the primary key is searched from the secondary index tree, and then the data row is searched through the primary key on the clustered index. The leaf nodes of the secondary index are directly stored primary key values, not primary key pointers. Therefore, if the primary key is too long, the number of index records that can be stored in a secondary index tree will decrease. In this way, in the limited index buffer, the number of disk reads will increase, so the performance will decrease.

3. Why is it recommended to use auto-increment ID

InnoDB uses a clustered index, as shown in the figure above, the data record itself is stored in the main index (a B Tree ) on the leaf node. This requires that each data record in the same leaf node (the size of one memory page or disk page) is stored in primary key order, so whenever a new record is inserted, MySQL will insert it into the appropriate node based on its primary key. and position, if the page reaches the load factor (InnoDB default is 15/16), a new page (node) is opened.

If the table uses an auto-incrementing primary key, then each time a new record is inserted, the records will be sequentially added to the subsequent position of the current index node. When a page is full, a new page will be automatically opened. This results in a compact index structure that is filled approximately sequentially. Since there is no need to move existing data every time it is inserted, it is very efficient and does not add a lot of overhead to maintaining the index, as shown on the left side of the figure below. Otherwise, since the value of the primary key inserted each time is approximately random, each new record must be inserted somewhere in the middle of the existing index page, and MySQL has to move the data in order to insert the new record into the appropriate position, as shown on the right side of the figure below. As shown on the side, this causes a certain amount of overhead. Due to this, Mysql may need to frequently refresh the buffer to maintain the index, which increases the number of method disk IOs, and the index structure often needs to be reorganized.

4. Business Key VS Logical Key

Business Key, that is, using an id with business significance as the Key, such as using the order serial number as the primary key Key of the order table. Logical Key, that is, Key that has nothing to do with business, generates Key according to certain rules, such as auto-incrementing Key.

Advantages of business Key

Key has business significance and can be used directly as a search keyword when querying

No need for additional columns and indexes Space

can reduce some join operations.

Disadvantages of business Key

When the business changes, sometimes the primary key needs to be changed

It is more difficult to operate when multiple columns of Key are involved

Business Key is often longer and takes up more space, resulting in larger disk IO

Data cannot be persisted before the Key is determined. Sometimes when we have not determined the data Key, we want to add a record first and then update the business Key

Design a Key generation that is both easy to use and performant The solution is more difficult

Advantages of logical Key

Key logic will not need to be modified due to business changes

Simple operation and easy management

Logical Key is often smaller and has better performance

Logical Key is easier to ensure uniqueness

Easier to optimize

Logical Key Disadvantages

Querying primary key columns and primary key indexes requires additional disk space

Additional IO is required when inserting and updating data

More joins may

If there is no uniqueness policy restriction, duplicate Keys are prone to appear

The test environment and the official environment Key are inconsistent, which is not conducive to troubleshooting

The value of the Key is not associated with the data and does not conform to the three paradigms

Cannot be used to search for keywords

Depends on the specific implementation of different database systems, which is not conducive to the replacement of the underlying database

5. Primary key generation

Under normal circumstances, we all use Mysql's auto-increment ID as the primary key of the table. It is so simple, and from the above, the performance is also the best. However, in the case of sub-databases and sub-tables, auto-incrementing IDs cannot meet the needs. We can take a look at how different databases generate IDs, and also look at some distributed ID generation solutions. It is helpful for us to think about and even implement our own distributed ID generation service.

Implementation of database

Mysql auto-increment

Mysql maintains an auto-increment counter in memory, each time auto is accessed -increment counter, InnoDB will add a lock named AUTO-INC until the end of the statement (note that the lock is only held until the end of the statement, not the end of the transaction). The AUTO-INC lock is a special table-level lock used to improve concurrent insertability of columns containing auto_increment.

In a distributed situation, you can actually use a separate service and database for id generation, and still rely on Mysql's table id auto-increment capability to uniformly generate ids for third-party services. For performance reasons, different tables can be used for different businesses.

Mongodb ObjectId

To prevent primary key conflicts, Mongodb designs an ObjectId as the primary key id. It consists of a 12-byte hexadecimal number that contains the following parts:

Time: Timestamp. 4 bytes. Seconds.

Machine: Machine identification. 3 bytes. Generally, it is the hash value of the machine host name. This ensures that different hosts generate different machine hash values, ensuring that there is no conflict in the distribution and that the same machine has the same value.

PID: Process ID. 2 bytes. The Machine above is to ensure that the objectId generated on different machines does not conflict, and the pid is to ensure that the objectId generated by different mongodb processes on the same machine does not conflict.

INC: self-increasing counter. 3 bytes. The first nine bytes ensure that objectIds generated by different processes on different machines within one second do not conflict. The self-increasing counter is used to ensure that objectIds generated within the same second will not conflict. Allowing 256 raised to the third power is equal to 16777216 entries. Record uniqueness.

Cassandra TimeUUID

Cassandra uses the following rules to generate a unique id: time MAC sequence

Scheme

Zookeeper auto-increment: Achieved through zk's auto-increment mechanism.

Redis self-increment: realized through Redis’s self-increment mechanism.

UUID: Use UUID string as Key.

snowflake algorithm: similar to Mongodb's implementation, 1 sign bit, 41-bit timestamp (millisecond level), 10-bit data machine bits, 12-bit sequence within milliseconds.

Open source implementation

Baidu UidGenerator: based on snowflake algorithm.

Meituan Leaf: It also implements mechanisms based on Mysql auto-increment (optimization) and snowflake algorithm.

The above is the detailed content of How to design primary key in MySQL. For more information, please follow other related articles on the PHP Chinese website!

mysql 分布式 auto 字符串指针数据结构并发算法 redis mongodb zookeeper 数据库

Statement：

This article is reproduced at:juejin.im. If there is any infringement, please contact admin@php.cn delete

Previous article：How MySQL uses sharding to solve the storage problem of 50 billion dataNext article：How MySQL uses sharding to solve the storage problem of 50 billion data

See more

How to design primary key in MySQL

Related articles