Database primary key ID generation strategy-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Database primary key ID generation strategy

藏色散人

Aug 15, 2019 pm 02:21 PM

database

Foreword: 　

Unique system ID is a problem we often encounter when designing a system. Here are some common ID generation strategies.

● Sequence ID

● UUID

● GUID

● COMB

● Snowflake

First Auto-increment ID In order to meet the needs of separate databases, different starting points will be used under the premise of auto-increment. However, it is extremely troublesome when database expansion is required. For example, when we first design the database of a certain system, there will be 10 tables in the database. Then we need different IDs for the content of each table. We can use different non-increasing forms, for example, The first table is 1, 11, 21, 31. . . The second table is 2, 12, 22, 32. . . The third table is 3, 13, 23, 33. . . The tenth table is 10, 20, 30. . . But the problem is, if one day I find that the 10 tables in this system are no longer enough, and I want to add another table, how should the primary keys be allocated at this time? In addition, if you want to merge data from multiple databases, but for this simple method of generating IDs, the possibility of duplication is very high, so duplication will almost certainly occur. Obviously, the scalability of the previous method will be poor.

Compared with auto-incrementing IDs, UUIDs are more convenient to generate unique primary keys (when the amount of data is very large, there is a possibility of duplication), but due to the disordered nature of UUIDs, the performance is not as good as auto-incrementing IDs and string storage. , large storage space and low query efficiency. Key: The disadvantage of using uuid is low query efficiency!

COMB compared to UUID, increases the orderliness of generated IDs, and improves the efficiency of insertion and query. This article has a simple analysis.

Sonwflake is Twitter’s primary key generation strategy, which can be seen as an improvement of COMB, using a 64-bit long integer instead of a 128-bit string. The composition of the ID: the first 0, a 41-bit time prefix, a 10-bit node identifier, a 12-bit sequence number to avoid concurrency.

Part 1: Sequence ID

The most common way is for the database to grow its own sequence or field. It is maintained by the database and is unique to the database.

Advantages:

Simple, convenient code, acceptable performance.

Number ID is naturally sorted, which is very helpful for paging or results that need to be sorted.

Disadvantages:

Different databases have different syntax and implementation, which needs to be processed during database migration or when supporting multiple database versions.

In the case of a single database or read-write separation or one master and multiple slaves, only one master database can be generated. There is a risk of a single point of failure.

It is difficult to expand when the performance cannot meet the requirements.

If you encounter multiple systems that need to be merged or data migration is involved, it will be quite painful.

There will be trouble when dividing tables and databases.

Optimization plan:

For the single point of the main library, if there are multiple Master libraries, the starting number set for each Master library is different and the step size is the same , which can be the number of Masters.

For example: Master1 generates 1, 4, 7, 10, Master2 generates 2,5,8,11, and Master3 generates 3,6,9,12. This effectively generates unique IDs in the cluster and greatly reduces the load on the ID generation database operations.

Part 2: UUID

npm Management https://www.npmjs.com/package/uuid

Common ways , 128 bits. It can be generated using a database or a program, and is generally unique globally.

UUID is a 128-bit globally unique identifier, usually represented by a 32-byte string. It can ensure the uniqueness of time and space, also called GUID, the full name is: UUID - Universally Unique IDentifier, called UUID in Python.

It ensures the uniqueness of the generated ID through MAC address, timestamp, namespace, random number, and pseudo-random number.

UUID mainly has five algorithms, that is, five methods to implement it.

(1), uuid1()

——Based on timestamp. Generated from MAC address, current timestamp, and random number. Global uniqueness can be guaranteed, but the use of MAC also brings security issues. IP can be used instead of MAC in the local area network.

(2), uuid2()

Based on the distributed computing environment DCE (this function does not exist in Python). The algorithm is the same as uuid1, except that the first 4 positions of the timestamp are replaced with POSIX UID. This method is rarely used in practice.

(3), uuid3()

Name-based MD5 hash value. It is obtained by calculating the MD5 hash value of the name and namespace, ensuring the uniqueness of different names in the same namespace and the uniqueness of different namespaces, but the same name in the same namespace generates the same uuid.

(4), uuid4()

Based on random numbers. Obtained from pseudo-random numbers, there is a certain probability of repetition, and this probability can be calculated.

(5), uuid5()

Name-based SHA-1 hash value. The algorithm is the same as uuid3, except that the Secure Hash Algorithm 1 algorithm is used.

Advantages:

Simple and convenient code.

The only one in the world, it can handle it calmly when encountering data migration, system data merger, or database changes.

Disadvantages:

There is no sorting, and the trend cannot be guaranteed to increase.

UUID is often stored using strings, and the query efficiency is relatively low.

The storage space is relatively large. If it is a massive database, you need to consider the storage amount.

The amount of transmitted data is large

Unreadable.

Optimization solution:

In order to solve the problem that the UUID is unreadable, you can use the UUID to Int64 method.

Part 3: GUID

GUID: It is Microsoft’s implementation of the UUID standard. There are various other implementations of UUID, not just GUID. The advantages and disadvantages are the same as UUID.

Part 4: COMB

COMB (combine) type is a design idea unique to the database and can be understood as an improved GUID, which combines GUID and system time to provide better performance in indexing and retrieval.

There is no COMB type in the database, it was designed by Jimmy Nilsson in his article "The Cost of GUIDs as Primary Keys". \

The basic design idea of the COMB data type is as follows: Since the UniqueIdentifier data has low index efficiency due to its irregularity, which affects the performance of the system, can we retain the uniqueness of the UniqueIdentifier through combination? The first 10 bytes are used, and the last 6 bytes are used to represent the time when the GUID was generated (DateTime). In this way, we combine the time information with the UniqueIdentifier, which increases the orderliness while retaining the uniqueness of the UniqueIdentifier, thereby improving the index. efficiency.

Advantages:

Solve the problem of UUID disorder and provide Comb algorithm (combined guid/timestamp) in its primary key generation method. Reserve 10 bytes of the GUID and use the other 6 bytes to represent the time when the GUID was generated (DateTime).

Performance is better than UUID.

Part 5: Twitter’s snowflake algorithm

Snowflake is Twitter’s open source distributed ID generation algorithm, and the result is a long ID. The core idea is: use 41 bits as the number of milliseconds, 10 bits as the machine ID (5 bits are the data center, 5 bits the machine ID), and 12 bits as the serial number within milliseconds (meaning that each node can generate 4096 IDs), and there is a sign bit at the end, which is always 0. The snowflake algorithm can be modified according to the needs of your own project. For example, estimate the number of future data centers, the number of machines in each data center, and the number of possible concurrencies in a unified millisecond to adjust the number of bits required in the algorithm.

Advantages:

Does not depend on the database, is flexible and convenient, and has better performance than the database.

ID is incremented on a single machine according to time.

Disadvantages:

is incremental on a single machine, but due to the distributed environment, the clocks on each machine cannot be completely synchronized, and maybe sometimes There may be situations where global increment is not achieved.

6. Use

This is really convenient to use:

npm install uuid --save

Then you can use it!

  const uuidv1 = require(‘uuid/v1‘);
  console.log(‘随机uuid字符串‘, uuidv1());

In this way, we can print out the uuid string. It's different every time.

The above is the detailed content of Database primary key ID generation strategy. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:learnku. If there is any infringement, please contact admin@php.cn delete

深入理解MySQL索引优化器工作原理Nov 09, 2022 pm 02:05 PM

本篇文章给大家带来了关于mysql的相关知识，其中主要介绍了关于索引优化器工作原理的相关内容，其中包括了MySQL Server的组成，MySQL优化器选择索引额原理以及SQL成本分析，最后通过 select 查询总结整个查询过程，下面一起来看一下，希望对大家有帮助。

sybase是什么数据库Sep 22, 2021 am 11:39 AM

sybase是基于客户/服务器体系结构的数据库，是一个开放的、高性能的、可编程的数据库，可使用事件驱动的触发器、多线索化等来提高性能。

visual foxpro数据库文件是什么Jul 23, 2021 pm 04:53 PM

visual foxpro数据库文件是管理数据库对象的系统文件。在VFP中，用户数据是存放在“.DBF”表文件中；VFP的数据库文件（“.DBC”）中不存放用户数据，它只起将属于某一数据库的数据库表与视图、连接、存储过程等关联起来的作用。

数据库系统的构成包括哪些Jul 15, 2022 am 11:58 AM

数据库系统由4个部分构成：1、数据库，是指长期存储在计算机内的，有组织，可共享的数据的集合；2、硬件，是指构成计算机系统的各种物理设备，包括存储所需的外部设备；3、软件，包括操作系统、数据库管理系统及应用程序；4、人员，包括系统分析员和数据库设计人员、应用程序员（负责编写使用数据库的应用程序）、最终用户（利用接口或查询语言访问数据库）、数据库管理员（负责数据库的总体信息控制）。

microsoft sql server是什么软件Feb 28, 2023 pm 03:00 PM

microsoft sql server是Microsoft公司推出的关系型数据库管理系统，是一个全面的数据库平台，使用集成的商业智能（BI）工具提供了企业级的数据管理，具有使用方便可伸缩性好与相关软件集成程度高等优点。SQL Server数据库引擎为关系型数据和结构化数据提供了更安全可靠的存储功能，使用户可以构建和管理用于业务的高可用和高性能的数据应用程序。

数据库的什么是指数据的正确性和相容性Jul 04, 2022 pm 04:59 PM

数据库的“完整性”是指数据的正确性和相容性。完整性是指数据库中数据在逻辑上的一致性、正确性、有效性和相容性。完整性对于数据库系统的重要性：1、数据库完整性约束能够防止合法用户使用数据库时向数据库中添加不合语义的数据；2、合理的数据库完整性设计，能够同时兼顾数据库的完整性和系统的效能；3、完善的数据库完整性有助于尽早发现应用软件的错误。