Home >System Tutorial >LINUX >How to design a MySQL high-performance table

How to design a MySQL high-performance table

王林
王林forward
2024-01-07 23:54:071239browse

Good logical design and physical design are the cornerstones of high performance. The schema should be designed according to the query statements that the system will execute, which often requires weighing various factors.

How to design a MySQL high-performance table

1. Select the optimized data type

MySQL supports many data types. Choosing the correct data type is crucial to achieving high performance.

Smaller is usually better

Smaller data types are generally faster because they occupy less disk, memory, and CPU cache, and require fewer CPU cycles to process.

Just keep it simple

Operations on simple data types generally require fewer CPU cycles. For example, integer operations are cheaper than character operations because the character set and collation rules (collation) make character comparisons more complex than integer comparisons.

Try to avoid NULL

If the query contains NULL columns, it is more difficult for MySQL to optimize, because NULL columns make indexes, index statistics, and value comparisons more complex. Columns that can be NULL use more storage space and require special handling in MySQL. When NULLable columns are indexed, each index record requires an extra byte, which in MyISAM can even cause a fixed-size index (such as an index with only one integer column) to become a variable-size index.

Of course there are exceptions. For example, InnoDB uses a separate bit to store NULL values, so it has good space efficiency for sparse data.

1.Integer type

There are two types of numbers: whole numbers and real numbers. If you store integers, you can use these integer types: TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT. Use 8, 16, 24, 32, and 64-bit storage space respectively.

The integer type has the optional **UNSIGNED ** attribute, which means that negative values ​​are not allowed, which roughly doubles the upper limit of positive numbers. For example, TINYINT.UNSIGNED can store the range 0 - 255, while the storage range of TINYINT is -128 -127.

Signed and unsigned types use the same storage space and have the same performance, so you can choose the appropriate type according to the actual situation.

Your choice determines how MySQL saves data in memory and disk. However, integer calculations generally use 64-bit BIGINT integers, even in a 32-bit environment. (The exception is some aggregate functions, which use DECIMAL or DOUBLE for calculations).

MySQL can specify the width for integer types, such as INT(11), which is meaningless for most applications: it does not limit the legal range of values, but only specifies some interactive tools of MySQL (such as the MySQL command line client end) is used to display the number of characters. For storage and calculation purposes, INT(1) and INT(20) are the same.

2.Real number type

Real numbers are numbers with a decimal part. However, they are not just for storing decimal parts, DECIMAL can also be used to store integers larger than BIGINT.

FLOAT and DOUBLE types support approximate calculations using standard floating point operations.

DECIMAL type is used to store precise decimals.

Both floating point and DECIMAL types can specify precision. For DECIMAL columns, you can specify the maximum number of digits allowed before and after the decimal point. This affects the column's space consumption.

There are many ways to specify the precision required for floating-point columns, which will cause MySQL to choose a different data type, or to round off the value when storing. These precision definitions are non-standard, so we recommend specifying only the data type and not the precision.

Floating point types usually use less space than DECIMAL when storing values ​​in the same range. FLOAT uses 4 bytes of storage. DOUBLE occupies 8 bytes and has higher precision and larger range than FLOAT. As with integer types, all you can choose is the storage type; MySQL uses DOUBLE as the type for internal floating point calculations.

Because of the additional space and computational overhead required, you should try to use DECIMAL only when performing precise calculations on decimals. But when the data is relatively large, you can consider using BIGINT instead of DECIMAL. Just multiply the currency unit to be stored by the corresponding multiple according to the number of decimal places.

3. String type

VARCHAR

  • Used to store variable length strings, the length is supported to 65535
  • Need to use 1 or 2 extra bytes to record the length of the string
  • Suitable for: the maximum length of the string is much larger than the average length; updates are rare

CHAR

  • Fixed length, the length range is 1~255
  • Suitable for: storing very short strings, or all values ​​close to the same length; frequently changed

Generosity is unwise

The space overhead of using VARCHAR(5) and VARCHAR(200) to store 'hello' is the same. So are there any advantages to using shorter columns?

It turns out there are great advantages. Longer columns consume more memory because MySQL typically allocates fixed-size blocks of memory to hold internal values. This is especially bad when using in-memory temporary tables for sorting or operations. It's equally bad when sorting using disk temporary tables.

So the best strategy is to allocate only the space you really need.

4.BLOB and TEXT types

BLOB and TEXT are both string data types designed to store large amounts of data, and are stored in binary and character modes respectively.

Unlike other types, MySQL treats each BLOB and TEXT value as an independent object. Storage engines usually do special processing when storing. When the BLOB and TEXT values ​​are too large, InnoDB will use a dedicated "external" storage area for storage. At this time, each value requires 1 - 4 bytes to be stored in the row. The storage area stores the actual value.

The only difference between BLOB and TEXT is that the BLOB type stores binary data without collation rules or character sets, while the TEXT type has character sets and collation rules

5. Date and time types

Most of the time there are no alternatives to the type, so there is no question of what is the best choice. The only problem is what needs to be done when saving the date and time. MySQL provides two similar date types: DATE TIME and TIMESTAMP.

But currently we recommend the method of storing timestamps, so there will no longer be too much explanation on DATE TIME and TIMESTAMP here.

6.Other types

6.1 Select identifier

The smallest data type should be selected on the premise that it can meet the needs of the value range and reserve room for future growth.

  • Integer type

Integers are usually the best choice for identity columns because they are fast and can use AUTO_INCREMENT.

  • ENUM and SET types

The EMUM and SET types are generally a poor choice for identity columns, although they may be fine for some static "definition tables" that only contain fixed states or types. ENUM and SET columns are suitable for storing fixed information, such as ordered status, product type, and person's gender.

  • String type

If possible, you should avoid using string types as identity columns because they are space-consuming and generally slower than numeric types.

You also need to pay more attention to completely "random" strings, such as strings generated by MDS(), SHAl() or UUID(). The new values ​​generated by these functions are arbitrarily distributed over a large space, which can cause INSERT and some SELECT statements to be very slow. If UUID values ​​are stored, the "-" sign should be removed.

6.2 Special type data

Some types of data wells are not directly consistent with built-in types. Timestamps with low kilosecond precision are one example. Another example is a 1Pv4 address. People often use VARCHAR(15) columns to store IP addresses. However, they are actually 32-bit unsigned integers, not strings. The representation of the address divided into four segments using decimal points is just to make it easier for people to read. So IP addresses should be stored as unsigned integers. MySQL provides INET_ATON() and INET_NTOA() functions to convert between these two representation methods.

2. Table structure design

1. Paradigm and anti-paradigm

There are usually many ways to represent any given data, from fully normalized to fully denormalized, and a compromise between the two. In a normalized database, each fact appears exactly once. In contrast, in a denormalized database, information is redundant and may be stored in multiple places.

Advantages and Disadvantages of Paradigms

When considering performance improvement, it is often recommended to normalize the schema, especially in write-intensive scenarios.

  • Normalized update operations are usually faster than denormalized.
  • When data is well normalized, there is little or no duplicate data, so less data needs to be modified.
  • Normalized tables are usually smaller and fit better in memory, so operations will be performed faster.
  • Less redundant data means fewer DISTINCT or GROUP BY statements are needed to retrieve list data.

Advantages and disadvantages of anti-paradigm

Without the need for related tables, the worst case for most queries—even if the table does not use an index—is a full table scan. This can be much faster than associative when the data is larger than memory because random I/0 is avoided.

Individual tables can also use more efficient indexing strategies.

Mixing normalization and denormalization

In practical applications, it is often necessary to mix them, and partially normalized schemas, cache tables, and other techniques may be used.

Add redundant fields appropriately to the table, such as performance priority, but it will increase complexity. Table related queries can be avoided.

Simple and familiar database paradigm

<br> First normal form (1NF): Field values ​​are atomic and cannot be divided (all relational database systems satisfy the first normal form);<br> For example: name field, where the surname and the first name are a whole. If the surname and the first name are distinguished, two independent fields must be set up;

Second Normal Form (2NF): A table must have a primary key, that is, each row of data can be uniquely distinguished;
Note: The first normal form must be satisfied first;

Third Normal Form (3NF): A table cannot contain information about non-key fields in other related tables, that is, the data table cannot have redundant fields;
Note: The second normal form must be satisfied first;

2. Table fields are less precise

  • I/O efficient
  • Separate fields are easy to maintain
  • Single table 1G volume 500W evaluation
  • A single line shall not exceed 200Byte
  • No more than 50 INT fields in a single table
  • No more than 20 CHAR(10) fields in a single table
  • It is recommended that the number of fields in a single table be controlled within 20
  • Split TEXT/BLOB, the TEXT type processing performance is much lower than VARCHAR, forcing the generation of hard disk temporary tables wastes more space.

The above is the detailed content of How to design a MySQL high-performance table. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:linuxprobe.com. If there is any infringement, please contact admin@php.cn delete