传奇的软件工程师Frederick P. Brooks曾经说过:“给我看你的数据”。因此可见数据结构对于一个程序来说是多么的重要,如果你不了解程序中的数据结构,你根本就无法去理解整个程序
有哪些设计数据结构方面的小技巧?
怪我咯2017-04-24 09:13:24
Words written before: It is always after a disaster occurs that one remembers the importance of disaster recovery; it is always after one has suffered a loss that one remembers someone’s reminder
This article is mainly for mysql database
Design principles
1. Don’t do calculations in the database: CPU calculations must be moved to the business layer
2. Control the amount of data in a single table: the records in a single table are controlled at 10 million
3. Control the number of columns: the number of fields is controlled within 20
4. Balance paradigm and redundancy Yu: Sacrifice paradigm design and redundant data to improve efficiency
5. Reject 3B: Reject large SQL, large transactions, large batches
6. Use UTF8 for table character set
7. Use INNODB storage engine
Data table design
1. Use the most effective (smallest) data type as much as possible
tinyint(1Byte)
smallint(2Byte)
mediumint(3Byte)
int(4Byte)
bigint(8Byte)
bad case:int(1)/int(11)
2. Don’t store numbers as strings, convert characters into numbers, use int to store ip instead of char(15)
3. Use enum or set first, sex
enum ('F', 'M')
4, avoid Use NULL fields
NULL字段很难查询优化
NULL字段的索引需要额外空间
NULL字段的复合索引无效
bad case:`name` char(32) default null`age` int not null
good case:`age` int not null default 0
5. Use less text/blob. The performance of varchar will be much higher than that of text. If blob is really unavoidable, please split the table.
6. Do not store images in the database. 7. For MyISAM tables, if there are no variable-length columns (VARCHAR, TEXT or BLOB column), using a fixed-size record format. This is faster but unfortunately may waste some space. Even if you have used the CREATE option to make the VARCHAR column ROW_FORMAT=fixed, you can still be prompted to use fixed-length rows
8. Use sample character set, such as latin1. Try to use utf-8 as little as possible, because utf-8 takes up 3 times the space of latin1. You can use latin1 on fields that do not need to use utf-8, such as mail, url, etc. 9. Accuracy and space conversion. Floating point types generally use less space than DECIMAL types when storing data in the same numerical range. The FLOAT field uses 4 bytes to store data. The DOUBLE type requires 8 bytes and has higher accuracy and a larger numerical range. DECIMAL type data will be converted to DOUBLE type
10. The library name, table name, and field name must have a fixed naming length, within 12 characters. ;Library names, table names, and field names are not allowed to exceed 32 characters. The meaning of the name must be understood; MySQL reserved words are prohibited in library names, table names, and field names; temporary library and table names must be prefixed with tmp and suffixed with date; backup libraries and tables must be bak prefix, and suffixed with date
11. The physical length of the InnoDB table row record does not exceed 8KB. The default data page of InnoDB is 16KB. Based on the characteristics of B+Tree, a data page needs to store at least 2 records. Therefore, when the actual storage length of a large column exceeds 8KB (especially TEXT/BLOB columns), it will cause "page-overflow storage", similar to "row migration" in ORACLE. Therefore, if large columns must be used ( Especially TEXT/BLOB type) and if they are read and written frequently, it is best to split these columns into sub-tables and do not store them together with the main table. If they are not too frequent, you can consider keeping them in the main table. If Change the innodb
page
size option to 8KB, then the physical length of the row record is recommended not to exceed 4KB
Index class
1. Use indexes carefully and rationally
改善查询、减慢更新
索引一定不是越多越好(能不加就不加,要加的一定得加)
覆盖记录条数过多不适合建索引,例如“性别”
2. Character fields must build prefix indexes
3. Do not perform column operations in the index, bad case: select id where age +1 = 10;
4. It is recommended to use auto-increment columns for innodb primary keys
主键建立聚簇索引
主键不应该被修改
字符串不应该做主键
如果不指定主键,innodb会使用唯一且非空值索引代替
5. Do not use foreign keys, please ensure the constraints by the program.
6. Avoid creating indexes on prefixes that already have indexes. For example: If index (a, b) exists, remove index (a)
7. Control the length of a single index. Use key (name (8)) to create an index on the first few characters of the data. 8. Use indexes selectively. It is not very good to use indexes on columns that change rarely, such as gender columns.
9. Optimize table can compress and sort the index. Be careful not to run it frequently.
10. Analyze table can update data. 11. Index selectivity is not repeated. The index value is also called the ratio of the number of data rows in the cardinality table. Index selectivity = cardinality/data rows. count(distinct(username))/count(*) is the index selectivity. The benefit of high index selectivity is mysql When searching for matches, you can filter more rows. The unique index has the best selectivity, with a value of 1
12. Do not use duplicate or redundant indexes. For the index of the INNODB engine, the primary key index must be indexed every time the data is modified. The corresponding index value in the auxiliary index is modified, which may cause a large amount of data migration, paging, and fragmentation. 13. For string columns with a length of more than 20, it is best to create a prefix index rather than a full column index (for example: ALTER TABLE t1 ADD INDEX(user(20))), which can effectively improve index utilization, but its disadvantage is that the prefix index is not used when sorting this column. The length of the prefix index can be determined based on the statistics of the field. Generally, it is slightly larger than the average length. 14. Regularly use the pt-duplicate-key-checker tool to check and delete duplicate indexes. For example, if the index idx1(a, b) already covers index idx2(a), you can delete the idx2 index
sql statement design class
1. The sql statement is as simple as possible. One sql can only be operated on one CPU. Split large statements into small statements to reduce lock time. One large sql can block the entire library (make full use of QUERY CACHE and make full use of multi-core CPU)
3. Avoid using trig/func, triggers and functions are not used, replace them with client programs
4. No need for select *, which consumes cpu, io, memory, and bandwidth , this kind of program is not scalable5. OR is rewritten as IN()
or的效率是n级别
in的消息时log(n)级别
in的个数建议控制在200以内
select id from t where phone=’159′ or phone=’136′ =>select id from t where phone in (’159′, ’136′);
select id from t where phone = '159' or name = 'john';
=>
select id from t where phone='159' union select id from t where name='jonh';
7. Avoid negative %, such as not in/like
8. Use count(*) with caution9. Limit efficient paging
The larger the limit, the lower the efficiency
select id from t limit 10000, 10;
=>
select id from t where id > 10000 limit 10;
12. Use group by, grouping, and automatic sorting
13. Please use same type comparison14. Use load data to import data, load data is about 20 times faster than insert;
15. Updates to data should be broken up and updated in batches. Do not update too much data at one time
16. Use performance analysis tools
Sql explain / showprofile / mysqlsla
show profile;
mysqlsla;
mysqldumpslow;
explain;
show slow log;
show processlist;
show query_response_time(percona)
optimize When data is inserted, updated, and deleted, some data migration and paging are inevitable, and then some fragments will appear. Over time, fragments accumulate and affect performance. This requires the DBA to regularly optimize the database to reduce fragments, which is done through the optimize command. For example, operating on MyISAM table: optimize table table name
18. It is forbidden to run large queries in the database
19. Use precompiled statements and only pass parameters, which is more efficient than passing SQL statements; parse once and use multiple times; reduce the probability of SQL injection
20. It is forbidden to use order by rand ()
21. A single SQL statement is prohibited from updating multiple tables at the same time
22. Avoid mathematical operations in the database (MySQL is not good at mathematical operations and logical judgment)
23. SQL statements require all R&D, and all SQL keywords are Capital letters, only one space is allowed for each word
24. If you can use NOT IN, you can use NOTIN. There are too many pitfalls. . Will check out empty and NULL
Attention
1. Even if it is conditional filtering based on indexes, if the optimizer realizes that the total amount of data that needs to be scanned exceeds 30% (it seems to be 20% in ORACLE, MySQL is currently 30%, it may be adjusted in the future), it will change directly. The execution plan is a full table scan, and indexes are no longer used. 2. When joining multiple tables, select the table with the greatest filterability (not necessarily the smallest amount of data, but the one with the greatest filterability after adding the WHERE condition). for the driver table. In addition, if there is sorting after JOIN, the sorting field must belong to the driver table, so that the index on the driver table can be used to complete the sorting. 3. In most cases, the cost of sorting is usually higher, so if you see the execution plan There is Using filesort, give priority to creating a sorting index
4. Use pt-query-digest to regularly analyze slow query log, and combine it with Box Anemometer to build a slow query log analysis and optimization system
迷茫2017-04-24 09:13:24
I saw something interesting today:
Which algorithms/data structures should I “recognize” and know by name?