I am making a website program, and the general requirements are as follows.
Users are divided into five levels, 1-5. The larger the number, the higher the authority.
I have a bunch of content. The higher the level, the more content is visible to users.
For example, there is content: A, B, C, D, E,
Visible to user group 1: A
Visible to user group 2: A, B
…………
User group 5 Visible: A, B, C, D, E
If you want to realize this function, how to build a database index better?
Previously, a friend told me to add a column "group" to the content (topic) table,
write the visible user levels 1-5, and then create a joint index of group_tid
.
Then query tid
<100 surrounding articles (for example, the current user group is 3). The statement is:
SELECT * FROM topic
WHERE group
>=3 AND tid
<100 LIMIT 10;
It can be actually found that this kind of index reads out all the data of group
>3 first, and then selects it. Inquire.
If there are 1 million pieces of data and 500,000 group
>3, the execution of this statement will filter out 500,000 types, which is extremely inefficient.
It seems that single-column indexes are only applicable to restrictions such as group
=*, not < or >.
So I would like to ask the experts here, have you ever had similar needs? How to correctly create an index or table?
Thank you very much!
Supplement 1:
In fact, changing the question is how to use < or > limits in two mysql indexes.
This is a logical problem. The current group_tid
index looks like the following picture after it is established:
Even if I limit the scope of group
, the following tid
are still arranged in order based on group
.
If I want to know the situation where group
>1 and tid
<6, I have to read out all group
2/3 first Filter again.
It seems that the only solution is to re-plan the table structure. Do you have similar experience?
Supplement 2:
I just received a helpful answer from an enthusiastic friend, saying that he had encountered this situation before.
The solution is to modify the publishing mechanism and publish qualified posts to each level.
For example, if the level of content A is 3, then three data rows must be created at the same time when posting: group
=1,tid
=Agroup
=2,tid
=Agroup
=3,tid
=A
In this way, when reading the content, directly request WHERE Group
=* can read out the content that meets the conditions.
But this method requires adding a large amount of related data, and may even cause duplication. Is there any other solution?
学习ing2017-06-28 09:24:55
Actually, your idea is already right.
Create an index on tid and divide the tables according to group.
If group >= 3 groups, dynamically combine sql in the program as follows:
select * from group3 where tid < 100
union all
select * from group4 where tid < 100
union all
select * from group5 where tid < 100
The above index is effective and the logic is available.
ringa_lee2017-06-28 09:24:55
First of all, let me explain that in Innodb, whether the index takes effect or not has nothing to do with your use of < or >. It does not mean that using = will definitely allow you to use indexes. When the performance of full table query is higher than that of index retrieval query, MySQL will intelligently abandon the index and choose full table query.
As shown in the picture:
Back to your question, if the range retrieved by an index, such as tid<100, is relatively small, the index can be used.
If the result sets of these two indexes are large, should you consider adding other filtering conditions, such as only searching for content in the past month based on the creation time.
Pagination issues can also be filtered again by primary key ID.
仅有的幸福2017-06-28 09:24:55
First of all, you need to understand the following points:
For a query on a table, only one index is used at most each time
For the joint index, the data is filtered from left to right, so if the first filter condition targets greater than or less than, the second filter condition will not have an exact index range in the entire optional area. Run through all the data filtered out by the first filter
The structure of the B-Tree index is similar to a tree structure, as shown in the figure below. The joint index is retrieved from left to right. The starting point is the process of searching branches from top to bottom in this structure
The index mechanism is simply to create a corresponding table from values to data items, so that you can quickly locate a certain value in a certain field to a certain row, eliminating the need to run the entire table to find the corresponding row, so compare Quick
Structure of B-Tree index:
Then back to your question, if you want to greatly improve efficiency, then the first step of joint indexing needs to significantly reduce the amount of data that can be used for subsequent screening, so if you want to check tid < 100
, Filtering with tid
first can significantly reduce subsequent B-Tree index branches, so if you want to use a joint index, it should be (tid, group)
.
怪我咯2017-06-28 09:24:55
The filtering performance of group conditions is very poor, and it makes little sense to create an index alone.
According to the scenario you describe, as long as the value of tid is not too large (on the order of thousands), it is enough to create an index for tid.
If you are still worried about the large amount of data filtered by tid conditions, you can create a combined index of tid and group.
黄舟2017-06-28 09:24:55
First of all, thank you very much for your attention and answers to my questions! !
After solving the problem, I have some thoughts on boxsnake’s suggestions, and I’ll post them here. group_tid
In addition to solving the problem of reading, this indexing method can also solve the paging problem.
For example, if the number of articles per page is 10 and the user level is 3, then when reading, it will be from group1, group2, and group3 respectively.
Press Scope tid
<100, take 10 articles each. Even if there are no qualified results in a certain group, the sum of several items can cover them all.
But if I use the index method tid_group
to read, if group<=3 is needed, I don’t know how many articles should be fetched.
For example, if you take 10 articles, tid90-tid99, if their groups are all 4, then you cannot get the values that meet the conditions.
And tid_group
must limit tid
before limiting group
, so it cannot be used.