使用数据库MySQL(innodb)
相关表设计最大数据量:5000w
假定表news,拥有以下字段:
id int auto_increment,
title varchar(2000) not null default '' comment '标题',
score int not null default 0 comment '基础分值',
visited int not null default 0 comment '展示次数',
voted int not null default 0 comment '投票数',
//hot int not null default 0 comment '热度', 不再作为独立字段,而是在select的时候来计算
created int not null default 0 comment '创建Unix时间戳',
primary key(id),
key score(score),
key visted(visited),
key voted(voted),
key created(created)
其中,hot字段的值是结合visited、voted计算并随着时间衰减的,衰减频率最短为1小时(即间隔1小时就需要根据当前时间戳与created的差值做一次衰减)。数据主要根据hot字段的值来排序展示的,因此需要保证整表hot数据的更新。
另外,我还需要记录一个hot的历史最高值。(补充)hot值有最大限制,不能超过100。
难点:如果独立一个hot字段,数据量大了之后,衰减更新的效率将非常之低并影响系统整体性能;如果另外设一个完全不包含衰减的分数值字段score,需要展示的时候实时再实时计算hot的值又怕SQL执行效率太低。
目前后者的效率肯定更高,不知道各位大神有没有做过类似的需求,有没有更好的方案,求解!
确定在visit和vote操作时更新递增的score值,然后在SELECT列表的时候实时计算经过时间衰减的hot值,对于提取的数据先通过where条件缩小范围,结果进行一定时间的cache缓存。感谢大家!
巴扎黑2017-04-17 11:18:23
This design is a bit confusing. I am wondering why the attenuation operation is designed. If the time factor is T
, a similar formula can be designed
hot = f(visited) * f(voted) * T
Newer content will definitely have a larger T
, so even if their visited
and voted
are the same, its hot
code>The value will also be larger, if you don’t want the disturbance of the time factor to be so large
hot = f(visited) * f(voted) + f(T)
In this way, the hot value only needs to be calculated once and does not need to be updated in real time, because you have already designed the time factor into it
怪我咯2017-04-17 11:18:23
I will calculate the hot value in real time when displaying, so that it is not limited to one hour granularity and the sorting can change in more real time.
If the efficiency is too low, I cache the final result after taking all factors into account.
Also: I have a secret feeling that the person who wrote the question must have been deeply poisoned by the series of articles "Ranking Algorithm Based on User Voting".
黄舟2017-04-17 11:18:23
This is a very strange approach. This hot should not be put together with the original data. If you are more stupid, you can use a dedicated slave table to sort the sql every hour. If you are more thorough, you can write a separate module to update the hot value of each entry at any time. , stored in another place for reading.
大家讲道理2017-04-17 11:18:23
Sorry, I would like to ask you about the final implementation method of your ranking method. I have been troubled by similar problems recently. Thank you for your guidance. . . Thanks. .