search

Home  >  Q&A  >  body text

MongoDB + Redis 任务队列性能瓶颈

问题背景: 近期在重构公司内部一个重要的任务系统,由于原来的任务系统使用了MongoDB来保存任务,客户端从MongoDB来取,至于为什么用MongoDB,是一个历史问题,也是因为如果使用到MongoDB的数组查询可以减少任务数量很多次,假设这样的情况,一个md5需要针对N种情况做任务处理,如果用到MongoDB的数组,只需要将一个md5作为一条任务,其中包含一个长度为N的待处理任务列表(只有N个子任务都处理完后整个任务才算处理完毕),这样整个任务系统的数量级就变为原来的 1/N。

细节描述: 1.当MongoDB的任务数量增多的时候,数组查询相当的慢,任务数达到5K就已经不能容忍了。 2.任务处理每个md5对应的N个子任务必须要全部完成才从MongoDB中删除 3.任务在超时后可以重置

改进方案如下: 由于原有代码的耦合,不能完全抛弃MongoDB,所以决定加一个Redis缓存。一个md5对应的N个子任务分发到N个Redis队列中(拆分子任务)。一个单独的进程从MongoDB中向Redis中将任务同步,客户端不再从MongoDB取任务。这样做的好处是抛弃了原有的MongoDB的数组查询,同步进程从MongoDB中取任务是按照任务的优先级偏移(已做索引)来取,所以速度比数组查询要快。这样客户端向Redis的N个队列中取子任务,把任务结果返回原来的MongoDB任务记录中(根据md5返回子任务)。

改进过程遇到的问题: 由于客户端向MongoDB返回时候会有一个update操作,如果N个子任务都完成,就将任务从MongoDB中删除。这样的一个问题就是,经过测试后发现MongoDB在高并发写的情况下性能很低下,整个任务系统任务处理速度最大为200/s(16核, 16G, CentOS, 内核2.6.32-358.6.3.el6.x86_64),原因大致为在频繁写情况下,MongoDB的性能会由于锁表操作急剧下降。

具体问题: (Think out of the Box)能否提出一个好的解决方案,能够保存任务状态(子任务状态),速度至少超过MongoDB的?

迷茫迷茫2799 days ago781

reply all(3)I'll reply

  • 迷茫

    迷茫2017-04-22 08:58:16

    After some preliminary thinking, just for reference:

    1. First of all, let’s mention the index. I believe you should add an index to this.
    2. I have a question to confirm. The lock granularity in the latest version of mongodb is still at the database level. I don’t know which version you are using. It has not yet reached the lock table (Collection) granularity, so it is worse when the write concurrency is large, but it should be The performance isn't as bad as you described? I don’t understand. I suggest you consider the possibility of task sub-library?
    3. Can you consider saving the status of subtasks and the status of main tasks separately? The status of subtasks can be placed in redis, and the main task is only responsible for its own status. In this way, the update frequency of each main task is reduced to 1/N, which can greatly reduce the pressure on the main task table in mongodb.
    4. After the subtask is completed or times out, can we consider background asynchronous single-thread sequential synchronization of the main task status of mongodb?

    reply
    0
  • 阿神

    阿神2017-04-22 08:58:16

    Personally, I think the performance issues of MongoDB array query and update mentioned by the questioner are likely to be issues with Schema design. But the questioner did not give a specific design, so I will put forward a few points worth paying attention to for reference only:

    1. Index, as mentioned above, you should have indexed the array. However, it is worth noting that the index of an array field is much larger than the index of an ordinary field (depending on the size of the array, the larger the array, the larger the space occupied by the index). This may cause a problem: the index is not (completely) in memory! The consequence is that each query requires additional IO operations, and performance will drop sharply.
    2. The query returns the size of the document. If the amount of document data returned for each query is large, and the client and mongodb are not on the same machine, it will increase the time required for network transmission (don’t underestimate this time), so try to only return all required fields.
    3. update-in-place. Due to the schemaless feature, mongodb will reserve some space for each document record for use when adding additional fields or data, improving update performance. But if the size of your document frequently expands (adding fields, increasing array length, etc.), it will cause write performance problems: MongoDB needs to move the growing document to another place. (Equivalent to moving from one location on the hard disk to another more free location) The performance at this time will be greatly reduced.

    Mongodb is an in-memory database. If all your hotspot data is in memory, its performance will be very excellent, and this largely depends on your Schema design.

    PS: The Schemaless advantages that mongodb has always touted have misled many people. In fact, this is more to show that mongodb is a dynamic schema, rather than that it does not need to design a schema.

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-22 08:58:16

    You can consider rabbitmq for task queue. In addition, mongodb shouldn’t be so slow, right? No indexing? Or try capped collection.

    reply
    0
  • Cancelreply