Home  >  Q&A  >  body text

python - mongodb deduplication

1. The function needs to determine whether it is duplicated based on a field, such as ownerId, and discard whatever exists. So I used the distinct method, but as the amount of data gradually increased during this period, a question:

OperationFailure: distinct too big, 16mb cap

This is an error in the command line. It says that the result returned by distinct is too large, exceeding 16M. Can any experts contribute some methods?

Requirements: Check whether the ownerId field appears in the database every time. The database is constantly being updated, and the non-existent ownerId will be stored in the database after being processed for a while, so it is required to judge every time. We need to re-check the owner field in the database. Therefore, the speed requirements are relatively high.
Please take a look and give us your opinion.

高洛峰高洛峰2682 days ago734

reply all(1)I'll reply

  • 迷茫

    迷茫2017-05-17 10:04:13

    If I understand correctly, have you considered Unique Indexes?

    For reference.

    Love MongoDB! Have fun!

    reply
    0
  • Cancelreply