search

Home  >  Q&A  >  body text

How to improve python query speed?

I have recently been crawling for stock-related news. What I originally imagined was that when new news is released, the program will send the latest content to the mailbox via email.

So I want to save the news titles and content into the database. When the content is updated, compare the new content with the title list in the database to see if it already exists. If it already exists, then it will not be sent. If it does not, it will not be sent. , then send it to the mailbox.

But when the number increases, the list query speed will slow down. Is there any other method you can teach me?

过去多啦不再A梦过去多啦不再A梦2730 days ago768

reply all(2)I'll reply

  • 欧阳克

    欧阳克2017-06-12 09:21:34

    Deduplication of crawler tasks

    Save the captured link into a set and check whether the new link is in the set.

    reply
    0
  • 欧阳克

    欧阳克2017-06-12 09:21:34

    There are many ways to remove duplicates, such as the set or Bloom filter above, which can effectively use memory and improve efficiency

    reply
    0
  • Cancelreply