search

Home  >  Q&A  >  body text

MySQL 'IN' operator for handling large numbers

I'm observing strange behavior that I'm trying to understand.

MySQL version: 5.7.33 I have the following query:

select * from a_table where time>='2022-05-10' and guid in (102,512,11,35,623,6,21,673);

a_table has a primary key on time, guid, and an index on guid

The query I wrote above has very good performance, according to the explain plan it is using index condition;using location;using MRR

When I increase the number of values ​​in the in clause, the performance is affected significantly.

After some drills, I got a rough number. For values ​​less than ~14500, the interpretation scheme is the same as above. For quantities above this, only the explain plan uses where and takes forever to run my query.

In other words, for example, if I put 14,000 values ​​in the in clause, the explain plan would have the expected 14,000 rows. However, if I put 15,000 values ​​in the in clause, the explanation has 221200324 rows. I don't even have that many rows in my entire table.

I'm trying to understand this behavior and know if there is any way to fix this.

Thanks

P粉190443691P粉190443691390 days ago516

reply all(1)I'll reply

  • P粉041856955

    P粉0418569552023-12-21 00:01:22

    UnderstandLimit memory for range optimization.

    When IN() has a large number of values ​​in the predicate, it uses more memory during the query optimization step. This has been considered a problem in some cases, so recent versions of MySQL set a maximum memory limit (default is 8MB).

    If the optimizer finds that it requires more memory than the limit, and there are no other conditions in the query to optimize against, it will give up trying to optimize and resort to a table scan. I deduce that your table statistics actually show that the table has about 221 million rows (although the table statistics are an inaccurate estimate).

    I can't say I know the exact formula for how much memory a given list of values ​​requires, but based on the behavior you observed, we can guess that considering 14k items, each item averages about 600 bytes effectively, but more is invalid.

    You can set range_optimizer_max_mem_size = 0 to disable memory limits. This creates the risk of overcommitting memory, but it avoids the optimizer "giving up". We set this value on all MySQL instances in our previous job because we couldn't educate developers to avoid creating huge lists of values ​​in their queries.

    reply
    0
  • Cancelreply