search

Home  >  Q&A  >  body text

How to improve grouping speed

<p>I have a large table with over 3B rows and my query now takes 3 days to execute. </p> <p>This is my query: </p> <pre class="brush:php;toolbar:false;">insert into frequencywords (word, frequency, filename) select word, count(*) as frequency, max(filename) from allwordstemp group by word</pre> <p>Basically, my query is to group by word from allwordstemp table, and I want to know the <strong>filename</strong> when frequency = 1, that's why I added max(filename), Because filename needs to be included in the aggregate function, such as max. If frequency > 1, I don't need the value of filename either. Both tables have 2 indexes on word and filename. </p> <p>allwordstemp table (filename is an id file): </p> <pre class="brush:php;toolbar:false;">CREATE TABLE `allwordstemp` ( `word` varchar(45) DEFAULT NULL, `filename` int(11) DEFAULT NULL, KEY `idx_allwordstemp_word` (`word`), KEY `idx_allwordstemp_filename` (`filename`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;</pre> <p>frequencywords table:</p> <pre class="brush:php;toolbar:false;">CREATE TABLE `frequencywords` ( `word` varchar(45) DEFAULT NULL, `frequency` int(11) DEFAULT NULL, `filename` int(11) DEFAULT NULL, KEY `idx_frequencywords_word` (`word`), KEY `idx_frequencywords_frequency` (`frequency`), KEY `idx_frequencywords_filename` (`filename`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;</pre> <p>Explain selection:</p> <pre class="brush:php;toolbar:false;"> ---- ------------- -------------- -- ---------- ------- --------------- ------------------ ----- --------- ------ ------------ ---------- -------- ----- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | ---- ------------- -------------- ------------ ------- --------------- ----------------------- ---------- --- ------------------------------------- | 1 | SIMPLE | allwordstemp | NULL | index | NULL | idx_allwordstemp_word | 138 | NULL | 3487864881 | 100.00 | Using index | ---- ------------- -------------- ------------ ------- --------------- ----------------------- ---------- --- --- ---------------- ---------- ------------- </pre> <p>How can I make my query faster? </p>
P粉155832941P粉155832941505 days ago554

reply all(1)I'll reply

  • P粉741678385

    P粉7416783852023-09-06 00:45:26

    Your filter appears to be based on frequency, not words or filenames. So I would first index all the words by frequency.

    Then, assuming frequency is an integer, I would add a WHERE clause like this:

    select word, max(filename)
    from allwordstemp
    where frequency = 1
    group by word

    The above will give you a list of all words that appear only once in various file names.

    Hope this helps, and all the best!

    reply
    0
  • Cancelreply