Home >Database >Mysql Tutorial >Should Higher Cardinality Columns Come First in Multi-Column Indexes with Range Queries?

Should Higher Cardinality Columns Come First in Multi-Column Indexes with Range Queries?

Patricia Arquette
Patricia ArquetteOriginal
2024-12-02 11:34:13746browse

Should Higher Cardinality Columns Come First in Multi-Column Indexes with Range Queries?

Indexing with Higher Cardinality Columns First When Involving a Range

Consider the following table:

CREATE TABLE `files` (
  `did` int(10) unsigned NOT NULL DEFAULT '0',
  `filename` varbinary(200) NOT NULL,
  `ext` varbinary(5) DEFAULT NULL,
  `fsize` double DEFAULT NULL,
  `filetime` datetime DEFAULT NULL,
  PRIMARY KEY (`did`,`filename`),
  KEY `fe` (`filetime`,`ext`),          -- This?
  KEY `ef` (`ext`,`filetime`)           -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

With a million rows and a high cardinality for filetime and a lower cardinality for ext, the question arises as to which index is more advantageous: fe or ef.

Analysis with Force Index and EXPLAIN

Using FORCE INDEX to test both indexes reveals a clear difference in performance:

-- Forcing the range on filetime first
mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(fe)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
-- Forcing the low-cardinality ext first
mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(ef)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;

The EXPLAIN output indicates that ef is significantly faster, using less rows to retrieve the results.

Analysis with the Optimizer Trace

The Optimizer trace confirms the superiority of ef:

"potential_range_indices": [
    ...
    {
        "index": "fe",
        "usable": true,
        ...
    },
    {
        "index": "ef",
        "usable": true,
        ...
    }
],
"analyzing_range_alternatives": {
    "range_scan_alternatives": [
        {
            "index": "fe",
            "ranges": [
                "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
            ],
            "cost": 20022,   -- Higher cost
        },
        {
            "index": "ef",
            "ranges": [
                "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
            ],
            "cost": 646.61,  -- Lower cost
        }
    ],
}

The trace reveals that ef can use both columns of the index, resulting in a more efficient search. Additionally, it highlights that the Optimizer will only examine the first 'range' column, making the cardinality of ext irrelevant.

Conclusions

Based on the analysis, it is clear that when dealing with a range query involving multiple indexed columns, the order of the columns should be:

  • Put columns involved in equality tests first, regardless of cardinality.
  • Other columns involved in the range should be placed after the equality columns.

This approach ensures that the index is used most effectively, resulting in optimal query performance.

The above is the detailed content of Should Higher Cardinality Columns Come First in Multi-Column Indexes with Range Queries?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn