一尘不染

MySQL性能优化:按日期时间字段排序

mysql

我有一个表,该表包含大约100.000个博客文章,并通过1:n关系链接到具有50个供稿的表。当我用select语句查询两个表时(按发布表的datetime字段排序),MySQL始终使用文件排序,导致查询时间非常慢(>
1秒)。这是postings表的架构(简化):

+---------------------+--------------+------+-----+---------+----------------+
| Field               | Type         | Null | Key | Default | Extra          |
+---------------------+--------------+------+-----+---------+----------------+
| id                  | int(11)      | NO   | PRI | NULL    | auto_increment |
| feed_id             | int(11)      | NO   | MUL | NULL    |                |
| crawl_date          | datetime     | NO   |     | NULL    |                |
| is_active           | tinyint(1)   | NO   | MUL | 0       |                |
| link                | varchar(255) | NO   | MUL | NULL    |                |
| author              | varchar(255) | NO   |     | NULL    |                |
| title               | varchar(255) | NO   |     | NULL    |                |
| excerpt             | text         | NO   |     | NULL    |                |
| long_excerpt        | text         | NO   |     | NULL    |                |
| user_offtopic_count | int(11)      | NO   | MUL | 0       |                |
+---------------------+--------------+------+-----+---------+----------------+

这是feed桌子:

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| type        | int(11)      | NO   | MUL | 0       |                |
| title       | varchar(255) | NO   |     | NULL    |                |
| website     | varchar(255) | NO   |     | NULL    |                |
| url         | varchar(255) | NO   |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

这是执行时间超过1秒的查询。请注意,该post_date字段具有索引,但MySQL并未使用它对发布表进行排序:

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    (`postings`)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

explain extended查询命令的结果表明MySQL正在使用文件排序:

+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| id | select_type | table    | type   | possible_keys                         | key       | key_len | ref                      | rows  | Extra                       |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
|  1 | SIMPLE      | postings | ref    | feed_id,is_active,user_offtopic_count | is_active | 1       | const                    | 30996 | Using where; Using filesort |
|  1 | SIMPLE      | feeds    | eq_ref | PRIMARY,type                          | PRIMARY   | 4       | feedian.postings.feed_id |     1 | Using where                 |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+

当我删除该order by部分时,MySQL停止使用文件排序。如果您对如何优化此查询以使MySQL通过使用索引来排序和选择数据有任何想法,请告诉我。我已经尝试了一些方法,例如,根据一些博客文章的建议,在所有按位置/按字段排序的字段上创建了一个组合索引,但是这也不起作用。


阅读 2851

收藏
2020-05-17

共1个答案

一尘不染

postings (is_active, post_date)(按此顺序)上创建一个复合索引。

它将用于按进行过滤is_active和排序post_date

MySQL应该在中显示REF对此索引的访问方法EXPLAIN EXTENDED

请注意,您在上有一个RANGE过滤条件user_offtopic_count,这就是为什么在过滤和按其他字段排序时都不能在该字段上使用索引的原因。

根据您选择的程度user_offtopic_count(即,满足多少行user_offtopic_count < 10),创建索引user_offtopic_count并对post_dates进行排序可能会更有用。

为此,在上创建一个复合索引,postings (is_active, user_offtopic_count)并确保RANGE使用对该索引的访问方法。

哪个索引会更快取决于您的数据分布。创建两个索引,FORCE然后看看哪个更快:

CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count);
CREATE INDEX ix_active_date ON postings (is_active, post_date);

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    `postings` FORCE INDEX (ix_active_offtopic)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

/* This should show RANGE access with few rows and keep the FILESORT */

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    `postings` FORCE INDEX (ix_active_date)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

/* This should show REF access with lots of rows and no FILESORT */
2020-05-17