900字范文,内容丰富有趣,生活中的好帮手!
900字范文 > mysql 亿级_mysql 亿级数据量 ( sum group by )的优化

mysql 亿级_mysql 亿级数据量 ( sum group by )的优化

时间:2022-07-04 16:49:01

相关推荐

mysql 亿级_mysql 亿级数据量 ( sum  group by )的优化

今天开发提出需求,让统计数据,一询问才得知表中的数据量已达亿级以上。具体的sql如下:

SELECT id_province_code,gender,age,COUNT(1),SUM(zy_days),SUM(zf),SUM(ybnje)FROM medicare WHERE zy_enter_date BETWEEN '-01-01 00:00:00' AND '-12-31 12:59:59' GROUP BY id_province_code,age,gender;

然后查看该sql的执行计划

mysql> explain SELECT id_province_code,gender,age,COUNT(1),SUM(zy_days),SUM(zf),SUM(ybnje) FROM medicare WHERE zy_enter_date BETWEEN '-01-01 00:00:00' AND '-12-31 12:59:59' GROUP BYid_province_code,age,gender;+----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+--------------------------------------------------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+--------------------------------------------------------+

| 1 | SIMPLE | medicare | NULL | range | idx_zy_enter_date,idx_province_age_gender | idx_zy_enter_date | 4 | NULL | 4836248 | 100.00 | Using index condition; Using temporary; Using filesort |

执行计划中“Extra”中竟然出现了“Using temporary; Using filesort ”,看到这种情况我们就得进行优化了,虽然“type”列出现了“range”。出现这种情况是因为sql语句中使用了“group by”或者是“order by ”,然后进行了文件排序。

接着,我们就需要给group by后面的字段建立索引了,mysql索引原则是最左匹配前缀原则,我们给“id_province_code,age,gender”三字段添加一个复合索引(按照最左匹配原则):

alter table medicare add index idx_ipc_age_gener(id_province_code,age,gender);

但一查看表结构

KEY `idx_province_age_gender` (`id_province_code`,`gender`,`age`)

早期已经建立好了,但是,怎么还会出现“Using temporary; Using filesort”,查看官档发现,group by 默认是要排序的,所以即使我们添加了索引,还是会引起文件排序。这样,我们的解决方案是:强制关闭排序:order by null

最后我们根据官方文档的建议,进行了sql的最终优化:

mysql> mysql> explain SELECT id_province_code,gender,age,COUNT(1),SUM(zy_days),SUMM(ybnje) FROM medicare WHERE zy_enter_date BETWEEN '-01-01 00:00:00' AND '-12-31 12:59:59' GROUP BY id_province_code,age,gender order by null;+----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+----------------------------------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+----------------------------------------+

| 1 | SIMPLE | medicare | NULL | range | idx_zy_enter_date,idx_province_age_gender | idx_zy_enter_date | 4 | NULL | 4836248 | 100.00 | Using index condition; Using temporary |

+----+-------------+--------------+------------+-------+-------------------------------------------+-------------------+---------+------+---------+----------+----------------------------------------+

查看执行计划,发现没有文件排序了,但是还是有“using tempory”,别着急,这是不重要的,只要执行sql的性能提升了就可以了

最终,运行最终优化后的sql发现比没有强制关闭排序的sql,整整快了将近4个小时(表中数据将近3亿,没关闭排序前的sql我运行了将近4个小时,还在跑,但优化后只跑了几十秒)看来优化的综合评估是很重要的。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。