代码素材网站哪个好,电视网站后台管理系统漏洞,有几个网站,c2c商城网站建设原标题#xff1a;使用MySQL 5.7虚拟列提高查询效率导读翻译团队#xff1a;星耀队知数堂团队成员#xff1a;星耀队-芬达#xff0c;星耀队-顺子#xff0c;星耀队-M哥原文出处#xff1a;https://www.percona.com/blog/2018/01/29/using-generated-columns-in-mysql-5-…原标题使用MySQL 5.7虚拟列提高查询效率导读翻译团队星耀队知数堂团队成员星耀队-芬达星耀队-顺子星耀队-M哥原文出处https://www.percona.com/blog/2018/01/29/using-generated-columns-in-mysql-5-7-to-increase-query-performance/原文作者Alexander Rubin在这篇博客中我们将看看如何使用MySQL 5.7的虚拟列来提高查询性能。In this blog post, we’ll look at ways you can use MySQL 5.7 generated columns (or virtual columns) to improve query performance.说明大约两年前我发表了一个在MySQL5.7版本上关于虚拟列的文章。从那时开始它成为MySQL5.7发行版当中我最喜欢的一个功能点。原因很简单在虚拟列的帮助下我们可以创建间接索引(fine-grained indexes)可以显著提高查询性能。我要告诉你一些技巧可以潜在地解决那些使用了GROUP BY 和 ORDER BY而慢的报表查询。About two years ago I published a blog post about Generated (Virtual) Columns in MySQL 5.7. Since then, it’s been one of my favorite features in the MySQL 5.7 release. The reason is simple: with the help of virtual columns, we can create fine-grained indexes that can significantly increase query performance. I’m going to show you some tricks that can potentially fix slow reporting queries with GROUP BY and ORDER BY.问题最近我正在协助一位客户他正挣扎于这个查询上Recently I was working with a customer who was struggling with this query:SELECT CONCAT(verb, - , replace(url,.xml,)) AS API Call, COUNT(*) as No. of API Calls, AVG(ExecutionTime) as Avg. Execution Time, COUNT(distinct AccountId) as No. Of Accounts, COUNT(distinct ParentAccountId) as No. Of Parents FROM ApiLog WHERE ts between 2017-10-01 00:00:00 and 2017-12-31 23:59:59 GROUP BY CONCAT(verb, - , replace(url,.xml,)) HAVING COUNT(*) 1 ;这个查询运行了一个多小时并且使用和撑满了整个 tmp目录(需要用到临时文件完成排序)。The query was running for more than an hour and used all space in the tmp directory (with sort files).表结构如下:The table looked like this:CREATE TABLE ApiLog (Id int(11) NOT NULL AUTO_INCREMENT,ts timestamp DEFAULT CURRENT_TIMESTAMP,ServerName varchar(50) NOT NULL default ,ServerIP varchar(50) NOT NULL default ,ClientIP varchar(50) NOT NULL default ,ExecutionTime int(11) NOT NULL default 0,URL varchar(3000) NOT NULL COLLATE utf8mb4_unicode_ci NOT NULL,Verb varchar(16) NOT NULL,AccountId int(11) NOT NULL,ParentAccountId int(11) NOT NULL,QueryString varchar(3000) NOT NULL,Request text NOT NULL,RequestHeaders varchar(2000) NOT NULL,Response text NOT NULL,ResponseHeaders varchar(2000) NOT NULL,ResponseCode varchar(4000) NOT NULL,... // other fields removed for simplicityPRIMARY KEY (Id),KEY index_timestamp (ts),... // other indexes removed for simplicity) ENGINEInnoDB;我们发现查询没有使用时间戳字段(“TS”)的索引:We found out the query was not using an index on the timestamp field (“ts”):mysql explain SELECT CONCAT(verb, - , replace(url,.xml,)) AS API Call, COUNT(*) as No. of API Calls, avg(ExecutionTime) as Avg. Execution Time, count(distinct AccountId) as No. Of Accounts, count(distinct ParentAccountId) as No. Of Parents FROM ApiLog WHERE ts between 2017-10-01 00:00:00 and 2017-12-31 23:59:59 GROUP BY CONCAT(verb, - , replace(url,.xml,)) HAVING COUNT(*) 1G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: ALLpossible_keys: ts key: NULL key_len: NULL ref: NULL rows: 22255292 filtered: 50.00 Extra: Using where; Using filesort1 row in set, 1 warning (0.00 sec)原因很简单符合过滤条件的行数太大了以至于影响一次索引扫描扫描的效率(或者至少优化器是这样认为的)The reason for that is simple: the number of rows matching the filter condition was too large for an index scan to be efficient (or at least the optimizer thinks that):mysql select count(*) from ApiLog WHERE ts between 2017-10-01 00:00:00 and 2017-12-31 23:59:59 ;----------| count(*) |----------| 7948800 |----------1 row in set (2.68 sec)总行数21998514。查询需要扫描的总行数的36%(7948800/21998514)(译者按当预估扫描行数超过20% 30%时即便有索引优化器通常也会强制转成全表扫描)。Total number of rows: 21998514. The query needs to scan 36% of the total rows (7948800 / 21998514).在这种情况下我们有许多处理方法创建时间戳列和GROUP BY列的联合索引创建一个覆盖索引(包含所有查询字段)仅对GROUP BY列创建索引创建索引松散索引扫描。In this case, we have a number of approaches:Create a combined index on timestamp column group by fieldsCreate a covered index (including fields that are selected)Create an index on just GROUP BY fieldsCreate an index for loose index scan然而如果我们仔细观察查询中“GROUP BY”部分我们很快就意识到这些方案都不能解决问题。以下是我们的GROUP BY部分However, if we look closer at the “GROUP BY” part of the query, we quickly realize that none of those solutions will work. Here is our GROUP BY part:GROUP BY CONCAT(verb, - , replace(url,.xml,))这里有两个问题它是计算列所以MySQL不能扫描verb url的索引。它首先需要连接两个字段然后组成连接字符串。这就意味着用不到索引URL被定义为“varchar(3000) COLLATE utf8mb4_unicode_ci NOT NULL”不能被完全索引(即使在全innodb_large_prefix 1 参数设置下这是UTF8启用下的默认参数)。我们能做部分索引这对GROUP BY的sql优化并没有什么帮助。There are two problems here:It is using a calculating field, so MySQL can’t just scan the index on verb url. It needs to first concat two fields, and then group on the concatenated string. That means that the index won’t be used.The URL is declared as “varchar(3000) COLLATE utf8mb4_unicode_ci NOT NULL” and can’t be indexed in full (even with innodb_large_prefix1 option, which is the default as we have utf8 enabled). We can only do a partial index, which won’t be helpful for GROUP BY optimization.在这里我尝试去对URL列添加一个完整的索引在innodb_large_prefix1参数下Here, I’m trying to add a full index on the URL with innodb_large_prefix1:mysql alter table ApiLog add key verb_url(verb, url);ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes嗯通过修改“GROUP BY CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”为 “GROUP BY verb, url”会帮助(假设我们把字段定义从varchar(3000)调小一些不管业务上允许或不允许)。然而这将改变结果因URL字段不会删除.xml扩展名了。Well, changing the “GROUP BY CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))” to “GROUP BY verb, url” could help (assuming that we somehow trim the field definition from varchar(3000) to something smaller, which may or may not be possible). However, it will change the results as it will not remove the .xml extension from the URL field.解决方案好消息是在MySQL 5.7中我们有虚拟列。所以我们可以在“CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”之上创建一个虚拟列。最好的部分我们不需要执行一组完整的字符串(可能大于3000字节)。我们可以使用MD5哈希(或更长的哈希例如SHA1 / SHA2)作为GROUP BY的对象。The good news is that in MySQL 5.7 we have virtual columns. So we can create a virtual column on top of “CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”. The best part: we do not have to perform a GROUP BY with the full string (potentially 3000 bytes). We can use an MD5 hash (or longer hashes, i.e., sha1/sha2) for the purposes of the GROUP BY.下面是解决方案:Here is the solution:alter table ApiLog add verb_url_hash varbinary(16) GENERATED ALWAYS AS (unhex(md5(CONCAT(verb, - , replace(url,.xml,))))) VIRTUAL;alter table ApiLog add key (verb_url_hash);所以我们在这里做的是声明虚拟列类型为varbinary(16)在CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”)上创建虚拟列并且使用MD5哈希转化后再使用unhex转化32位十六进制为16位二进制对上面的虚拟列创建索引。So what we did here is:Declared the virtual column with type varbinary(16)Created a virtual column on CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”), and used an MD5 hash on top plus an unhex to convert 32 hex bytes to 16 binary bytesCreated and index on top of the virtual column现在我们可以修改查询语句GROUP BY verb_url_hash列Now we can change the query and GROUP BY verb_url_hash column:mysql explain SELECT CONCAT(verb, - , replace(url,.xml,))AS API Call, COUNT(*) as No. of API Calls,avg(ExecutionTime) as Avg. Execution Time,count(distinct AccountId) as No. Of Accounts,count(distinct ParentAccountId) as No. Of ParentsFROM ApiLogWHERE ts between 2017-10-01 00:00:00 and 2017-12-31 23:59:59GROUP BY verb_url_hashHAVING COUNT(*) 1;ERROR 1055 (42000): Expression #1 of SELECT list is not inGROUP BY clause and contains nonaggregated column ApiLog.ApiLog.Verbwhich is not functionally dependent on columns in GROUP BY clause;this is incompatible with sql_modeonly_full_group_byMySQL 5.7的严格模式是默认启用的我们可以只针对这次查询修改一下。现在解释计划看上去好多了MySQL 5.7 has a strict mode enabled by default, which we can change for that query only.Now the explain plan looks much better:mysql select sql_mode;-------------------------------------------------------------------------------------------------------------------------------------------| sql_mode |-------------------------------------------------------------------------------------------------------------------------------------------| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |-------------------------------------------------------------------------------------------------------------------------------------------1 row in set (0.00 sec)mysql set sql_modeSTRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION;Query OK, 0 rows affected (0.00 sec)mysql explain SELECT CONCAT(verb, - , replace(url,.xml,)) AS API Call, COUNT(*) as No. of API Calls, avg(ExecutionTime) as Avg. Execution Time, count(distinct AccountId) as No. Of Accounts, count(distinct ParentAccountId) as No. Of Parents FROM ApiLog WHERE ts between 2017-10-01 00:00:00 and 2017-12-31 23:59:59 GROUP BY verb_url_hash HAVING COUNT(*) 1G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: indexpossible_keys: ts,verb_url_hash key: verb_url_hash key_len: 19 ref: NULL rows: 22008891 filtered: 50.00 Extra: Using where1 row in set, 1 warning (0.00 sec)MySQL可以避免排序速度更快。它将最终还是要扫描所有表的索引的顺序。响应时间明显更好只需大概38秒而不再是大于一小时。MySQL will avoid any sorting, which is much faster. It will still have to eventually scan all the table in the order of the index. The response time is significantly better: ~38 seconds as opposed to an hour.覆盖索引现在我们可以尝试做一个覆盖索引这将相当大Now we can attempt to do a covered index, which will be quite large:mysql alter table ApiLog add key covered_index (verb_url_hash,ts,ExecutionTime,AccountId,ParentAccountId, verb, url);Query OK, 0 rows affected (1 min 29.71 sec)Records: 0 Duplicates: 0 Warnings: 0我们添加了一个“verb”和“URL”所以之前我不得不删除表定义的COLLATE utf8mb4_unicode_ci。现在执行计划表明我们使用了覆盖索引We had to add a “verb” and “url”, so beforehand I had to remove the COLLATE utf8mb4_unicode_ci from the table definition. Now explain shows that we’re using the index:mysql explain SELECT CONCAT(verb, - , replace(url,.xml,)) AS API Call, COUNT(*) as No. of API Calls, AVG(ExecutionTime) as Avg. Execution Time, COUNT(distinct AccountId) as No. Of Accounts, COUNT(distinct ParentAccountId) as No. Of Parents FROM ApiLog WHERE ts between 2017-10-01 00:00:00 and 2017-12-31 23:59:59 GROUP BY verb_url_hash HAVING COUNT(*) 1G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: indexpossible_keys: ts,verb_url_hash,covered_index key: covered_index key_len: 3057 ref: NULL rows: 22382136 filtered: 50.00 Extra: Using where; Using index1 row in set, 1 warning (0.00 sec)响应时间下降到约12秒但是索引的大小明显地比仅verb_url_hash的索引(每个记录16字节)要大得多。The response time dropped to ~12 seconds! However, the index size is significantly larger compared to just verb_url_hash (16 bytes per record).结论MySQL 5.7的生成列提供一个有价值的方法来提高查询性能。如果你有一个有趣的案例请在评论中分享。MySQL 5.7 generated columns provide a valuable way to improve query performance. If you have an interesting case, please share in the comments.知数堂MySQL实战/MySQL优化 / Python/ SQL优化(MySQL 实战/优化、Python开发及SQL优化等课程)责任编辑