ONLY_FULL_GROUP_BY是MySQL提供的一個(gè)sql_mode,通過(guò)這個(gè)sql_mode來(lái)提供SQL語(yǔ)句“分組求最值”合法性的檢查,在MySQL的sql_mode為非ONLY_FULL_GROUP_BY語(yǔ)義時(shí)。一條select語(yǔ)句,MySQL允許target list中輸出的表達式是除聚集函數或group by column以外的表達式,但這個(gè)表達式的值可能在經(jīng)過(guò)group by操作后變成了未知的,例如:
| 1 2 3 4 5 6 7 | create table emp(id int primary key, ename varchar(20), sal decimal(10, 2), deptno int); insert into emp select 1001,'emp_1001',100.00,10; insert into emp select 1002,'emp_1002',200.00,10; insert into emp select 1003,'emp_1003',300.00,20; insert into emp select 1004,'emp_1004',400.00,20; insert into emp select 1005,'emp_1005',500.00,30; insert into emp select 1006,'emp_1006',600.00,30; |
其中,empno是員工編號,ename是員工姓名,sal是工資,deptno是員工所在部門(mén)號。
業(yè)務(wù)的需求是,求出每個(gè)部門(mén)中工資最高的員工的相關(guān)信息。
在MySQL 5.6中,可能會(huì )看見(jiàn)這種寫(xiě)法:
| 1 2 3 4 5 6 7 8 9 | mysql> select deptno,ename,max(sal) from emp group by deptno; +--------+----------+----------+ | deptno | ename | max(sal) | +--------+----------+----------+ | 10 | emp_1001 | 200.00 | | 20 | emp_1003 | 400.00 | | 30 | emp_1005 | 600.00 | +--------+----------+----------+ 3 rows in set (0.00 sec) |
實(shí)在不明白,這里的ename在業(yè)務(wù)層有何意義,畢竟,他并不是工資最高的那位員工。但仔細觀(guān)察會(huì )發(fā)現,ename值也算有規律,總是分組后的第一條記錄。由此,在5.6中,我們可以通過(guò)下面這個(gè)SQL來(lái)實(shí)現這個(gè)需求,算是一種投機方式。
| 1 2 3 4 5 | SELECT deptno,ename,sal FROM ( SELECT * FROM emp ORDER BY sal DESC ) t GROUP BY deptno; |
得到結果如下:
| 1 2 3 4 5 6 7 8 | +--------+----------+--------+ | deptno | ename | sal | +--------+----------+--------+ | 10 | emp_1002 | 200.00 | | 20 | emp_1004 | 400.00 | | 30 | emp_1006 | 600.00 | +--------+----------+--------+ 3 rows in set (0.00 sec) |
而對于語(yǔ)義限制都比較嚴謹的多家數據庫,如SQLServer、Oracle、PostgreSql都不支持select target list中出現語(yǔ)義不明確的列,這樣的語(yǔ)句在這些數據庫中是會(huì )被報錯的,這也是SQL92的標準。所以從MySQL 5.7版本開(kāi)始修正了這個(gè)語(yǔ)義,就是我們所說(shuō)的ONLY_FULL_GROUP_BY語(yǔ)義。
例如,查看MySQL 5.7默認的sql_mode如下:
| 1 2 | mysql> select @@global.sql_mode; ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
所以,對于同樣的SQL,我們在MySQL 5.7再跑一次就會(huì )報錯了,剛才通過(guò)的查詢(xún)語(yǔ)句被MySQL拒絕掉了!如下錯誤:
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 't.ename' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
在ONLY_FULL_GROUP_BY模式下,因為target list中的ename沒(méi)有出現在聚集函數中,并且也沒(méi)有出現在group by list中,所以MySQL給拒絕了。這也是MySQL 5.6升級到MySQL 5.7需要注意的地方。如果我們去掉ONLY_FULL_GROUP_BY模式,如下操作:
| 1 | mysql> set session sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'; |
現在,我們在去掉ONLY_FULL_GROUP_BY語(yǔ)義的MySQL 5.7上把剛才的查詢(xún)再次執行,得到結果如下:
| 1 2 3 4 5 6 7 8 | +--------+----------+--------+ | deptno | ename | sal | +--------+----------+--------+ | 10 | emp_1001 | 100.00 | | 20 | emp_1003 | 300.00 | | 30 | emp_1005 | 500.00 | +--------+----------+--------+ 3 rows in set (0.00 sec) |
結果竟然跟5.6不一樣。
實(shí)際上,在MySQL5.7中,對該SQL進(jìn)行了改寫(xiě),改寫(xiě)后的SQL可通過(guò)show warnings查看。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | mysql> explain SELECT deptno,ename,sal FROM ( SELECT * FROM emp ORDER BY sal DESC ) t GROUP BY deptno; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+ | 1 | SIMPLE | emp | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | Using temporary; Using filesort | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show warnings; +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | /* select#1 */ select `employees`.`emp`.`deptno` AS `deptno`,`employees`.`emp`.`ename` AS `ename`,`employees`.`emp`.`sal` AS `sal` from `employees`.`emp` group by `employees`.`emp`.`deptno` | +-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) |
從改寫(xiě)后的SQL來(lái)看,其消除了子查詢(xún),導致結果未能實(shí)現預期效果。但可通過(guò)修改optimizer_switch來(lái)加以規避(setoptimizer_switch=”derived_merge=off”),derived_merge是MySQL 5.7引入的,其會(huì )試圖將derived table(派生表,from后面的子查詢(xún))、視圖引用、共用表表達式與外層表進(jìn)行合并。
Tips:所以很多從5.6升級到5.7時(shí),為了語(yǔ)法兼容,大部分都會(huì )選擇調整sql_mode,使其保持跟5.6一致,為了盡量兼容程序。但是這個(gè)問(wèn)題也是需要注意的,不要以為這種投機的寫(xiě)法不會(huì )有人用。在stackoverflow中,該實(shí)現的點(diǎn)贊數就有116個(gè),由此可見(jiàn)其受眾之廣。
所以ONLY_FULL_GROUP_BY的語(yǔ)義就是確定select target list中的所有列的值都是明確語(yǔ)義,簡(jiǎn)單的說(shuō)來(lái),在ONLY_FULL_GROUP_BY模式下,target list中的值要么是來(lái)自于聚集函數的結果,要么是來(lái)自于group by list中的表達式的值。但是由于表達式的表現形式非常豐富,對于MySQL來(lái)說(shuō),很難精確的確定一些表達式的輸出結果是明確的,比如:
| 1 2 3 | mysql> select deptno from emp group by deptno+1; ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'employees.emp.deptno' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by |
在上面的查詢(xún)語(yǔ)句中,其實(shí)count的值也是能被唯一確定的,但是由于程序無(wú)法分析出這種復雜的關(guān)系,所以這條查詢(xún)也被拒絕掉了。
因此,如果查詢(xún)語(yǔ)句中的target list,having condition或者order by list里引用了的表達式不是聚集函數,但是和group by list中的表達式嚴格匹配,該語(yǔ)句也是合法的(deptno+1和deptno+1是嚴格匹配的,deptno+1和deptno+2在MySQL認為是不嚴格匹配的, deptno+1和1+deptno也是不嚴格匹配的)。
如下嚴格匹配模式下,查詢(xún)是沒(méi)有問(wèn)題的。
| 1 2 3 4 5 6 7 8 9 | mysql> select deptno+1 from emp group by deptno+1; +----------+ | deptno+1 | +----------+ | 11 | | 21 | | 31 | +----------+ 3 rows in set (0.00 sec) |
這條語(yǔ)句target list中的deptno+1和group by中的deptno+1是嚴格匹配的,所以MySQL認為target list中的deptno+1是語(yǔ)義明確的,因此該語(yǔ)句可以通過(guò)。
再看下面這條語(yǔ)句:
| 1 2 3 4 5 6 7 8 9 | mysql> select deptno+1 as a from emp group by a order by deptno+1 desc; +------+ | a | +------+ | 31 | | 21 | | 11 | +------+ 3 rows in set (0.00 sec) |
MySQL允許target list中對于非聚集函數的alias column被group by、having condition以及order by語(yǔ)句引用,從上面兩條語(yǔ)句可以看出,group by和order by中引用了alias column,并且其等價(jià)于基礎列語(yǔ)義。
總結一下:MySQL對于ONLY_FULL_GROUP_BY語(yǔ)義的判斷規則是,如果group by list中的表達式是basic column,那么target list中允許出現表達式是group by list中basic column或者alias column的組合結果,如果group by list中的表達式是復雜表達式(非basic column或者alias column),那么要求target list中的表達式必須能夠嚴格和group by list中的表達式進(jìn)行匹配,否者這條查詢(xún)會(huì )被認為不合法。
其實(shí)分組求最值是一個(gè)很普遍的需求。下面具體來(lái)看看,MySQL中有哪些實(shí)現方式。
方法一:
| 1 2 3 4 5 6 7 8 9 10 | SELECT e.deptno, ename, sal FROM emp e, ( SELECT deptno, max( sal ) maxsal FROM emp GROUP BY deptno ) t WHERE e.deptno = t.deptno AND e.sal = t.maxsal; |
方法二:
| 1 2 3 4 5 6 7 8 9 10 | SELECT a.deptno, a.ename, a.sal FROM emp a LEFT JOIN emp b ON a.deptno = b.deptno AND a.sal < b.sal WHERE b.sal IS NULL; |
性能呢?
方法一執行計劃:
| 1 2 3 4 5 6 7 8 9 | mysql> desc SELECT e.deptno,ename,sal FROM emp e, ( SELECT deptno, max( sal ) maxsal FROM emp GROUP BY deptno ) t WHERE e.deptno = t.deptno AND e.sal = t.maxsal; +----+-------------+------------+------------+-------+----------------+----------------+---------+-------------------+------+----------+----------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+-------+----------------+----------------+---------+-------------------+------+----------+----------------------------------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where | | 1 | PRIMARY | e | NULL | ref | idx_deptno_sal | idx_deptno_sal | 11 | t.deptno,t.maxsal | 1 | 100.00 | Using join buffer (Batched Key Access) | | 2 | DERIVED | emp | NULL | range | idx_deptno_sal | idx_deptno_sal | 5 | NULL | 4 | 100.00 | Using index for group-by | +----+-------------+------------+------------+-------+----------------+----------------+---------+-------------------+------+----------+----------------------------------------+ 3 rows in set, 1 warning (0.00 sec) |
在有復合索引(deptno, sal)的情況下,結果瞬間就能出來(lái)應該。根據執行計劃,先將group by的結果放到臨時(shí)表中,然后再將該臨時(shí)表作為驅動(dòng)表,來(lái)和emp表進(jìn)行關(guān)聯(lián)查詢(xún)。驅動(dòng)表?。ㄖ挥?條記錄),關(guān)聯(lián)列又有索引,無(wú)怪乎,結果能秒出。
方法二執行計劃:
| 1 2 3 4 5 6 7 8 | mysql> explain SELECT a.deptno,a.ename,a.sal FROM emp a LEFT JOIN emp b ON a.deptno = b.deptno AND a.sal < b.sal WHERE b.sal IS NULL; +----+-------------+-------+------------+------+----------------+----------------+---------+--------------------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+----------------+----------------+---------+--------------------+------+----------+--------------------------+ | 1 | SIMPLE | a | NULL | ALL | NULL | NULL | NULL | NULL | 6 | 100.00 | NULL | | 1 | SIMPLE | b | NULL | ref | idx_deptno_sal | idx_deptno_sal | 5 | employees.a.deptno | 2 | 16.67 | Using where; Using index | +----+-------------+-------+------------+------+----------------+----------------+---------+--------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) |
兩表關(guān)聯(lián)查詢(xún)。其犯了SQL優(yōu)化中的兩個(gè)大忌。驅動(dòng)表對于優(yōu)化器來(lái)說(shuō),沒(méi)辦法自由選擇,只能使用a表,所以a表如果越大就越慢。被驅動(dòng)表雖然也有索引,但從執行計劃上看,其只使用了復合索引(deptno,sal)中的deptno,如果deptno選擇性太低,則越慢。
所以,對于分組求最值的需求,建議使用方法一,其不僅符合SQL規范,查詢(xún)性能上也是最好的。
<參考>
https://mp.weixin.qq.com/s/u9Twv24IKxfHVyj62B4VtQ
聯(lián)系客服