本文首發於 2020-05-05 21:55:15
一、前言
從接觸MySQL開始斷斷續續的看過一些文章,對count()操作眾説紛紜,其中分歧點主要在於count(1)和count(*)哪個效率高,有説count(1)比count(*)快的(這種説法更普遍),有説二者一樣快的。個人理解這兩種行為可能適用於的是不同的版本,我只關心較新的MySQL版本是什麼行為,詳見下文。
二、含義
首先,先説明一下常見count()操作及含義:
count(*):計算包括NULL值在內的行數,SQL92定義的標準統計行數的語法。
count(1):計算包括NULL值在內的行數,其中的1是恆真表達式。
count(列名):計算指定列的行數,但不包含NULL值。
三、具體區別
MySQL手冊中相關描述如下:
For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently,
SELECT COUNT(*)statements only count rows visible to the current transaction.Prior to
MySQL 5.7.18, InnoDB processesSELECT COUNT(*)statements by scanning the clustered index. As ofMySQL 5.7.18, InnoDB processesSELECT COUNT(*)statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.Processing
SELECT COUNT(*)statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, useSHOW TABLE STATUS.InnoDB handles
SELECT COUNT(*)andSELECT COUNT(1)operations in the same way. There is no performance difference.For
MyISAMtables,COUNT(*)is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:mysql> SELECT COUNT(*) FROM student;This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly.COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.
官方這段描述要點如下:
- InnoDB是事務引擎,支持MVCC,併發事務可能同時“看到”不同的行數,所以,InnoDB不保留表中的行數,
SELECT COUNT(*)語句只計算當前事務可見的行數。- 在MySQL 5.7.18之前,InnoDB通過掃描聚集索引處理
SELECT COUNT(*)語句。從MySQL 5.7.18開始,InnoDB通過遍歷最小的可用二級索引來處理SELECT COUNT(*)語句,除非索引或優化器明確指示使用不同的索引。如果不存在二級索引,則掃描聚集索引。這樣的設計單從 IO 的角度就節省了很多開銷。- InnoDB以同樣的方式處理
SELECT COUNT(*)和SELECT COUNT(1)操作,沒有性能差異。 因此,建議使用符合SQL標準的count(*)。- 對於
MyISAM表,由於MyISAM引擎存儲了精確的行數,因此,如果SELECT COUNT(*)語句不包含WHERE子句,則會很快返回。這個很好理解,如果帶了where條件,就需要掃表了。- 如果索引記錄不完全在緩衝池中,則處理
SELECT(*)語句需要一些時間。為了更快的計數,您可以創建一個計數器表,並讓您的應用程序按插入和刪除操作更新它。然而,這種方法在同一計數器表中啓動成千上萬個併發事務的情況下,可能無法很好地擴展。如果一個近似的行數足夠,可以使用SHOW TABLE STATUS查詢行數。
到這裏我們明白了 count(*) 和 count(1) 本質上面其實是一樣的,那麼 count(column) 又是怎麼回事呢?
count(column)也是會遍歷整張表,但是不同的是它會拿到 column 的值以後判斷是否為空,然後再進行累加,那麼如果針對主鍵需要解析內容,如果是二級索引需要再次根據主鍵獲取內容,則要多一次 IO 操作,所以count(column)的性能肯定不如前兩者,如果按照效率比較的話:count(*)=count(1)>count(primary key)>count(非主鍵column)。
四、建議
基於以上描述,如果要查詢innodb存儲引擎的表的總行數,有如下建議:
-
若僅僅是想獲取大概的行數,建議使用
show table status或查詢information_schema.tables:mysql> use db6; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +---------------+ | Tables_in_db6 | +---------------+ | t1 | -
row in set (0.01 sec)
mysql> select count(*) from t1; count(*) 2 -
row in set (0.00 sec)
mysql> show table status\G
1. row **Name: t1 Engine: InnoDB Version: 10Row_format: Dynamic
Rows: 2Avg_row_length: 8192
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2020-04-21 12:00:44
Update_time: NULL
Check_time: NULL
Collation: utf8mb4_general_ciChecksum: NULLCreate_options:
Comment: -
row in set (0.00 sec)
mysql> select * from information_schema.tables where table_name = 't1'\G
1. row **
TABLE_CATALOG: def
TABLE_SCHEMA: db6
TABLE_NAME: t1
TABLE_TYPE: BASE TABLEENGINE: InnoDB VERSION: 10ROW_FORMAT: Dynamic
TABLE_ROWS: 2
AVG_ROW_LENGTH: 8192
DATA_LENGTH: 16384
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 0
DATA_FREE: 0
AUTO_INCREMENT: NULL
CREATE_TIME: 2020-04-21 12:00:44
UPDATE_TIME: NULL
CHECK_TIME: NULL
TABLE_COLLATION: utf8mb4_general_ciCHECKSUM: NULLCREATE_OPTIONS:
TABLE_COMMENT: -
row in set (0.00 sec)
-
反之,如果必須要獲取準確的總行數,建議:
1) 創建一個計數器表,並讓您的應用程序按插入和刪除操作更新它。
2) 若業務插入和刪除相對較少,也可以考慮緩存到 redis。
篇幅有限,深入驗證、源碼分析將在下一篇文章中介紹。
歡迎關注我的微信公眾號【數據庫內核】:分享主流開源數據庫和存儲引擎相關技術。
| 標題 | 網址 |
|---|---|
| GitHub | https://dbkernel.github.io |
| 知乎 | https://www.zhihu.com/people/... |
| 思否(SegmentFault) | https://segmentfault.com/u/db... |
| 掘金 | https://juejin.im/user/5e9d3e... |
| CSDN | https://blog.csdn.net/dbkernel |
| 博客園(cnblogs) | https://www.cnblogs.com/dbkernel |