赵晨:新浪微博高级数据库管理员,曾经在EA工作,目前在新浪微博主要负责性能优化和架构设计,目前正痴迷于MySQL及Kernel源代码的研究中,欢迎大家交流。
个人微博为@fiona514。
前段时间业务小伙伴压测MySQL的并发insert和update,发现update并发的线程开到100的时候,DB的性能就很差了。DBA同学配合观察数据库状态, 发现频繁有死锁信息出现。根据之前的经验,第一反应是单行update出现锁的几率很高,但是死锁的几率应该不高。那么为什么这个业务就产生了如此之多的死锁进而引发新能问题呢?
于是决定深入业务,结合业务模式与环境和死锁日志一起进行进一步分析
注:本文重点为描述死锁产生的过程
表结构:
表中数据:
localhost.test>select * from test_int2 limit 2;
+---------------+-----+--------------+---------------+------------------+----------------+---------------------+----------+
| uid | mid | message_type | relation_type | a_type | a_id | time | weightid |
+---------------+-----+--------------+---------------+------------------+----------------+---------------------+----------+
| 1000 | 15 | 0 | 0 | 0 | 15 | 2016-09-19 18:52:28 | 15 |
| 1474958915589 | 1 | 0 | 0 | 0 | 1 | 2016-09-27 14:48:35 | 1 |
+---------------+-----+--------------+---------------+------------------+----------------+---------------------+----------+
2 rows in set (0.00 sec)
根据自己innodb_extract工具解析,以上两条记录分别对应物理记录heap no 172和heap no 3
update语句:
update test.test_int2 set mid=x,time=now(),weightid=x,a_id=x where uid=1000 and m_type=0 and a_type=0
其中x理想值以1为步长的单调递增的。
explain:
localhost.test>explain select * from test.test_int2 where uid=1000 and m_type=0 and a_type=0;
+----+-------------+--------------+------+----------------------+-------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------+-------+---------+-------------+------+-------------+
| 1 | SIMPLE | test_int2 | ref | umraa,ur,umaa,umaaat | umraa | 9 | const,const | 1 | Using where |
+----+-------------+--------------+------+----------------------+-------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
环境
OS version:CentOS release 6.4 (Final) MySQL version:5.5.12 事务隔离级别:Repeatable Read (RR)
首先我们先来看一下死锁日志:
localhost.(none)>show engine innodb status\G
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
160922 9:42:31 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 9 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2982 1_second, 2981 sleeps, 279 10_second, 5989 background, 5989 flush
srv_master_thread log flush and writes: 3207
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 8616970, signal count 2970405
Mutex spin waits 20443484, rounds 263315965, OS waits 7491778
RW-shared spins 728980, rounds 15323442, OS waits 295481
RW-excl spins 2533489, rounds 36138397, OS waits 511336
Spin rounds per wait: 12.88 mutex, 21.02 RW-shared, 14.26 RW-excl
------------------------
LATEST DETECTED DEADLOCK
------------------------
160919 18:52:28
*** (1) TRANSACTION:
TRANSACTION 599522E1E, ACTIVE 0 sec, process no 22134, OS thread id 1282165056 starting index read
MySQL tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 288331, query id 6903755 10.73.14.102 lymz_test Searching rows for update
update test.test_int2 set mid=15,time='2016-09-19 18:52:28',weightid=15,a_id=15 where uid=1000 and m_type=0 and a_type=0
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 486 page no 6 n bits 336 index `umraa` of table `test`.`test_int2` trx id 599522E1E lock_mode X waiting
Record lock, heap no 134 PHYSICAL RECORD: n_fields 10; compact format; info bits 32
0: len 8; hex 00000000000003e8; asc ;;
1: len 1; hex 80; asc ;;
2: len 1; hex 80; asc ;;0
3: len 1; hex 80; asc ;;
4: len 8; hex 0000000000000004; asc ;;
5: len 6; hex 000599522e09; asc R. ;;
6: len 7; hex 3a000740261068; asc : @& h;;
7: len 8; hex 0000000000000004; asc ;;
8: len 4; hex 57dfc36c; asc W l;;
9: len 8; hex 8000000000000004; asc ;;
*** (2) TRANSACTION:
TRANSACTION 599522E12, ACTIVE 0 sec, process no 22134, OS thread id 1467468096 updating or deleting, thread declared inside InnoDB 499
MySQL tables in use 1, locked 1
4 lock struct(s), heap size 1248, 11 row lock(s), undo log entries 1
MySQL thread id 288341, query id 6903743 10.73.14.102 lymz_test Updating
update test.test_int2 set mid=2,time='2016-09-19 18:52:28',weightid=2,a_id=2 where uid=1000 and message_type=0 and a_type=0
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 486 page no 6 n bits 336 index `umraa` of table `test`.`test_int2` trx id 599522E12 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 10; compact format; info bits 32
0: len 8; hex 00000000000003e8; asc ;;
1: len 1; hex 80; asc ;;
2: len 1; hex 80; asc ;;
3: len 1; hex 80; asc ;;
4: len 8; hex 00000000000003e9; asc ;;
5: len 6; hex 000599522e07; asc R. ;;
6: len 7; hex 39000700352216; asc 9 5" ;;
7: len 8; hex 00000000000489fd; asc ;;
8: len 4; hex 57dfc027; asc W ';;
9: len 8; hex 80000000000489fd; asc ;;
Record lock, heap no 134 PHYSICAL RECORD: n_fields 10; compact format; info bits 32
0: len 8; hex 00000000000003e8; asc ;;
1: len 1; hex 80; asc ;;
2: len 1; hex 80; asc ;;
3: len 1; hex 80; asc ;;
4: len 8; hex 0000000000000004; asc ;;
5: len 6; hex 000599522e09; asc R. ;;
6: len 7; hex 3a000740261068; asc : @& h;;
7: len 8; hex 0000000000000004; asc ;;
8: len 4; hex 57dfc36c; asc W l;;
9: len 8; hex 8000000000000004; asc ;;
.......中间省略
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 486 page no 6 n bits 336 index `umraa` of table `test`.`test_int2` trx id 599522E12 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 134 PHYSICAL RECORD: n_fields 10; compact format; info bits 32
0: len 8; hex 00000000000003e8; asc ;;
1: len 1; hex 80; asc ;;
2: len 1; hex 80; asc ;;
3: len 1; hex 80; asc ;;
4: len 8; hex 0000000000000004; asc ;;
5: len 6; hex 000599522e09; asc R. ;;
6: len 7; hex 3a000740261068; asc : @& h;;
7: len 8; hex 0000000000000004; asc ;;
8: len 4; hex 57dfc36c; asc W l;;
9: len 8; hex 8000000000000004; asc ;;
*** WE ROLL BACK TRANSACTION (1)
根据以上innodb死锁信息,第一反应是事务2和事务1锁竞争集中在同一条umraa
索引记录上(space id 486 page no 6 n bits 336)。事务1等待事务2持有的该索引记录的排它锁,同时事务2自己也在也在等待这个锁,于是产生死锁。
以上为根据死锁日志的DBA直觉分析,下面为具体的复现过程,但在这之前,需要先准备一下基础知识。
以下的分析需要了解的:
常见锁模式:
锁模式和锁类型
通过以上宏来分析之后加锁的模式和类型
记录锁对象的结构
struct lock_rec_t {
ulint space;
ulint page_no;
ulint n_bits;
};
通过主键我们能找到对应的page no,记录在page里以heap形式存在,heap no标识物理的行,对物理行加锁,实际是通过lock_rec_bitmap_reset
根据记录对应的heap no将锁对象的bitmap位设置为1 n_bits为bitmap的bit位数,大小为:(1+(page_dir_get_n_heap(page) + 64)/8))*8,在锁对象结构体后面会分配对应的bitmap内存。
补充说明 为简单,之后分析会省略意向锁及事务的显示/隐式锁。主键上的隐式锁会通过函数lock_rec_impl_to_expl
转化为显示锁
由于死锁日志所展示的信息不足以进行分析,所以我们通过gdb工具剖析,设置断点函数,用来打印出每一次到达断点函数后的函数栈。以下为设置的断点函数
lock_rec_lock
,对行记录加锁的入口函数,根据事务和页来进行
lock_rec_add_to_queue
如果不存在和当前申请锁模式冲突的锁,则加入trx_lock_list中
lock_rec_enqueue_waiting
如果存在冲突的锁,则加入到lock_wait_list中
lock_rec_create
创建lock_t
锁对象
lock_rec_bitmap_reset
,根据记录的heap no(记录在page中以堆的形式存在),也就是物理行的顺序标识将page上与之对应的bitmap设置为1,通过此函数我们可以得到lock的信息
lock_grant
,可以看到事务rollback之后,哪个事务的wait_list中的锁被grant
以下进入死锁产生过程分析
2个事务开始进行。
时间点 | trx 1 | trx 2 |
---|---|---|
1 | BEGIN | BEGIN |
时间点 | trx 1 | trx 2 |
---|---|---|
1 | BEGIN | BEGIN |
2 | update test.test_int2 set mid=2,time='2016-09-19 18:52:28',weightid=2,a_id=2 where uid=1000 and message_type=0 and a_type=0; |
事务1(线程19) trx=0x7fff4c00dad8
,之后简称trx1
1.lock_rec_lock (impl=0, mode=3, block=0x7fff8cc63a30, heap_no=172, index=0x7ffff00095b8, thr=0x7fff4c007da8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:2099
c
2.lock_rec_create (type_mode=3, block=0x7fff8cc63a30, heap_no=172, index=0x7ffff00095b8, trx=0x7fff4c00dad8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:1677
c
3.lock_rec_bitmap_reset (lock=0x7fff4c00e070) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:1280
c
4.lock_rec_lock (impl=0, mode=515, block=0x7fff8cc63a30, heap_no=3, index=0x7ffff00095b8, thr=0x7fff4c007da8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:2099
c
5.lock_rec_add_to_queue (type_mode=547, block=0x7fff8cc63a30, heap_no=3, index=0x7ffff00095b8, trx=0x7fff4c00dad8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:1853
c
6. lock_rec_create (type_mode=547, block=0x7fff8cc63a30, heap_no=3, index=0x7ffff00095b8, trx=0x7fff4c00dad8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:1677
c
7.lock_rec_lock (impl=0, mode=1027, block=0x7fff8cc63a30, heap_no=172, index=0x7ffff00095b8, thr=0x7fff4c007da8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:2099
首先事务1进入Update的记录查找阶段(row_search_for_mysql)阶段,也是一致性读阶段,依次查找符合条件的记录
index scan,根据where条件,通过主键的前缀(uid,message_type)开始查找第一条符合条件的记录,对找到行加锁(heap no 172),锁类型为LOCK_X|LOCK_ORDINARY
(mode=3,在上层函数lock_clust_rec_read_check_and_lock中mode=LOCK_X, gap_mode=LOCK_ORDINARY=0),表示对记录和记录之前的间隙加排它锁
判断记录所在page上没有任何记录锁,走fast lock流程,直接lock_rec_create
创建锁对象,lock->type_mode = (type_mode & ~LOCK_TYPE_MASK) | LOCK_REC = 35 = (LOCK_X | LOCK_ORDINARY | LOCK_REC) 我们标记为lock1。 锁住heap no 172的记录本身,同时锁住记录和记录之前的GAP,mode为LOCK_X
根据生成锁对象lock1
将对应heap no的Bitmap设置为1
对该记录的下一条记录(即heap no为3的记录)尝试加gap锁(mode=515,LOCK_X|LOCK_GAP),锁住heap no为3的记录之前的GAP。
此时尝试走fast lock
流程失败,尝试走slow lock
流程,因为满足以下两个条件: a. 检查没有与此申请的锁冲突的锁 b. 同一page上有锁对象,但是heap no不是同一个 所以调用lock_rec_add_queue
将锁加入lock_list队列,并创建新的锁对象 这块注意传的参数type_mode从515变成547,为什么??? 源代码是这样的:
lock_rec_lock_slow(
/*===============*/
ibool impl, /*!< in: if TRUE, no lock is set
if no wait is necessary: we
assume that the caller will
set an implicit lock */
ulint mode, /*!< in: lock mode: LOCK_X or
LOCK_S possibly ORed to either
LOCK_GAP or LOCK_REC_NOT_GAP */
const buf_block_t* block, /*!< in: buffer block containing
the record */
ulint heap_no,/*!< in: heap number of record */
dict_index_t* index, /*!< in: index of record */
que_thr_t* thr) /*!< in: query thread */
{
...
略
...
/* Set the requested lock on the record */
lock_rec_add_to_queue(LOCK_REC | mode, block,
heap_no, index, trx);
return(DB_SUCCESS_LOCKED_REC);
}
return(DB_SUCCESS);
我们注意到加传给函数的同时参数type_modeLOCK_REC|LOCK_GAP
,同时在记录上加了lock.
生成锁对象lock2
lock->type_mode = (type_mode & ~LOCK_TYPE_MASK) | LOCK_REC = 547 = (LOCK_X | LOCK_GAP | LOCK_REC) 锁住heap no 3记录本身,及记录之前的GAP,mode为LOCK_X
此时锁对象结构如下:
table scan,buffer->tempfile 因为更新的列是主键的一部分,所以不能直接更新,将满足条件的记录保存在temp文件中。从tempfile中准备进行全表扫描取出记录进行更新前的准备与内容的填充。 此处从tempfile基于table scan取出一行记录(调用ha_innobase::rnd_pos),并尝试对找到的聚簇索引umraa
加锁LOCK_X|LOCK_REC_NOT_GAP = 1027,mode=LOCK_X, gap_mode=1024
。
此时事务1拥有两个锁:[lock1:trx1,lock2:trx1]
分别为锁住: a. lock1为对物理行heap_no 172,聚簇索引为(1000,0,0,0,15),对应逻辑记录(1000,15,0,0,0,15,2016-09-19 18:52:28,15)及记录之前的gap锁 b. lock2为下一条逻辑记录的物理行heap_no 3,聚簇索引为(1474958915589,0,0,0,1)逻辑记录(1474958915589,0,0,0,2016-09-27 14:48:35,1)及之前的gap ,目的为防止更新前有记录插入
lock_list :
heap_no=172 [lock1:trx1
]
heap_no=3 [lock2:trx1
]
trx_wait_list:
[]
事务2 (线程20) trx=0x7fff5000bba8
,之后简称trx2
其gdb的输出结果如下:
[Switching to thread 20 (Thread 0x7ffff448d700 (LWP 30474))]
lock_rec_lock (impl=0, mode=3, block=0x7fff8cc63a30, heap_no=172, index=0x7ffff00095b8, thr=0x7fff50007d48) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:2099
continue
lock_rec_create (type_mode=259, block=0x7fff8cc63a30, heap_no=172, index=0x7ffff00095b8, trx=0x7fff5000bba8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:1677
continue
search阶段:
事务2开始进行index scan,根据where条件,通过主键的前缀(uid,message_type)开始查找第一条符合条件的记录,对找到的行(heap no 172)加锁,尝试加LOCK_X | LOCK_ORDINARY
,与事务1持有的lock1
冲突,所以只能设置waiting lock flag,因为LOCK_X=3,LOCK_WAIT=256,LOCK_REC=32,则 lock3.type_mode = (LOCK_X | LOCK_WAIT | LOCK_REC) = 291
如下:事务2生成锁对象lock3
加入锁等待队列,最终属性为如下:
此时事务2拥有一个锁lock3,并加入到锁等待列表trx_wait_list
lock_list :
heap_no=172 [lock1:trx1
,lock3:trx2
]
heap_no=3 [lock2:trx1
]
trx_wait_list:
heap_no=172 [lock3:trx2
->lock1:trx1
]
首先,事务1此时search阶段完成,开始进入update阶段,入口函数lock_update_insert
lock_rec_enqueue_waiting (type_mode=2563, block=0x7fff8cc63a30, heap_no=172, index=0x7ffff00095b8, thr=0x7fff4c008200) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:1759
c
lock_rec_create (type_mode=2819, block=0x7fff8cc63a30, heap_no=172, index=0x7ffff00095b8, trx=0x7fff4c00dad8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:1677
解读如下:
关键函数路径 row_upd_step -> row_upd -> row_upd_clust_step -> row_upd_clust_rec_by_insert -> row_ins_index_entry -> row_ins_index_entry_low -> btr_cur_optimistic_insert -> btr_cur_ins_lock_and_undo -> lock_rec_insert_check_and_lock -> lock_rec_enqueue_waiting
定位待插入记录的下一条
进入流程btr_cur_optimistic_insert->btr_cur_ins_lock_and_undo->lock_rec_insert_check_and_lock
。此时事务1要插入的聚簇索引为(1000,0,0,0,2),将使聚簇索引的顺序发生了变化,重新定位待插入记录的位置,cur位置为伪记录infimum
,==下一条记录为该page上用户记录的第一条(heap_no=172)==,
检查锁冲突 调用函数lock_rec_other_has_conflicting
检查以下两点:
记录上(heap_no=172)是否已经有其他事务持有一个非GAP锁的请求与插入意向锁冲突 (满足)
以上条件满足==并且冲突的锁已经在wait_list等待队列中==。 (满足)
首先尝试更新聚簇索引
生成新的锁对象 我们看到下一条记录(heap no 172)已经有事务2的非GAP锁lock3
在等待队列中了(type_mode=291=LOCK_X|LOCK_WAIT
),所以此插入意向锁请求不能被grant,之后创建锁对象lock4
,模式和类型为type_mode | LOCK_WAIT=2563+256=2819,并加入到wait_list锁等待队列中.
type_mode = 2819 (LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION | LOCK_REC |LOCK_WAIT)
heap_no = 172
index = 0x7ffff00095b8
trx = 0x7fff4c00dad8 (trx2)
lock = 0x7fff4c008a20(lock4)
page_no = 7
space = 148
n_bits = 348
n_bytes = 44
page = 0x7fff8d418000 "\215j{5"
发生死锁,事务2被选为牺牲者回滚
lock_deadlock_occurs (lock=0x7fff4c008a20, trx=0x7fff4c00dad8) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:3291
c
lock_deadlock_recursive (start=0x7fff4c00dad8, trx=0x7fff4c00dad8, wait_lock=0x7fff4c008a20, cost=0x7fff59427518, depth=0)
c
lock_deadlock_recursive (start=0x7fff4c00dad8, trx=0x7fff5000bd68, wait_lock=0x7fff5000c3c0, cost=0x7fff59427518, depth=1)
c
lock_cancel_waiting_and_release (lock=0x7fff5000c3c0) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:4142
c
lock_rec_dequeue_from_page (in_lock=0x7fff5000c3c0) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:2275
c
lock_grant (lock=0x7fff4c008a20) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:2173
c
lock_reset_lock_and_trx_wait (lock=0x7fff4c008a20) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:793
c
lock_reset_lock_and_trx_wait (lock=0x7fff5000c3c0) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:793
c
lock_deadlock_recursive (start=0x7fff4c00dad8, trx=0x7fff4c00dad8, wait_lock=0x7fff4c008a20, cost=0x7fff59427518, depth=0) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:3364
c
lock_update_insert (block=0x7fff8cc63a30, rec=0x7fff8d41a5e5 "") at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:3143
c
lock_release_off_kernel (trx=0x7fff5000bd68) at /usr/local/mysql-5.5.12-debug/storage/innobase/lock/lock0lock.c:4078
至此死锁产生,我们梳理下锁的队列和锁等待的队列
lock_list :
heap_no=172 [lock1:trx1
,lock3:trx2
]
heap_no=3 [lock2:trx1
]
trx_wait_list :
heap no 172 [lock3:trx2
->lock1:trx1
,lock4:trx1
->lock3:trx2
]
结合之前的事务图最终直观的加锁图结果如下:
死锁的发生与加锁顺序关联密切。而加锁顺序又跟使用的索引密切相关。由于之前判断where根据聚簇索引前缀查询,同时聚簇索引中的列也要将被更新,此时可能会产生Halloween
问题。(感兴趣的可以自行google)
另外,update阶段也十分复杂,本案例中update阶段首先更新聚簇索引记录,之后依次的更新包含了更新字段的辅助索引记录。那如果where后面根据辅助索引查呢?update阶段首先要更新使用的辅助索引,对辅助索引加LOCK_X | LOCK_ORDINARY,防止search阶段有对记录的增删改操作,再更新对应更改的聚簇索引,对聚簇索引记录加LOCK_X|LOCK_REC锁,之后还要依次更新包含了更新字段的其他二级索引加LOCK_X |LOCK_REC锁。
最后,死锁与隔离级别有关,以上验证都是基于rr隔离级别下,同时没有设置innodb_locks_unsafe_for_binlog参数,验证在rc隔离级别下该死锁不会发生。
总结,引发本次死锁问题的根本原因就在于更新主键之后,新纪录的位置发生了变化,从而导致了循环等待,这种情况只发生在第一个事务更新考前位置,而第二个事务更新靠后位置的情况下。
综上可以看出,对于多列主键的update在高并发下会产生严重的死锁问题,从而对性能产生致命的影响,故建议如下:
主键设计尽量简单,如果可以,尽量避免多列主键的存在
如果存在多列主键,那么尽可能不要更新主键
如果以上做到较困难,选择rc隔离级别或者同时设置innodb_locks_unsafe_for_binlog会得到缓解