14 Concurrency Control¶

文本统计：约 4111 个字 • 23 行代码

Info

本文参考并发控制 - HobbitQia的笔记本

14.1 Lock-Based Protocols¶

A lock is a mechanism to control concurrent access to a data item

exclusive(X) Data item can be both read as well as written. X-lock is requested using lock-X instruction.
shared(X) Data item can only be read. S-lock is requested using lock-S instruction.

要写一个数据，先申请获得 X 锁；要读一个数据，先申请获得 S 锁。访问结束后释放这个锁。

访问数据之前必须获得对应的锁，否则需要等待。

14.1.1 The Two-Phase Locking Protocol¶

事务的加锁和减锁分为两个阶段。

Phase 1: Growing Phase (增长阶段) - transaction may obtain locks - transaction may not release locks

Phase 2: Shrinking Phase(缩减阶段) - transaction may release locks - transaction may not obtain locks 一个事务一旦开始释放锁，就不能再加锁了。

事务两个阶段的分界线(lock point), 即获得了最后一个锁（完成获得这个动作）的时间点。这样每个事务都有一个 lock point, 按照这个时间排序即可得到串行化的执行顺序。

Two-Phase Locking Protocol assures serializability.

It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock).

基本的两阶段封锁协议无法保证事务的可恢复性。

Extensions to basic two-phase locking（基本两阶段封锁） needed to ensure recoverability of freedom from cascading roll-back

Strict two-phase locking（严格两阶段封锁）: a transaction must hold all its exclusive locks till it commits/aborts. Ensures recoverability and avoids cascading roll-backs. S 锁可以用完就放，但 X 锁必须到提交的时候才能释放（这样别人就不能访问了，无法读脏数据）。代价是降低并发度。
Rigorous two-phase locking（强两阶段封锁）: a transaction must hold all locks till commit/abort. Transactions can be serialized in the order in which they commit.

Two-phase locking is not a necessary condition for serializability. 两阶段封锁协议是冲突可串行化的充分条件

Example

14.1.2 2PL - Proof¶

Proof by Contradiction¶

断言：如果有 \(T_i\rightarrow T_j\) 的有向边，那 \(T_i\) 的 lockpoint 一定小于 \(T_j\).

证明：如果有 \(T_i\rightarrow T_j\) 的有向边，那么\(T_i\rightarrow T_j\) 肯定有一个冲突的访问（对同一个数据）那 \(T_j\) 在获得锁的时候 \(T_i\) 已经放锁了，得证。

所以如果不满足事务可串行化，那么前驱图必然存在环，从而 lock point 的先后顺序就会出现问题

Proof by Induction¶

只需证明: Lock point 最小的事务，可以无障碍地交换到调度最前。剩下的就可以根据递归法，说明该调度可以根据 lock point 的顺序串行化。

证明：设事务 A 是现在 lock point 最早的那一个，假如有事务B拦住他了，那么说明事务 B 中有一个操作 \(OP_j\) 与事务 A 中的操作 \(OP_i\) 对同一个数据进行操作，并且 \(OP_j\) 比 \(OP_i\) 先。那么就可以说明事务 B 的 lock point 比事务 A 的 lock point 要早

14.1.3 Lock Conversions¶

通常是先读后修改。但我们不能先得 S 锁再释放后得 X 锁（违背了两阶段协议），也不能直接用 X 锁（降低并发度）。

Two-phase locking with lock conversions:

First Phase:
- can acquire a lock-S or lock-X on a data item
- can convert a lock-S to a lock-X (lock-upgrade)
Second Phase:
- can release a lock-S or lock-X
- can convert a lock-X to a lock-S (lock-downgrade)

This protocol assures serializability.

A transaction \(T_i\) issues the standard read/write instruction, without explicit locking calls.

比如说 read(D) 操作申请锁的步骤如下

 if Ti has a lock on D
     then
        read(D) 
     else begin 
        if necessary wait until no other  
            transaction has a lock-X on D
        grant Ti a  lock-S on D;
        read(D)
     end

write(D) 操作申请锁的步骤如下

 if Ti has a  lock-X on D 
        then 
          write(D)
       else begin
            if necessary wait until no other trans. has any lock on D,
            if Ti has a lock-S on D
                 then
                    upgrade lock on D to lock-X
            else
                 grant Ti a lock-X on D
                 write(D)
         end;

All locks are released after commit or abort

14.2 Implementation of Locking¶

A lock manager can be implemented as a separate process to which transactions send lock and unlock requests.

14.2.1 Lock Table¶

Lock table records granted locks and waiting requests.

每个记录的 id 可以放进哈希表。如这里记录 123, T1、T8 获得了 S 锁，但 T2 在等待获得 X 锁。

T1: lock-X(D) 通过 D 的 id 找到哈希表上的项，在对应项上增加。根据是否相容决定是获得锁还是等待。 unlock 类似，先找到对应的数据，拿掉对应的项。同时看后续的项是否可以获得锁。

如果一个事务 commit, 需要放掉所有的锁，我们需要去找。因此我们还需要一个事务的表，标明每个事务所用的锁。

14.2.2 Deadlock Handling¶

System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set.

Two-phase locking does not ensure freedom from deadlocks. 两阶段封锁协议并不能避免死锁问题

Example

Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention strategies:

(1) Require that each transaction locks all its data items before it begins execution (predeclaration). 执行前一次性获得所有锁。

(2) Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order (graph-based protocol).

对数据访问规定一种次序。e.g. T1: A-50, B+50. T2: B-10, A+10. 我们可以把第二个事务调换顺序，变为 A+10, B-10. 这样按照 partial order 能降低死锁概率。

(3) Timeout-Based Schemes:

a transaction waits for a lock only for a specified amount of time. After that, the wait times out and the transaction is rolled back. 等待一会，如果还是等不到就放弃。
thus deadlocks are not possible.
simple to implement; but starvation is possible. Also difficult to determine good value of the timeout interval. 时长不好规定。但可能有事务老是申请不到自己的锁。

(4) Wait-Die 方案（非抢占式）

较老的事务可以等待较新的事务释放数据项，而较新的事务如果需要等待较老的事务，则会被回滚。

如果一个较老的事务 \(T_{old}\) 需要访问一个被较新的事务 \(T_{young}\) 持有的数据项，\(T_{old}\) 可以选择等待 \(T_{young}\) 完成并释放该数据项。
相反，如果一个较新的事务 \(T_{young}\) 需要访问一个被较老的事务 \(T_{old}\) 持有的数据项，那么 \(T_{young}\) 将被立即回滚（“die”），而不是等待 \(T_{old}\) 释放资源。

潜在问题：在这种机制下，一个事务可能需要多次尝试才能成功获取所需的数据项，因为它可能会被反复回滚。

(5) Wound-Wait 方案（抢占式）

较老的事务可以直接“伤害”（强制回滚）较新的事务，而不是等待它释放资源；而较新的事务则可以等待较老的事务。

如果一个较老的事务 \(T_{old}\) 需要访问一个被较新的事务 \(T_{young}\) 持有的数据项， \(T_{old}\) 可以直接“伤害” \(T_{young}\) ，即强制 \(T_{young}\) 回滚，并重新开始执行。
如果一个较新的事务 \(T_{young}\) 需要访问一个被较老的事务 \(T_{old}\) 持有的数据项，\(T_{young}\) 则可以选择等待 \(T_{old}\) 释放资源。

与 wait-die 方案相比，这种方法可能减少回滚的次数，因为较老的事务可以直接抢占资源。

14.2.3 Deadlock Detection¶

定期检查数据库内是否有死锁，如果有就选择一个事务将其回滚。

wait-for graph¶

这里的箭头表示在等待锁。如 T17->T18 表示 T17 在等待 T18 的锁。如果形成了环，就说明出现了死锁。

通过刚刚的 Lock Table, 我们可以得到等待关系。（后面的 waited 等待前面的 granted）

Warning

注意其与 conflict serializable 检测前驱图的区别

When deadlock is detected :

(1) Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost.

(2) Rollback -- determine how far to roll back transaction

Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to break deadlock. Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation

14.2.4 Graph-Based Protocols¶

假设我们知道数据是按偏序访问的，可以有更高级的协议。如果存在 \(d_i→d_j\) 的关系，则任何同时访问 \(d_i\) 和 \(d_j\) 的事务必须先访问 \(d_i\) 再访问 \(d_j\)。

The tree-protocol is a simple kind of graph protocol.

Only exclusive locks are allowed. 只有这种锁。
The first lock by \(T_i\) may be on any data item. Subsequently, a data Q can be locked by \(T_i\) only if the parent of Q is currently locked by \(T_i\). 第一个锁可以放任意地方，后面的锁只能在父节点锁住时才能往下锁。
Data items may be unlocked at any time.
A data item that has been locked and unlocked by \(T_i\) cannot subsequently be relocked by \(T_i\) 放了之后不能再加锁了。

Example

比如这里我们先锁 D, 随后锁 G, 放掉, 锁 H, 这时 D 已经没用了可以放掉。随后我们锁 J, H 也就没用了也可以放掉。最后放掉 J.

The tree protocol ensures conflict serializability as well as freedom from deadlock.

Advantages

Unlocking may occur earlier in the tree-locking protocol than in the two-phase locking protocol. shorter waiting times, and increase in concurrency 锁可以更早释放，不用等待第二阶段。用完就可以放，提高了并发度。
protocol is deadlock-free, no rollbacks are required

Disadvantages

Protocol does not guarantee recoverability or cascade freedom Need to introduce commit dependencies to ensure recoverability 早放锁，意味着可能会读脏数据，不可恢复。这就对 commit 顺序有要求。
Transactions may have to lock more data items than needed.
increased locking overhead, and additional waiting time 比如刚刚的图中，我们访问 G, J, 需要从 D 开始访问。会锁上更多数据。
potential decrease in concurrency

14.3 Multiple Granularity¶

可以锁在记录上(如 update table set ...;)，也可以锁在整个表上(如 select * from table;)。

Granularity of locking (level in tree where locking is done):

fine granularity（细粒度） (lower in tree): high concurrency, high locking overhead
coarse granularity（粗粒度） (higher in tree): low locking overhead, low concurrency

The levels, starting from the coarsest (top) level are

database
area
File(table)
record

14.3.1 Intention Lock Modes¶

记录和表上都可以加 S/X 锁。但是当事务涉及到多个粒度，如何判断是否冲突，如一个表的 S 锁和一个记录的 X 锁是冲突的。

我们引入了其他锁，意向锁(IS, IX, SIX)

如果一个事务要给一个记录加 S 锁，那也要在表上加 IS 锁。（意向共享锁）
如果一个事务要给一个记录加 X 锁，那也要在表上加 IX 锁。（意向排他锁）
SIX 锁是 S 和 IX 锁的结合。要读整个表，但可能对其中某些记录进行修改。（共享意向排他）

这样当我们想向一个表上 S 锁时，发现表上有 IX 锁，这样我们很快就发现了冲突，需要等待。 IS 和 IX 是不冲突的。在表上是不冲突的，可能在记录上冲突（即对一个记录又读又写，冲突发生在记录层面而非表）。

intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks. 在下面会加 S 锁。
intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks 在下面会加 X 锁。
shared and intention-exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive-mode locks.

Example

锁是从上往下，放锁是从下往上。先对 DB 加 IX, 对 A1 加 IX, 对 Fa 加 SIX, 对某些记录加 X. 其他记录就不用再加 S 锁了(因为表是 SIX).

14.4 Insert and Delete Operations¶

数据库里除了 R/W 还有插入、删除等操作。

需要定义 R/W 和插入/删除是否冲突。

If two-phase locking is used :

A delete operation may be performed only if the transaction deleting the tuple has an exclusive lock on the tuple to be deleted. 只有当事务对要删除的元组拥有排他锁（exclusive lock）时，才能执行删除操作。
A transaction that inserts a new tuple into the database is given an X-mode lock on the tuple 插入之前是没有这个数据的，无法先加锁。当一个事务向数据库中插入新元组时，它会获得对该元组的X模式锁（X-mode lock）。

Insertions and deletions can lead to the phantom phenomenon. 因此只是加锁不能保证串行化。

Index Locking Protocol¶

每个关系必须至少有一个索引，并且事务只能通过索引访问元组。在执行查询（查找）操作时，事务需要对所有访问到的索引叶节点加S模式锁（共享锁），即使这些叶节点不包含满足查询条件的元组。对于插入、更新或删除操作，事务必须更新所有相关的索引，并对受影响的索引叶节点加X模式锁（排他锁）。在整个事务过程中，还需要严格遵守两阶段锁定协议，以确保并发控制的正确性。通过这种方式，索引锁定协议可以有效避免幻影现象的发生

14.5 Multi-version Schemes¶

多版本并发控制方案通过保留数据项的旧版本来提高并发性。每当写操作成功时，就会创建该数据项的新版本，并使用时间戳标记这些版本。当发出读取（Q）操作时，会根据事务的时间戳选择合适的Q版本，并返回所选版本的值。只读事务无需等待，因为每次读操作都会立即返回一个合适的版本。

例如，对于数据项Q，它可能有多个版本：、和，其中数字代表时间戳。如果一个事务在时间戳为2时进行读取操作，那么它将读取到版本的数据。这样可以确保不同事务之间不会相互干扰，从而提高系统的并发处理能力。

Differentiates between read-only transactions and update transactions

SET TRANSACTION READ ONLY;
SET TRANSACTION READ WRITE;

SET TRANSACTION READ ONLY: 声明当前事务为只读事务。这意味着在这个事务中的所有操作只能是读取数据，不允许进行任何修改（如插入、更新或删除）。

SET TRANSACTION READ WRITE: 声明当前事务为读写事务。这意味着在这个事务中可以执行读取和写入（包括插入、更新、删除）操作

更新事务（Update Transactions）

获取读写锁：更新事务需要获取数据项的读锁和写锁。
保持锁直到事务结束：所有获取的锁会一直持有到事务结束，遵循强的两阶段锁定协议。这意味着在事务执行过程中，不会释放任何已获得的锁。
每次成功写操作创建新版本：每当一个数据项被成功写入时，就会生成该数据项的一个新版本。
时间戳管理：每个数据项版本都有一个唯一的时间戳，该时间戳值来自一个在提交处理期间递增的计数器（ts-counter）。

只读事务（Read-only Transactions）

分配时间戳：只读事务在开始执行前，通过读取当前的ts-counter值来分配一个时间戳。
读取合适版本的数据：当只读事务T_i发出读取请求read(Q)时，返回的是时间戳小于或等于TS(\(T_i\))的最大版本的数据内容。

MVCC（多版本并发控制）在实现过程中会面临一些问题，主要包括存储开销和垃圾回收。

存储开销

额外的元组：为了支持多个版本的数据，系统需要创建额外的元组来存储不同版本的数据。这会增加存储空间的需求。
每个元组中的额外空间：除了数据本身，每个元组还需要额外的空间来存储版本信息，如时间戳等。这些额外的信息增加了每个元组的大小，进一步增加了存储开销。

垃圾回收

由于MVCC会产生大量的旧版本数据，如果不进行有效的管理，这些旧版本数据会占用大量的存储空间。因此，需要对过期的、不再需要的版本数据进行垃圾回收。

保留最年轻的版本Qk：在所有版本中，保留那些时间戳小于或等于系统中最老的只读事务的时间戳的最年轻版本Qk。这个规则确保了当前活跃的只读事务能够访问到它们需要的数据版本。
删除其他旧版本：除了保留下来的最年轻版本Qk之外，所有比Qk更旧的版本都可以被安全地删除。这样可以释放存储空间，减少存储开销。

Example

Q: \<Q1, 1>, \<Q2, 2>, \<Q3, 3>, \<Q4, 4>
D: \<D1, 1>, \<D3, 3>

如果系统中最老的只读事务的时间戳是3，那么根据上述规则：

最年轻的版本Qk为\<Q3, 3>，因为它的时间戳3小于或等于最老只读事务的时间戳3。
其他旧版本如\<Q1, 1>和\<Q2, 2>可以被标记为垃圾并进行回收。

评论区

对你有帮助的话请给我个赞和 star =>

欢迎跟我探讨！！！