Chapter 13 Transaction¶
Info
13.1 Transaction Concept¶
A transaction is a unit of program execution that accesses and possibly updates various data items. 一段原子性的操作
transaction to transfer $50 from account A to account B
ACID Properties
- Atomicity(原子性) 全有或全无,由数据库恢复功能保证
- Consistency(一致性) 保证数据库内的内容正确性,与实际业务相符。如转账是一个人余额减少一个人增加。 consistency 与开发人员有关系(事务设计是否合理)
- Isolation(隔离性) 事务并发执行,但是相互隔离,好像是串行执行一样。 由数据库的并发执行来实现
- Durability(持久性) 事务提交后被缓存,掉电不能失去 buffer 里的内容。
13.2 A Simple Transaction Model¶
这个模型中,把事务对数据库的修改简化为读写两种操作。 Transactions access data using two operations:
- read(X), which transfers the data item X from the database to a variable, also called X, in a work area in main memory belonging to the transaction that executed the read operation.
- write(X), which transfers the value in the variable X in the main memory work area of the transaction that executed the write operation to the data item X in database.
在这个简单的模型中,我们不考虑数据读到工作区域后发生了什么操作,只考虑读写。
Example of Fund Transfer
Atomicity requirement:如果执行结束之后出现了问题,数据库应该要撤销之前的操作
Durability requirement:如果事务结束了,我们就把更新同步
Consistency requirement
- Explicitly(显式) specified integrity constraints e.g. primary keys , foreign keys 数据库把这个定义放在内部,会自己维护
- Implicit (隐式) integrity constraints e.g. sum of balances of all accounts minus sum of loan amounts must equal value of cash-in-hand
Isolation requirement 如果在 step 3 6 之间,另一个事务可以访问这个被部分更新的数据库,A+B 就会小于正确答案。这是因为破坏了隔离性。
Isolation can be ensured trivially by running transactions serially(串行)
Transaction State¶
Active – the initial state; the transaction stays in this state while it is executing
Partially committed – after the final statement has been executed. 语句执行完了,准备提交。能否提交取决于具体的执行。
Failed -- after the discovery that normal execution can no longer proceed. 不能正常提交。或者是执行过程中发现问题。
Aborted – after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted:
- restart the transaction
- kill the transaction
Committed – after successful completion.
13.3 Concurrent Executions¶
- increased processor and disk utilization
- reduced average response time
事务是并发执行的,如果不加以控制可能会有以下问题
Anomalies in Concurrent Executions
- Lost Update(丢失修改)
一个人订票后,另一个人读到这里第一个人还没有修改的余量。导致丢失了一次修改。
- Dirty Read(读脏数据)
一个人订票后,另一个人读数据后,但是第一个人放弃了,但是第二个人仍然是用的脏数据。
- Unrepeatable Read (不可重复读)
Isolation 要求我们读到的数据应该是一样的。
- Phantom Problem(幽灵问题 )
unrepeatable 是针对已经存在的数据,但是数据的值不同. Phantom 是指数据数量会变多/减少。
13.3.1 Schedules¶
Schedule – a sequences of instructions that specify the chronological order in which instructions of concurrent transactions are executed.
- a schedule for a set of transactions must consist of all instructions of those transactions
- must preserve the order in which the instructions appear in each individual transaction. 单个事务的指令执行顺序要保证
A transaction that successfully completes its execution will have a commit instructions as the last statement
- by default transaction assumed to execute commit instruction as its last step
A transaction that fails to successfully complete its execution will have an abort instruction as the last statement
Schedule Example
- 串行调度,串行调度一定是满足隔离性的
- 非串行调度,但等价于上面的串行调度
这里 T2 的 readA 和 T1 的 readB 可以调换时间次序,就得到了刚刚的串行调度。
下面这样的调度就不等价,破坏了隔离性。
13.3.2 Serializability¶
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
- conflict serializability (冲突可串行化)
- view serializability (视图可串行化)
Conflict Serializability¶
Instructions \(l_i\) and \(l_j\) of transactions \(T_i\) and \(T_j\) respectively, conflict if and only if there exists some item Q accessed by both \(l_i\) and \(l_j\), and at least one of these instructions wrote Q.
-
\(l_i = \text{read}(Q)\), \(l_j = \text{read}(Q)\). \(l_i\) and \(l_j\) don't conflict.
-
\(l_i = \text{read}(Q)\), \(l_j = \text{write}(Q)\). They conflict.
-
\(l_i = \text{write}(Q)\), \(l_j = \text{read}(Q)\). They conflict.
-
\(l_i = \text{write}(Q)\), \(l_j = \text{write}(Q)\). They conflict.
Intuitively, a conflict between \(l_i\) and \(l_j\) forces a (logical) temporal order between them.
- If \(l_i\) and \(l_j\) are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. 交换不冲突的指令,得到的是冲突等价的调度。
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. 冲突等价于一个串行调度,那么这个调度是可串行的。
Testing for Serializability¶
Consider some schedule of a set of transactions \(T_1,T_2,…\),Tn Precedence graph(前驱图) — a directed graph where the vertices are the transactions (names).
如果 T1 要在 T2 前面(即找到一条 T1 的指令要求比 T2 中的一条指令先执行),那我们画一条从 T1->T2 的边。
如果找到环,说明是不可串行化的。否则可以利用拓扑排序。
只用于理论研究,数据库内不会这样实现。【太复杂了】
View Serializability¶
Let S and S' be two schedules with the same set of transactions. S and S' are view equivalent if the following three conditions are met, for each data item Q,
-
If in schedule S, transaction \(T_i\) reads the initial value of Q, then in schedule S' also transaction \(T_i\) must read the initial value of Q. 初始值由同一个事务读到
-
If in schedule S transaction \(T_i\) executes
read(Q)
, and that value was produced by transaction \(T_j\) (if any), then in schedule S' also transaction \(T_i\) must read the value of Q that was produced by the samewrite(Q)
operation of transaction \(T_j\). 中间结果是由同一个事务得到再由另一个事务读出 -
The transaction (if any) that performs the final
write(Q)
operation in schedule S must also perform the finalwrite(Q)
operation in schedule S'. 最终写回数据库也是由同一个事务写
A schedule S is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable.
Example
Below is a schedule which is view-serializable but not conflict serializable.
T27 – T28 – T29
Other Notions of Serializability¶
有的调度既不是冲突可串行化又不是视图可串行化,但它是可串行化的。
13.4 Recoverable Schedules¶
Recoverable schedule(可恢复调度) — if a transaction \(T_j\) reads a data item previously written by a transaction \(T_i\) , then the commit operation of \(T_i\) appears before the commit operation of \(T_j\).
如果 \(T_j\) 读了 \(T_i\) 写的结果,那么 \(T_i\) 要比 \(T_j\) 先提交
Example
The following schedule (Schedule 11) is not recoverable if T9 commits immediately after the read. 可能会读脏数据
如果 T8 后续回滚, 但 T9 已经基于脏数据做了后续操作,而且已经提交了,不可恢复。
如果一个事务读了另一个事务的脏数据,提交次序需要有约束,要在被读事务的后面提交。
Cascading Rollbacks¶
Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)
要有级联回滚的恢复。Can lead to the undoing of a significant amount of work.
我们更希望用非级联的恢复,否则开销太大。
13.5 Transaction Isolation Levels¶
A database must provide a mechanism that will ensure that all possible schedules are
- either conflict or view serializable, and 保证可串行的
- are recoverable and preferably cascadeless 保证可恢复的(最好是非级联)
数据库里提供一种协议,每个事务要遵从协议,遵从协议下产生的调度一定是可串行、可恢复的。这是完全的隔离,代价比较高。
In SQL set transaction isolation level serializable
我们可以设置数据库的隔离级别。
- Serializable — default 四种问题都要避免,代价最高。
- Repeatable read — only committed records to be read, repeated reads of same record must return same value. However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others. 不管幽灵问题。
- Read committed — only committed records can be read, but successive reads of record may return different (but committed) values. 保证不读脏数据。
- Read uncommitted — even uncommitted records may be read. 最低的隔离级别,有些数据库只是做统计任务。
Lower degrees of consistency useful for gathering approximate information about the database
13.6 Concurrency Control Protocols¶
Lock-Based Protocols
- Lock on whole database vs lock on items 读之前要访问一个共享锁,写之前要访问一个排他锁,冲突了就要等待。通过锁就规定了一个执行的次序。
- How long to hold lock?
- Shared vs exclusive locks
Timestamp-Based Protocols
- Transaction timestamp assigned e.g. when a transaction begins 事务执行时分配一个时间戳。执行次序按照时间戳排序。
- Data items store two timestamps
- Read timestamp
- Write timestamp
- Timestamps are used to detect out of order accesses
Validation-Based Protocols
- Optimistic concurrency control
- Low rate of conflicts among transactions
- Each transaction must go through 3 phases: Read phase -> Validation phase -> Write phase 事务提交的时候先去验证是否有冲突,如果没有冲突就提交,如果冲突就考虑放弃某个。