跳转至

Chapter 13 Transaction

文本统计:约 2346 个字 • 3 行代码

13.1 Transaction Concept

A transaction is a unit of program execution that accesses and possibly updates various data items. 一段原子性的操作

transaction to transfer $50 from account A to account B

update  account  set  balance=balance-50  where account_number=A;
update  account  set  balance=balance+50  where account_number=B;
commit;

ACID Properties

  • Atomicity(原子性) 全有或全无,由数据库恢复功能保证
  • Consistency(一致性) 保证数据库内的内容正确性,与实际业务相符。如转账是一个人余额减少一个人增加。 consistency 与开发人员有关系(事务设计是否合理)
  • Isolation(隔离性) 事务并发执行,但是相互隔离,好像是串行执行一样。 由数据库的并发执行来实现
  • Durability(持久性) 事务提交后被缓存,掉电不能失去 buffer 里的内容。

13.2 A Simple Transaction Model

这个模型中,把事务对数据库的修改简化为读写两种操作。 Transactions access data using two operations:

  • read(X), which transfers the data item X from the database to a variable, also called X, in a work area in main memory belonging to the transaction that executed the read operation.
  • write(X), which transfers the value in the variable X in the main memory work area of the transaction that executed the write operation to the data item X in database.

在这个简单的模型中,我们不考虑数据读到工作区域后发生了什么操作,只考虑读写。

Example of Fund Transfer

Atomicity requirement:如果执行结束之后出现了问题,数据库应该要撤销之前的操作

Durability requirement:如果事务结束了,我们就把更新同步

Consistency requirement

  • Explicitly(显式) specified integrity constraints e.g. primary keys , foreign keys 数据库把这个定义放在内部,会自己维护
  • Implicit (隐式) integrity constraints e.g. sum of balances of all accounts minus sum of loan amounts must equal value of cash-in-hand

Isolation requirement 如果在 step 3 6 之间,另一个事务可以访问这个被部分更新的数据库,A+B 就会小于正确答案。这是因为破坏了隔离性。

Isolation can be ensured trivially by running transactions serially(串行)

Transaction State

Active – the initial state; the transaction stays in this state while it is executing

Partially committed – after the final statement has been executed. 语句执行完了,准备提交。能否提交取决于具体的执行。

Failed -- after the discovery that normal execution can no longer proceed. 不能正常提交。或者是执行过程中发现问题。

Aborted – after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted:

  • restart the transaction
  • kill the transaction

Committed – after successful completion.

13.3 Concurrent Executions

  • increased processor and disk utilization
  • reduced average response time

事务是并发执行的,如果不加以控制可能会有以下问题

Anomalies in Concurrent Executions

  • Lost Update(丢失修改)

一个人订票后,另一个人读到这里第一个人还没有修改的余量。导致丢失了一次修改。

  • Dirty Read(读脏数据)

一个人订票后,另一个人读数据后,但是第一个人放弃了,但是第二个人仍然是用的脏数据。

  • Unrepeatable Read (不可重复读)

Isolation 要求我们读到的数据应该是一样的。

  • Phantom Problem(幽灵问题 )

unrepeatable 是针对已经存在的数据,但是数据的值不同. Phantom 是指数据数量会变多/减少。

13.3.1 Schedules

Schedule – a sequences of instructions that specify the chronological order in which instructions of concurrent transactions are executed.

  • a schedule for a set of transactions must consist of all instructions of those transactions
  • must preserve the order in which the instructions appear in each individual transaction. 单个事务的指令执行顺序要保证

A transaction that successfully completes its execution will have a commit instructions as the last statement

  • by default transaction assumed to execute commit instruction as its last step

A transaction that fails to successfully complete its execution will have an abort instruction as the last statement

Schedule Example

  • 串行调度,串行调度一定是满足隔离性的

  • 非串行调度,但等价于上面的串行调度

这里 T2 的 readA 和 T1 的 readB 可以调换时间次序,就得到了刚刚的串行调度。

下面这样的调度就不等价,破坏了隔离性。

13.3.2 Serializability

A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.

  • conflict serializability (冲突可串行化)
  • view serializability (视图可串行化)

Conflict Serializability

Instructions \(l_i\) and \(l_j\) of transactions \(T_i\) and \(T_j\) respectively, conflict if and only if there exists some item Q accessed by both \(l_i\) and \(l_j\), and at least one of these instructions wrote Q.

  1. \(l_i = \text{read}(Q)\), \(l_j = \text{read}(Q)\). \(l_i\) and \(l_j\) don't conflict.

  2. \(l_i = \text{read}(Q)\), \(l_j = \text{write}(Q)\). They conflict.

  3. \(l_i = \text{write}(Q)\), \(l_j = \text{read}(Q)\). They conflict.

  4. \(l_i = \text{write}(Q)\), \(l_j = \text{write}(Q)\). They conflict.

Intuitively, a conflict between \(l_i\) and \(l_j\) forces a (logical) temporal order between them.

  • If \(l_i\) and \(l_j\) are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.

If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. 交换不冲突的指令,得到的是冲突等价的调度。

We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. 冲突等价于一个串行调度,那么这个调度是可串行的。

Testing for Serializability

Consider some schedule of a set of transactions \(T_1,T_2,…\),Tn Precedence graph(前驱图) — a directed graph where the vertices are the transactions (names).

如果 T1 要在 T2 前面(即找到一条 T1 的指令要求比 T2 中的一条指令先执行),那我们画一条从 T1->T2 的边。

如果找到环,说明是不可串行化的。否则可以利用拓扑排序。

Example

T1 的 readY 和 T2 的 writeY 冲突,所以要画一条边,如此。 最后有 10 种调度方式。

只用于理论研究,数据库内不会这样实现。【太复杂了】

View Serializability

Let S and S' be two schedules with the same set of transactions. S and S' are view equivalent if the following three conditions are met, for each data item Q,

  1. If in schedule S, transaction \(T_i\) reads the initial value of Q, then in schedule S' also transaction \(T_i\) must read the initial value of Q. 初始值由同一个事务读到

  2. If in schedule S transaction \(T_i\) executes read(Q), and that value was produced by transaction \(T_j\) (if any), then in schedule S' also transaction \(T_i\) must read the value of Q that was produced by the same write(Q) operation of transaction \(T_j\). 中间结果是由同一个事务得到再由另一个事务读出

  3. The transaction (if any) that performs the final write(Q) operation in schedule S must also perform the final write(Q) operation in schedule S'. 最终写回数据库也是由同一个事务写

A schedule S is view serializable if it is view equivalent to a serial schedule.

Every conflict serializable schedule is also view serializable.

Example

Below is a schedule which is view-serializable but not conflict serializable.

T27 – T28 – T29

Other Notions of Serializability

有的调度既不是冲突可串行化又不是视图可串行化,但它是可串行化的。

Example

加减操作是可结合的,这里需要了解事务里具体是什么操作。但我们的简单模型对此不加以区分。这也很难在静态层面去判断是否可串行

13.4 Recoverable Schedules

Recoverable schedule(可恢复调度) — if a transaction \(T_j\) reads a data item previously written by a transaction \(T_i\) , then the commit operation of \(T_i\) appears before the commit operation of \(T_j\).

如果 \(T_j\) 读了 \(T_i\) 写的结果,那么 \(T_i\) 要比 \(T_j\) 先提交

Example

The following schedule (Schedule 11) is not recoverable if T9 commits immediately after the read. 可能会读脏数据

如果 T8 后续回滚, 但 T9 已经基于脏数据做了后续操作,而且已经提交了,不可恢复。

如果一个事务读了另一个事务的脏数据,提交次序需要有约束,要在被读事务的后面提交。

Cascading Rollbacks

Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)

Example

If T10 fails, T11 and T12 must also be rolled back.

要有级联回滚的恢复。Can lead to the undoing of a significant amount of work.

我们更希望用非级联的恢复,否则开销太大。

13.5 Transaction Isolation Levels

A database must provide a mechanism that will ensure that all possible schedules are

  • either conflict or view serializable, and 保证可串行的
  • are recoverable and preferably cascadeless 保证可恢复的(最好是非级联)

数据库里提供一种协议,每个事务要遵从协议,遵从协议下产生的调度一定是可串行、可恢复的。这是完全的隔离,代价比较高。

In SQL set transaction isolation level serializable 我们可以设置数据库的隔离级别。

  • Serializable — default 四种问题都要避免,代价最高。
  • Repeatable read — only committed records to be read, repeated reads of same record must return same value. However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others. 不管幽灵问题。
  • Read committed — only committed records can be read, but successive reads of record may return different (but committed) values. 保证不读脏数据。
  • Read uncommitted — even uncommitted records may be read. 最低的隔离级别,有些数据库只是做统计任务。

Lower degrees of consistency useful for gathering approximate information about the database

13.6 Concurrency Control Protocols

Lock-Based Protocols

  • Lock on whole database vs lock on items 读之前要访问一个共享锁,写之前要访问一个排他锁,冲突了就要等待。通过锁就规定了一个执行的次序。
  • How long to hold lock?
  • Shared vs exclusive locks

Timestamp-Based Protocols

  • Transaction timestamp assigned e.g. when a transaction begins 事务执行时分配一个时间戳。执行次序按照时间戳排序。
  • Data items store two timestamps
  • Read timestamp
  • Write timestamp
  • Timestamps are used to detect out of order accesses

Validation-Based Protocols

  • Optimistic concurrency control
  • Low rate of conflicts among transactions
  • Each transaction must go through 3 phases: Read phase -> Validation phase -> Write phase 事务提交的时候先去验证是否有冲突,如果没有冲突就提交,如果冲突就考虑放弃某个。

评论区

对你有帮助的话请给我个赞和 star => GitHub stars
欢迎跟我探讨!!!