Chapter 2 The Relation Model¶

文本统计：约 1998 个字

2.1 Structure of Relational Databases¶

Concept¶

Formally, given sets \(D_1, D_2, \ldots, D_n\), a relation \(r\) is a subset of \(D_1 \times D_2 \times \ldots \times D_n\). Thus, a relation is a set of \(n\)-tuples \((a_1, a_2, \ldots, a_n)\) where each \(a_i \in D_i\).

Example

name = {Wu, Mozart, Gold, Singh, ...} /set of all instructor names /
dept_name = {Music, Physics, Finance, ...} / set of all department names /
salary = {40000, 80000, 87000, 90000, ...} / set of all salary /

Then

\[ r = \{ (\text{Wu, Finance}, 90000), \\ (\text{Mozart, Music}, 40000), \\ (\text{Gold, Physics}, 87000), \\ (\text{Singh, Finance}, 80000) \} \]

is a relation over \(\text{name} \times \text{dept}\_\text{name} \times \text{salary}\)

\(A_1, A_2, \ldots, A_n\) are attributes 属性，\(R = (A_1, A_2, \ldots, A_n)\) is a relation schema 关系模式, e.g. instructor = (ID, name, dept_name, salary)

A relation instance \(r\) defined over schema \(R\) is denoted by \(r(R)\) and the current values a relation are specified by a table.

因为关系是一个集合，所以关系都是无序的。

Attributes¶

The set of allowed values for each attribute is called the domain (域).

Attribute values are (normally) required to be atomic (原子的); that is, indivisible.

The special value null (空值) is a member of every domain. The null value causes complications in the definition of many operations.

2.2 Database Schema¶

Database schema -- is the logical structure of the database.

Database instance -- is a snapshot of the data in the database at a given instant in time.

Example

schema: instructor (ID, name, dept_name, salary)

Instance:

2.3 Keys¶

Let \(K \subseteq R\)

\(K\) is a Super-key (超键) of \(R\) if values for \(K\) are sufficient to identify a unique tuple of each possible relation \(r(R)\)

Super-key example

\(\{ID\}\) and \(\{ID, \text{name}\}\) are both superkeys of instructor.

Super-key \(K\) is a candidate key (候选键) if \(K\) is minimal.

candidate key example

\(\{\text{ID}\}\) is a candidate key for Instructor.

One of the candidate keys is selected to be the primary key (主键).

Foreign key（外键）constraint from attribute(s) A of relation \(r_1\) to the primary key B of relation \(r_2\) states that on any database instance, the value of A for each tuple in \(r_1\) must also be the value of B for some tuple in \(r_2\).

Example

Referential integrity（参照完整性）constraint requires that the values appearing in specified attribute(s) A of any tuples in the referencing relation \(r_1\) also appear in specified attribute(s) B of at least one tuple in the referenced relation \(r_2\).

Example

A database consists of multiple relations, Information about an enterprise is broken up into parts，下图代表了学校数据库的模式图

2.4 Relational Algebra¶

Relational Query Languages 有两种不同的类型 Procedural vs.non-procedural, or declarative 过程性/非过程性（陈述性）

"Pure" languages:

Relational algebra(关系代数)
Tuple relational calculus（元组关系演算）
Domain relational calculus（域关系演算）

The above 3 pure languages are equivalent in computing power, We will concentrate on relational algebra

A procedural language consisting of a set of operations that take one or two relations as input and produce a new relation as their result.

Six basic operators

select: \(\sigma\)
project: \(\prod\)
union: \(\cup\)
set difference: \(-\)
Cartesian product: \(\times\)
rename: \(\rho\)

2.4.1 Select Operation¶

The select operation selects tuples that satisfy a given predicate.

notation: \(σₚ(r)\), \(p\) is called the selection predicate 选择谓词

Defined as:

\[ \sigma_p(r) = \{ t \mid t \in r \text{ and } p(t) \} \]

Where: \(p\) is a formula in propositional calculus consisting of terms connected by: \(∧\) (and), \(∨\) (or), \(¬\) (not)

Each term is one of:

\[ \begin{aligned} <\text{attribute}> op <\text{attribute}>\\ <\text{attribute}> op <\text{constant}> \end{aligned} \]

where \(op\) is one of: \(=, ≠, >, ≥, <, ≤\)

Example of Selection

\[ \sigma_{{\text{dept}\_\text{name}={\text{"Physics"}}}}(\text{instructor}) \]

\[ \sigma_{\text{salary} > 90000}(\text{instructor}) \]

\[ \sigma_{{\text{dept}\_\text{name}={\text{"Physics"} \land \text{salary} > 90000}}}(\text{instructor}) \]

2.4.2 Project Operation¶

The project operation is a unary operation that returns its argument relation, with certain attributes left out.

Notation: \(\Pi_{A_1, A_2, \ldots, A_k}(r)\), where \(A_1, A_2\) are attribute names and \(r\) is a relation name.

The result is defined as the relation of \(k\) columns obtained by erasing the columns that are not listed. Duplicate rows removed from result, since relations are sets.

Example

To eliminate the dept_name attribute of instructor

\[ \Pi_{ID, name, salary}(\text{instructor}) \]

2.4.3 Union Operation¶

The union operation allows us to combine two relations.

Notation: \(r \cup s\)

Defined as:

\[ r \cup s = \{ t \mid t \in r \text{ or } t \in s \} \]

For \(r \cup s\) to be valid:

\(r\), \(s\) must have the same arity (元数) (same number of attributes).属性个数要相同
The attribute domains must be compatible (example: 2nd column of \(r\) deals with the same type of values as does the 2nd column of \(s\)).

Example

To find all courses taught in the Fall 2009 semester, or in the Spring 2010 semester, or in both:

\[ \Pi_{\text{course_id}} \left( \sigma_{\text{semester}=\text{"Fall"} \land \text{year}=2009} (\text{section}) \right) \cup \Pi_{\text{course_id}} \left( \sigma_{\text{semester}=\text{"Spring"} \land \text{year}=2010} (\text{section}) \right) \]

2.4.4 Set difference¶

The set-difference operation allows us to find tuples that are in one relation but are not in another.

Notation: \(r - s\)

Definition:

\[ r - s = \{ t \mid t \in r \text{ and } t \notin s \} \]

Conditions for Set-Difference:

\(r\) and \(s\) must have the same arity.
Attribute domains of \(r\) and \(s\) must be compatible.

Set Difference Operation

To find all courses taught in the Fall 2009 semester, but not in the Spring 2010 semester:

\[ \Pi_{\text{course}\_\text{id}} \left( \sigma_{\text{semester}=\text{"Fall"} \land \text{year}=2009} (\text{section}) \right) - \Pi_{\text{course}\_\text{id}} \left( \sigma_{\text{semester}=\text{"Spring"} \land \text{year}=2010} (\text{section}) \right) \]

2.4.5 Cartesian-Product Operation¶

The Cartesian-product operation (denoted by \(\times\)) allows us to combine information from any two relations.

Notation: \(r \times s\)

Definition: \(r \times s = \{ t \, q \mid t \in r \text{ and } q \in s \}\)

Assumptions:

Assume that attributes of \(r(R)\) and \(s(S)\) are disjoint. (That is, \(R \cap S = \emptyset\)).
If attributes of \(r(R)\) and \(s(S)\) are not disjoint, then renaming must be used.

2.4.6 Rename Operation¶

Allows us to name, and therefore to refer to, the results of relational-algebra expressions.

Allows us to refer to a relation by more than one name.

Definition:

\[ \rho_X(E) \]

returns the expression \(E\) under the name \(X\).

If a relational-algebra expression \(E\) has arity \(n\), then

\[ \rho_{X(A_1, A_2, \ldots, A_n)}(E) \]

returns the result of expression \(E\) under the name \(X\), and with the attributes renamed to \(A_1, A_2, \ldots, A_n\). 对属性进行重命名

Basic Operation

Example 1: Find the names of all instructors in the Physics department, along with the course_id of all courses they have taught

Query 1

\[ \Pi_{\text{instructor.name}, \text{course}\_\text{id}} \left( \sigma_{\text{dept}\_\text{name}=\text{"Physics"}} \left( \sigma_{\text{instructor.ID}=\text{teaches.ID}} (\text{instructor} \times \text{teaches}) \right) \right) \]

Query 2

\[ \Pi_{\text{instructor.name},\text{course}\_\text{id}} \left( \sigma_{\text{instructor.ID}=\text{teaches.ID}} \left( \sigma_{\text{dept}\_\text{name}=\text{"Physics"}} (\text{instructor} \times \text{teaches}) \right) \right) \]

Example 2: Find the largest salary in the university

Step 1: Find instructor salaries that are less than some other instructor salary (i.e., not maximum) - Using a copy of instructor under a new name d

\[ \Pi_{\text{instructor.salary}} \left( \sigma_{\text{instructor.salary} < \text{d.salary}} (\text{instructor} \times \rho_d(\text{instructor})) \right) \]

Step 2: Find the largest salary

\[ \Pi_{\text{salary}} (\text{instructor}) - \Pi_{\text{instructor.salary}} \left( \sigma_{\text{instructor.salary} < \text{d.salary}} (\text{instructor} \times \rho_d(\text{instructor})) \right) \]

2.4.7 Additional Operations¶

We define additional operations that do not add any power to the relational algebra, but that simplify common queries.

(1) Set Intersection: \(r \cap s\)

(2) Natural Join: \(r \bowtie s\)

(3) Semi-Join: \(r \ltimes_{\theta} s\)

(4) Assignment: \(\leftarrow\)

(5) Outer Join:

Left Outer Join: \(r ⟕ s\)
Right Outer Join: \(r ⟖ s\)
Full Outer Join: \(r ⟗ s\)

(6) Division Operator: \(r \div s\)

The set-intersection operation allows us to find tuples that are in both input relations.

Set Intersection¶

Definition:

\[ r \cap s = \{ t \mid t \in r \text{ and } t \in s \} \]

Assumptions:

\(r\), \(s\) have the same arity.
Attributes of \(r\) and \(s\) are compatible.

Note: \(r \cap s = r - (r - s)\)

Set Intersection

Natural Join¶

Definition:

Let \(r\) and \(s\) be relations on schemas \(R\) and \(S\) respectively.

Then, \(r \bowtie s\) is a relation on schema \(R \cup S\) obtained as follows:

Consider each pair of tuples \(t_r\) from \(r\) and \(t_s\) from \(s\).

If \(t_r\) and \(t_s\) have the same value on each of the attributes in \(R \cap S\), add a tuple \(t\) to the result, where - \(t\) has the same value as \(t_r\) on \(r\) - \(t\) has the same value as \(t_s\) on \(s\)

Natural Join 有交换律与结合律，然后还引出一个更广的操作为 theta 操作，记为 \(\bowtie_\theta\)，定义为

\[ r\bowtie_\theta s = \sigma_\theta(r\times s) \]

Natural join

\(R = (A, B, C, D)\)

\(S = (E, B, D)\)

Result schema = \((A, B, C, D, E)\)

\(r \bowtie s = \Pi_{r.A, r.B, r.C, r.D, s.E} (\sigma_{r.B = s.B \land r.D = s.D} (r \times s))\)

Outer Join¶

An extension of the join operation that avoids loss of information.

Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.

Uses null values:

null signifies that the value is unknown or does not exist
All comparisons involving null are (roughly speaking) false by definition. We shall study precise meaning of comparisons with nulls later

Outer Join

JoinLeft Outer JoinRight Outer JoinFull Outer Join

Outer Join using Joins

Left Outer Join (r ⟕ s) can be written as:

\[ (r \bowtie s) \cup (r - \Pi_R(r \bowtie s)) \times \{(null, ..., null)\} \]

Right Outer Join (r ⟖ s) can be written as:

\[ (r \bowtie s) \cup \{(null, ..., null)\} \times (s - \Pi_S(r \bowtie s)) \]

Full Outer Join (r ⟗ s) can be written as:

\[ (r \bowtie s) \cup (r - \Pi_R(r \bowtie s)) \times \{(null, ..., null)\} \cup \{(null, ..., null)\} \times (s - \Pi_S(r \bowtie s)) \]

Null Values

Comparisons with null values return the special truth value: unknown.

Three-Valued Logic using the Truth Value unknown:

(1) OR: - (unknown or true) = true - (unknown or false) = unknown - (unknown or unknown) = unknown

(2) AND: - (true and unknown) = unknown - (false and unknown) = false - (unknown and unknown) = unknown

(3) NOT: - (not unknown) = unknown

SQL Predicate: In SQL "P is unknown" evaluates to true if predicate P evaluates to unknown.

Result of Select Predicate: The result of a select predicate is treated as false if it evaluates to unknown.

Semi-join Operation¶

Notation: \(r \ltimes_{\theta} s\)

Definition: Is a subset of \(r\), in which every tuple \(r_i\) matches at least one tuple \(s_j\) in \(s\) under the condition \(\theta\).

\[ r \ltimes_{\theta} s = \Pi_R (\sigma_{\theta}(r \bowtie s)) \]

Assignment Operation¶

The assignment operation (\(\leftarrow\)) provides a convenient way to express complex queries.

(1) Write query as a sequential program consisting of

a series of assignments
followed by an expression whose value is displayed as a result of the query

(2) Assignment must always be made to a temporary relation variable.

Division Operation¶

Definition: Given relations \(r(R)\) and \(s(S)\), such that \(S \subseteq R\), \(r \div s\) is the largest relation \(t(R-S)\) such that

\[ t \times s \subseteq r \]

Example

Let \(r(\text{ID}, \text{course}\_id) = \Pi_{\text{ID}, \text{course}\_\text{id}}(\text{takes})\) and \(s(\text{course}\_\text{id}) = \Pi_{\text{course}\_\text{id}}(\sigma_{\text{dept}\_\text{name}=\text{Biology}}(\text{course}))\)

then \(r \div s\) gives us students who have taken all courses in the Biology department. 返回的结果是参加生物部门开设的所有课程的学生

Can write \(r \div s\) as:

\[ \begin{aligned} &\text{temp}1 \leftarrow \Pi_{R-S}(r)\\ &\text{temp}2 \leftarrow \Pi_{R-S}((\text{temp}1 \times s) - \Pi_{R-S,S}(r))\\ &result = \text{temp}1 - \text{temp}2\\ \end{aligned} \]

Notes:

The result to the right of the \(\leftarrow\) is assigned to the relation variable on the left of the \(\leftarrow\).
May use variable in subsequent expressions.

2.4.8 Generalized Projection¶

Extends the projection operation by allowing arithmetic functions to be used in the projection list. 广义的投影可以添加算数运算

\[ \Pi_{F_1, F_2, \ldots, F_n}(E) \]

\(E\) is any relational-algebra expression.
Each of \(F_1, F_2, \ldots, F_n\) are arithmetic expressions involving constants and attributes in the schema of \(E\).

Example

Given relation instructor(ID, name, dept_name, salary) where salary is annual salary, get the same information but with monthly salary.

\[ \Pi_{\text{ID}, \text{name}, \text{dept}\_\text{name}, \text{salary}/12}(instructor) \]

2.4.9 Aggregate Functions and Operations¶

Aggregation function takes a collection of values and returns a single value as a result.

avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values

Aggregate Operation in Relational Algebra

\[ _{G_1, G_2, \ldots, G_n} \; \mathcal{G} \; _{F_1(A_1), F_2(A_2), \ldots, F_n(A_n)} \; (E) \]

\(E\) is any relational-algebra expression.

\(G_1, G_2, \ldots, G_n\) is a list of attributes on which to group (can be empty). 指定哪些元素合在一起作为一个组

Each \(F_i\) is an aggregate function. 聚合方程

Each \(A_i\) is an attribute name. 元素的名字

Note

Some books/articles use \(\gamma\) instead of \(\mathcal{G}\) (Calligraphic G).

Example

将 dept_name 一样的作为一组进行合并，合并计算的元素为 salary

Result of aggregation does not have a name, Can use rename operation to give it a name. For convenience, we permit renaming as part of aggregate operation

\[ _{\text{dept}\_\text{name}}\mathcal G_{avg(\text{salary}) \text{ as } \text{avg}\_\text{sal} }(\text{instructor}) \]

2.4.10 Multi-set Relational Algebra¶

Pure relational algebra removes all duplicates e.g. after projection

Multiset (多重集) relational algebra retains duplicates, to match SQL semantics

SQL duplicate retention was initially for efficiency, but is now a feature

Multi-set relational algebra defined as follows

(1) selection: has as many duplicates of a tuple as in the input, if the tuple satisfies the selection

(2) projection: one tuple per input tuple, even if it is a duplicate

(3) cross product: If there are \(m\) copies of \(t_1\) in \(r\), and \(n\) copies of \(t_2\) in \(s\), there are \(m \times n\) copies of \(t_1.t_2\) \(r \times s\)

(4) set operators

union: \(m + n\) copies
intersection: \(\min(m, n)\) copies
difference: \(\min(0, m - n)\) copies

评论区

对你有帮助的话请给我个赞和 star =>

欢迎跟我探讨！！！