Chapter 2 The Relation Model¶
2.1 Structure of Relational Databases¶
Concept¶
Formally, given sets \(D_1, D_2, \ldots, D_n\), a relation \(r\) is a subset of \(D_1 \times D_2 \times \ldots \times D_n\). Thus, a relation is a set of \(n\)-tuples \((a_1, a_2, \ldots, a_n)\) where each \(a_i \in D_i\).
Example
- name = {Wu, Mozart, Gold, Singh, ...} /set of all instructor names /
- dept_name = {Music, Physics, Finance, ...} / set of all department names /
- salary = {40000, 80000, 87000, 90000, ...} / set of all salary /
Then
is a relation over \(\text{name} \times \text{dept}\_\text{name} \times \text{salary}\)
\(A_1, A_2, \ldots, A_n\) are attributes 属性,\(R = (A_1, A_2, \ldots, A_n)\) is a relation schema 关系模式, e.g. instructor = (ID, name, dept_name, salary)
A relation instance \(r\) defined over schema \(R\) is denoted by \(r(R)\) and the current values a relation are specified by a table.
因为关系是一个集合,所以关系都是无序的。
Attributes¶
The set of allowed values for each attribute is called the domain (域).
Attribute values are (normally) required to be atomic (原子的); that is, indivisible.
The special value null (空值) is a member of every domain. The null value causes complications in the definition of many operations.
2.2 Database Schema¶
Database schema -- is the logical structure of the database.
Database instance -- is a snapshot of the data in the database at a given instant in time.
2.3 Keys¶
Let \(K \subseteq R\)
\(K\) is a Super-key (超键) of \(R\) if values for \(K\) are sufficient to identify a unique tuple of each possible relation \(r(R)\)
Super-key example
\(\{ID\}\) and \(\{ID, \text{name}\}\) are both superkeys of instructor
.
Super-key \(K\) is a candidate key (候选键) if \(K\) is minimal.
candidate key example
\(\{\text{ID}\}\) is a candidate key for Instructor
.
One of the candidate keys is selected to be the primary key (主键).
Foreign key(外键)constraint from attribute(s) A of relation \(r_1\) to the primary key B of relation \(r_2\) states that on any database instance, the value of A for each tuple in \(r_1\) must also be the value of B for some tuple in \(r_2\).
Referential integrity(参照完整性)constraint requires that the values appearing in specified attribute(s) A of any tuples in the referencing relation \(r_1\) also appear in specified attribute(s) B of at least one tuple in the referenced relation \(r_2\).
A database consists of multiple relations, Information about an enterprise is broken up into parts,下图代表了学校数据库的模式图
2.4 Relational Algebra¶
Relational Query Languages 有两种不同的类型 Procedural vs.non-procedural, or declarative 过程性/非过程性(陈述性)
"Pure" languages:
-
Relational algebra(关系代数)
-
Tuple relational calculus(元组关系演算)
-
Domain relational calculus(域关系演算)
The above 3 pure languages are equivalent in computing power, We will concentrate on relational algebra
A procedural language consisting of a set of operations that take one or two relations as input and produce a new relation as their result.
Six basic operators
-
select: \(\sigma\)
-
project: \(\prod\)
-
union: \(\cup\)
-
set difference: \(-\)
-
Cartesian product: \(\times\)
-
rename: \(\rho\)
2.4.1 Select Operation¶
The select operation selects tuples that satisfy a given predicate.
notation: \(σₚ(r)\), \(p\) is called the selection predicate 选择谓词
Defined as:
Where: \(p\) is a formula in propositional calculus consisting of terms connected by: \(∧\) (and), \(∨\) (or), \(¬\) (not)
Each term is one of:
where \(op\) is one of: \(=, ≠, >, ≥, <, ≤\)

Example of Selection
2.4.2 Project Operation¶
The project operation is a unary operation that returns its argument relation, with certain attributes left out.
Notation: \(\Pi_{A_1, A_2, \ldots, A_k}(r)\), where \(A_1, A_2\) are attribute names and \(r\) is a relation name.
The result is defined as the relation of \(k\) columns obtained by erasing the columns that are not listed. Duplicate rows removed from result, since relations are sets.
Example
To eliminate the dept_name
attribute of instructor
2.4.3 Union Operation¶
The union operation allows us to combine two relations.
Notation: \(r \cup s\)
Defined as:
For \(r \cup s\) to be valid: 1. \(r\), \(s\) must have the same arity (元数) (same number of attributes).属性个数要相同 2. The attribute domains must be compatible (example: 2nd column of \(r\) deals with the same type of values as does the 2nd column of \(s\)).

Example
To find all courses taught in the Fall 2009 semester, or in the Spring 2010 semester, or in both:
2.4.4 Set difference¶
The set-difference operation allows us to find tuples that are in one relation but are not in another.
Notation: \(r - s\)
Definition:
Conditions for Set-Difference:
-
\(r\) and \(s\) must have the same arity.
-
Attribute domains of \(r\) and \(s\) must be compatible.

Set Difference Operation
To find all courses taught in the Fall 2009 semester, but not in the Spring 2010 semester:
2.4.5 Cartesian-Product Operation¶
The Cartesian-product operation (denoted by \(\times\)) allows us to combine information from any two relations.
Notation: \(r \times s\)
Definition: \(r \times s = \{ t \, q \mid t \in r \text{ and } q \in s \}\)
Assumptions:
- Assume that attributes of \(r(R)\) and \(s(S)\) are disjoint. (That is, \(R \cap S = \emptyset\)).
- If attributes of \(r(R)\) and \(s(S)\) are not disjoint, then renaming must be used.

2.4.6 Rename Operation¶
Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
Allows us to refer to a relation by more than one name.
Definition:
returns the expression \(E\) under the name \(X\).
If a relational-algebra expression \(E\) has arity \(n\), then
returns the result of expression \(E\) under the name \(X\), and with the attributes renamed to \(A_1, A_2, \ldots, A_n\).
Basic Operation
Example 1: Find the names of all instructors in the Physics department, along with the course_id
of all courses they have taught
Query 1
Query 2
Example 2: Find the largest salary in the university
Step 1: Find instructor salaries that are less than some other instructor salary (i.e., not maximum)
- Using a copy of instructor
under a new name d
Step 2: Find the largest salary
2.4.7 Additional Operations¶
We define additional operations that do not add any power to the relational algebra, but that simplify common queries.
(1) Set Intersection: \(r \cap s\)
(2) Natural Join: \(r \bowtie s\)
(3) Semi-Join: \(r \ltimes_{\theta} s\)
(4) Assignment: \(\leftarrow\)
(5) Outer Join:
-
Left Outer Join: \(r ⟕ s\)
-
Right Outer Join: \(r ⟖ s\)
-
Full Outer Join: \(r ⟗ s\)
(6) Division Operator: \(r \div s\)
The set-intersection operation allows us to find tuples that are in both input relations.
Set Intersection¶
Definition:
Assumptions:
-
\(r\), \(s\) have the same arity.
-
Attributes of \(r\) and \(s\) are compatible.
Note: \(r \cap s = r - (r - s)\)
Natural Join¶
Definition:
Let \(r\) and \(s\) be relations on schemas \(R\) and \(S\) respectively.
Then, \(r \bowtie s\) is a relation on schema \(R \cup S\) obtained as follows:
Consider each pair of tuples \(t_r\) from \(r\) and \(t_s\) from \(s\).
If \(t_r\) and \(t_s\) have the same value on each of the attributes in \(R \cap S\), add a tuple \(t\) to the result, where - \(t\) has the same value as \(t_r\) on \(r\) - \(t\) has the same value as \(t_s\) on \(s\)
Natural Join 有交换律与结合律,然后还引出一个更广的操作为 theta 操作,记为 \(\bowtie_\theta\),定义为
Natural join
\(R = (A, B, C, D)\)
\(S = (E, B, D)\)
Result schema = \((A, B, C, D, E)\)
\(r \bowtie s = \Pi_{r.A, r.B, r.C, r.D, s.E} (\sigma_{r.B = s.B \land r.D = s.D} (r \times s))\)
Outer Join¶
An extension of the join operation that avoids loss of information.
Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.
Uses null values:
- null signifies that the value is unknown or does not exist
- All comparisons involving null are (roughly speaking) false by definition. We shall study precise meaning of comparisons with nulls later
Outer Join using Joins
Left Outer Join (r ⟕ s) can be written as:
Right Outer Join (r ⟖ s) can be written as:
Full Outer Join (r ⟗ s) can be written as:
Null Values
Comparisons with null values return the special truth value: unknown
.
Three-Valued Logic using the Truth Value unknown
:
(1) OR:
- (unknown or true) = true
- (unknown or false) = unknown
- (unknown or unknown) = unknown
(2) AND:
- (true and unknown) = unknown
- (false and unknown) = false
- (unknown and unknown) = unknown
(3) NOT:
- (not unknown) = unknown
SQL Predicate: In SQL "P is unknown"
evaluates to true if predicate P
evaluates to unknown
.
Result of Select Predicate: The result of a select predicate is treated as false
if it evaluates to unknown
.
Semi-join Operation¶
Notation: \(r \ltimes_{\theta} s\)
Definition: Is a subset of \(r\), in which every tuple \(r_i\) matches at least one tuple \(s_j\) in \(s\) under the condition \(\theta\).

Assignment Operation¶
The assignment operation (\(\leftarrow\)) provides a convenient way to express complex queries.
(1) Write query as a sequential program consisting of
-
a series of assignments
-
followed by an expression whose value is displayed as a result of the query
(2) Assignment must always be made to a temporary relation variable.
Division Operation¶
Definition: Given relations \(r(R)\) and \(s(S)\), such that \(S \subseteq R\), \(r \div s\) is the largest relation \(t(R-S)\) such that
Example
Let \(r(\text{ID}, \text{course}\_id) = \Pi_{\text{ID}, \text{course}\_\text{id}}(\text{takes})\) and \(s(\text{course}\_\text{id}) = \Pi_{\text{course}\_\text{id}}(\sigma_{\text{dept}\_\text{name}=\text{Biology}}(\text{course}))\)
then \(r \div s\) gives us students who have taken all courses in the Biology department.
Can write \(r \div s\) as:
Notes:
-
The result to the right of the \(\leftarrow\) is assigned to the relation variable on the left of the \(\leftarrow\).
-
May use variable in subsequent expressions.

2.4.8 Generalized Projection¶
Extends the projection operation by allowing arithmetic functions to be used in the projection list.
- \(E\) is any relational-algebra expression.
- Each of \(F_1, F_2, \ldots, F_n\) are arithmetic expressions involving constants and attributes in the schema of \(E\).
Example
Given relation instructor(ID, name, dept_name, salary)
where salary is annual salary, get the same information but with monthly salary.
2.4.9 Aggregate Functions and Operations¶
Aggregation function takes a collection of values and returns a single value as a result.
- avg: average value
- min: minimum value
- max: maximum value
- sum: sum of values
- count: number of values
Aggregate Operation in Relational Algebra
\(E\) is any relational-algebra expression.
\(G_1, G_2, \ldots, G_n\) is a list of attributes on which to group (can be empty).
Each \(F_i\) is an aggregate function.
Each \(A_i\) is an attribute name.
Note
Some books/articles use \(\gamma\) instead of \(\mathcal{G}\) (Calligraphic G).
Result of aggregation does not have a name, Can use rename operation to give it a name. For convenience, we permit renaming as part of aggregate operation
2.4.10 Multi-set Relational Algebra¶
Pure relational algebra removes all duplicates e.g. after projection
Multiset (多重集) relational algebra retains duplicates, to match SQL semantics
- SQL duplicate retention was initially for efficiency, but is now a feature
Multi-set relational algebra defined as follows
(1) selection: has as many duplicates of a tuple as in the input, if the tuple satisfies the selection
(2) projection: one tuple per input tuple, even if it is a duplicate
(3) cross product: If there are \(m\) copies of \(t_1\) in \(r\), and \(n\) copies of \(t_2\) in \(s\), there are \(m \times n\) copies of \(t_1.t_2\) \(r \times s\)
(4) set operators
- union: \(m + n\) copies
- intersection: \(\min(m, n)\) copies
- difference: \(\min(0, m - n)\) copies