Chapter 3 Introduction to SQL¶
The SQL data-definition language (DDL) allows the specification of information about relations, including:
- The schema for each relation.
- The domain of values associated with each attribute.
- Integrity constraints
And as we will see later, also other information such as
- The set of indices to be maintained for each relations.
- Security and authorization information for each relation.
- The physical storage structure of each relation on disk.
3.1 Domain Types in SQL¶
(1) char(n)
. Fixed length character string, with user-specified length n.
(2) varchar(n)
. Variable length character strings, with user-specified maximum length n.
(3) int
. Integer (a finite subset of the integers that is machine-dependent).
(4) smallint
. Small integer (a machine-dependent subset of the integer domain type).
(5) numeric(p,d)
. Fixed point number, with user-specified precision of p
digits, with d
digits to the right of decimal point.
Note
number(3,1) allows 44.5 to be store exactly, but neither 444.5 or 0.32
(6) real, double precision
. Floating point and double-precision floating point numbers, with machine-dependent precision.
(7) float(n)
. Floating point number, with user-specified precision of at least n digits.
3.2 Built-in Data Types in SQL¶
date
: Dates, containing a (4 digit) year, month and date
- Example: date ‘2005-7-27’
time
: Time of day, in hours, minutes and seconds.
- Example: time ‘09:00:30’ time ‘09:00:30.75’
timestamp
: date plus time of day
- Example: timestamp ‘2005-7-27 09:00:30.75’
interval
: period of time
- Subtracting a date/time/timestamp value from another gives an interval value
- Interval values can be added to date/time/timestamp values
3.3 Table Constructs¶
3.3.1 Create Table Construct¶
An SQL relation is defined using the create table command:
- \(r\) is the name of the relation
- each \(A_i\) is an attribute name in the schema of relation \(r\)
- \(D_i\) is the data type of values in the domain of attribute \(A_i\)
比如说创建一个学生的数据表
create table student (
ID varchar(5),
name varchar(20) not null,
dept_name varchar(20),
tot_cred numeric(3,0) default 0,
primary key (ID),
foreign key (dept_name) references department) );
创建一个学生参加什么课程的数据表
create table takes (
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4,0),
grade varchar(2),
primary key (ID, course_id, sec_id, semester, year),
foreign key (ID) references student,
foreign key (course_id, sec_id, semester, year) references section );
Note
sec_id
can be dropped from primary key above, to ensure a student cannot be registered for two sections of the same course in the same semester
在数据库设计中,外键(Foreign Key)约束用于维护两个表之间的关系。当一个表中的数据被删除或更新时,这些操作可能会对另一个表产生影响。为了处理这种情况,可以设置不同的级联操作(Cascade Actions)。以下是这些操作的含义:
ON DELETE
操作
- CASCADE: 当父表中的记录被删除时,子表中所有相关联的记录也会被自动删除。
- SET NULL: 当父表中的记录被删除时,子表中对应的外键字段会被设置为NULL(前提是该字段允许NULL值)。
- RESTRICT: 如果子表中有任何记录依赖于父表中的记录,则不允许删除父表中的记录。这会阻止删除操作,并返回错误。
- SET DEFAULT: 当父表中的记录被删除时,子表中对应的外键字段会被设置为其默认值(前提是该字段有默认值)。
ON UPDATE
操作
- CASCADE: 当父表中的记录被更新时,子表中所有相关联的记录也会被自动更新,以保持一致性。
- SET NULL: 当父表中的记录被更新时,子表中对应的外键字段会被设置为NULL(前提是该字段允许NULL值)。
- RESTRICT: 如果子表中有任何记录依赖于父表中的记录,则不允许更新父表中的记录。这会阻止更新操作,并返回错误。
- SET DEFAULT: 当父表中的记录被更新时,子表中对应的外键字段会被设置为其默认值(前提是该字段有默认值)。
Example
假设我们有两个表:department
和 employee
,其中 employee
表有一个外键 dept_name
引用 department
表的主键 dept_name
。
CREATE TABLE department (
dept_name VARCHAR(50) PRIMARY KEY,
dept_location VARCHAR(100)
);
CREATE TABLE employee (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(50),
dept_name VARCHAR(50),
FOREIGN KEY (dept_name) REFERENCES department(dept_name)
ON DELETE CASCADE
ON UPDATE CASCADE
);
在这个例子中:
- ON DELETE CASCADE: 如果某个部门被删除,那么所有属于该部门的员工记录也会被自动删除。
- ON UPDATE CASCADE: 如果某个部门的名字被更改,那么所有属于该部门的员工记录中的
dept_name
字段也会被自动更新为新的部门名字。
3.3.2 Drop and Alter Table Constructs¶
(1) drop table student. Deletes the table and its contents 删除一整个表
(2) delete from student. Deletes all contents of table, but retains table 删除所有数据,但是保留表
(3) alter table
- alter table r add A D, where A is the name of the attribute to be added to relation r and D is the domain of A. All tuples in the relation are assigned null as the value for the new attribute.
- alter table r drop A, where A is the name of an attribute of relation r. Dropping of attributes not supported by many databases 代价大,不鼓励
3.4 Basic Query Structure¶
The SQL data-manipulation language (DML) provides the ability to query information, and insert, delete and update tuples
A typical SQL query has the form:
-
\(A_i\) represents an attribute
-
\(R_i\) represents a relation
-
\(P\) is a predicate.
The result of an SQL query is a relation.
3.4.1 The select
Clause¶
The select clause list the attributes desired in the result of a query, corresponds to the projection operation of the relational algebra.
Note
SQL names are case insensitive (i.e., you may use upper- or lower-case letters.)
SQL allows duplicates in relations as well as in query results.
(1) To force the elimination of duplicates, insert the keyword distinct
after select. 去重加一个关键字 distinct
Find the names of all departments with instructor, and remove duplicates
(2) The keyword all
specifies that duplicates not be removed. 要得到所有的就加一个关键字 all
(3) An asterisk in the select clause denotes “all attributes”
(4) The select clause can contain arithmetic expressions involving the operation, +, –, *, and /, and operating on constants or attributes of tuples.
3.4.2 The where
Clause¶
The where clause specifies conditions that the result must satisfy. Corresponds to the selection predicate of the relational algebra.
Example
To find all instructors in Comp. Sci. dept with salary > 80000
Comparison results can be combined using the logical connectives and, or, and not. 通过与或非连接
Comparisons can be applied to results of arithmetic expressions. 可先进行算数运算
关于谓词,SQL includes a between comparison operator
Example
Find the names of all instructors with salary between $90,000 and $100,000 (that is, \(\ge\) $90,000 and \(\le\) $100,000)
SQL 也支持元组比较
select name, course_id
from instructor, teaches
where (instructor.ID, dept_name) = (teaches.ID, ’Biology’);
3.4.3 The from
Clause¶
The from clause lists the relations involved in the query. Corresponds to the Cartesian product operation of the relational algebra.
Cartesian product not very useful directly, but useful combined with where-clause condition (selection operation in relational algebra)
Example
Find the course ID, semester, year and title of each course offered by the Comp. Sci. department
我们也可以指定连接方式为自然连接, Natural join matches tuples with the same values for all common attributes, and retains only one copy of each common column
Warning
Beware of unrelated attributes with same name which get equated incorrectly
e.g. List the names of instructors along with the titles of courses that they teach
course(course_id,title, dept_name,credits)
teaches( ID, course_id,sec_id,semester, year)
instructor(ID,name, dept_name,salary)
Incorrect version (makes course.dept_name = instructor.dept_name
)
Correct version
select name, title
from instructor natural join teaches, course
where teaches.course_id = course.course_id;
select name, title
from (instructor natural join teaches)join course using(course_id);
select name, title
from instructor,teaches, course
where instructor.ID=teaches .ID and teaches.course_id =course.course_id;
3.5 Additional Basic Operations¶
3.5.1 The Rename Operation¶
The SQL allows renaming relations and attributes using the as clause:
比如说将年薪的1/12记为月薪,可以这么写
也可以对表进行 rename
select distinct T. name
from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = ‘Comp. Sci.’
Keyword as is optional and may be omitted 关键字可省略
3.5.2 String Operation¶
SQL includes a string-matching operator for comparisons on character strings. The operator “like” uses patterns that are described using two special characters:
- percent (%). The % character matches any substring.
- underscore (_). The _ character matches any character.
这个字符串匹配是大小写敏感的
Pattern matching examples:
- ‘Intro%’ matches any string beginning with “Intro”.
- ‘%Comp%’ matches any string containing “Comp” as a substring.
- ‘ _ ’ matches any string of exactly three characters. _
- ‘_ _ _ %’ matches any string of at least three characters.
SQL supports a variety of string operations such as
- concatenation (using “||”)
- converting from upper to lower case (and vice versa)
- finding string length, extracting substrings, etc.
Example
Find the names of all instructors whose name includes the substring “dar”.
匹配中文字的小问题
由于中文在计算机中使用两个字节来存储的,而字符匹配的时候是一个字节一个字节匹配的,所以当我们匹配中文字的时候,可能会出现匹配结果并不包含相应中文字的情况
3.5.3 Ordering the Display of Tuples¶
可以使用下列语句对名字进行排列
We may specify desc
for descending(降) order or asc
for ascending(升) order, for each attribute; ascending order is the default.
Can sort on multiple attributes
3.5.4 The limit
Clause¶
The limit
clause can be used to constrain the number of rows returned by the select statement.
limit clause takes one or two numeric arguments, which must both be nonnegative integer constants:
Example
List names of instructors whose salary is among top 3
3.6 Set Operations¶
我们可以将得到的结果进行 set operations
Set operations union
, intersect
, and except
, Each of the above operations automatically eliminates duplicates 会去重
To retain all duplicates use the corresponding multi-set versions union all
, intersect all
and except all
. 不去重
Example
Suppose a tuple occurs m times in r and n times in s, then, it occurs:
- \(m + n\) times in r union all s
- \(\min(m,n)\) times in r intersect all s
- \(\max(0, m – n)\) times in r except all s
3.7 Null Values¶
null
signifies an unknown value or that a value does not exist.
(1)任何包含 null 的算数表达式得到的结果都是 null.
(2)The predicate is null
can be used to check for null values.
(3)Comparisons with null values return the special truth value: unknown
(4)Three-valued logic using the truth value unknown:
- OR
- AND:
- NOT:
- In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown
(5)Result of select predicate is treated as false if it evaluates to unknown 选择谓词是unknown,则被处理为 false
3.8 Aggregate Functions¶
These functions operate on the multiset of values of a column of a relation, and return a value
-
avg: average value
-
min: minimum value
-
max: maximum value
-
sum: sum of values
-
count: number of values
Example
Find the average salary of instructors in the Computer Science department
Find the total number of instructors who teach a course in the Spring 2010 semester
Find the number of tuples in the course relation
3.8.1 Group By¶
我们也可以分组使用聚合函数,比如说计算每个部门中老师的平均工资
Attributes in select clause outside of aggregate functions must appear in group by list
Warning
这里 ID 不属于聚合函数的参数,也不在 group by 中就会出现问题
3.8.2 Having Clause¶
这个语句实现的功能就是对聚合的结果进行选择,比如说
Example
Find the names and average salaries of all departments whose average salary is greater than 42000
Predicates in the having clause are applied after the formation of groups whereas predicates in the where clause are applied before forming groups
Note
SQL查询处理的基本流程
- FROM: 确定数据源。
- WHERE: 应用过滤条件来选择符合条件的行。
- GROUP BY: 将结果集按指定列进行分组。
- HAVING: 对分组后的结果应用过滤条件。
- SELECT: 选择需要显示的列或表达式。
- ORDER BY: 按指定列排序结果。
3.8.3 Null Values and Aggregates¶
All aggregate operations except count(*)
ignore tuples with null values on the aggregated attributes
What if collection has only null values?
count returns 0
all other aggregates return null
3.9 Nested Sub-queries¶
SQL provides a mechanism for the nesting of subqueries.
A sub-query is a select-from-where expression that is nested within another query.
A common use of subqueries is to perform tests for :
- set membership
- set comparisons
- set cardinality
3.9.1 Set Membership¶
目的:检查一个值是否属于另一个查询返回的集合。
Find courses offered in Fall 2009 and in Spring 2010
select distinct course_id
from section
where semester = ’Fall’ and year= 2009
and course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);
3.9.2 Set Comparisons¶
目的:比较两个或多个集合之间的关系,如相等、不相等、包含等。
Find names of instructors with salary greater than that of some (at least one) instructor in the Biology department.
可以这样做
select distinct T.name
from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = ’Biology’;
也可以利用嵌套查询,然后使用 >some
语句
select name
from instructor
where salary > some (select salary
from instructor
where dept_name = ’Biology’);
Definition of Some Clause
\(\mathsf{F}<\text{comp}>\mathsf{some}\;r \Leftrightarrow \exists\;t \in r\;\;\mathsf{such\;that}\;(\mathsf{F}<\text{comp}>t)\)
Where \(<\text{comp}>\) can be: <, ≤, >, =, ≠
相应的我们也有 all Clause 语句
3.9.3 Scalar Sub-query¶
Scalar(标量) sub-query is one which is used where a single value is expected
select name
from instructor
where salary * 10 >
(select budget
from department
where department.dept_name = instructor.dept_name)
Runtime error if sub-query returns more than one result tuple
3.9.4 Test for Empty Relations¶
The exists construct returns the value true if the argument sub-query is nonempty.
\(\text{exists } r \Leftrightarrow r\ne\empty\)
\(\text{not exists } r \Leftrightarrow r = \empty\)
Example
还是之前那个例子,使用 exist 关键字来写
Example
Find all students who have taken all courses offered in the Biology department.
3.9.5 Test for Absence of Duplicate Tuples¶
The unique construct tests whether a sub-query has any duplicate tuples in its result. (Evaluates to “true” on an empty set)
Example
Find all courses that were offered at most once in 2009
3.9.6 *Sub-queries in the From Clause¶
SQL allows a sub-query expression to be used in the from clause
Example
Find the average instructors’ salaries of those departments where the average salary is greater than $42,000.
Lateral clause permits later part of the from clause (after the lateral keyword) to access correlation variables from the earlier part.
select name, salary, avg_salary
from instructor I1,
lateral (select avg(salary) as avg_salary
from instructor I2
where I2.dept_name= I1.dept_name);
3.9.7 *With Clause¶
The with clause provides a way of defining a temporary view whose definition is available only to the query in which the with clause occurs.
Example
Find all departments with the maximum budget
with max_budget (value) as
(select max(budget)
from department)
select dept_name
from department, max_budget
where department.budget = max_budget.value;
等价的写法如下
With clause is very useful for writing complex queries
Complex Queries using With Clause
Find all departments where the total salary is greater than the average of the total salary at all departments
3.10 Modification of the Database¶
3.10.1 Deletion¶
Delete all instructors
Delete all instructors from the Finance department
Delete all tuples in the instructor relation for those instructors associated with a department located in the Watson building.
delete from instructor
where dept_name in (select dept_name
from department
where building = ’Watson’);
Delete all instructors whose salary is less than the average salary of instructors
Question
Problem: as we delete tuples from deposit, the average salary changes
Solution used in SQL:
1.First, compute avg salary and find all tuples to delete
2.Next, delete all tuples found above (without recomputing avg or retesting the tuples)
选择直接不管了,先计算原来的,剔除掉没超过的即可
3.10.2 Insertion¶
Add a new tuple to course
或者等价地写为
insert into course (course_id, title, dept_name, credits)
values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4);
Add all instructors to the student relation with tot_creds
set to 0
Warning
The select from where statement is evaluated fully before any of its results are inserted into the relation。
这样可能导致重复的数据和可能的无限循环。
3.10.3 Update¶
Increase salaries of instructors whose salary is over $100,000 by 3%, and all others receive a 5% raise
update instructor
set salary = salary * 1.03
where salary > 100000;
update instructor
set salary = salary * 1.05
where salary <= 100000;
Updates with Scalar Sub-queries
Example
Recompute and update tot_creds
value for all students