Chapter 1 Introduction¶
1.1 Database Systems¶
Applications built on files vs. built on databases
Database is a collection of interrelated data about a enterprise, which is managed by a DBMS(Database Management System).
The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient.
Management of data involves both defining structures for storage of information and providing mechanisms for the manipulation of information.
The database system must ensure the safety of the information stored, despite system crashes or attempts at unauthorized access.
If data are to be shared among several users, the system must provide concurrency control mechanisms to avoid possible anomalous results.
1.2 Database Application¶
Database Applications(数据库应用,数据库应用系统)
(1) Enterprise Information
-
Sales: customers, products, purchases
-
Accounting: payments, receipts, assets
-
Human Resources: employees, salaries, payroll taxes.
(2) Manufacturing: production, inventory, orders, supply chain
(3) Banking: customers, accounts, loans, credit cards ,transactions
(4) Universities: instructors, students, courses, registration, grades
(5) Airlines: reservations, schedules
(6) Web-based services
-
On-line retailers: order tracking, customized recommendations
-
On-line advertisements
Databases can be very large: Big Data (大数据)
Volume (容量)Variety (种类)Velocity (速度)Value(价值)
Data-driven Artificial Intelligence: AI 2.0 (人工智能 2.0)
GAI: Generative Artificial Intelligence(生成式人工智能)LLM : Large Language Model (大语言模型)
1.3 Purpose of Database Systems¶
In the early days, database applications were built directly on top of file systems,which leads to:
- Data redundancy(数据冗余) and inconsistency(不一致)Multiple file formats, duplication of information in different files
- Data isolation(数据孤立,数据孤岛) — multiple files and formats
- Difficulty in accessing data (存取数据困难)Need to write a new program to carry out each new task
Integrity problems(完整性问题)
- Integrity constraints become “buried” in program code rather than being stated explicitly(显式的)
- Example: “account balance >=1”
- Hard to add new constraints or change existing ones
Atomicity problems(原子性问题)
- Failures may leave database in an inconsistent state with partial updates carried out
- Example: Transfer of funds from one account to another should either complete or not happen at all

Concurrent access anomalies(并发访问异常)
- Concurrent access needed for performance
- Uncontrolled concurrent accesses can lead to inconsistencies
- Example: Two people reading a balance (say 100) and updating it by saving money (say 50 each) at the same time

Security problems(安全性问题)
- Hard to provide user access to some, but not all, data
- Authentication(认证)
- Priviledge (权限)
- Audit(审计)
1.4 View of Data¶
Three-level abstraction of databases
优势:(1) Hide the complexities (2) Enhance the adaptation to changes——可以通过调整 mapping 来适应变化
Schema and Instance¶
Similar to types and variables in programming languages
Schema (模式)– the logical structure of the database
-
Example: The database consists of information about a set of customers and accounts and the relationship between them
-
Analogous to type information of a variable in a program
-
Physical schema(物理模式): database design at the physical level
-
Logical schema(逻辑模式): database design at the logical level
Instance(实例) – the actual content of the database at a particular point in time
- Analogous to the value of a variable
1.5 Data Models¶
A collection of tools for describing
- Data (数据)
- Data relationships(联系)
- Data semantics(语义)
- Data constraints(约束)
Relational model (关系模型) 【重点学习】
Entity-Relationship (实体-联系) data model
Object-based data models
-
Object-oriented (面向对象数据模型)
-
Object-relational (对象-关系模型模型)
Semistructured data model (XML) (半结构化数据模型)
Other older models:
-
Network model (网状模型)
-
Hierarchical model(层次模型)
1.6 Database Languages¶
-
Data Definition Language (DDL)
-
Data Manipulation Language (DML)
-
SQL Query Language
-
Application Program Interface (API)
Data Definition Language (DDL)¶
Specification notation for defining the database schema
DDL compiler generates a set of table templates stored in a data dictionary(数据字典)
Data dictionary contains metadata (元数据, i.e., data about data)
- Database schema
- Integrity constraints(完整性约束) Primary key (ID uniquely identifies instructors)(主健) Referential integrity (references constraint in SQL)(参照完整性)e.g. dept_name value in any instructor tuple must appear in department relation
- Authorization(权限)
Data Manipulation Language (DML)¶
Language for accessing and manipulating the data organized by the appropriate data model
- DML also known as query language
Two classes of languages
-
Procedural(过程式) – user specifies what data is required and how to get those data
-
Declarative (nonprocedural, 陈述式,非过程式) – user specifies what data is required without specifying how to get those data
SQL is the most widely used query language
Example
Find the name of the instructor with ID 22222
Application Program Interface (API)¶
Non-procedural query languages such as SQL are not as powerful as a universal Turing machine.
SQL does not support actions such as input from users, output to displays, or communication over the network.
Such computations and actions must be written in a host language, such as C/C++, Java or Python.
Application programs generally access databases through one of
- Language extensions to allow embedded SQL
- API(Application program interface) (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database
1.7 Database Design¶
Entity Relationship Model (实体-联系模型)
- Models an enterprise as a collection of data entities and relationships
- Represented diagrammatically by an entity-relationship diagram.

Normalization Theory(规范化理论): Formalize what designs are bad, and test for them
1.8 Database Engine¶
A database system (database engine) is partitioned into modules that deal with each of the responsibilities of the overall system.
The functional components of a database system can be divided into
- The storage manager
- The query processor
- The transaction management component.
Storage Manager¶
A program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible to the following tasks:
- Interaction with the OS file manager
- Efficient storing, retrieving and updating of data
The storage manager components include:
- File manager
- Buffer manager
- Authorization and integrity manager
- Transaction manager
The storage manager implements several data structures as part of the physical system implementation:
- Data files -- store the database itself
- Data dictionary -- stores metadata about the structure of the database, in particular the schema of the database.
- Indices -- can provide fast access to data items. A database index provides pointers to those data items that hold a particular value.
- Statistical data
Query Processor¶
The query processor components include:
- DDL interpreter -- interprets DDL statements and records the definitions in the data dictionary.
-
DML compiler -- translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands. (The DML compiler performs query optimization; that is, it picks the lowest cost evaluation plan from among the various alternatives.)
-
Query evaluation engine -- executes low-level instructions generated by the DML compiler.
Transaction Management¶
A transaction is a collection of operations that performs a single logical function in a database application.
Recover Manager ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
1.9 Database Users and Administrators¶

Database Users¶
Users are differentiated by the way they expect to interact with the system
(1) Application programmers – interact with system through DML calls
(2) Naive users – invoke one of the permanent application programs that have been written previously e.g. people accessing database over the web, bank tellers, clerical staff
(3) Database Administrator - Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.
Database Administrator (DBA)¶
Database administrator's duties include:
(1) Schema definition
(2) Storage structure and access method definition
(3) Schema and physical organization modification
(4) Granting user authority to access the database
(5) Routine maintenance
- Performance Tuning - Monitoring performance and responding to changes in requirements
- Periodically backing up the database onto remote servers
- Ensuring that enough free disk space is available for normal operations, and upgrading disk space as required