跳转至

Chapter 6 Entity-Relationship Model

文本统计:约 4452 个字 • 1 行代码

6.1 Database Design Process

**Initial phase (Requirement Specification) ** -- characterize fully the data needs of the prospective database users.

Second phase (Conceptual Design) -- choosing a data model

  • Applying the concepts of the chosen data model
  • Translating these requirements into a conceptual schema of the database.
  • A fully developed conceptual schema indicates the functional requirements of the enterprise. (Describe the kinds of operations (or transactions) that will be performed on the data.)

Final Phase (Database Design)-- Moving from an abstract data model to the implementation of the database

  • Logical Design – Deciding on the database schema.

  • Physical Design – Deciding on the physical layout of the database

Warning

In designing a database schema, we must ensure that we avoid two major pitfalls:

  • Redundancy: a bad design may result in repeat information. Redundant representation of information may lead to data inconsistency among the various copies of information

  • Incompleteness: a bad design may make certain aspects of the enterprise difficult or impossible to model.

Avoiding bad designs is not enough. There may be a large number of good designs from which we must choose.

Database design can be a challenging problem.

设计方法主要包括实体关系模型和规范化理论

Entity Relationship Model: models an enterprise as a collection of entities and relationships

  • Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects, Described by a set of attributes

  • Relationship: an association among several entities

Represented diagrammatically by an entity-relationship diagram:

Normalization Theory: Formalize what designs are bad, and test for them

下面所展示的是 university 的一个 Entity Relationship Model,接下来几节我们都会讨论其中的一些细节

6.2 Database Modeling

A database can be modeled as:

  • a collection of entities
  • relationship among entities.

6.2.1 Entity

An entity is an object that exists and is distinguishable from other objects. Entities have attributes

  • entity like specific person, company, event and attributes like names

An entity set is a set of entities of the same type that share the same properties.

  • example: set of all persons, companies

Representing Entity sets in ER Diagram

Entity sets can be represented graphically as follows:

  • Rectangles represent entity sets.
  • Attributes listed inside entity rectangle
  • Underline indicates primary key attributes

6.2.2 Relationship

A relationship is an association among several entities

A relationship set is a mathematical relation among \(n \ge 2\) entities, each taken from entity sets $$ {(e_1, e_2, … e_n) | e_1 \in E_1, e_2 \in E_2, …, e_n \in E_n} $$ where \((e_1, e_2, … e_n)\) is a relationship

Representing Relationship Sets in ER Diagrams

An attribute can also be property of a relationship set.

Example

For instance, the advisor relationship set between entity sets instructor and student may have the attribute date which tracks when the student started being associated with the advisor 学生和导师的关系还可以附加一个属性:确认关系的时间

6.2.3 Roles

Entity sets of a relationship need not be distinct

Each occurrence of an entity set plays a “role” in the relationship

Example

The labels “course_id” and “prereq_id” are called roles.

6.2.4 Degree of a Relationship Set

Binary relationship(二元联系)

  • involve two entity sets (or degree two).
  • most relationship sets in a database system are binary.

There are occasions when it is more convenient to represent relationships as non-binary.

Warning

当然这些多元关系可以很好地转化为二元关系,这样增加了便携性与扩展性

6.2.5 Attributes

An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.

instructor = (ID, name, street, city, salary )
course= (course_id, title, credits)

Domain – the set of permitted values for each attribute

Attribute types:

  • Simple (简单)and composite(复合) attributes.
  • Single-valued(单值) and multivalued(多值) attributes, example: multivalued attribute: phone_numbers
  • Derived(派生) attributes. Can be computed from other attributes, example: age, given date_of_birth

Example

右边这个表格中

  • name 和 address 就是复合属性
  • {phone_number} 就是一个多值属性,指的是一个 instructor 可以有多个电话号码
  • age() 就是一个派生属性,可以通过 date_of_birth 得到

6.3 Mapping Cardinality Constraints

Express the number of entities to which another entity can be associated via a relationship set.

Most useful in describing binary relationship sets.

For a binary relationship set the mapping cardinality must be one of the following types: One to One, One to Many, Many to One, Many to Many.

We express cardinality constraints by drawing either a directed line (\(\rightarrow\)), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set.

Four types of relationship between an instructor and a student

A student is associated with at most one instructor via the relationship advisor

A instructor is associated with at most one student via the relationship advisor

an instructor is associated with several (including 0) students via advisor

a student is associated with at most one instructor via advisor,

an instructor is associated with at most one student via advisor,

and a student is associated with several (including 0) instructors via advisor

An instructor is associated with several (possibly 0) students via advisor

A student is associated with several (possibly 0) instructors via advisor

Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set 每个 entity 必须要在这个关系模式里

Example

Partial participation: some entities may not participate in any relationship in the relationship set

6.3.1 Notation for Expressing More Complex Constraints

A line may have an associated minimum and maximum cardinality, shown in the form \(l..h\), where \(l\) is the minimum and \(h\) the maximum cardinality

  • A minimum value of 1 indicates total participation.
  • A maximum value of 1 indicates that the entity participates in at most one relationship
  • A maximum value of * indicates no limit.

Example

6.3.2 Cardinality Constraints on Ternary Relationship

We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint

To avoid confusion we outlaw(禁止) more than one arrow. 如果允许从关系指向实体集的多条箭头,可能会导致理解上的混乱。例如,如果同时有箭头指向 instructorstudent,可能无法明确表示具体是哪个实体集的基数受到限制

6.3.3 Primary Key

Primary keys provide a way to specify how entities and relations are distinguished. We will consider:

  • Entity sets
  • Relationship sets
  • Weak entity sets

Primary key for Entity Sets

By definition, individual entities are distinct.

From database perspective, the differences among them must be expressed in terms of their attributes.

The values of the attribute values of an entity must be such that they can uniquely identify the entity.

  • No two entities in an entity set are allowed to have exactly the same value for all attributes.

A key for an entity is a set of attributes that suffice to distinguish entities from each other

Primary key 是标识 entity 在 entity set 的唯一标识

Primary Key for Relationship Sets

To distinguish among the various relationships of a relationship set we use the individual primary keys of the entities in the relationship set.

  • Let R be a relationship set involving entity sets \(E_1, E_2, .. E_n\)
  • The primary key for R is consists of the union of the primary keys of entity sets \(E_1, E_2, .. E_n\)
  • If the relationship set R has attributes \(a_1, a_2, .., a_m\) associated with it, then the primary key of R also includes the attributes \(a_1, a_2, .., a_m\)
  • The choice of the primary key for a relationship set depends on the mapping cardinality of the relationship set.

Example

多对多(Many-to-Many)关系

  • 定义:在多对多关系中,一个实体可以与多个其他实体相关联,反之亦然。
  • 主键选择:在这种情况下,两个参与实体集的主键组合起来形成一个新的复合主键。这个复合主键是一个最小超键(minimal superkey),并被选作关系表的主键。

e.g. 假设有一个“课程”和“学生”之间的多对多关系,表示一个学生可以注册多门课程,一门课程也可以有多个学生注册。那么,关系表的主键将是“学生ID”和“课程ID”的组合。


一对多(One-to-Many)关系

  • 定义:在一个实体集中,一个实体可以与多个其他实体相关联,但另一个实体只能与一个实体关联。
  • 主键选择:在这种情况下,“多”侧实体集的主键作为关系表的主键。这个主键是一个最小超键,并被选作关系表的主键。

e.g. 假设有一个“教师”和“课程”之间的一对多关系,表示一个教师可以教授多门课程,但一门课程只能由一个教师教授。那么,关系表的主键将是“课程ID”。


多对一(Many-to-One)关系

  • 定义:在一个实体集中,多个实体可以与同一个其他实体相关联,但另一个实体只能与一个实体关联。
  • 主键选择:在这种情况下,“多”侧实体集的主键作为关系表的主键。这个主键是一个最小超键,并被选作关系表的主键。

e.g. 假设有一个“订单”和“客户”之间的多对一关系,表示一个客户可以下多个订单,但一个订单只能属于一个客户。那么,关系表的主键将是“订单ID”。


一对一(One-to-One)关系

  • 定义:在一个实体集中,一个实体只能与另一个实体唯一关联,反之亦然。
  • 主键选择:在这种情况下,任意一个参与实体集的主键都可以作为关系表的主键。因为任何一个主键都是一个最小超键。

e.g. 假设有一个“员工”和“工资记录”之间的一对一关系,表示一个员工只有一条工资记录,一条工资记录只对应一个员工。那么,可以选择“员工ID”或“工资记录ID”作为关系表的主键。

Weak Entity Sets

An entity set that does not have a primary key is referred to as a weak entity set.

The existence of a weak entity set depends on the existence of a identifying entity set(标识性实体集)

  • It must relate to the identifying entity set via a total, one-to-many relationship set from the identifying to the weak entity set

  • Identifying relationship(标识性联系) depicted using a double diamond

The discriminator (分辨符,or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set when the identifying entity they depend is known.

The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator.

Example

Primary key for section – (course_id, sec_id, semester, year)

Note

the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship. 强实体集的主键不会显式地存储在弱实体集中,因为这种关系是隐含在识别关系中的。

If course_id were explicitly stored, section could be made a strong entity, but then the relationship between section and course would be duplicated by an implicit relationship defined by the attribute course_id common to course and section

如果course_id被显式存储在section表中,那么section就可以成为一个强实体。但这样会导致sectioncourse之间的关系变得冗余,因为它们之间已经通过共同属性course_id存在一种隐含关系。

6.3.4 Redundant Attributes

Suppose we have entity sets:

  • student, with attributes: ID, name, tot_cred, dept_name
  • department, with attributes: dept_name, building, budget

We model the fact that each student has an associated department using a relationship set stud_dept

The attribute dept_name in student below replicates information present in the relationship and is therefore redundant

这个dept_name 已经在 relationship 中已经阐述了

6.4 Reduction to Relational Schemas

Entity sets and relationship sets can be expressed uniformly as relation schemas that represent the contents of the database.

A database which conforms to an E-R diagram can be represented by a collection of schemas.

6.4.1 Representing Entity Sets With Simple Attributes

A strong entity set reduces to a schema with the same attributes

course(course_id, title, credits)

A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set

Primary key of the table is the union of the discriminator of the weak entity set and the primary key of the identifying strong entity set

6.4.2 Representing Relationship Sets

A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set.

6.4.3 Redundancy of Schemas

Many-to-one and one-to-many relationship sets that are total on the many-side can be represented by adding an extra attribute to the “many” side, containing the primary key of the “one” side

为了避免冗余,我们可以直接在instructor实体中添加一个新属性dept_name,该属性包含了department实体中的dept_name。这样,每个教师记录就直接关联到其所属的部门名称,而不需要单独的关系集inst_dept

对于一对一的关系集,任何一方都可以被选择作为“多”的一方:这意味着可以在两个实体集中对应的任意一个表中添加额外的属性。 例如,如果存在一个教师和其办公室的一对一关系,你可以在教师表中添加办公室ID,或者在办公室表中添加教师ID。

如果“多”的一侧是部分参与的,用“多”侧的一个额外属性替换一个模式可能会导致空值(null values):例如,在学生和课程注册的关系中,如果某些学生没有注册任何课程,那么在学生表中添加课程ID可能会导致这些学生的课程ID为空。

6.4.4 Composite and Multivalued Attributes

Composite attributes are flattened out by creating a separate attribute for each component attribute 复合属性直接展开就行了

Ignoring multivalued attributes, extended instructor schema is

instructor(ID, first_name, middle_initial,  last_name,      
         street_number, street_name,  apt_number, 
         city, state, zip_code,date_of_birth, age )

A multivalued attribute M of an entity E is represented by a separate schema EM 多值属性增加一个表

Schema EM has attributes corresponding to the primary key of E and an attribute corresponding to multivalued attribute M 这个表格包含了 E 的主键与那个多值属性

Example

Multivalued attribute phone_number of instructor is represented by a schema:

inst_phone= ( ID, phone_number) 

Each value of the multivalued attribute maps to a separate tuple of the relation on schema EM

For example, an instructor entity with primary key 22222 and phone numbers 456-7890 and 123-4567 maps to two tuples:

(22222, 456-7890) 
(22222, 123-4567) 

Special case: entity time_slot has only one attribute other than the primary-key attribute, and that attribute is multivalued 除去多值属性,只有一个主键,如果我们按照之前讲的方法,我们转化为

time_slot(time_slot_id)
time_slot_detail(time_slot_id, day, start_time, end_time)

在设计数据库时,可以进行以下优化:

  • 不创建与实体对应的表:由于time_slot实体只有一个非主键属性,并且这个属性是多值的,因此不需要为time_slot实体单独创建一个表。
  • 直接创建与多值属性对应的表:将time_slot_detail表作为最终的存储结构,其字段包括time_slot_iddaystart_timeend_time

Warning

外键问题:由于进行了上述优化,section实体中的time_slot属性(来自sec_time_slot关系)不能作为一个外键。这是因为优化后,time_slot实体不再存在,相应的外键关系也无法建立。因为这时候time_slot_id 并不能唯一确定一个time_slot

6.5 Design Issues

6.5.1 Common Mistakes in E-R Diagrams

这一小节主要展示了设计时的一些错误与小技巧

这个就是属性重复而导致的错误

错误地使用关系属性,一门课不可能只有一份作业,可以这样改进

  • 增加一个新的实体 assignment, marks_in:表示学生在某个作业中的成绩。sec_assign:表示作业与课程章节之间的关联。
  • 或者直接将原本的属性改为多值属性

6.5.2 Use of entity sets vs. attributes

对于 instructor 的 phone_number 这一信息,我们可以直接将他存储为属性,也可以存储为一个新的实体,这样的好处在于我们能存储更多有关phone的信息了

6.5.3 Use of entity sets vs. relationship sets

我们之前联系 section 和 student 使用的是 takes, 我们也可以将其实体化,新增一个实体 registration

Possible guideline is to designate a relationship set to describe an action that occurs between entities 描述实体间的动作时常常使用关系集,但是处理更复杂的关系时,还是将其实体化更方便

6.5.4 Placement of relationship attributes

e.g., attribute date as attribute of access or as attribute of account

6.5.5 Binary Vs. Non-Binary Relationships

Binary Relationship:

  • 涉及两个实体之间的关系。
  • 更简单,易于理解和实现。

Non-Binary Relationships

  • 涉及三个或更多实体之间的关系。
  • 能更清晰地展示多个实体共同参与一个单一的关系。

尽管任何非二元关系都可以通过多个二元关系来替代,但n元关系能更直观地表达多个实体间的复杂联系。

非二元关系可能更适合用二元关系表示的情况

有些表面上看起来是多实体参与的关系,实际上可以通过分解成多个二元关系来更好地表示。例如:

父母关系(Parents):

  • 原始情况:一个孩子与父亲和母亲之间的三元关系。
  • 改进方式:可以拆分为两个二元关系——孩子与父亲、孩子与母亲。
  • 优点:允许部分信息存在,如只知道孩子的母亲而不知道父亲。

然而,并不是所有关系都能或应该被简化为二元关系。有些关系本质上就是多实体参与的,例如:项目指导(proj_guide):可能涉及项目、指导教师、学生等多个实体,这些实体之间存在复杂的交互,无法简单地用二元关系来表示。

In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set. 一般情况下非二元关系都可以转化为二元关系

Converting Non-Binary Relationships to Binary Form

Also need to translate constraints

  • Translating all constraints may not be possible, There may be instances in the translated schema that cannot correspond to any instance of R

Exercise: add constraints to the relationships RA, RB and RC to ensure that a newly created entity corresponds to exactly one entity in each of entity sets A, B and C

We can avoid creating an identifying attribute by making E a weak entity set (described shortly) identified by the three relationship sets E 的唯一性就由它与其他实体之间的关系来保证,而不需要额外的标识属性。

6.6 Extended ER Features

6.6.1 Specialization/Generalization

特化(Specialization): 特化是一种自上而下的设计过程,用于在实体集中创建具有独特特征的子分组。

  • 定义子分组:在较大的实体集中识别出具有特定属性或行为的子集。例如,在“员工”实体集中,可以进一步细分为“经理”和“普通员工”。
  • 属性继承:较低级别的实体集会继承其关联的较高级别实体集的所有属性和关系参与。这意味着“经理”不仅具有“员工”的所有属性(如姓名、ID等),还可能有额外的属性(如管理的部门)。

概化(Generalization): 概化是一种自下而上的设计过程,将多个具有相同特征的实体集合并成一个更高级别的实体集。

  • 合并实体集:将具有共同属性的多个实体集组合成一个更通用的实体集。例如,可以将“全职员工”和“兼职员工”合并为一个更广泛的“员工”实体集。
  • 共享属性:合并后的实体集包含所有子实体集的公共属性。例如,“员工”实体集将包含“全职员工”和“兼职员工”的共同属性,如姓名、ID等。

Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.

评论区

对你有帮助的话请给我个赞和 star => GitHub stars
欢迎跟我探讨!!!