Introduction The Database Approach to Data Storage

Chapter 6: Data-Base
Modeling and Applications
Introduction
We use the term Data Base to mean the
collected data sets that are organized and
stored as an integral part of a firm’s computerbased information system
Data Sets are flexible data structures that
include groupings of data that are logically
related
The Database Approach to
Data Storage
A Database is a set of computer files that
minimizes data redundancy and is accessed by
one or more application programs for data
processing
The database approach to data storage applies
whenever a database is established to serve two
or more applications, organizational units, or
types of users
A Database Management System (DBMS) is a
computer program that enables users to create,
modify, and utilize database information
efficiently
1
Characteristics of the
Database Approach
Data Independence - the separation of the
data from the various application programs and
other accesses by users
Data Standardization - data elements within a
database have standard definitions, thus stored
data are compatible with every application
program that accesses the data
One-Time Data Entry and Storage individual data values are entered into the
database only once; consequently, redundancy
is reduced and inconsistencies between data
elements are eliminated
Characteristics of the
Database Approach
Data Integration - data sets integrate the data,
which enables all affected data sets to be updated
simultaneously
Shared Data Ownership - all data within a
database are owned in common by the users. The
portion of the database that is of interest to each user
is known as the sub-schema
Centralized Data Management - the database
management system stands guard over the database
and presents the logical view to users and
application programs
Program-Data
Independence
Application
Program A
Database
Management
System
Database
Application
Program B
Figure 6-1
2
Questions for Database
Design and Construction
What data management perspective should be adopted?
What is the proposed system’s initial objective?
What systems and users will use the data?
Which existing or future systems will the proposed system
interface with?
How much data will be stored initially? In the future?
How many data accesses (reads and updates) will occur on an
hourly, daily, and monthly basis?
How can the data be organized, both logically and physically,
to best serve the users of the system?
Iterative Phases in Database
Development: Planning & Analysis
Planning
Cost-benefit Analysis
Effective usage Analysis
Analysis
Enterprise Diagram
User Requirements
Data requirements
⌧Firm’s operations and relationships
Development of logical design
⌧Expected output requirements
⌧Inputs
⌧Processes
⌧Appropriate Conceptual Model
⌧Data Modeling through Entity-Relationship Diagrams
Specification of logical view(s)
Designation of Primary and Secondary keys
Development of Data Dictionary
Iterative Phases in Database
Development: Detailed Design
Technical Specifications
Report Layouts
Data Flows
Screen Layouts
DBMS Selection
⌧Data Definition Language (DDL)
⌧Data Manipulation language (DML)
⌧Query language [Structured Query Language (SQL)
and/or Query by Example (QBE)]
⌧Data-base Control System (DBCS)
3
DBMS
Many DBMS packages allow users to:
Analyze Data
Prepare ad hoc or customized Reports
Create and Display Graphs
Create Customized Applications via
Programming Languages
Import and Export Data
Perform On-line Editing
Purge or Archive Obsolete Data
Backup data
Maintain Security Measures
Interface with Communication Networks
Iterative Phases in Database
Development: Post-Design Phases
Implementation
Testing
Unit Testing
System Testing
User Acceptance Test
Maintenance
DATA MODELING
So far have not considered the data
required to support the data flows and
processing
Data Modeling is a technique for
organising and documenting a system’s
Data.
4
Why is data modeling
considered crucial?
Data is a resource to be shared by as many
processes as possible, thus must be
organized in a way that’s flexible &
adaptable to unanticipated business
requirements
Data structures and properties are
reasonably permanent compared with the
processes that use the data.
Why is data modeling
considered crucial (cont)?
Data models are much smaller than
process and object models and can be
constructed more rapidly.
The process of constructing data models
helps analysts and users quickly reach
consensus on business terminology and
rules.
Conceptual Data Model
Model that captures overall structure of
organizational data
Independent of database management
systems
5
Conceptual Data Model
(cont)
Collect information from
Interviews
Questionnaires
JAD’s
Develop Conceptual data model (logic)
Data Modeling (cont)
There are several formats for data
modeling. (ERD, object oriented env.)
A common one is called an entity
relationship diagram (ERD) because it
depicts data in terms of the entities and
relationships described by the data.
INTRODUCTION TO E-R
MODELING
Entity Relationship Data Model
Detailed logical representation of
entities, associations, and data
elements for an organization or
business area
ERD
Graphical representation of ER model.
6
FUNDAMENTALS OF E-R
MODELING
Basic E-R Modeling notation uses three
main constructs
entities
associated attributes
relationships
ENTITIES
An ENTITY is something about which we
want to store data.
Synonyms include entity type and entity
class.
An entity is a class of persons, places,
objects, events, or concepts about which we
may need to capture and store data.
ENTITIES (cont)
Each entity is distinguishable from the
other entities.
Each entity represents a group of many
instances of that entity
An entity instance is a single occurrence
of an entity.
7
ATTRIBUTES
If an entity is something about which we
want to store data, then intuitively, we
need to identify what specific pieces of
data we want to store about each
instance of a given entity.
These pieces of data are ATTRIBUTES.
ATTRIBUTES (cont)
An attribute is a descriptive property or
characteristic of an entity.
Synonyms include element, property, and
field.
ATTRIBUTES (cont)
Some attributes can be logically grouped
into super-attributes called compound
attributes.
A compound attribute is one that actually
consists of more primitive attributes.
EG name = first name and family name
8
ATTRIBUTES (cont)
MULTIVALUED ATTRIBUTES
An attribute that may take on more than
one value for each entity instance
(sometimes termed a repeating group)
EG: a student may be enrolled in more
than one major
IDENTIFICATION
An entity typically has many instances,
Conceptually, there exists a need to uniquely
identify each instance based on the data value
of one or more attributes.
Thus, every entity must have an identifier or
key.
KEYS & IDENTIFIERS
A KEY is an attribute, or a group of
attributes, that assumes a unique value
for each entity instance.
An entity may have more than one key.
A candidate key is a “candidate to
become the primary identifier” of
instances of an entity.
9
KEYS & IDENTIFIERS (cont)
The candidate key selected as the
unique identifier is the primary key
Any candidate key that is not selected
to become the primary key is called an
alternate key.
A group of attributes that uniquely
identifies an instance of an entity is
called a concatenated key.
RELATIONSHIPS
Conceptually, entities and attributes do
not exist in isolation.
The things they represent interact with
and impact one another.
Thus we introduce the concept of a
RELATIONSHIP.
RELATIONSHIPS (cont)
A relationship is an association that exists
between one or more entities.
The relationship may represent an event that
links the entities or merely a logical affinity
that exits between the entities.
• A STUDENT IS ENROLLED in one or more degrees.
• A degree IS BEING STUDIED BY zero, one, or more
STUDENTS.
10
RELATIONSHIPS (cont)
The underlined verb phrases defines the
relationships that exist between the two
entities.
Notice that all relationships are implicitly
bidirectional, meaning they can be
interpreted in both directions
DEGREE OF A RELATIONSHIP
Another measure of the complexity of a
data relationships is its degree.
The degree of a relationship is the
number of entities that participate in the
relationships.
DEGREE OF A RELATIONSHIP
(Cont)
Relationships may also exist between
different instances the same entity.
We call this a recursive relationship or
UNARY RELATIONSHIP
the degree of the relationship = 1.
11
DEGREE OF A RELATIONSHIP
(Cont)
A relationships that exists between
instances of two different entities is a
BINARY RELATIONSHIP
the degree of the relationship = 2
DEGREE OF A RELATIONSHIP
(Cont)
Relationships can also exist between more
than two different entities.
These are sometimes called N-ary
relationships, EG ternary relationships
An N-ary relationships is illustrated with a
new entity construct called an associate
entity.
Relationships of Different Degrees
12
CARDINALITIES OF RELATIONSHIPS
Cardinality shows the complexity of
each relationship.
Cardinality defines the minimum and
maximum number of instances of one
entity that may be associated with
each instance of the related entity.
CARDINALITIES (Cont)
Since all relationships are bi-directional,
cardinality must be defined in both
directions for every relationships.
Thus must consider whether its
optional or mandatory ie min is 0 or 1
Sample ER Diagram
13
Associative Entities
Sometimes attributes may be associated
with a many-many relationship
An associative entity (gerund) is a
relationship that the data modeller
chooses to model as an entity type.
Associative Entities (cont)
You must turn the relationship into an
associative entity if the associative entity
in involved in relationships with additional
entities
Note the M:M is replaced by two
mandatory 1:M relationships
Relational Databases
In a relational database, data are perceived by
users to be structured in the form of simple flat
files or tables
Each table consists of records that are
comprised of a key and associated data
elements
In order to lay claim as a relational database, it
must do the following:
Present data to users as tables only
Support the relational algebra functions of
Restrict (Select), Project, and Join without
requiring any definitions of access paths to support
these operations
14
Relational Algebra Functions in
a Relational Database - Select
Select (Restrict): This function produces a
new table with only rows from a single
source table whose columns meet
prescribed conditions, e.g.,
Customer_Name=Adam Smith;
DOB=2/29/64; Legal Residence=California,
etc
Select
Cust
No.
Cust. Date Credit Legal
Name of
Limit Res.
Birth
1000
Adam
Smith
3-12-62 1000
CA
1010
Lord
2-29-64 2000
Keynes
TX
Relational Algebra Functions in
a Relational Database - Project
This function produces a new table with only
some columns from a single source table. e.g.,
Project Student table on Student_Name and
Student_Major
Student_Name
Student_Major
Estudiante Garcia
French
Madeleine Notallbright
International Relations
15
Relational Algebra Functions in a
Relational Database - Select & Project
The combination of Select and Project
produces a new table with both fewer
columns and rows than the original
table. e.g., Project on Student_Name
and Student_Major where
Student_Major = Latin
Select & Project
Student_Name
Penny Pasta
Connie Curry
Tony Lama
Student_Major
Latin
Greek
Tibetan
Student_Status
Senior
Freshman
Junior
Student_Name Student_Major
Penny Pasta
Latin
Relational Algebra Functions
in a Relational Database - Join
The Join function produces a new table
from two or more source tables that
have at least one common column
The new table is wider than either of the
two source tables because it contains all
the columns from both source tables
16
Join
Customer_Name
Customer_Code
John Doe
1001
Customer_Code
Credit_Limit
1001
10,000
Customer_ Customer_
Name
Code
Credit_Limit
John Doe 1001
10,000
+
=
Query Languages for a
Relational Database
Structured Query Language
(SQL)
SELECT CLIENT_NO, CLIENT_NAME,
PROJECT_NAME
FROM PROJ.TABL
WHERE CLIENT_NO = 531
Thanks.
17