Advanced Data Base Managementand Information Assurance
1
2
Grading
´ Catalogies
´ Assignments
40%
´ One Midterms
20%
´ Research Paper
15%
´ Group Project
25%
´ Letter Grades
´ 94% A, 90% A-,
´ 86% B+, 83% B, 80% B-,
´ 76% C+, 73% C, 70 C-,
´ 66 % D+, 63% D, 60% D-,
´ < 60% F
´ No Rounding Off at the final grade
3
´ Midterm, November 13, 2019
´ Group Project Presentation, December 4, 2019
4
Assignment Schedule
´ Research Paper Topics and references – October 2, 2019
´ Assignment 1– October 11, 2019
´ Assignment 2 -- October 25, 2019
´ Assignment 3 – November 8, 2019
´ Assignment 4 – November 22, 2019
´ Research Paper – November 6, 2019
5
Research Paper
´ You are required to write a 10-page research paper in MLA Style.
´ The paper topics includes
´ Security challenges in DBMS
´ Legal and ethical issues in database management
´ Utilization and impacts with business intelligence
´ Research Paper due November 6, 2019 via Blackboard
6
Fundamental Concepts of
Database Management
Chapter 1
9
Introduction
´ Application of Database Technology
´ Key definitions
´ File Versus database approach to data management
´ Elements of a database system
´ Advantages of database systems and database
management
10
What is Data?
´ Information in raw or unorganized form (such as
alphabets, numbers, or symbols) that refer to, or
represent, conditions, ideas, or objects. Data is limitless
and present everywhere in the universe.
´ Computers: Symbols or signals that are input, stored, and
processed by a computer, for output as usable
information.
11
Difficulties in Managing Data
´ Data increases exponentially with time
´ Multiple sources of data
´ New sources of data
´ Data rot, or data degradation
´ Data security, quality, and integrity
´ Government Regulation
12
Data Governance
´ An approach to managing information across an entire
organization.
´ Master Data Management
´ Master Data
13
DIKW Pyramid
´ Data
´ Information
´ Knowledge
´ Wisdom
14
DIKW Pyramid
15
Principles of Modern Data Architecture
´ Data is commodity and shared asset
´ Ensure privacy and proper use
´ Ensure security and access control
´ Information through data stewardship
´ Ensure a common vocabulary
´ Provide the right interface for consumption
´ Eliminate data copies and movement
16
What is Database?
´ Collection of data central to some enterprise
´ Essential to operation of enterprise
´ Contains the only record of enterprise activity
´ An asset in its own right
´ Historical data can guide enterprise strategy
´ Of Interest to other enterprises
´ State of database mirrors state of enterprise
´ Database is persistent
17
Applications of Database Technology
´ Storage and retrieval of traditional numeric and alphanumeric data
in an inventory application
´ Multimedia applications (e.g., YouTube, Spotify)
´ Biometric applications (e.g., fingerprints, retina scans)
´ Wearable applications (e.g., FitBit, Apple Watch)
´ Geographical information systems (GIS) applications (e.g., Google
Maps)
´ Sensor applications (e.g., nuclear reactor)
´ Big Data applications (e.g., Walmart)
´ Internet of Things (IoT) applications (e.g., Telematics)
18
Key Definitions
´ A database can be defined as a collection of related
data items within a specific business process or problem
setting
´ Has a target group of users and applications
´ A database management system (DBMS) is the software
package used to define, create, use, and maintain a
database
´ CONSISTS of several software modules
´ The combination of a DBMS and a database is then
often called a database system
19
File versus Database Approach to
Data Management
´ File approach
´ Data files only contain the data
´ Data definitions are included in each application
´ Application can make use of one or more files
´ Duplicate or redundant information will be stored
´ Danger of inconsistent data
´ Strong coupling between applications and data
´ Hard to manage concurrency control
´ Hard to integrate applications aimed at providing crosscompany services
20
File versus Database Approach to
Data Management
Invoicing
CRM
GIS
CustomerNr
CustomerName
VATcode
CustomerNr
CustomerName
Turnover
CustomerNr
CustomerName
ZipCode
Duplicate data!
21
File versus Database Approach to
Data Management
´ Database approach
´ Data are stored and managed centrally by a DBMS
´ Applications now directly interface with the DBMS
´ DBMS stores and manages two types of data: raw data and
metadata
´ Metadata are no longer included in the applications
´ Superior to the file approach in terms of efficiency, consistency,
and maintenance
´ Loose coupling between applications and data
´ Facilities provided for data querying and retrieval
22
File versus Database Approach to
Data Management
Invoicing
CRM
GIS
DBMS
Raw data
Catalog
23
File versus Database Approach to
Data Management
File approach
Database approach (SQL)
Procedure FindCustomer;
SELECT *
begin
open file Customer.txt;
FROM Customer
Read(Customer)
While not EOF(Customer)
If Customer.name='Bart' then
display(Customer);
EndIf
Read(Customer);
EndWhile;
End;
WHERE
name = 'Bart'
24
Elements of a Database System
´Database model versus instances
´Data model
´The three-layer architecture
´Catalog
´Database users
´Database languages
25
Database Model versus Instances
´ Database model or database schema provides the description of
the database data at different levels of detail and specifies the
various data items, their characteristics and relationships,
constraints, storage details, etc.
´ Specified during database design and not expected to change
too frequently
´ Stored in the catalog
´ Database state represents the data in the database at a particular
moment
´ Also called the current set of instance
´ Typically changes on an ongoing basis
26
Database Model versus Instances
´ Database model
Student (number, name, address, email)
Course
(number, name)
Building (number, address)
27
Database Model versus Instances
´ Database state
28
Data Model
´ A database model comprises different data models,
each describing the data from different perspectives
´ A data model provides a clear and unambiguous
description of the data items, their relationships, and
various data constraints from a particular perspective
29
Data Model
´ A conceptual data model provides a high-level description of the
data items with their characteristics and relationships
´ Communication instrument between information architect and
business user
´ Should be implementation-independent, user-friendly, and close
to how the business user perceives the data
´ Usually represented using an enhanced-entity relationship (EER)
model, or an object-oriented model
´ Logical data model is a translation or mapping of the conceptual
data model toward a specific implementation environment
´ Can be a hierarchical, CODASYL, relational, object-oriented,
extended relational, XML, or NoSQL model
30
Data Model
´ Logical data model can be mapped to an internal data model
that represents the data’s physical storage details
´ Clearly describes which data atr stored where, in what format,
which indexes are provided to speed up retrieval, etc.
´ Highly DBMS specific
´ External data model contains various subsets of the data items in
the logical model, also called views, tailored toward the needs of
specific applications or groups of users
31
The Three-Layer Architecture
32
The Three-Layer Architecture
´ Conceptual\logical layer
´ Contains the conceptual and logical data models
´ External layer
´ External data model which includes views
´ Used to control data access and enforce security
´ Internal layer
´ Includes the internal data model
´ Changes in one layer should have no to minimal impact on the
others
´ Physical data independence
´ Logical data independence
33
The Three-Layer Architecture
34
Catalog
´ Heart of the DBMS
´ Contains the data definitions, or metadata, of your
database application
´ Stores the definitions of the views, logical and internal
data models, and synchronizes these three data models
to make sure their consistency is guaranteed
35
Database Users
´ Information architect designs the conceptual data model
´ Closely interacts with the business user
´ Database designer translates the conceptual data model into a
logical and internal data model
´ Database administrator (DBA) is responsible for the implementation
and monitoring of the database
´ Application developer develops database applications in a
programming language such as Java or Python
´ Business user will run these applications to perform specific
database operations
36
Database Languages
´ Data Definition Language (DDL) is used by the DBA to express
the database’s external, logical, and internal data models
´ Definitions are stored in the catalog
´ Data Manipulation Language (DML) is used to retrieve, insert,
delete, and modify data
´ DML statements can be embedded in a programming
language, or entered interactively through a front-end
querying tool
´ Structured Query Language (SQL) offers both DDL and DML
statements for relational database systems
37
Advantages of Database Systems and
Database Management
´ Data independence
´ Database modeling
´ Managing data redundancy
´ Specifying integrity rules
´ Concurrency control
´ Backup and recovery facilities
´ Data security
´ Performance utilities
38
Data Independence
´ Data independence implies that changes in data definitions have
minimal to no impact on the applications
´ Physical data independence implies that neither the applications, nor
the views or logical data model must be changed when changes are
made to the data storage specifications in the internal data model
´ DBMS should provide interfaces between logical and internal data
models
´ Logical data independence implies that software applications are
minimally affected by changes in the conceptual or logical data
model
´ Views in the external data model will act as a protective shield
´ DBMS must provide interfaces between conceptual/logical and
external layer
39
Database Modeling
´ A data model is an explicit representation of the data items
together with their characteristics and relationships
´ A conceptual data model should provide a formal and
perfect mapping of the data requirements of the business
process and is made in collaboration with the business user
´ Translated into logical and internal data model
´ It is important that a data model’s assumptions and
shortcomings are clearly documented
40
Managing Data Redundancy
´ Duplication of data can be desired in distributed
environments to improve data retrieval performance
´ DBMS is now responsible for the management of the
redundancy by providing synchronization facilities to
safeguard data consistency
´ Compared to the file approach, the DBMS guarantees
correctness of the data without user intervention
41
Specifying Integrity Rules
´ Syntactical rules specify how the data should be represented and
stored
´ e.g., customerID is an integer; date of birth should be stored as
month, day, and year
´ Semantical rules focus on the semantical correctness or meaning of the
data
´ e.g., customerID is unique; account balance should be > 0;
customer cannot be deleted if he/she has pending invoices
´ Integrity rules are specified as part of the conceptual/logical data
model and stored in the catalog
´ Directly enforced by the DBMS instead of applications
42
Concurrency Control
´ DBMS has built-in facilities to support concurrent or parallel
execution of database programs
´ Key concept is a database transaction
´ Sequence of read/write operations considered to be an
atomic unit in the sense that either all operations are
executed or none at all
´ Read/write operations can be executed at the same time by
the DBMS
´ DBMS should avoid inconsistencies!
43
Concurrency Control
´ Lost update problem
44
Concurrency Control
´ DBMS must support ACID (Atomicity, Consistency, Isolation,
Durability) properties
´ Atomicity requires that a transaction should either be executed
in its entirety or not at all
´ Consistency assures that a transaction brings the database from
one consistent state to another
´ Isolation ensures that the effect of concurrent transactions
should be the same as if they would have been executed in
isolation
´ Durability ensures that the database changes made by a
transaction declared successful can be made permanent under
all circumstances
45
Backup and Recovery Facilities
´ Backup and recovery facilities can be used to deal with the
effect of loss of data due to hardware or network errors, or
bugs in system or application software
´ Backup facilities can either perform a full or incremental
backup
´ Recovery facilities allow restoration of the data to a previous
state after loss or damage occurs
46
Data Security
´ Data security can be enforced by the DBMS
´ Some users have read access, while others have write access
to the data (role-based functionality)
´ e.g., vendor-managed inventory (VMI)
´ Data access can be managed via logins and passwords
assigned to users or user accounts
´ Each account has its own authorization rules that can be
stored in the catalog
47
Performance Utilities
´ There are three KPIs of a DBMS
´ Response time, denoting the time elapsed between issuing a
database request and the successful termination thereof
´ Throughput rate, representing the transactions a DBMS can
process per unit of time
´ Space utilization, referring to the space utilized by the DBMS to
store both raw data and metadata
´ DBMSs come with various types of utilities aimed at improving these
KPIs
´ e.g., utilities to distribute and optimize data storage, to tune
indexes for faster query execution, to tune queries to improve
application performance, or to optimize buffer management
48
Conclusions
´ Applications of database technology
´ Key definitions
´ File versus database approach to data management
´ Elements of a database system
´ Advantages of database systems and database
management
Architecture and
Classification of DBMSs
Chapter Two
1
Introduction
´Architecture of a DBMS
´Categorization of DBMSs
2
Architecture of a DBMS
3
Architecture of a DBMS
´Connection and security manager
´DDL compiler
´Query processor
´Storage manager
´DBMS utilities
´DBMS interfaces
4
Connection and Security Manager
´ Connection manager provides facilities to set up a
database connection (locally or through a network)
´Verifies logon credentials and returns a connection
handle
´Database connection can either run as a single
process or as a thread within a process
´ Security manager verifies whether a user has the right
privileges
´Read versus write access
5
DDL Compiler
´ Compiles the data definitions specified in DDL
´ Ideally three DDLs (internal/logical/external data model)
´ DDL compiler first parses the DDL definitions and checks
their syntactical correctness
´ DDL compiler then translates the data definitions to an
internal format and generates errors if required
´ Upon successful compilation, the DDL compiler registers
the data definitions in the catalog
6
Query Processor
´ Query processor assists in the execution of database
queries such as retrieval, insertion, update, or removal of
data
´ Key components:
´DML compiler
´Query parser
´Query rewriter
´Query optimizer
´Query executor
7
DML Compiler
´ DML compiler compiles the DML statements
´ Procedural DML
´ DML explicitly specifies how to navigate in the database
´ record-at-a-time DML
´ no query processor
´ Declarative DML
´ DML specifies what data should be retrieved or what changes
should be made
´ set-at-a-time DML
´ query processor
8
DML Compiler
import java.sql.*;
public class JDBCExample1 {
public static void main(String[] args) {
try {
System.out.println(“Loading JDBC driver…”);
Class.forName(“com.mysql.jdbc.Driver”);
System.out.println(“JDBC driver loaded!”);
} catch (ClassNotFoundException e) {
throw new RuntimeException(e);
}
String url =
“jdbc:mysql://localhost:3306/employeeschema”;
String username = “root”;
String password = “mypassword123”;
String query = “select E.Name, D.DName” +
“from employee E, department D” +
“where E.DNR=D.DNR;”;
Connection connection = null;
Statement stmt=null;
try {
System.out.println(“Connecting to database”);
connection = DriverManager.getConnection(url,
username, password);
System.out.println(“MySQL Database connected!”);
stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(query);
while (rs.next()) {
System.out.print(rs.getString(1));
System.out.print(” “);
System.out.println(rs.getString(2));
}
stmt.close();
} catch (SQLException e) {
System.out.println(e.toString());
} finally {
System.out.println(“Closing the connection.”);
if (connection != null) {
try {
connection.close();
} catch (SQLException ignore) {}}}}
9
DML Compiler
´ Impedance mismatch problem
´mapping between OO (e.g., Java) and relational
(e.g., SQL) concepts
´ Impedance mismatch solutions
´host language and DBMS with comparable data
structures (e.g., Java and OODBMS)
´middleware to map the data structures from the
DBMS to the host language and vice versa
10
DML Compiler
Java
public class Employee {
private int EmployeeID;
private String Name;
private String Gender;
private int DNR;
public int getEmployeeID() {
return EmployeeID;
}
public void setEmployeeID( int id ) {
this.EmployeeID = id;
}
public String getName() {
return Name;
}
public void setName( String name ) {
this.Name = name;
}
…}
SQL
CREATE TABLE Employee (
‘EmployeeID’ INT NOT NULL,
‘Name’ VARCHAR(45) NULL,
‘Gender’ VARCHAR(45) NULL,
‘DNR’ INT NULL)
EmployeeID
Name
Gender
DNR
100
Bart Baesens
Male
2
110
Wilfried Lemahieu
Male
4
120
Seppe vanden Broucke
Male
6
…
11
DML Compiler
´DML compiler starts by extracting the DML
statements from the host language.
´DML compiler then collaborates with the query
parser, query rewriter, query optimizer, and
query executor for executing the DML
statements
´Errors are generated and reported if necessary
12
Query Parser and Query Rewriter
´Query parser parses the query into an internal
representation format
´Query parser checks the query for syntactical
and semantical correctness
´Query rewriter optimizes the query,
independently of the current database state
13
Query Optimizer
´ Query optimizer optimizes the query based upon the current
database state (based upon e.g., predefined indexes)
´ Query optimizer comes up with various query execution plans and
evaluates their cost in terms of estimated
´ number of I/O operations
´ CPU processing cost
´ execution time
´ Estimates based on catalog information combined with statistical
inference
´ Query optimizer is a key competitive asset of a DBMS
14
Query Executor
´Result of the query optimization is a final
execution plan
´Query executor takes care of the actual
execution by calling on the storage manager to
retrieve the data requested
15
Storage Manager
´Storage manager governs physical file access
and supervises the correct and efficient storage
of data
´Storage manager consists of
´transaction manager
´buffer manager
´lock manager
´recovery manager
16
Transaction Manager
´ Transaction manager supervises execution of database
transactions
´ a database transaction is a sequence of read/write
operations, considered to be an atomic unit
´ Transaction manager creates a schedule with interleaved
read/write operations
´ Transaction manager guarantees ACID properties
´ COMMIT a transaction upon successful execution and
ROLLBACK a transaction upon unsuccessful execution
17
Buffer Manager
´ Buffer manager manages buffer memory of the DBMS
´ Buffer manager intelligently caches data in the buffer
´ Example strategies:
´Data locality: data recently retrieved are likely to be
retrieved again
´20/80 law: 80% of the transactions read or write only
20% of the data
´ Buffer manager needs to adopt smart replacement
strategy in case buffer is full
´ Buffer manager needs to interact with lock manager
18
Lock Manager
´ Lock manager provides concurrency control which
ensures data integrity at all times
´ Two types of locks: read and write locks
´ Lock manager is responsible for assigning, releasing, and
recording locks in the catalog
´ Lock manager makes use of a locking protocol which
describes the locking rules, and a lock table with the
lock information
19
Recovery Manager
´ Recovery manager supervises the correct execution of
database transactions
´ Recovery manager keeps track of all database
operations in a logfile
´ Recovery manager will be called upon to undo actions
of aborted transactions or during crash recovery
20
DBMS Utilities
´ Loading utility
´ Reorganization utility
´ Performance-monitoring utilities
´ User management utilities
´ Backup and recovery utility
21
DBMS Interfaces
´ Web-based interface
´ Stand-alone query language interface
´ Command-line interface
´ Forms-based interface
´ Graphical user interface
´ Natural language interface
´ Admin interface
´ Network interface
22
DBMS Interfaces
Navigator
window
Query window
Results
window
Log window
23
Categorization of DBMSs
´ Categorization based on data model
´ Categorization based on degree of simultaneous access
´ Categorization based on architecture
´ Categorization based on usage
24
Categorization Based on Data Model
´ Hierarchical DBMSs
´ adopt a tree-like data model
´ DML is procedural and record-oriented
´ no query processor (logical and internal data model intertwined)
´ e.g., IMS (IBM)
´ Network DBMSs
´ use a network data model
´ CODASYL DBMSs
´ DML is procedural and record-oriented
´ no query processor (logical and internal data model intertwined)
´ e.g., CA-IDMS (Computer Associates)
25
Categorization Based on Data Model
´ Relational DBMSs
´ use the relational data model
´ currently the most popular in industry
´ SQL (declarative and set-oriented)
´ query processor
´ strict separation between the logical and internal data model
´ e.g., MySQL (open source, Oracle), Oracle DBMS (Oracle), DB2
(IBM), Microsoft SQL (Microsoft)
26
Categorization Based on Data Model
´ Object-Oriented DBMSs (OODBMS)
´Based upon the OO data model
´No impedance mismatch in combination with OO
host language
´e.g., db4o (open source, Versant), Caché
(Intersystems), GemStone/S (GemTalk Systems)
´only successful in niche markets due to their
complexity
27
Categorization Based on Data Model
´ Object-relational DBMSs (ORDBMSs)
´Also referred to as extended relational DBMSs
(ERDBMSs)
´Use a relational model extended with OO concepts
´DML is SQL (declarative and set-oriented)
´e.g., Oracle DBMS (Oracle), DB2 (IBM), Microsoft SQL
(Microsoft)
28
Categorization Based on Data Model
´ XML DBMSs
´Use the XML data model to store data
´Native XML DBMSs (e.g., BaseX, eXist) map the tree
structure of an XML document to a physical storage
structure
´XML-enabled DBMSs (e.g., Oracle, IBM DB2) are
existing DBMSs that are extended with facilities to
store XML data
29
Categorization Based on Data Model
´ NoSQL DBMSs
´Targeted at storing big and unstructured data
´Can be classified into key–value stores, columnoriented databases and graph databases
´Focus on scalability and the ability to cope with
irregular or highly volatile data structures
´e.g., Apache Hadoop, MongoDB, Neo4j
30
Categorization Based Upon Degree of
Simultaneous Access
´ Single-user versus multi-user systems
incoming
connections
dispatcher
server
instance 1
server
instance 2
server
instance 3
database
31
Categorization Based on Architecture
´ Centralized DBMS architecture
´ data is maintained on a centralized server
´ Client server DBMS architecture
´ active clients request services from passive servers
´ fat server versus fat client variant
´ n-tier DBMS architecture
´ client with GUI functionality, application server with applications,
database server with DBMS and database, and web server for web
based access
32
Categorization Based on Architecture
´ Cloud DBMS architecture
´DBMS and database are hosted by a third-party
cloud provider
´e.g., Apache Cassandra project and Google’s
BigTable
´ Federated DBMS
´Provides a uniform interface to multiple underlying
data sources
´Hides the underlying storage details to facilitate data
access
33
Categorization Based on Architecture
´ In-memory DBMS
´Stores all data in internal memory instead of slower
external storage (e.g., disk)
´Often used for real-time purposes
´e.g., HANA (SAP)
34
Categorization Based on Usage
´ On-Line Transaction Processing (OLTP)
´ Focus on managing operational or transactional data
´ Database server must be able to process lots of simple transactions per
unit of time
´ DBMS must have good support for processing a high volume of short,
simple queries
´ On-Line Analytical Processing (OLAP)
´ Focus on using operational data for tactical or strategical decisionmaking
´ Limited number of users formulates complex queries
´ DBMS should support efficient processing of complex queries which
often come in smaller volumes
35
Categorization Based on Usage
´ Big Data and analytics
´ NoSQL databases
´ Focus on more flexible, or even schema-less, database
structures
´ Store unstructured information such as emails, text
documents, Twitter tweets, Facebook posts, etc.
´ Multimedia
´ Multimedia DBMSs provide storage of multimedia data
such as text, images, audio, video, 3D games, etc.
´ Should also provide content-based query facilities
36
Categorization Based on Usage
´ Spatial applications
´Spatial DBMSs support storage and querying of spatial
data (both 2D and 3D)
´Geographical information systems (GIS)
´ Sensoring
´Sensor DBMSs manage sensor data such as biometric
data from wearables, or telematics data
37
Categorization Based on Usage
´ Mobile
´ Mobile DBMSs run on smartphones, tablets, or other mobile
devices
´ Should always be online, have a small footprint, and be able
to deal with limited processing power, storage, and battery
life
´ Open source
´ Code of open source DBMSs is publicly available and can be
extended by anyone
´ See www.sourceforge.net
´ e.g., MySQL (Oracle)
38
Conclusions
´Architecture of a DBMS
´Categorization of DBMSs
Organizational Aspects of
Data Management
Chapter Four
1
Introduction
´Data Management
´Roles in data management
2
Data Management
´Catalogs and the role of metadata
´Metadata modeling
´Data quality
´Data governance
3
Catalogs and the Role of Metadata
´Just as raw data, also metadata are data that
need to be properly modeled, stored, and
managed
´Concepts of data modeling should also be
applied to metadata
´In a DBMS approach, metadata are stored in a
catalog (aka data dictionary, data repository),
which constitutes the heart of the database
system
4
Catalogs and the Role of Metadata
´ The catalog provides an important source of information for
end-users, application developers, as well as the DBMS itself
´ Catalog should provide:
´ an extensible metamodel
´ import/export facilities
´ support for maintenance and re-use of metadata
´ monitoring of integrity rules
´ facilities for user access
´ statistics about the data and their usage for the DBA and
query optimizer
5
Metadata Modeling
´A metamodel is a data model for metadata
´A database design process can be used to
design a database storing metadata
´Design a conceptual model of the metadata:
EER model or UML model
6
Metadata Modeling
7
Data Quality
´Data quality (DQ) is often defined as “fitness for
use,” which implies the relative nature of the
concept
´Data quality determines the intrinsic value of the
data to the business
´Poor DQ impacts organizations in many ways
8
Data Quality
´DQ is a multidimensional concept in which each
dimension represents a single aspect or
construct, comprising both objective and
subjective perspectives
´A DQ framework categorizes the different
dimensions of data quality
´Example: Wang et al. (1996)
´Four categories: intrinsic, contextual,
representation, access
Data Quality
Category DQ dimensions Definitions
Accuracy
The extent to which data is certified, error-free, correct,
flawless and reliable
Intrinsic
9
Objectivity
The extent to which data is unbiased, unprejudiced,
based on facts and impartial
Reputation
The extent to which data is highly regarded in terms of
its sources or content
Data Quality
10
Category DQ dimensions Definitions
Completeness The extent to which data is not missing and covers the needs of the tasks
and is of sufficient breadth and depth of the task at hand
Contextual
Appropriate-
The extent to which the volume of data is appropriate for the task at hand
amount
Value-added The extent to which data is beneficial and provides advantages from its use
Relevance
The extent to which data is applicable and helpful for the task at hand
Timeliness
The extent to which data is sufficiently up-to-date for the task at hand
Actionable
The extent to which data is ready for use
11
Data Quality
Category DQ dimensions
Interpretable
Definitions
The extent to which data is in appropriate languages,
Representation
symbols and the definitions are clear
Easily-understandable
The extent to which data is easily comprehended
Consistency
The extent to which data is continuously presented in the
same format
Concisely-represented (CR) The extent to which data is compactly represented, wellpresented, well-organized, and well-formatted
Alignment
The extent to which data is reconcilable (compatible)
Data Quality
12
Category DQ dimensions
Accessibility
Definitions
The extent to which data is available, or easily
Access
and swiftly retrievable
Security
The extent to which access to data is restricted
appropriately to maintain its security
Traceability
The extent to which data is traceable to the
source
13
Data Quality
´ Accuracy refers to whether the data values stored for an object are
the correct values
´ Often correlated with other DQ dimensions
´ Completeness can be viewed from at least three perspectives:
´ Schema completeness: refers to the degree to which entity
types and attribute types are missing from the schema
´ Column completeness: refers to the degree to which there exist
missing values in a column of a table
´ Population completeness: refers to the degree to which the
necessary members of a population are present or not
14
Data Quality
´The consistency dimension can also be viewed
from several perspectives:
´Consistency of redundant or duplicated data
in one table or in multiple tables
´Consistency between two related data
elements
´Consistency of format for the same data
element used in different tables
15
Data Quality
´The accessibility dimension reflects the ease of
retrieving the data from the underlying data
sources
´Often involves a tradeoff with security
16
Data Quality
´ Common causes of bad data quality are:
´ Multiple data sources: multiple sources with the same data may
produce duplicates; a problem of consistency
´ Subjective judgment in data production: data production using human
judgment can result in biased information; a problem of objectivity
´ Limited computing resources: lack of sufficient computing resources
may limit the accessibility of relevant data; a problem of accessibility
´ Volume of data: Large volumes of stored data make it difficult to
access needed information in a reasonable time; a problem of
accessibility
´ Changing data needs: data requirements change on an ongoing basis;
a problem of relevance
´ Different processes updating the same data; a problem of consistency
17
Data Governace
´ To manage and safeguard data quality, a data governance
culture should be put in place assigning clear roles and
responsibilities
´ Different frameworks have been introduced for data quality
management and data quality improvement
´ Examples: Total Data Quality Management (TDQM), Total Quality
Management (TQM), Capability Maturity Model Integration
(CMMI), ISO 9000, Control Objectives for Information and
Related Technology (CobiT), Data Management Body of
Knowledge (DMBOK), Information Technology Infrastructure
Library (ITIL), and Six Sigma
18
Data Governance
• Define
• Improve
• Measure
Identify
pertinent
DQ
dimensions
Assess DQ
level along
DQ
dimensions
Present
improveme
nt actions
Investigate
DQ
problems
and
analyze
root causes
• Analyze
19
Roles in Data Management
´Information architect (aka information analyst)
´Responsible for designing the conceptual
data model
´Bridges the gap between the business
processes and the IT environment
´Closely collaborates with the database
designer, who may assist in choosing the type
of conceptual data model (e.g., EER or UML)
and the database modeling tool
20
Roles in Data Management
´Database designer
´Translates the conceptual data model into a
logical and internal data model
´Also assists the application developers in
defining the views of the external data model
´Defines company-wide uniform naming
conventions when creating the various data
models
21
Roles in Data Management
´ Data owner
´Has the authority to ultimately decide on the access
to, and usage of, the data
´Should be able to insert or update data
´ Data steward
´DQ experts in charge of ensuring the quality of both
the actual business data and the metadata
´Also in charge of taking DQ initiatives such as
corrective and/or preventive measures
22
Roles in Data Management
´ Database administrator (DBA)
´ Responsible for the implementation and monitoring of the
database
´ Closely collaborates with network and system managers
´ Also interacts with database designers
´ Data scientist
´ Responsible for analyzing data using state-of-the-art analytical
techniques to provide new insights into, e.g., customer behavior
´ Has a multidisciplinary profile combining ICT skills with
quantitative modeling, business understanding, communication,
and creativity
23
Conclusions
´ Data Management
´ Roles in data management
Relational Databases:
Structured Query Language
(SQL)
Chapter Seven
1
Introduction
´ Relational database management systems and SQL
´ SQL Data Definition Language
´ SQL Data Manipulation Language
´ SQL views
´ SQL indexes
´ SQL privileges
´ SQL for metadata management
2
Relational Database Management
Systems and SQL
´ Key Characteristics of SQL
´ Three-Level Database Architecture
3
Key Characteristics of SQL
´ First version, SQL-86 in 1986, most recent version in 2011
(SQL:2011)
´ Accepted by the American National Standards Institute
(ANSI) in 1986 and by the International Organization for
Standardization (ISO) in 1987
´ Each vendor provides its own implementation (also
called SQL dialect) of SQL
4
Key Characteristics of SQL
´Set-oriented and declarative
´Free-form language
´Case insensitive
´Can be used both interactively from a
command prompt or executed by a program
5
Key Characteristics of SQL
6
Key Characteristics of SQL
7
Three-Level Database Architecture
8
SQL Data Definition Language
´ Key DDL concepts
´ DDL example
´ Referential integrity constraints
´ DROP and ALTER command
9
Key DDL Concepts
´ SQL schema is a grouping of tables and other database
objects such as views, constraints, and indexes which
logically belong together
CREATE SCHEMA PURCHASE AUTHORIZATION BBAESENS
´ SQL table implements a relation from the relational
model
CREATE TABLE PRODUCT …
CREATE TABLE PURCHASE.PRODUCT …
10
Key DDL Concepts
Data type
Description
CHAR(n)
Holds a fixed-length string with size n
VARCHAR(n)
Holds a variable-length string with maximum size n
SMALLINT
Small integer (no decimal) between -32768 and 32767
INT
Integer (no decimal) between -2147483648 and 2147483647
FLOAT(n,d)
Small number with a floating decimal point. The total maximum number of digits is n with a maximum of d digits to
the right of the decimal point
DOUBLE(n,d)
Large number with a floating decimal point. The total maximum number of digits is n with a maximum of d digits to
the right of the decimal point
DATE
Date in format YYYY-MM-DD
DATETIME
Date and time in format YYYY-MM-DD HH:MI:SS
TIME
Time in format HH:MI:SS
BOOLEAN
True or false
BLOB
Binary large object (e.g., image, audio, video)
11
Key DDL Concepts
CREATE DOMAIN PRODTYPE_DOMAIN AS VARCHAR(10)
CHECK (VALUE IN (‘white’, ‘red’, ‘rose’,
‘sparkling’))
12
Key DDL Concepts
´ Column constraints
´ PRIMARY KEY constraint defines the primary key of the
table
´ FOREIGN KEY constraint defines a foreign key of a table
´ UNIQUE constraint defines an alternative key of a table
´ NOT NULL constraint prohibits null values for a column
´ DEFAULT constraint sets a default value for a column
´ CHECK constraint defines a constraint on the column
values
13
DDL Example
SUPNR
0..M
supname
SUPPLIES
1..1
ON_ORDER
PRODNR
prod- supsup prod
SUPPLIER
supaddress
deliv_period
purchase_price
supstatus
supcity
po-sup
sup-po
prodname
0..N
PRODUCT
1..N
0..N
PURCHASE
ORDER
PONR
podate
PO_LINE
0..N
prodtype
prod- popo
prod
quantity
available
quantity
14
DDL Example
SUPPLIER(SUPNR, SUPNAME, SUPADDRESS, SUPCITY, SUPSTATUS)
PRODUCT(PRODNR, PRODNAME, PRODTYPE,
AVAILABLE_QUANTITY)
SUPPLIES(SUPNR, PRODNR, PURCHASE_PRICE, DELIV_PERIOD)
PURCHASE_ORDER(PONR, PODATE, SUPNR)
PO_LINE(PONR, PRODNR, QUANTITY)
15
DDL Example
CREATE TABLE SUPPLIER
(SUPNR CHAR(4) NOT NULL PRIMARY KEY,
SUPNAME VARCHAR(40) NOT NULL,
SUPADDRESS VARCHAR(50),
SUPCITY VARCHAR(20),
SUPSTATUS SMALLINT)
CREATE TABLE PRODUCT
(PRODNR CHAR(6) NOT NULL PRIMARY KEY,
PRODNAME VARCHAR(60) NOT NULL,
CONSTRAINT UC1 UNIQUE(PRODNAME),
PRODTYPE VARCHAR(10),
CONSTRAINT CC1 CHECK(PRODTYPE IN (‘white’, ‘red’, ‘rose’,’sparkling’)),
AVAILABLE_QUANTITY INTEGER)
16
DDL Example
CREATE TABLE SUPPLIES
(SUPNR CHAR(4) NOT NULL,
PRODNR CHAR(6) NOT NULL,
PURCHASE_PRICE DOUBLE(8,2)
COMMENT ‘PURCHASE_PRICE IN EUR’,
DELIV_PERIOD TIME
COMMENT ‘DELIV_PERIOD IN DAYS’,
PRIMARY KEY (SUPNR, PRODNR),
FOREIGN KEY (SUPNR) REFERENCES SUPPLIER (SUPNR)
ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (PRODNR) REFERENCES PRODUCT (PRODNR)
ON DELETE CASCADE ON UPDATE CASCADE)
17
DDL Example
CREATE TABLE PURCHASE_ORDER
(PONR CHAR(7) NOT NULL PRIMARY KEY,
PODATE DATE,
SUPNR CHAR(4) NOT NULL,
FOREIGN KEY (SUPNR) REFERENCES SUPPLIER (SUPNR)
ON DELETE CASCADE ON UPDATE CASCADE)
CREATE TABLE PO_LINE
(PONR CHAR(7) NOT NULL,
PRODNR CHAR(6) NOT NULL,
QUANTITY INTEGER,
PRIMARY KEY (PONR, PRODNR),
FOREIGN KEY (PONR) REFERENCES PURCHASE_ORDER (PONR)
ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (PRODNR) REFERENCES PRODUCT (PRODNR)
ON DELETE CASCADE ON UPDATE CASCADE)
18
Referential Integrity Constraints
´ Foreign key has the same domain as the primary key it refers to and either
occurs as a value of it or NULL
´ What should happen to foreign keys in case a primary key is updated or
deleted?
´ Options:
´ ON UPDATE/DELETE CASCADE: update/removal should be cascaded to
all referring tuples
´ ON UPDATE/DELETE RESTRICT: update/removal is halted if referring tuples
exist
´ ON UPDATE/DELETE SET NULL: foreign keys in the referring tuples are set
to NULL
´ ON UPDATE/DELETE SET DEFAULT: foreign keys in the referring tuples are
set to their default value
19
Referential Integrity Constraints
Supplier
SUPNR
21
32
37
52
68
69
SUPNAME
Deliwines
Best Wines
Ad Fundum
Spirits & co.
The Wine Depot
Vinos del Mundo
SUPADDRESS
240, Avenue of the Americas
660, Market Street
82, Wacker Drive
928, Strip
132, Montgomery Street
4, Collins Avenue
PRODNR
0178
0185
0468
0795
SUPSTATUS
20
90
95
NULL
10
92
Purchase_Order
Supplies
SUPNR
37
37
37
37
SUPCITY
New York
San Francisco
Chicago
Las Vegas
San Francisco
Miami
PURCHASE_PRICE
16.99
32.99
14.00
20.99
DELIV_PERIOD
4
3
1
3
PONR
1511
1513
1523
1577
1594
PODATE
2015-03-24
2015-04-11
2015-04-19
2015-05-10
2015-05-13
SUPNR
37
37
37
37
37
20
DROP and ALTER Command
´ DROP command can be used to drop or remove
database objects
´Can also be combined with CASCADE and RESTRICT
´ Examples:
DROP SCHEMA PURCHASE CASCADE
DROP SCHEMA PURCHASE RESTRICT
DROP TABLE PRODUCT CASCADE
DROP TABLE PRODUCT RESTRICT
21
DROP and ALTER Command
´ ALTER statement can be used to modify table column
definitions
´ Examples:
ALTER TABLE PRODUCT ADD PRODIMAGE BLOB
ALTER TABLE SUPPLIER ALTER SUPSTATUS SET DEFAULT
’10’
22
DROP and ALTER Command
SUPPLIER
SUPNR SUPNAME
21
Deliwines
32
Best Wines
…
PRODNR
0119
0154
…
SUPADDRESS
240, Avenue of the Americas
660, Market Street
PRODUCT
PRODNAME
Chateau Miraval, Cotes de Provence Rose, 2015
Chateau Haut Brion, 2008
SUPCITY
SUPSTATUS
New York
20
San Francisco 90
PRODTYPE
rose
red
red
AVAILABLE_QUANTITY
126
111
5
SUPPLIES
SUPNR PRODNR PURCHASE_PRICE DELIV_PERIOD
21
0289
17.99
1
21
0327
56.00
6
…
PURCHASE_ORDER
PONR PODATE
1511 2015-03-24
1512 2015-04-10
…
SUPNR
37
94
PO_LINE
PONR PRODNR QUANTITY
1511 0212
2
1511 0345
4
…
23
SQL Data Manipulation Language (SQL
DML)
´ SQL SELECT statement
´ SQL INSERT statement
´ SQL DELETE statement
´ SQL UPDATE statement
24
SQL SELECT Statement
´ Overview
´ Simple queries
´ Queries with aggregate functions
´ Queries with GROUP BY/HAVING
´ Queries with ORDER BY
´ Join queries
´ Nested queries
´ Correlated queries
´ Queries with ALL/ANY
´ Queries with EXISTS
´ Queries with subqueries in FROM/WHERE
´ Queries with set operations
25
Overview
SELECT component
FROM component
[WHERE component]
[GROUP BY component]
[HAVING component]
[ORDER BY component]
26
Overview
´ The result of an SQL SELECT statement is a multiset, and
not a set!
´ In a multiset (aka bag), the elements are not ordered
but there can be duplicates
´ Examples: set {10, 5, 20} and multiset {10, 5, 10, 20, 5, 10}
´ SQL does not eliminate duplicates
´Duplicate elimination is expensive
´User may want to see duplicate tuples
´Duplicates may be considered by aggregate
functions
27
Simple Queries
´ SQL statements that retrieve data from only one table
Q1: SELECT SUPNR, SUPNAME,SUPADDRESS, SUPCITY,
SUPSTATUS FROM SUPPLIER
Q1: SELECT * FROM SUPPLIER
28
Simple Queries
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
SUPSTATUS
21
Deliwines
240, Avenue of the
New York
20
Americas
32
Best Wines
660, Market Street
San Francisco
90
37
Ad Fundum
82, Wacker Drive
Chicago
95
52
Spirits & co.
928, Strip
Las Vegas
NULL
68
The Wine Depot
132, Montgomery Street
San Francisco
10
69
Vinos del Mundo
4, Collins Avenue
Miami
92
29
Simple Queries
´ Q2: SELECT SUPNR, SUPNAME FROM SUPPLIER
SUPNR
SUPNAME
21
Deliwines
32
Best Wines
37
Ad Fundum
52
Spirits & co.
68
The Wine Depot
69
Vinos del Mundo
30
Simple Queries
SUPNR
32
32
Q3: SELECT SUPNR
FROM PURCHASE_ORDER
37
37
37
37
37
68
69
94
31
Simple Queries
Q4: SELECT DISTINCT SUPNR FROM PURCHASE_ORDER
SUPNR
32
37
68
69
94
32
Simple Queries
Q5: SELECT SUPNR, PRODNR, DELIV_PERIOD/30 AS MONTH_DELIV_PERIOD FROM
SUPPLIES
SUPNR
PRODNR
MONTH_DELIV_PERIOD
21
0119
0.0333
21
0178
NULL
21
0289
0.0333
21
0327
0.2000
21
0347
0.0667
21
0384
0.0667
…
…
…
33
Simple Queries
Q6: SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPCITY = ‘San Francisco’
SUPNR
SUPNAME
SUPSTATUS
32
Best Wines
90
68
The Wine Depot
10
34
Simple Queries
Q7: SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPCITY = ‘San Francisco’ AND SUPSTATUS > 80
SUPNR
SUPNAME
SUPSTATUS
32
Best Wines
90
35
Simple Queries
Q8: SELECT SUPNR, SUPNAME, SUPSTATUS
FROM SUPPLIER WHERE SUPSTATUS BETWEEN 70 AND 80
SUPNR
SUPNAME
SUPSTATUS
94
The Wine Crate
75
36
Simple Queries
´ Q9:SELECT PRODNR, PRODNAME
FROM PRODUCT
WHERE PRODTYPE IN (‘WHITE’, ‘SPARKLING’)
PRODNR
PRODNAME
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
0199
Jacques Selosse, Brut Initial, 2012
0212
Billecart-Salmon, Brut Réserve, 2014
0300
Chateau des Rontets, Chardonnay, Birbettes
0494
Veuve-Cliquot, Brut, 2012
0632
Meneghetti, Chardonnay, 2010
…
….
37
Simple Queries
Q10: SELECT PRODNR, PRODNAME
FROM PRODUCT
WHERE PRODNAME LIKE ‘%CHARD%’
PRODNR
PRODNAME
0300
Chateau des Rontets, Chardonnay, Birbettes
0783
Clos D’Opleeuw, Chardonnay, 2012
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
0632
Meneghetti, Chardonnay, 2010
38
Simple Queries
Q11: SELECT SUPNR, SUPNAME
FROM SUPPLIER
WHERE SUPSTATUS IS NULL
SUPNR
SUPNAME
SUPSTATUS
52
Spirits & Co.
NULL
39
Queries with Aggregate Functions
´ Examples: COUNT, SUM, AVG, VARIANCE, MIN/MAX, and STDEV
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
21
0178
NULL
NULL
37
0178
16.99
4
68
0178
17.99
5
69
0178
16.99
NULL
94
0178
18.00
6
…
…
40
Queries with Aggregate Functions
Q12: SELECT COUNT(*)
FROM SUPPLIES
WHERE PRODNR = ‘0178’
Q13: SELECT COUNT(PURCHASE_PRICE)
FROM SUPPLIES
WHERE PRODNR = ‘0178’
Q14: SELECT COUNT(DISTINCT PURCHASE_PRICE)
FROM SUPPLIES
WHERE PRODNR = ‘0178’
5
4
3
41
Queries with Aggregate Functions
Q15: SELECT PRODNR, SUM(QUANTITY) AS
SUM_ORDERS FROM PO_LINE
WHERE PRODNR = ‘0178’
PONR
PRODNR
QUANTITY
1512
0178
3
1538
0178
6
…
…
0178 9
42
Queries with Aggregate Functions
´ Q16: SELECT SUM(QUANTITY) AS
TOTAL_ORDERS FROM PO_LINE
173
Queries with Aggregate Functions
43
Q17: SELECT PRODNR, AVG(PURCHASE_PRICE) AS
WEIGHTED_AVG_PRICE FROM SUPPLIES
WHERE PRODNR = ‘0178’
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
21
0178
NULL
NULL
37
0178
16.99
4
68
0178
17.99
5
69
0178
16.99
NULL
94
0178
18.00
6
…
…
0178, (16.99 + 17.99 + 16.99 + 18.00) / 4 = 17.4925
44
Queries with Aggregate Functions
Q18: SELECT PRODNR, AVG(DISTINCT
PURCHASE_PRICE)AS UNWEIGHTED_AVG_PRICE
FROM SUPPLIES WHERE PRODNR = ‘0178’
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
21
0178
NULL
NULL
37
0178
16.99
4
68
0178
17.99
5
69
0178
16.99
NULL
94
0178
18.00
6
…
…
0178, (16.99 + 17.99 + 18.00) / 3 = 17.66
45
Queries with Aggregate Functions
´ Q19: SELECT PRODNR, VARIANCE(PURCHASE_PRICE)
AS PRICE_VARIANCE FROM SUPPLIES
WHERE PRODNR = ‘0178’
PRODNR
PRICE_VARIANCE
0178
0.25251875000000024
46
Queries with Aggregate Functions
Q20: SELECT PRODNR, MIN(PURCHASE_PRICE) AS LOWEST_PRICE,
MAX(PURCHASE_PRICE) AS HIGHEST_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0178’
PRODNR
LOWEST_PRICE
HIGHEST_PRICE
0178
16.99
18.00
47
Queries with GROUP BY/HAVING
Q21: SELECT PRODNR
FROM PO_LINE
GROUP BY PRODNR
HAVING COUNT(*) >= 3
PONR
PRODNR
QUANTITY
1511
0212
2
1512
0178
3
1513
0668
7
1514
0185
2
1514
0900
2
1523
0900
3
1538
0178
6
1538
0212
15
1560
0900
9
1577
0212
6
1577
0668
9
…
..
…
48
Queries with GROUP BY/HAVING
GROUP BY
PONR
PRODNR
QUANTITY
PONR
PRODNR
QUANTITY
1511
0212
2
1512
0178
3
1577
0212
6
1538
0178
6
1538
0212
15
PONR
PRODNR
QUANTITY
PONR
PRODNR
QUANTITY
1514
0900
2
1514
0185
2
1523
0900
3
1560
0900
9
PRODNR
0212
900
PONR
PRODNR
QUANTITY
1513
0668
7
1577
0668
9
Queries with GROUP BY/HAVING
49
Q22: SELECT PRODNR, SUM(QUANTITY) AS QUANTITY FROM PO_LINE
GROUP BY PRODNR
HAVING SUM(QUANTITY) > 15
GROUP BY
PONR
PRODNR
QUANTITY
1511
0212
2
1577
0212
6
1538
PONR
1514
0212
15
SUM
23
PRODNR
PONR
PRODNR
QUANTITY
1512
0178
3
1538
0178
6
SUM
9
PONR
PRODNR
QUANTITY
1514
0900
2
1523
0900
3
1560
0900
9
SUM
14
QUANTITY
0185
2
SUM
2
PONR
PRODNR
QUANTITY
1513
0668
7
1577
0668
9
SUM
16
PRODNR
QUANTITY
0212
23
0668
16
50
Queries with ORDER BY
´ Q23: SELECT PONR, PODATE, SUPNR
FROM PURCHASE_ORDER
ORDER BY PODATE ASC, SUPNR DESC
PONR
PODATE
SUPNR
1511
2015-03-24
37
1512
2015-04-10
94
1513
2015-04-11
37
1514
2015-04-12
32
…
51
Queries with ORDER BY
Q24: SELECT PRODNR, SUPNR, PURCHASE_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0178’
ORDER BY 3 DESC
PRODNR
SUPNR
PURCHASE_PRICE
0178
94
18.00
0178
68
17.99
0178
37
16.99
0178
69
16.99
0178
21
NULL
52
Join Queries
´ Inner joins
´ Outer joins
53
Inner Joins
SUPPLIER(SUPNR, SUPNAME, …, SUPSTATUS)
SUPPLIES(SUPNR, PRODNR, PURCHASE_PRICE, …)
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
32
Best wines
90
68
The Wine Depot
10
84
Wine Trade Logistics
92
:
:
:
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
32
0474
40.00
1
32
0154
21.00
4
84
0494
15.99
2
:
:
:
SUPSTATUS
54
Inner Joins
Q25: SELECT R.SUPNR, R.SUPNAME, R.SUPSTATUS,
S.SUPNR, S.PRODNR, S.PURCHASE_PRICE
FROM SUPPLIER R, SUPPLIES S
55
Inner Joins
R.SUPNR
R.SUPNAME
R.SUPSTATUS
S.SUPNR
S.PRODNR
S.PURCHASE_PRICE
21
Deliwines
20
21
0119
15.99
32
Best Wines
90
21
0119
15.99
37
Ad Fundum
95
21
0119
15.99
52
Spirits & co.
NULL
21
0119
15.99
32
Best Wines
90
32
0154
21.00
37
Ad Fundum
95
32
0154
21.00
52
Spirits & co.
NULL
32
0154
21.00
69
Vinos del Mundo
92
94
0899
15.00
84
Wine Trade
92
94
0899
15.00
75
94
0899
15.00
…
…
Logistics
94
The Wine Crate
56
Inner Joins
Q26: SELECT R.SUPNR, R.SUPNAME,
R.SUPSTATUS, S.PRODNR,
S.PURCHASE_PRICE
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
57
Inner Joins
R.SUPNR
R.SUPNAME
R.SUPSTATUS
S.SUPNR
S.PRODNR
S.PURCHASE_PRICE
21
Deliwines
20
21
0119
15.99
21
Deliwines
20
21
0178
NULL
21
Deliwines
20
21
0289
17.99
21
Deliwines
20
21
0327
56.00
21
Deliwines
20
21
0347
16.00
21
Deliwines
20
21
0384
55.00
21
Deliwines
20
21
0386
58.99
21
Deliwines
20
21
0468
14.99
21
Deliwines
20
21
0668
6.00
32
Best Wines
90
32
0154
21.00
32
Best Wines
90
32
0474
40.00
32
Best Wines
90
32
0494
15.00
32
Best Wines
90
32
0657
44.99
32
Best Wines
90
32
0760
52.00
…
58
Inner Joins
Q27: SELECT R.SUPNR, R.SUPNAME, R.SUPSTATUS,
S.PRODNR, S.PURCHASE_PRICE
FROM SUPPLIER AS R INNER JOIN SUPPLIES AS S
ON (R.SUPNR = S.SUPNR)
59
Inner Joins
Q28: SELECT R.SUPNR, R.SUPNAME, PO.PONR, PO.PODATE,
P.PRODNR,P.PRODNAME, POL.QUANTITY
FROM SUPPLIER R, PURCHASE_ORDER PO, PO_LINE
POL, PRODUCT P
WHERE (R.SUPNR = PO.SUPNR)
AND (PO.PONR = POL.PONR)
AND (POL.PRODNR = P.PRODNR)
60
Inner Joins
R.SUPNR
R.SUPNAME
PO.PONR
PO.PODATE
P.PRODNR
37
Ad Fundum
1511
2015-03-24
0212
37
Ad Fundum
1511
2015-03-24
0345
37
Ad Fundum
1511
2015-03-24
0783
37
Ad Fundum
1511
2015-03-24
0856
94
The Wine Crate
1512
2015-04-10
0178
…
P.PRODNAME
Billecart-Salmon, Brut Réserve,
2014
Vascosassetti, Brunello di
Montalcino, 2004
Clos D’Opleeuw, Chardonnay,
2012
Domaine Chandon de Briailles,
Savigny-Les-Beaune, 2006
Meerdael, Methode
Traditionnelle Chardonnay, 2014
POL.QUANTITY
2
4
1
9
3
61
Inner Joins
Q29: SELECT R1.SUPNAME, R2.SUPNAME,
R1.SUPCITY
FROM SUPPLIER R1, SUPPLIER R2
WHERE R1.SUPCITY = R2.SUPCITY
AND (R1.SUPNR < R2.SUPNR)
62
Inner Joins
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
SUPSTATUS
21
Deliwines
240, Avenue of the Americas
New York
20
32
Best Wines
660, Market Street
San Francisco
90
37
Ad Fundum
82, Wacker Drive
Chicago
95
52
Spirits & co.
928, Strip
Las Vegas
NULL
68
The Wine Depot
132, Montgomery Street
San Francisco
10
69
Vinos del Mundo
4, Collins Avenue
Miami
92
SUPNAME
SUPNAME
SUPCITY
Best Wines
The Wine Depot
San Francisco
63
Inner Joins
´ Q30: SELECT R.SUPNAME
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = '0899'
Wine Crate
64
Inner Joins
Q31: SELECT DISTINCT R.SUPNAME
FROM SUPPLIER R, SUPPLIES S, PRODUCT P
WHERE S.SUPNR = R.SUPNR
AND S.PRODNR = P.PRODNR
AND P.PRODTYPE = 'ROSE'
SUPNAME
SUPNAME
DeliWines
DeliWines
DeliWines
The Wine Depot
DeliWines
The Wine Depot
65
Inner Joins
Q32: SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
FROM PRODUCT P, PO_LINE POL
WHERE P.PRODNR = POL.PRODNR
GROUP BY P.PRODNR
PRODNR
PRODNAME
SUM(POL.QUANTITY)
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
9
0185
Chateau Petrus, 1975
2
0212
Billecart-Salmon, Brut Réserve, 2014
23
0295
Chateau Pape Clement, Pessac-Léognan, 2001
9
0306
Chateau Coupe Roses, Granaxa, 2011
11
…
66
Outer Joins
´ Outer join can be used when we want to keep all the
tuples of one or both tables in the result of the JOIN,
regardless of whether or not they have matching tuples
in the other table
Outer Joins
67
Q33: SELECT R.SUPNR, R.SUPNAME, R.SUPSTATUS,
S.PRODNR, S.PURCHASE_PRICE
FROM SUPPLIER AS R LEFT OUTER JOIN SUPPLIES AS S
ON (R.SUPNR = S. SUPNR)
SUPNR
SUPNAME
68
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
The Wine Depot
21
0119
15.99
1
21
Deliwines
21
0289
17.99
1
94
The Wine Crate
68
0178
17.99
5
:
:
:
:
..
SUPADDRESS
SUPCITY
SUPSTATUS
68
Outer Joins
SUPNR
SUPNAME
SUPSTATUS
PRODNR
PURCHASE_PRICE
21
Deliwines
20
0119
15.99
21
Deliwines
20
0178
NULL
37
Ad Fundum
95
0795
20.99
52
Spirits & Co.
NULL
NULL
NULL
68
The Wine Depot
10
0178
17.99
…
…
69
Outer Joins
Q34: SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
AS SUM FROM PO_LINE AS POL RIGHT OUTER JOIN PRODUCT AS P
ON (POL.PRODNR = P. PRODNR)
GROUP BY P.PRODNR
P.PRODNR
P.PRODNAME
SUM
0119
Chateau Miraval, Cotes de Provence Rose, 2015
NULL
0154
Chateau Haut Brion, 2008
NULL
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
9
0185
Chateau Petrus, 1975
2
0199
Jacques Selosse, Brut Initial, 2012
NULL
0212
Billecart-Salmon, Brut Réserve, 2014
23
…
70
Nested Queries
Outer block
Inner block
SELECT ...
FROM
...
WHERE ...
(SELECT
FROM
WHERE
...
...
...
)
71
Nested Queries
Q34: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR =
(SELECT SUPNR
FROM PURCHASE_ORDER
WHERE PONR = '1560')
Best Wines
72
Nested Queries
Q35: SELECT PRODNR, PRODNAME
FROM PRODUCT
WHERE AVAILABLE_QUANTITY >
(SELECT AVAILABLE_QUANTITY
FROM PRODUCT
WHERE PRODNR = ‘0178’)
PRODNR
PRODNAME
0212
Billecart-Salmon, Brut Réserve, 2014
0347
Chateau Corbin-Despagne, Saint-Emilion, 2005
0474
Chateau De La Tour, Clos-Vougeot, Grand cru, 2008
0885
Chateau Margaux, Grand Cru Classé, 1956
0899
Trimbach, Riesling, 1989
73
Nested Queries
Q36: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR =’0178′)
SUPNAME
Deliwines
Ad Fundum
The Wine Depot
Vinos del Mundo
The Wine Crate
74
Nested Queries
Q37: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR IN
(SELECT PRODNR
FROM PRODUCT
WHERE PRODTYPE = ‘ROSE’))
SUPNAME
Deliwines
The Wine Depot
75
Nested Queries
Q38: SELECT PRODNAME
FROM PRODUCT
WHERE PRODNR IN
(SELECT PRODNR
FROM SUPPLIES
WHERE SUPNR = ’32’)
AND PRODNR IN
(SELECT PRODNR
FROM SUPPLIES
WHERE SUPNR = ’84’)
PRODNAME
Veuve-Cliquot, Brut, 2012
Conde de Hervías, Rioja, 2004
76
Correlated Queries
´ Whenever a condition in the WHERE clause of a nested
query references some column of a table declared in
the outer query, the two queries are said to be
correlated
´ The nested query is then evaluated once for each tuple
(or combination of tuples) in the outer query
77
Correlated Queries
Q39: SELECT P.PRODNR
FROM PRODUCT P
WHERE 1 <
(SELECT COUNT(*)
FROM PO_LINE POL
WHERE P.PRODNR = POL.PRODNR)
PONR
PRODNR
PRODNAME
0212
Billecart-Salmon, Brut Réserve, 2014
0289
Chateau Saint Estève de Neri, 2015
…
0154
Chateau Haut Brion, 2008
1538
0295
Chateau Pape Clement, Pessac-Léognan, 2001
…
PRODNR
QUANTITY
0212
2
0212
15
0212
6
…
1511
…
1577
…
78
Correlated Queries
Q40: SELECT R.SUPNR, R.SUPNAME, P.PRODNR, P.PRODNAME,
S1.PURCHASE_PRICE, S1.DELIV_PERIOD
FROM SUPPLIER R, SUPPLIES S1, PRODUCT P
WHERE R.SUPNR = S1.SUPNR
AND S1.PRODNR = P.PRODNR
AND S1.PURCHASE_PRICE <
(SELECT AVG(PURCHASE_PRICE)
FROM SUPPLIES S2
WHERE P.PRODNR = S2.PRODNR)
79
Correlated Queries
SUPPLIES S1
PRODUCT P
PRODNR
PRODNAME
PRODTYPE
0178
Meerdael, Methode
Traditionnelle Chardonnay, 2014
sparkling
0185
Chateau Petrus, 1975
red
..
PRODNR
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
0178
37
16.99
4
0178
68
17.99
5
0178
69
16.99
-
0178
94
18.00
6
0178
21
-
-
0185
37
32.99
3
…
< AVG ?
SUPPLIER R
SUPPLIES S2
PRODNR
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
SUPNR
SUPNAME
SUPADDRESS
0178
37
16.99
4
37
Ad Fundum
…
0178
68
17.99
5
68
The Wine Depot
…
0178
69
16.99
-
84
Wine Trade Logistics
…
0178
94
18.00
6
0178
21
-
-
0185
37
32.99
3
…
…
80
Correlated Queries
Q41: SELECT P1.PRODNR
FROM PRODUCT P1
WHERE 3 >
(SELECT COUNT(*)
FROM PRODUCT P2
WHERE P1.PRODNR < P2.PRODNR)
P1.PRODNR
Result of Inner Query block
< 3?
Output
0119
41
No
No
0154
40
No
No
0178
39
No
No
…
…
…
…
0899
3
No
No
0900
2
Yes
Yes
0915
1
Yes
Yes
0977
0
Yes
Yes
81
Queries with ALL/ANY
´ The comparison condition v > ALL V returns TRUE if the
value v is greater than all the values in the multiset V.
´If the nested query doesn’t return a value, it evaluates
the condition as TRUE.
´ The comparison condition v > ANY V returns TRUE if the
value v is greater than at least one value in the multiset
V.
´If the nested query doesn’t return a value, it evaluates
the whole condition as FALSE.
82
Queries with ALL/ANY
Q42: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR = ‘0668’
AND PURCHASE_PRICE >= ALL
(SELECT PURCHASE_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0668’))
The Wine Depot
68
6.00, 6.99
83
Queries with ALL/ANY
SUPPLIES S1
PRODUCT P
PRODNR
PRODNAME
PRODTYPE
0178
Meerdael, Methode Traditionnelle
Chardonnay, 2014
sparkling
0668
Gallo Family Vineyards, Grenache, 2014
rose
…
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
0668
68
6.99
3
0668
21
6.00
1
0760
32
52.00
3
0760
68
52.99
2
0783
69
7.00
3
…
…
…
…
≥ ALL ?
SUPPLIES S2
SUPPLIER R
SUPNR
SUPNAME
SUPADDRESS
32
Best wines
…
68
The Wine Depot
…
84
Wine Trade Logistics
…
..
PRODNR
PRODNR
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
0668
68
6.99
3
0668
21
6.00
1
0760
32
52.00
3
0760
68
52.99
2
0783
69
7.00
3
…
…
…
…
84
Queries with ALL/ANY
Q43: SELECT R1.SUPNR, R1.SUPNAME, R1.SUPCITY, R1.SUPSTATUS
FROM SUPPLIER R1
WHERE R1.SUPSTATUS >= ALL
(SELECT R2.SUPSTATUS
FROM SUPPLIER R2
WHERE R1.SUPCITY = R2.SUPCITY)
SUPNR
SUPNAME
SUPCITY
SUPSTATUS
21
Deliwines
New York
20
32
Best Wines
San Francisco
90
37
Ad Fundum
Chicago
95
69
Vinos del Mundo
Miami
92
84
Wine Trade Logistics
Washington
92
94
The Wine Crate
Dallas
75
85
Queries with ALL/ANY
Q44: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR = ‘0178’ AND
PURCHASE_PRICE > ANY
(SELECT PURCHASE_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0178’))
The Wine Depot, The Wine Crate
68,94
NULL, 16.99, 17.99, 16.99, 18.00
86
Queries with EXISTS
´ The EXISTS function checks whether the result of a
correlated nested query is empty or not
´ The result is a Boolean value: TRUE or FALSE
´ EXISTS returns TRUE if there is at least one tuple in the
result of the nested query, or otherwise returns FALSE
´ Vice versa, the NOT EXISTS function returns TRUE if there
are no tuples in the result of the nested query, or
otherwise returns FALSE
87
Queries with EXISTS
Q44: SELECT SUPNAME
FROM SUPPLIER R
WHERE EXISTS
(SELECT *
FROM SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = ‘0178’)
SUPNAME
Deliwines
Ad Fundum
The Wine Depot
Vinos del Mundo
The Wine Crate
88
Queries with EXISTS
Q45: SELECT SUPNAME, SUPADDRESS, SUPCITY
FROM SUPPLIER R
WHERE NOT EXISTS
(SELECT *
FROM PRODUCT P
WHERE NOT EXISTS
(SELECT *
FROM SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND P.PRODNR = S.PRODNR))
89
Queries with Subqueries in
SELECT/FROM
Q46: SELECT P.PRODNR, P.PRODNAME,
(SELECT SUM(QUANTITY) FROM PO_LINE POL
WHERE P.PRODNR = POL.PRODNR) AS TOTALORDERED
FROM PRODUCT P
PRODNR
PRODNAME
TOTALORDERED
0212
Billecart-Salmon, Brut Réserve, 2014
23
0795
Casa Silva, Los Lingues, Carmenere, 2012
3
0915
Champagne Boizel, Brut, Réserve, 2010
13
0523
Chateau Andron Blanquet, Saint Estephe, 1979
NULL
0977
Chateau Batailley, Grand Cru Classé, 1975
11
…
90
Queries with Subqueries in
SELECT/FROM
Q47: SELECT M.PRODNR, M.MINPRICE, M.MAXPRICE FROM
(SELECT PRODNR, MIN(PURCHASE_PRICE) AS MINPRICE,
MAX(PURCHASE_PRICE) AS MAXPRICE
FROM SUPPLIES GROUP BY PRODNR) AS M
WHERE M.MAXPRICE-M.MINPRICE > 1
PRODNR
MINPRICE
MAXPRICE
0178
16.99
18.00
0199
30.99
32.00
0300
19.00
21.00
0347
16.00
18.00
0468
14.00
15.99
91
Queries with Set Operations
´ A = {10, 5, 25, 30, 45}
´ B = {15, 20, 10, 30, 50}
´ A UNION B = {5, 10, 15, 20, 25, 30, 45, 50}
´ A INTERSECT B = {10, 30}
´ A EXCEPT B = {5, 25, 45}
92
Queries with Set Operations
Q48: SELECT SUPNR, SUPNAME
FROM SUPPLIER
WHERE SUPCITY = ‘New York’
UNION
SELECT R.SUPNR, R.SUPNAME
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = ‘0915’
ORDER BY SUPNAME ASC
SUPNR
SUPNAME
21
Deliwines
84
Wine Trade Logistics
93
Queries with Set Operations
Q49: SELECT SUPNR, SUPNAME
FROM SUPPLIER
WHERE SUPCITY = ‘NEW YORK’
INTERSECT
SELECT R.SUPNR, R.SUPNAME
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = ‘0915’
ORDER BY SUPNAME ASC
NULL
94
Queries with Set Operations
Q50: SELECT SUPNR
FROM SUPPLIER
EXCEPT
SELECT SUPNR
FROM SUPPLIES
52
95
SQL INSERT Statement
INSERT INTO PRODUCT VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’, 6)
INSERT INTO PRODUCT(PRODNR, PRODNAME, PRODTYPE, AVAILABLE_QUANTITY)
VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’, 6)
INSERT INTO PRODUCT(PRODNR, PRODNAME, PRODTYPE) VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’)
96
SQL INSERT Statement
INSERT INTO PRODUCT(PRODNR, PRODNAME, PRODTYPE, AVAILABLE_QUANTITY)
VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’, 6),
(‘1000’, ‘Domaine de la Vougeraie, Bâtard Montrachet’, Grand cru, 2010’,
‘white’, 2),
(‘1002’, ‘Leeuwin Estate Cabernet Sauvignon 2011’, ‘white’, 20)
INSERT INTO INACTIVE-SUPPLIERS(SUPNR)
SELECT SUPNR
FROM SUPPLIER
EXCEPT
SELECT SUPNR
FROM SUPPLIES
97
SQL DELETE Statement
DELETE FROM PRODUCT
WHERE PRODNR = ‘1000’
DELETE FROM SUPPLIER
WHERE SUPSTATUS IS NULL
DELETE FROM SUPPLIES
WHERE PRODNR IN (SELECT PRODNR
FROM PRODUCT
WHERE PRODNAME LIKE ‘%CHARD%’)
98
SQL DELETE Statement
DELETE FROM SUPPLIER R
WHERE NOT EXISTS
(SELECT PRODNR
FROM SUPPLIES S
WHERE R.SUPNR = S.SUPNR)
DELETE FROM SUPPLIES S1
WHERE S1.PURCHASE_PRICE >
(SELECT 2 * AVG(S2.PURCHASE_PRICE)
FROM SUPPLIES S2
WHERE S1.PRODNR = S2.PRODNR)
DELETE FROM PRODUCT
99
SQL UPDATE Statement
UPDATE PRODUCT
SET AVAILABLE_QUANTITY = 26
WHERE PRODNR = ‘0185’
UPDATE SUPPLIER
SET SUPSTATUS = DEFAULT
UPDATE SUPPLIES
SET DELIV_PERIOD = DELIV_PERIOD+7
WHERE SUPNR IN (SELECT SUPNR
FROM SUPPLIER
WHERE SUPNAME = ‘Deliwines’)
100
SQL UPDATE Statement
UPDATE SUPPLIES S1
SET (PURCHASE_PRICE, DELIV_PERIOD) =
(SELECT MIN(PURCHASE_PRICE), MIN(DELIV_PERIOD)
FROM SUPPLIES S2
WHERE S1.PRODNR = S2.PRODNR)
WHERE SUPNR = ’68’
ALTER TABLE SUPPLIER ADD SUPCATEGORY VARCHAR(10) DEFAULT ‘SILVER’
UPDATE SUPPLIER
SET SUPCATEGORY =
CASE WHEN SUPSTATUS >= 70 AND SUPSTATUS = 90 THEN ‘PLATINUM’
ELSE ‘SILVER’
END
101
SQL UPDATE Statement
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
SUPSTATUS
SUPCATEGORY
21
Deliwines
20, Avenue of the Americas
New York
20
SILVER
32
Best Wines
660, Market Street
San Francisco
90
GOLD
37
Ad Fundum
82, Wacker Drive
Chicago
95
PLATINUM
52
Spirits & co.
928, Strip
Las Vegas
NULL
SILVER
68
The Wine Depot
132, Montgomery Street
San Francisco
10
SILVER
69
Vinos del Mundo
4, Collins Avenue
Miami
92
PLATINUM
84
Wine Trade Logistics
100, Rhode Island Avenue
Washington
92
PLATINUM
94
The Wine Crate
330, McKinney Avenue
Dallas
75
GOLD
102
SQL Views
´ SQL views are part of the external data model
´ A view is defined by means of an SQL query and its
content is generated upon invocation of the view by an
application or other query
´ A view is a virtual table without physical tuples
´ Views allow for logical data independence which
makes them a key component in the three-layer
database architecture
103
SQL Views
CREATE VIEW TOPSUPPLIERS
AS SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPSTATUS > 50
CREATE VIEW TOPSUPPLIERS_SF
AS SELECT * FROM TOPSUPPLIERS
WHERE SUPCITY = ‘San Francisco’
104
SQL Views
CREATE VIEW ORDEROVERVIEW(PRODNR, PRODNAME,
TOTQUANTITY)
AS SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
FROM PRODUCT AS P LEFT OUTER JOIN PO_LINE AS POL
ON (P.PRODNR = POL.PRODNR)
GROUP BY P.PRODNR
105
SQL Views
SELECT * FROM TOPSUPPLIERS_SF
SELECT * FROM ORDEROVERVIEW
WHERE PRODNAME LIKE ‘%CHARD%’
106
SQL Views
´ Query modification: RDBMS modifies queries that query
views into queries on the underlying base tables
´ View materialization: a physical table is created when
the view is first queried
107
SQL Views
´ Some views can be updated
´In this case, the view serves as a window through
which updates are propagated to the underlying
base table(s)
108
SQL Views
CREATE VIEW ORDEROVERVIEW(PRODNR, PRODNAME, TOTQUANTITY)
AS SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
FROM PRODUCT AS P LEFT OUTER JOIN PO_LINE AS POL
ON (P.PRODNR = POL.PRODNR)
GROUP BY P.PRODNR
UPDATE VIEW ORDEROVERVIEW
SET TOTQUANTITY=10
WHERE PRODNR= ‘0154’
ERROR!
109
SQL Views
´ WITH CHECK option checks UPDATE and INSERT statements for conformity with
the view definition
CREATE VIEW TOPSUPPLIERS
AS SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPSTATUS > 50
WITH CHECK OPTION
UPDATE TOPSUPPLIERS
SET STATUS =20
WHERE SUPNR=’32’
OK!
UPDATE TOPSUPPLIERS
SET STATUS =80
WHERE SUPNR=’32’
NOT OK!
110
SQL Privileges
´ A privilege corresponds to the right to use certain SQL statements such as
SELECT, INSERT, etc. on one or more database objects
Privilege
Explanation
SELECT
Gives retrieval privilege
INSERT
Gives insert privilege
UPDATE
Gives update privilege
DELETE
Gives delete privilege
ALTER
Gives privilege to change the table definition
REFERENCES
Gives the privilege to reference the table when specifying integrity constraints
ALL
Gives all privileges (DBMS specific)
111
SQL Privileges
GRANT SELECT, INSERT, UPDATE, DELETE ON SUPPLIER TO
BBAESENS
GRANT SELECT (PRODNR, PRODNAME) ON PRODUCT TO PUBLIC
REVOKE DELETE ON SUPPLIER FROM BBAESENS
GRANT SELECT, INSERT, UPDATE, DELETE ON PRODUCT TO
WLEMAHIEU WITH GRANT OPTION
GRANT REFERENCES ON SUPPLIER TO SVANDENBROUCKE
112
SQL Privileges
CREATE VIEW SUPPLIERS_NY
AS SELECT SUPNR, SUPNAME FROM SUPPLIERS
WHERE SUPCITY = ‘New York’
GRANT SELECT ON SUPPLIERS_NY TO WLEMAHIEU
113
SQL for Metadata Management
´ The catalog itself can also be implemented as a relational database
114
SQL for Metadata Management
Table(Tablename, …)
Key(Keyname, …)
Primary-Key(PK-Keyname, PK-Tablename, …)
PK-Keyname is a foreign key referring to Keyname in Key
PK-Tablename is a foreign key referring to Tablename in Table
Foreign-Key(FK-Keyname, FK-Tablename, FK-PK-Keyname, Update-rule, Delete-rule, …)
FK-Keyname is a foreign key referring to Keyname in Key
FK-Tablename is a foreign key referring to Tablename in Table
FK-PK-Keyname is a foreign key referring to PK-Keyname in Primary-Key
Column(Columnname, C-Tablename, Data type, Nulls, …)
C-Tablename is a foreign key referring to Tablename in Table
Key-Column(KC-Keyname, KC-Columnname, KC-Tablename, …)
KC-Keyname is a foreign key referring to Keyname in Key
KC-Columnname is a foreign key referring to Columnname in Column
KC-Tablename is a foreign key referring to C-Tablename in Column
115
SELECT *
FROM Column
WHERE TableName = ‘SUPPLIER’
SELECT PK.PK-Keyname, FK.FK-PK-Keyname, FK.FK-Tablename,
FK.Deleterule
FROM Primary-Key PK, Foreign-Key FK
WHERE PK.PK-Tablename = ‘SUPPLIER’
AND PK.PK-Keyname = FK.FK-PK-Keyname
116
Conclusions
´ Relational database management systems and SQL
´ SQL Data Definition Language
´ SQL Data Manipulation Language
´ SQL views
´ SQL indexes
´ SQL privileges
´ SQL for metadata management
Conceptual Data Modeling
Using the (E)ER model and
UML Class Diagram
Chapter 3
1
Overview
´Phases of database design
´Entity relationship (ER) model
´Enhanced entity relationship (EER) model
´UML class diagram
2
Phases of Database Design
3
Entity Relationship (ER) Model
´Entity types
´Attribute types
´Relationship types
´Weak entity types
´Ternary Relationship types
´Examples of the ER model
´Limitations of the ER model
4
Entity Types
´ Entity type represents a business concept with an
unambiguous meaning to a particular set of users
´Examples: supplier, student, product or employee
´ Entity is one particular occurrence or instance of an
entity type
´Examples: Deliwines, Best Wines, and Ad Fundum are
entities from the entity type supplier
5
Attribute Types
´ Attribute type represents a property of an entity type.
´ Example: name and address are attribute types of the entity
type supplier
´ Attribute is an instance of an attribute type
ENTITY TYPE
ATTRIBUTE TYPE
SUPPLIER
ENTITIES
SUPNR
21
32
37
52
68
69
…
SUPNAME
Deliwines
Best Wines
Ad Fundum
Spirits & co.
The Wine Depot
Vinos del Mundo
SUPADDRESS
240, Avenue of the Americas
660, Market Street
82, Wacker Drive
928, Strip
132, Montgomery Street
4, Collins Avenue
ATTRIBUTES
SUPCITY
New York
San Francisco
Chicago
Las Vegas
San Francisco
Miami
SUPSTATUS
20
90
95
NULL
10
92
6
Attribute Types
7
Attribute Types
´Domains
´Key attribute types
´Simple versus Composite attribute types
´single-valued versus multi-valued attribute types
´Derived attribute type
8
Domains
´ A domain specifies the set of values that may be
assigned to an attribute for each individual entity
´Example: gender – male and female
´ A domain can also contain null values
´Null value: value is not known, not applicable, or not
relevant
´ Domains are not displayed in an ER model
9
Key Attribute Types
´ A key attribute type is an attribute type whose values are distinct for
each individual entity
´ Examples: supplier number, product number, social security
number
´ A key attribute type can also be a combination of attribute types
´ Example: combination of flight number and departure date
10
Simple versus Composite Attribute
Types
´ A simple or atomic attribute type cannot be further divided into parts
´ Examples: supplier number, supplier status
´ A composite attribute type is an attribute type that can be decomposed
into other meaningful attribute types
´ Examples: address, name
11
Single-Valued versus Multi-Valued
Attribute Types
´ A single-valued attribute type has only one value for a particular entity
´ Examples: product number, product name
´ A multi-valued attribute type is an attribute type that can have multiple
values
´ Example: email address
12
Derived Attribute Type
´ A derived attribute type is an attribute type that can be derived from
another attribute type
´ Example: age
last name
first name
SUPNR
name
email
Street
Number
ZIP
address
SUPPLIER
City
Country
date of birth
age
status
13
Relationship Types
´Definition
´Degree and roles
´Cardinalities
´Relationship attribute types
14
Definition
´ A relationship represents an association between two or more entities
´ A relationship type then defines a set of relationships among instances of
one, two, or more entity types
supcity
SUPNR
supname
supstatus
SUPPLIER
supaddress
supprod
prodsup
SUPPLIES
PRODNR
prodtype
PRODUCT
prodname
available_quantity
15
Degree and Roles
´ The degree of a relationship type corresponds to the number of entity types
participating in the relationship type
´ Unary: degree 1; binary: degree 2; ternary: degree 3
´ The roles of a relationship type indicate the various directions that can be
used to interpret it
supcity
SUPNR
supname
supstatus
SUPPLIER
supaddress
supprod
prodsup
SUPPLIES
PRODNR
prodtype
PRODUCT
prodname
available_quantity
16
Degree and Roles
SUPERVISES
Super
vises
Supervised
by
SSN
EMPLOYEE
address
ename
BOOKING
Tourist
Hotel
TNR
HNR
Travel
Agency
ANR
17
Cardinalities
´ Every relationship type can be characterized in terms of its
cardinalities, which specify the minimum or maximum number of
relationship instances that an individual entity can participate in
´ Minimum cardinality can be 0 or 1
´ If 0: partial participation
´ If 1: total participation or existence dependency
´ Maximum cardinality can be 1 or N
´ Relationship types are often characterized by their maximum
cardinalities
´ Four options for binary relationship types: 1:1, 1:N, N:1 and M:N.
18
Cardinalities
ENROLLED FOR
STUDENT
0..N
1..M
COURSE
N:M
ASSIGNED TO
STUDENT
0..1
0..1
MASTER THESIS
1:1
MANAGED BY
EMPLOYEE
1..1
0..N
PROJECT
1:N
19
Relationship Attribute Types
´ Relationship type can also have attribute types
´ These attribute types can be migrated to one of the participating entity
types in case of a 1:1 or 1:N relationship type
SSN
ename
address
PNR
hours
0..M
EMPLOYEE
pro- empemp pro
0..N
PROJECT
WORKS ON
pname
pduration
20
Weak Entity Type
´ A strong entity type is an entity type that has a key attribute type
´ A weak entity type is an entity type that does not have a key attribute type
of its own
´ Related to owner entity type from which it borrows an attribute type to
make up a key attribute type
HNR
RNR
HNR
Beds
Room
0..N
1..1
Hotel
BELONGS TO
Hname
21
Weak Entity Type
´ Weak entity type is always existence-dependent on owner entity type (not
vice versa!)
supstatus
supcity
SUPNR
supname
SUPPLIER
supaddress
1..1
ON_ORDER
po-sup
sup-po
0..N
PURCHASE
ORDER
PONR
podate
22
Ternary Relationship Types
´ Assume that we have a situation in which suppliers can supply products for projects.
A supplier can supply a particular product for multiple projects. A product for a
particular project can be supplied by multiple suppliers. A project can have a
particular supplier supply multiple products. The model must also include the quantity
and due date for supplying a particular product to a particular project by a
particular supplier.
Due date
Quantity
SUPPLY
SUPPLIER
SUPNR
0..N
0..N
PROJECT
0..N
PNR
PRODUCT
PRODNR
23
Ternary Relationship Types
SUPNR
PNR
SUPPLIES
SUPPLIER
0..N
0..N
PROJECT
0..N
0..N
0..N
PRODUCT
0..N
CAN SUPPLY
USES
PRODNR
Note: loss of semantics!
24
Ternary Relationship Types
´
Say we have two projects: project 1 uses a pencil and a pen, and project 2 uses a pen. Supplier
Peters supplies the pencil for project 1 and the pen for project 2 whereas supplier Johnson
supplies the pen for project 1.
´
From the binary relationship types, it is not clear who supplies the pen for project 1!
25
Ternary Relationship Types
26
Ternary Relationship Types
27
Examples of the ER Model
28
Examples of the ER Model
29
Limitations of the ER Model
´ ER model presents a temporary snapshot and cannot model
temporal constraints
´ Examples: a project needs to be assigned to a department after
one month, an employee cannot return to a department of
which he previously was a manager, a purchase order must be
assigned to a supplier after two weeks, etc.
´ ER model cannot guarantee the consistency across multiple
relationship types
´ Examples: an employee should work in the department that
he/she manages, employees should work on projects assigned
to departments to which the employees belong, suppliers can
only be assigned to purchase orders for products they can
supply
30
Limitations of the ER Model
´ Domains are not included in the ER model
´Examples: hours should be positive; prodtype must be
red, white, or sparkling, supstatus is an integer
between 0 and 100
´ Functions are not included in the ER model
´Examples: calculate average number of projects an
employee works on; determine which supplier
charges the maximum price for a product
31
Enhanced Entity Relationship (EER)
Model
´ Specialization/Generalization
´ Categorization
´ Aggregation
´ Examples of the EER Model
´ Designing the EER Model
32
Specialization/Generalization
´ Specialization refers to the process of defining a set of subclasses of an
entity type
´ Example: ARTIST superclass with subclasses SINGER and ACTOR
´ The specialization process defines an “is a” relationship
´ The specialization can then establish additional specific attribute types for
each subclass
´ Example: singer can have a music style attribute type
´ The specialization can also establish additional specific relationship types
for each subclass
´ Examples: actor can act in movies, singer can be part of a band
´ A subclass inherits all attribute types and relationship types from its
superclass
33
Specialization/Generalization
´ Generalization, also called abstraction, is the reverse
process of specialization
´Specialization corresponds to a top-down process of
conceptual refinement
´Generalization corresponds to a bottom-up process
of conceptual synthesis
34
Specialization/Generalization
35
Specialization/Generalization
´ The disjointness constraint specifies to what subclasses an entity of
the superclass can belong
´ A disjoint specialization is a specialization in which an entity can
be a member of at most one of the subclasses
´ An overlap specialization is a specialization whereby the same
entity may be a member of more than one subclass
´ The completeness constraint indicates if all entities of the superclass
should belong to one of the subclasses
´ A total specialization is a specialization in which every entity in
the superclass must be a member of some subclass
´ A partial specialization allows an entity to only belong to the
superclass and to none of the subclasses
36
Specialization/Generalization
ANR
ARTIST
aname
p
o
music
style
SINGER
ACTOR
37
Specialization/Generalization
PNR
PERSON
pname
t
d
PROFESSOR
STUDENT
38
Specialization/Generalization
´ In a specialization hierarchy, every subclass can only have a single
superclass and inherits the attribute types and relationship types of all its
predecessor superclasses all the way up to the root of the hierarchy
39
Specialization/Generalization
´ In a specialization lattice, a subclass can have multiple superclasses
(multiple inheritance)
VEHICLE
p
o
MOTORCYCLE
BOAT
CAR
TRIKE
AMPHIBIAN
40
Categorization
´ A category is a subclass that has several possible superclasses
´ Each superclass represents a different entity type
´ The category represents a collection of entities that is a subset of the union
of the superclasses
41
Categorization
´ Inheritance in the case of categorization corresponds to an entity inheriting
only the attributes and relationships of that superclass it is a member of
(selective inheritance)
´ A categorization can be total or partial
´ Total: all entities of the superclasses belong to the subclass
´ Partial: not all entities of the superclasses belong to the subclass
Note: total categorization can also be represented as a specialization/generalization!
42
Aggregation
´ Entity types that are related by a particular relationship type can be
combined or aggregated into a higher-level aggregate entity type
´ Aggregation is especially useful when the aggregate entity type has its
own attribute types and/or relationship types
CNR
PNR
0..N
1..M
CONSULTANT
PROJECT
PARTICIPATION
1..M
date
1..1
CONTRACT
CONTNR
43
Examples of the EER Model
44
Designing the EER Model
1.
2.
3.
4.
5.
6.
7.
8.
9.
Identify the entity types
Identify the relationship types and assert their degree
Assert the cardinality ratios and participation constraints (total
versus partial participation)
Identify the attribute types and assert whether they are simple or
composite, single- or multi-valued, derived or not
Link each attribute type to an entity type or a relationship type
Denote the key attribute type(s) of each entity type
Identify the weak entity types and their partial keys
Apply abstractions such as generalization/specialization,
categorization and aggregation
Assert the characteristics of each abstraction such as disjoint or
overlapping, total or partial
45
UML Class Diagram
´
´
´
´
´
´
´
´
´
´
´
Origin
Recap of object orientation
Classes
Variables
Access modifiers
Associations
specialization/generalization
Aggregation
UML example
Advanced UML modeling concepts
UML versus EER
46
Origin
´ The Unified Modeling Language (UML) is a modeling language that
assists in the specification, visualization, construction, and
documentation of artifacts of a software system
´ UML was accepted as a standard by the Object Management
Group (OMG) in 1997 and approved as an ISO standard in 2005
´ The most recent version is UML 2.5, introduced in 2015
´ UML offers various diagrams such as use case diagrams, sequence
diagrams, package diagrams, deployment diagrams, etc.
´ From a database modeling perspective, the class diagram is the
most important
47
Recap of Object Orientation
´ A class is a blueprint definition for a set of objects
´Compare to entity type in ER
´ Conversely, an object is an instance of a class
´Compare to entity in ER
´ Object is characterized by both variables and methods
´Variables correspond to attribute types and variable
values to attributes in the ER
´No ER equivalent for methods
48
Recap of Object Orientation
´ Example class Student
´ Objects: Bart, Wilfried, Seppe
´ Example variables: student’s name, gender, and birthdate
´ Example methods: calcAge, isBirthday,
hasPassed(courseID)
´ Information hiding (aka encapsulation) states that the
variables of an object can only be accessed through either
getter or setter methods
´ getter method is used to retrieve the value of a variable
´ setter method assigns a value to a variable
49
Recap of Object Orientation
´ Inheritance
´A superclass can have one or more subclasses which
inherit both the variables and methods from the
superclass
´ Method overloading
´Various methods in the same class can have the
same name, but a different number or type of input
arguments
50
Classes
51
Variables
´ Variables with unique values (similar to key attribute
types in the ER model) are not directly supported in UML
´ UML provides a set of primitive types such as string,
integer, and Boolean, which can be used to define
variables in the class diagram
´ It is also possible to define your own data types or
domains
´ Composite/multi-valued/derived variables
52
Variables
53
Access Modifiers
´ Access modifiers can be used to specify who can have
access to a variable or method
´ Three types
´ private (denoted by “–”): variable or method can only be
accessed by the class itself
´ public (denoted by “+”): variable or method can be
accessed by any other class
´ protected (denoted by “#”): variable or method can be
accessed by both the class and its subclasses
´ It is recommended to declare all variables as private
(information hiding)
54
Access Modifiers
55
Associations
´ An association corresponds to a relationship type in ER
´ A particular occurrence of an association is referred to
as a link, which corresponds to a relationship in ER
´ Multiple associations can be defined between the same
classes
´ An association is characterized by its multiplicities
(cardinalities in the ER model)
56
Associations
UML class diagram
ER model cardinality
multiplicity
*
0..N
0..1
0..1
1..*
1..N
1
1..1
57
Associations
´Association class
´Unidirectional versus bidirectional association
´Qualified association
58
Association Class
´ In case an association has variables and/or methods on its own, it can be
modeled as an association class
59
Unidirectional versus Bidirectional
Association
´ Associations can be augmented with direction-reading arrows, which
specify the direction of querying or navigating through it
Unidirectional
Bidirectional
60
Qualified Association
´A qualified association is a special type of
association that uses a qualifier to further refine
the association
´The qualifier specifies one or more variables that
are used as an index key for navigating from the
qualified class to the target class
´reduces the multiplicity of the association
because of this extra key
61
Qualified Association
62
Qualified Association
´ Qualified associations can be used to represent weak entity types
63
Specialization/Generalization
ARTIST
{partial; overlap}
SINGER
ACTOR
64
Aggregation
´ Aggregation represents a composite to part relationship
whereby a composite class contains a part class
´ Two types in UML: shared and composite aggregation
´ Shared aggregation (aka aggregation)
´ Part object can simultaneously belong to multiple
composite objects
´ Maximum multiplicity at the composite side is
undetermined
´ Part object can also occur without belonging to a
composite object
´ Loose coupling
65
Aggregation
´ Composite aggregation (aka composition)
´The part object can only belong to one
composite
´Maximum multiplicity at the composite side is 1
´Minimum multiplicity can be either 1 or 0
´Tight coupling
66
Aggregation
67
UML Example
68
Advanced UML Modeling Concepts
´Changeability property
´Object constraint language (OCL)
´Dependency relationship
69
Changeability Property
´
The changeability property specifies the type of operations that are allowed on either variable
values or links
´
Three common choices
´ Default which allows any type of edit
´ AddOnly which only allows additional values or links to be added (so no deletions)
´ Frozen which allows no further changes once the value or link is established
70
Object Constraint Language (OCL)
´ The object constraint language can be used to specify
various types of constraints in a declarative way
´No control flow or procedural code is provided
´Can be used to specify invariants for classes, pre- and
post-conditions for methods, to navigate between
classes, or to define constraints on operations
´ See www.omg.org/spec/OCL
71
Object Constraint Language (OCL)
´A class invariant is a constraint that holds for all
objects of a class
´Example: SUPPLIER: SUPSTATUS>100
´Pre- and post-conditions on methods must be
true when a method either begins or ends
´Example: before the method withdrawal can
be executed, the balance must be positive;
after it has been executed, the balance must
still be positive
72
Object Constraint Language (OCL)
´ Constraint: manager of a department should be at least ten years employed
Context: Department
invariant: self.managed_by.yearsemployed>10
73
Object Constraint Language (OCL)
´ A department should have at least 20 employees
Context: Department
invariant: self.workers→size()>20
74
Object Constraint Language (OCL)
´ Constraint: a manager of a department must also work in the department
Context: Department
Invariant: self.managed_by.works_in=self
75
Dependency Relationship
´ Dependency defines a “using” relationship that states that a change in the
specification of a UML modeling concept may affect another modeling
concept that uses it
76
UML versus EER
UML class diagram
EER model
Class
Entity type
Object
Entity
Variable
Attribute type
Variable value
Attribute
Method
–
Association
Relationship type
Link
Relationship
77
UML versus EER
UML class diagram
EER model
Qualified association
Weak entity type
Specialization/generalization
Specialization/generalization
Aggregation
Aggregation (composite/shared)
OCL
–
Multiplicity
*
Cardinality
0..N
0..1
0..1
1..*
1..N
1
1..1
78
Conclusions
´Phases of database design
´Entity relationship (ER) model
´Enhanced entity relationship (EER) model
´UML class diagram