Advanced Data Base Management
and Information Assurance
1
2
Grading
´ Catalogies
´ Assignments
40%
´ One Midterms
20%
´ Research Paper
15%
´ Group Project
25%
´ Letter Grades
´ 94% A, 90% A-,
´ 86% B+, 83% B, 80% B-,
´ 76% C+, 73% C, 70 C-,
´ 66 % D+, 63% D, 60% D-,
´ < 60% F
´ No Rounding Off at the final grade
3
´ Midterm, November 13, 2019
´ Group Project Presentation, December 4, 2019
4
Assignment Schedule
´ Research Paper Topics and references – October 2, 2019
´ Assignment 1– October 11, 2019
´ Assignment 2 -- October 25, 2019
´ Assignment 3 – November 8, 2019
´ Assignment 4 – November 22, 2019
´ Research Paper – November 6, 2019
5
Research Paper
´ You are required to write a 10-page research paper in MLA Style.
´ The paper topics includes
´ Security challenges in DBMS
´ Legal and ethical issues in database management
´ Utilization and impacts with business intelligence
´ Research Paper due November 6, 2019 via Blackboard
6
Fundamental Concepts of
Database Management
Chapter 1
9
Introduction
´ Application of Database Technology
´ Key definitions
´ File Versus database approach to data management
´ Elements of a database system
´ Advantages of database systems and database
management
10
What is Data?
´ Information in raw or unorganized form (such as
alphabets, numbers, or symbols) that refer to, or
represent, conditions, ideas, or objects. Data is limitless
and present everywhere in the universe.
´ Computers: Symbols or signals that are input, stored, and
processed by a computer, for output as usable
information.
11
Difficulties in Managing Data
´ Data increases exponentially with time
´ Multiple sources of data
´ New sources of data
´ Data rot, or data degradation
´ Data security, quality, and integrity
´ Government Regulation
12
Data Governance
´ An approach to managing information across an entire
organization.
´ Master Data Management
´ Master Data
13
DIKW Pyramid
´ Data
´ Information
´ Knowledge
´ Wisdom
14
DIKW Pyramid
15
Principles of Modern Data Architecture
´ Data is commodity and shared asset
´ Ensure privacy and proper use
´ Ensure security and access control
´ Information through data stewardship
´ Ensure a common vocabulary
´ Provide the right interface for consumption
´ Eliminate data copies and movement
16
What is Database?
´ Collection of data central to some enterprise
´ Essential to operation of enterprise
´ Contains the only record of enterprise activity
´ An asset in its own right
´ Historical data can guide enterprise strategy
´ Of Interest to other enterprises
´ State of database mirrors state of enterprise
´ Database is persistent
17
Applications of Database Technology
´ Storage and retrieval of traditional numeric and alphanumeric data
in an inventory application
´ Multimedia applications (e.g., YouTube, Spotify)
´ Biometric applications (e.g., fingerprints, retina scans)
´ Wearable applications (e.g., FitBit, Apple Watch)
´ Geographical information systems (GIS) applications (e.g., Google
Maps)
´ Sensor applications (e.g., nuclear reactor)
´ Big Data applications (e.g., Walmart)
´ Internet of Things (IoT) applications (e.g., Telematics)
18
Key Definitions
´ A database can be defined as a collection of related
data items within a specific business process or problem
setting
´ Has a target group of users and applications
´ A database management system (DBMS) is the software
package used to define, create, use, and maintain a
database
´ CONSISTS of several software modules
´ The combination of a DBMS and a database is then
often called a database system
19
File versus Database Approach to
Data Management
´ File approach
´ Data files only contain the data
´ Data definitions are included in each application
´ Application can make use of one or more files
´ Duplicate or redundant information will be stored
´ Danger of inconsistent data
´ Strong coupling between applications and data
´ Hard to manage concurrency control
´ Hard to integrate applications aimed at providing crosscompany services
20
File versus Database Approach to
Data Management
Invoicing
CRM
GIS
CustomerNr
CustomerName
VATcode
CustomerNr
CustomerName
Turnover
CustomerNr
CustomerName
ZipCode
Duplicate data!
21
File versus Database Approach to
Data Management
´ Database approach
´ Data are stored and managed centrally by a DBMS
´ Applications now directly interface with the DBMS
´ DBMS stores and manages two types of data: raw data and
metadata
´ Metadata are no longer included in the applications
´ Superior to the file approach in terms of efficiency, consistency,
and maintenance
´ Loose coupling between applications and data
´ Facilities provided for data querying and retrieval
22
File versus Database Approach to
Data Management
Invoicing
CRM
GIS
DBMS
Raw data
Catalog
23
File versus Database Approach to
Data Management
File approach
Database approach (SQL)
Procedure FindCustomer;
SELECT *
begin
open file Customer.txt;
FROM Customer
Read(Customer)
While not EOF(Customer)
If Customer.name='Bart' then
display(Customer);
EndIf
Read(Customer);
EndWhile;
End;
WHERE
name = 'Bart'
24
Elements of a Database System
´Database model versus instances
´Data model
´The three-layer architecture
´Catalog
´Database users
´Database languages
25
Database Model versus Instances
´ Database model or database schema provides the description of
the database data at different levels of detail and specifies the
various data items, their characteristics and relationships,
constraints, storage details, etc.
´ Specified during database design and not expected to change
too frequently
´ Stored in the catalog
´ Database state represents the data in the database at a particular
moment
´ Also called the current set of instance
´ Typically changes on an ongoing basis
26
Database Model versus Instances
´ Database model
Student (number, name, address, email)
Course
(number, name)
Building (number, address)
27
Database Model versus Instances
´ Database state
28
Data Model
´ A database model comprises different data models,
each describing the data from different perspectives
´ A data model provides a clear and unambiguous
description of the data items, their relationships, and
various data constraints from a particular perspective
29
Data Model
´ A conceptual data model provides a high-level description of the
data items with their characteristics and relationships
´ Communication instrument between information architect and
business user
´ Should be implementation-independent, user-friendly, and close
to how the business user perceives the data
´ Usually represented using an enhanced-entity relationship (EER)
model, or an object-oriented model
´ Logical data model is a translation or mapping of the conceptual
data model toward a specific implementation environment
´ Can be a hierarchical, CODASYL, relational, object-oriented,
extended relational, XML, or NoSQL model
30
Data Model
´ Logical data model can be mapped to an internal data model
that represents the data’s physical storage details
´ Clearly describes which data atr stored where, in what format,
which indexes are provided to speed up retrieval, etc.
´ Highly DBMS specific
´ External data model contains various subsets of the data items in
the logical model, also called views, tailored toward the needs of
specific applications or groups of users
31
The Three-Layer Architecture
32
The Three-Layer Architecture
´ Conceptual\logical layer
´ Contains the conceptual and logical data models
´ External layer
´ External data model which includes views
´ Used to control data access and enforce security
´ Internal layer
´ Includes the internal data model
´ Changes in one layer should have no to minimal impact on the
others
´ Physical data independence
´ Logical data independence
33
The Three-Layer Architecture
34
Catalog
´ Heart of the DBMS
´ Contains the data definitions, or metadata, of your
database application
´ Stores the definitions of the views, logical and internal
data models, and synchronizes these three data models
to make sure their consistency is guaranteed
35
Database Users
´ Information architect designs the conceptual data model
´ Closely interacts with the business user
´ Database designer translates the conceptual data model into a
logical and internal data model
´ Database administrator (DBA) is responsible for the implementation
and monitoring of the database
´ Application developer develops database applications in a
programming language such as Java or Python
´ Business user will run these applications to perform specific
database operations
36
Database Languages
´ Data Definition Language (DDL) is used by the DBA to express
the database’s external, logical, and internal data models
´ Definitions are stored in the catalog
´ Data Manipulation Language (DML) is used to retrieve, insert,
delete, and modify data
´ DML statements can be embedded in a programming
language, or entered interactively through a front-end
querying tool
´ Structured Query Language (SQL) offers both DDL and DML
statements for relational database systems
37
Advantages of Database Systems and
Database Management
´ Data independence
´ Database modeling
´ Managing data redundancy
´ Specifying integrity rules
´ Concurrency control
´ Backup and recovery facilities
´ Data security
´ Performance utilities
38
Data Independence
´ Data independence implies that changes in data definitions have
minimal to no impact on the applications
´ Physical data independence implies that neither the applications, nor
the views or logical data model must be changed when changes are
made to the data storage specifications in the internal data model
´ DBMS should provide interfaces between logical and internal data
models
´ Logical data independence implies that software applications are
minimally affected by changes in the conceptual or logical data
model
´ Views in the external data model will act as a protective shield
´ DBMS must provide interfaces between conceptual/logical and
external layer
39
Database Modeling
´ A data model is an explicit representation of the data items
together with their characteristics and relationships
´ A conceptual data model should provide a formal and
perfect mapping of the data requirements of the business
process and is made in collaboration with the business user
´ Translated into logical and internal data model
´ It is important that a data model’s assumptions and
shortcomings are clearly documented
40
Managing Data Redundancy
´ Duplication of data can be desired in distributed
environments to improve data retrieval performance
´ DBMS is now responsible for the management of the
redundancy by providing synchronization facilities to
safeguard data consistency
´ Compared to the file approach, the DBMS guarantees
correctness of the data without user intervention
41
Specifying Integrity Rules
´ Syntactical rules specify how the data should be represented and
stored
´ e.g., customerID is an integer; date of birth should be stored as
month, day, and year
´ Semantical rules focus on the semantical correctness or meaning of the
data
´ e.g., customerID is unique; account balance should be > 0;
customer cannot be deleted if he/she has pending invoices
´ Integrity rules are specified as part of the conceptual/logical data
model and stored in the catalog
´ Directly enforced by the DBMS instead of applications
42
Concurrency Control
´ DBMS has built-in facilities to support concurrent or parallel
execution of database programs
´ Key concept is a database transaction
´ Sequence of read/write operations considered to be an
atomic unit in the sense that either all operations are
executed or none at all
´ Read/write operations can be executed at the same time by
the DBMS
´ DBMS should avoid inconsistencies!
43
Concurrency Control
´ Lost update problem
44
Concurrency Control
´ DBMS must support ACID (Atomicity, Consistency, Isolation,
Durability) properties
´ Atomicity requires that a transaction should either be executed
in its entirety or not at all
´ Consistency assures that a transaction brings the database from
one consistent state to another
´ Isolation ensures that the effect of concurrent transactions
should be the same as if they would have been executed in
isolation
´ Durability ensures that the database changes made by a
transaction declared successful can be made permanent under
all circumstances
45
Backup and Recovery Facilities
´ Backup and recovery facilities can be used to deal with the
effect of loss of data due to hardware or network errors, or
bugs in system or application software
´ Backup facilities can either perform a full or incremental
backup
´ Recovery facilities allow restoration of the data to a previous
state after loss or damage occurs
46
Data Security
´ Data security can be enforced by the DBMS
´ Some users have read access, while others have write access
to the data (role-based functionality)
´ e.g., vendor-managed inventory (VMI)
´ Data access can be managed via logins and passwords
assigned to users or user accounts
´ Each account has its own authorization rules that can be
stored in the catalog
47
Performance Utilities
´ There are three KPIs of a DBMS
´ Response time, denoting the time elapsed between issuing a
database request and the successful termination thereof
´ Throughput rate, representing the transactions a DBMS can
process per unit of time
´ Space utilization, referring to the space utilized by the DBMS to
store both raw data and metadata
´ DBMSs come with various types of utilities aimed at improving these
KPIs
´ e.g., utilities to distribute and optimize data storage, to tune
indexes for faster query execution, to tune queries to improve
application performance, or to optimize buffer management
48
Conclusions
´ Applications of database technology
´ Key definitions
´ File versus database approach to data management
´ Elements of a database system
´ Advantages of database systems and database
management
Architecture and
Classification of DBMSs
Chapter Two
1
Introduction
´Architecture of a DBMS
´Categorization of DBMSs
2
Architecture of a DBMS
3
Architecture of a DBMS
´Connection and security manager
´DDL compiler
´Query processor
´Storage manager
´DBMS utilities
´DBMS interfaces
4
Connection and Security Manager
´ Connection manager provides facilities to set up a
database connection (locally or through a network)
´Verifies logon credentials and returns a connection
handle
´Database connection can either run as a single
process or as a thread within a process
´ Security manager verifies whether a user has the right
privileges
´Read versus write access
5
DDL Compiler
´ Compiles the data definitions specified in DDL
´ Ideally three DDLs (internal/logical/external data model)
´ DDL compiler first parses the DDL definitions and checks
their syntactical correctness
´ DDL compiler then translates the data definitions to an
internal format and generates errors if required
´ Upon successful compilation, the DDL compiler registers
the data definitions in the catalog
6
Query Processor
´ Query processor assists in the execution of database
queries such as retrieval, insertion, update, or removal of
data
´ Key components:
´DML compiler
´Query parser
´Query rewriter
´Query optimizer
´Query executor
7
DML Compiler
´ DML compiler compiles the DML statements
´ Procedural DML
´ DML explicitly specifies how to navigate in the database
´ record-at-a-time DML
´ no query processor
´ Declarative DML
´ DML specifies what data should be retrieved or what changes
should be made
´ set-at-a-time DML
´ query processor
8
DML Compiler
import java.sql.*;
public class JDBCExample1 {
public static void main(String[] args) {
try {
System.out.println(“Loading JDBC driver…”);
Class.forName(“com.mysql.jdbc.Driver”);
System.out.println(“JDBC driver loaded!”);
} catch (ClassNotFoundException e) {
throw new RuntimeException(e);
}
String url =
“jdbc:mysql://localhost:3306/employeeschema”;
String username = “root”;
String password = “mypassword123”;
String query = “select E.Name, D.DName” +
“from employee E, department D” +
“where E.DNR=D.DNR;”;
Connection connection = null;
Statement stmt=null;
try {
System.out.println(“Connecting to database”);
connection = DriverManager.getConnection(url,
username, password);
System.out.println(“MySQL Database connected!”);
stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(query);
while (rs.next()) {
System.out.print(rs.getString(1));
System.out.print(” “);
System.out.println(rs.getString(2));
}
stmt.close();
} catch (SQLException e) {
System.out.println(e.toString());
} finally {
System.out.println(“Closing the connection.”);
if (connection != null) {
try {
connection.close();
} catch (SQLException ignore) {}}}}
9
DML Compiler
´ Impedance mismatch problem
´mapping between OO (e.g., Java) and relational
(e.g., SQL) concepts
´ Impedance mismatch solutions
´host language and DBMS with comparable data
structures (e.g., Java and OODBMS)
´middleware to map the data structures from the
DBMS to the host language and vice versa
10
DML Compiler
Java
public class Employee {
private int EmployeeID;
private String Name;
private String Gender;
private int DNR;
public int getEmployeeID() {
return EmployeeID;
}
public void setEmployeeID( int id ) {
this.EmployeeID = id;
}
public String getName() {
return Name;
}
public void setName( String name ) {
this.Name = name;
}
…}
SQL
CREATE TABLE Employee (
‘EmployeeID’ INT NOT NULL,
‘Name’ VARCHAR(45) NULL,
‘Gender’ VARCHAR(45) NULL,
‘DNR’ INT NULL)
EmployeeID
Name
Gender
DNR
100
Bart Baesens
Male
2
110
Wilfried Lemahieu
Male
4
120
Seppe vanden Broucke
Male
6
…
11
DML Compiler
´DML compiler starts by extracting the DML
statements from the host language.
´DML compiler then collaborates with the query
parser, query rewriter, query optimizer, and
query executor for executing the DML
statements
´Errors are generated and reported if necessary
12
Query Parser and Query Rewriter
´Query parser parses the query into an internal
representation format
´Query parser checks the query for syntactical
and semantical correctness
´Query rewriter optimizes the query,
independently of the current database state
13
Query Optimizer
´ Query optimizer optimizes the query based upon the current
database state (based upon e.g., predefined indexes)
´ Query optimizer comes up with various query execution plans and
evaluates their cost in terms of estimated
´ number of I/O operations
´ CPU processing cost
´ execution time
´ Estimates based on catalog information combined with statistical
inference
´ Query optimizer is a key competitive asset of a DBMS
14
Query Executor
´Result of the query optimization is a final
execution plan
´Query executor takes care of the actual
execution by calling on the storage manager to
retrieve the data requested
15
Storage Manager
´Storage manager governs physical file access
and supervises the correct and efficient storage
of data
´Storage manager consists of
´transaction manager
´buffer manager
´lock manager
´recovery manager
16
Transaction Manager
´ Transaction manager supervises execution of database
transactions
´ a database transaction is a sequence of read/write
operations, considered to be an atomic unit
´ Transaction manager creates a schedule with interleaved
read/write operations
´ Transaction manager guarantees ACID properties
´ COMMIT a transaction upon successful execution and
ROLLBACK a transaction upon unsuccessful execution
17
Buffer Manager
´ Buffer manager manages buffer memory of the DBMS
´ Buffer manager intelligently caches data in the buffer
´ Example strategies:
´Data locality: data recently retrieved are likely to be
retrieved again
´20/80 law: 80% of the transactions read or write only
20% of the data
´ Buffer manager needs to adopt smart replacement
strategy in case buffer is full
´ Buffer manager needs to interact with lock manager
18
Lock Manager
´ Lock manager provides concurrency control which
ensures data integrity at all times
´ Two types of locks: read and write locks
´ Lock manager is responsible for assigning, releasing, and
recording locks in the catalog
´ Lock manager makes use of a locking protocol which
describes the locking rules, and a lock table with the
lock information
19
Recovery Manager
´ Recovery manager supervises the correct execution of
database transactions
´ Recovery manager keeps track of all database
operations in a logfile
´ Recovery manager will be called upon to undo actions
of aborted transactions or during crash recovery
20
DBMS Utilities
´ Loading utility
´ Reorganization utility
´ Performance-monitoring utilities
´ User management utilities
´ Backup and recovery utility
21
DBMS Interfaces
´ Web-based interface
´ Stand-alone query language interface
´ Command-line interface
´ Forms-based interface
´ Graphical user interface
´ Natural language interface
´ Admin interface
´ Network interface
22
DBMS Interfaces
Navigator
window
Query window
Results
window
Log window
23
Categorization of DBMSs
´ Categorization based on data model
´ Categorization based on degree of simultaneous access
´ Categorization based on architecture
´ Categorization based on usage
24
Categorization Based on Data Model
´ Hierarchical DBMSs
´ adopt a tree-like data model
´ DML is procedural and record-oriented
´ no query processor (logical and internal data model intertwined)
´ e.g., IMS (IBM)
´ Network DBMSs
´ use a network data model
´ CODASYL DBMSs
´ DML is procedural and record-oriented
´ no query processor (logical and internal data model intertwined)
´ e.g., CA-IDMS (Computer Associates)
25
Categorization Based on Data Model
´ Relational DBMSs
´ use the relational data model
´ currently the most popular in industry
´ SQL (declarative and set-oriented)
´ query processor
´ strict separation between the logical and internal data model
´ e.g., MySQL (open source, Oracle), Oracle DBMS (Oracle), DB2
(IBM), Microsoft SQL (Microsoft)
26
Categorization Based on Data Model
´ Object-Oriented DBMSs (OODBMS)
´Based upon the OO data model
´No impedance mismatch in combination with OO
host language
´e.g., db4o (open source, Versant), Caché
(Intersystems), GemStone/S (GemTalk Systems)
´only successful in niche markets due to their
complexity
27
Categorization Based on Data Model
´ Object-relational DBMSs (ORDBMSs)
´Also referred to as extended relational DBMSs
(ERDBMSs)
´Use a relational model extended with OO concepts
´DML is SQL (declarative and set-oriented)
´e.g., Oracle DBMS (Oracle), DB2 (IBM), Microsoft SQL
(Microsoft)
28
Categorization Based on Data Model
´ XML DBMSs
´Use the XML data model to store data
´Native XML DBMSs (e.g., BaseX, eXist) map the tree
structure of an XML document to a physical storage
structure
´XML-enabled DBMSs (e.g., Oracle, IBM DB2) are
existing DBMSs that are extended with facilities to
store XML data
29
Categorization Based on Data Model
´ NoSQL DBMSs
´Targeted at storing big and unstructured data
´Can be classified into key–value stores, columnoriented databases and graph databases
´Focus on scalability and the ability to cope with
irregular or highly volatile data structures
´e.g., Apache Hadoop, MongoDB, Neo4j
30
Categorization Based Upon Degree of
Simultaneous Access
´ Single-user versus multi-user systems
incoming
connections
dispatcher
server
instance 1
server
instance 2
server
instance 3
database
31
Categorization Based on Architecture
´ Centralized DBMS architecture
´ data is maintained on a centralized server
´ Client server DBMS architecture
´ active clients request services from passive servers
´ fat server versus fat client variant
´ n-tier DBMS architecture
´ client with GUI functionality, application server with applications,
database server with DBMS and database, and web server for web
based access
32
Categorization Based on Architecture
´ Cloud DBMS architecture
´DBMS and database are hosted by a third-party
cloud provider
´e.g., Apache Cassandra project and Google’s
BigTable
´ Federated DBMS
´Provides a uniform interface to multiple underlying
data sources
´Hides the underlying storage details to facilitate data
access
33
Categorization Based on Architecture
´ In-memory DBMS
´Stores all data in internal memory instead of slower
external storage (e.g., disk)
´Often used for real-time purposes
´e.g., HANA (SAP)
34
Categorization Based on Usage
´ On-Line Transaction Processing (OLTP)
´ Focus on managing operational or transactional data
´ Database server must be able to process lots of simple transactions per
unit of time
´ DBMS must have good support for processing a high volume of short,
simple queries
´ On-Line Analytical Processing (OLAP)
´ Focus on using operational data for tactical or strategical decisionmaking
´ Limited number of users formulates complex queries
´ DBMS should support efficient processing of complex queries which
often come in smaller volumes
35
Categorization Based on Usage
´ Big Data and analytics
´ NoSQL databases
´ Focus on more flexible, or even schema-less, database
structures
´ Store unstructured information such as emails, text
documents, Twitter tweets, Facebook posts, etc.
´ Multimedia
´ Multimedia DBMSs provide storage of multimedia data
such as text, images, audio, video, 3D games, etc.
´ Should also provide content-based query facilities
36
Categorization Based on Usage
´ Spatial applications
´Spatial DBMSs support storage and querying of spatial
data (both 2D and 3D)
´Geographical information systems (GIS)
´ Sensoring
´Sensor DBMSs manage sensor data such as biometric
data from wearables, or telematics data
37
Categorization Based on Usage
´ Mobile
´ Mobile DBMSs run on smartphones, tablets, or other mobile
devices
´ Should always be online, have a small footprint, and be able
to deal with limited processing power, storage, and battery
life
´ Open source
´ Code of open source DBMSs is publicly available and can be
extended by anyone
´ See www.sourceforge.net
´ e.g., MySQL (Oracle)
38
Conclusions
´Architecture of a DBMS
´Categorization of DBMSs
Organizational Aspects of
Data Management
Chapter Four
1
Introduction
´Data Management
´Roles in data management
2
Data Management
´Catalogs and the role of metadata
´Metadata modeling
´Data quality
´Data governance
3
Catalogs and the Role of Metadata
´Just as raw data, also metadata are data that
need to be properly modeled, stored, and
managed
´Concepts of data modeling should also be
applied to metadata
´In a DBMS approach, metadata are stored in a
catalog (aka data dictionary, data repository),
which constitutes the heart of the database
system
4
Catalogs and the Role of Metadata
´ The catalog provides an important source of information for
end-users, application developers, as well as the DBMS itself
´ Catalog should provide:
´ an extensible metamodel
´ import/export facilities
´ support for maintenance and re-use of metadata
´ monitoring of integrity rules
´ facilities for user access
´ statistics about the data and their usage for the DBA and
query optimizer
5
Metadata Modeling
´A metamodel is a data model for metadata
´A database design process can be used to
design a database storing metadata
´Design a conceptual model of the metadata:
EER model or UML model
6
Metadata Modeling
7
Data Quality
´Data quality (DQ) is often defined as “fitness for
use,” which implies the relative nature of the
concept
´Data quality determines the intrinsic value of the
data to the business
´Poor DQ impacts organizations in many ways
8
Data Quality
´DQ is a multidimensional concept in which each
dimension represents a single aspect or
construct, comprising both objective and
subjective perspectives
´A DQ framework categorizes the different
dimensions of data quality
´Example: Wang et al. (1996)
´Four categories: intrinsic, contextual,
representation, access
Data Quality
Category DQ dimensions Definitions
Accuracy
The extent to which data is certified, error-free, correct,
flawless and reliable
Intrinsic
9
Objectivity
The extent to which data is unbiased, unprejudiced,
based on facts and impartial
Reputation
The extent to which data is highly regarded in terms of
its sources or content
Data Quality
10
Category DQ dimensions Definitions
Completeness The extent to which data is not missing and covers the needs of the tasks
and is of sufficient breadth and depth of the task at hand
Contextual
Appropriate-
The extent to which the volume of data is appropriate for the task at hand
amount
Value-added The extent to which data is beneficial and provides advantages from its use
Relevance
The extent to which data is applicable and helpful for the task at hand
Timeliness
The extent to which data is sufficiently up-to-date for the task at hand
Actionable
The extent to which data is ready for use
11
Data Quality
Category DQ dimensions
Interpretable
Definitions
The extent to which data is in appropriate languages,
Representation
symbols and the definitions are clear
Easily-understandable
The extent to which data is easily comprehended
Consistency
The extent to which data is continuously presented in the
same format
Concisely-represented (CR) The extent to which data is compactly represented, wellpresented, well-organized, and well-formatted
Alignment
The extent to which data is reconcilable (compatible)
Data Quality
12
Category DQ dimensions
Accessibility
Definitions
The extent to which data is available, or easily
Access
and swiftly retrievable
Security
The extent to which access to data is restricted
appropriately to maintain its security
Traceability
The extent to which data is traceable to the
source
13
Data Quality
´ Accuracy refers to whether the data values stored for an object are
the correct values
´ Often correlated with other DQ dimensions
´ Completeness can be viewed from at least three perspectives:
´ Schema completeness: refers to the degree to which entity
types and attribute types are missing from the schema
´ Column completeness: refers to the degree to which there exist
missing values in a column of a table
´ Population completeness: refers to the degree to which the
necessary members of a population are present or not
14
Data Quality
´The consistency dimension can also be viewed
from several perspectives:
´Consistency of redundant or duplicated data
in one table or in multiple tables
´Consistency between two related data
elements
´Consistency of format for the same data
element used in different tables
15
Data Quality
´The accessibility dimension reflects the ease of
retrieving the data from the underlying data
sources
´Often involves a tradeoff with security
16
Data Quality
´ Common causes of bad data quality are:
´ Multiple data sources: multiple sources with the same data may
produce duplicates; a problem of consistency
´ Subjective judgment in data production: data production using human
judgment can result in biased information; a problem of objectivity
´ Limited computing resources: lack of sufficient computing resources
may limit the accessibility of relevant data; a problem of accessibility
´ Volume of data: Large volumes of stored data make it difficult to
access needed information in a reasonable time; a problem of
accessibility
´ Changing data needs: data requirements change on an ongoing basis;
a problem of relevance
´ Different processes updating the same data; a problem of consistency
17
Data Governace
´ To manage and safeguard data quality, a data governance
culture should be put in place assigning clear roles and
responsibilities
´ Different frameworks have been introduced for data quality
management and data quality improvement
´ Examples: Total Data Quality Management (TDQM), Total Quality
Management (TQM), Capability Maturity Model Integration
(CMMI), ISO 9000, Control Objectives for Information and
Related Technology (CobiT), Data Management Body of
Knowledge (DMBOK), Information Technology Infrastructure
Library (ITIL), and Six Sigma
18
Data Governance
• Define
• Improve
• Measure
Identify
pertinent
DQ
dimensions
Assess DQ
level along
DQ
dimensions
Present
improveme
nt actions
Investigate
DQ
problems
and
analyze
root causes
• Analyze
19
Roles in Data Management
´Information architect (aka information analyst)
´Responsible for designing the conceptual
data model
´Bridges the gap between the business
processes and the IT environment
´Closely collaborates with the database
designer, who may assist in choosing the type
of conceptual data model (e.g., EER or UML)
and the database modeling tool
20
Roles in Data Management
´Database designer
´Translates the conceptual data model into a
logical and internal data model
´Also assists the application developers in
defining the views of the external data model
´Defines company-wide uniform naming
conventions when creating the various data
models
21
Roles in Data Management
´ Data owner
´Has the authority to ultimately decide on the access
to, and usage of, the data
´Should be able to insert or update data
´ Data steward
´DQ experts in charge of ensuring the quality of both
the actual business data and the metadata
´Also in charge of taking DQ initiatives such as
corrective and/or preventive measures
22
Roles in Data Management
´ Database administrator (DBA)
´ Responsible for the implementation and monitoring of the
database
´ Closely collaborates with network and system managers
´ Also interacts with database designers
´ Data scientist
´ Responsible for analyzing data using state-of-the-art analytical
techniques to provide new insights into, e.g., customer behavior
´ Has a multidisciplinary profile combining ICT skills with
quantitative modeling, business understanding, communication,
and creativity
23
Conclusions
´ Data Management
´ Roles in data management
Relational Databases:
Structured Query Language
(SQL)
Chapter Seven
1
Introduction
´ Relational database management systems and SQL
´ SQL Data Definition Language
´ SQL Data Manipulation Language
´ SQL views
´ SQL indexes
´ SQL privileges
´ SQL for metadata management
2
Relational Database Management
Systems and SQL
´ Key Characteristics of SQL
´ Three-Level Database Architecture
3
Key Characteristics of SQL
´ First version, SQL-86 in 1986, most recent version in 2011
(SQL:2011)
´ Accepted by the American National Standards Institute
(ANSI) in 1986 and by the International Organization for
Standardization (ISO) in 1987
´ Each vendor provides its own implementation (also
called SQL dialect) of SQL
4
Key Characteristics of SQL
´Set-oriented and declarative
´Free-form language
´Case insensitive
´Can be used both interactively from a
command prompt or executed by a program
5
Key Characteristics of SQL
6
Key Characteristics of SQL
7
Three-Level Database Architecture
8
SQL Data Definition Language
´ Key DDL concepts
´ DDL example
´ Referential integrity constraints
´ DROP and ALTER command
9
Key DDL Concepts
´ SQL schema is a grouping of tables and other database
objects such as views, constraints, and indexes which
logically belong together
CREATE SCHEMA PURCHASE AUTHORIZATION BBAESENS
´ SQL table implements a relation from the relational
model
CREATE TABLE PRODUCT …
CREATE TABLE PURCHASE.PRODUCT …
10
Key DDL Concepts
Data type
Description
CHAR(n)
Holds a fixed-length string with size n
VARCHAR(n)
Holds a variable-length string with maximum size n
SMALLINT
Small integer (no decimal) between -32768 and 32767
INT
Integer (no decimal) between -2147483648 and 2147483647
FLOAT(n,d)
Small number with a floating decimal point. The total maximum number of digits is n with a maximum of d digits to
the right of the decimal point
DOUBLE(n,d)
Large number with a floating decimal point. The total maximum number of digits is n with a maximum of d digits to
the right of the decimal point
DATE
Date in format YYYY-MM-DD
DATETIME
Date and time in format YYYY-MM-DD HH:MI:SS
TIME
Time in format HH:MI:SS
BOOLEAN
True or false
BLOB
Binary large object (e.g., image, audio, video)
11
Key DDL Concepts
CREATE DOMAIN PRODTYPE_DOMAIN AS VARCHAR(10)
CHECK (VALUE IN (‘white’, ‘red’, ‘rose’,
‘sparkling’))
12
Key DDL Concepts
´ Column constraints
´ PRIMARY KEY constraint defines the primary key of the
table
´ FOREIGN KEY constraint defines a foreign key of a table
´ UNIQUE constraint defines an alternative key of a table
´ NOT NULL constraint prohibits null values for a column
´ DEFAULT constraint sets a default value for a column
´ CHECK constraint defines a constraint on the column
values
13
DDL Example
SUPNR
0..M
supname
SUPPLIES
1..1
ON_ORDER
PRODNR
prod- supsup prod
SUPPLIER
supaddress
deliv_period
purchase_price
supstatus
supcity
po-sup
sup-po
prodname
0..N
PRODUCT
1..N
0..N
PURCHASE
ORDER
PONR
podate
PO_LINE
0..N
prodtype
prod- popo
prod
quantity
available
quantity
14
DDL Example
SUPPLIER(SUPNR, SUPNAME, SUPADDRESS, SUPCITY, SUPSTATUS)
PRODUCT(PRODNR, PRODNAME, PRODTYPE,
AVAILABLE_QUANTITY)
SUPPLIES(SUPNR, PRODNR, PURCHASE_PRICE, DELIV_PERIOD)
PURCHASE_ORDER(PONR, PODATE, SUPNR)
PO_LINE(PONR, PRODNR, QUANTITY)
15
DDL Example
CREATE TABLE SUPPLIER
(SUPNR CHAR(4) NOT NULL PRIMARY KEY,
SUPNAME VARCHAR(40) NOT NULL,
SUPADDRESS VARCHAR(50),
SUPCITY VARCHAR(20),
SUPSTATUS SMALLINT)
CREATE TABLE PRODUCT
(PRODNR CHAR(6) NOT NULL PRIMARY KEY,
PRODNAME VARCHAR(60) NOT NULL,
CONSTRAINT UC1 UNIQUE(PRODNAME),
PRODTYPE VARCHAR(10),
CONSTRAINT CC1 CHECK(PRODTYPE IN (‘white’, ‘red’, ‘rose’,’sparkling’)),
AVAILABLE_QUANTITY INTEGER)
16
DDL Example
CREATE TABLE SUPPLIES
(SUPNR CHAR(4) NOT NULL,
PRODNR CHAR(6) NOT NULL,
PURCHASE_PRICE DOUBLE(8,2)
COMMENT ‘PURCHASE_PRICE IN EUR’,
DELIV_PERIOD TIME
COMMENT ‘DELIV_PERIOD IN DAYS’,
PRIMARY KEY (SUPNR, PRODNR),
FOREIGN KEY (SUPNR) REFERENCES SUPPLIER (SUPNR)
ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (PRODNR) REFERENCES PRODUCT (PRODNR)
ON DELETE CASCADE ON UPDATE CASCADE)
17
DDL Example
CREATE TABLE PURCHASE_ORDER
(PONR CHAR(7) NOT NULL PRIMARY KEY,
PODATE DATE,
SUPNR CHAR(4) NOT NULL,
FOREIGN KEY (SUPNR) REFERENCES SUPPLIER (SUPNR)
ON DELETE CASCADE ON UPDATE CASCADE)
CREATE TABLE PO_LINE
(PONR CHAR(7) NOT NULL,
PRODNR CHAR(6) NOT NULL,
QUANTITY INTEGER,
PRIMARY KEY (PONR, PRODNR),
FOREIGN KEY (PONR) REFERENCES PURCHASE_ORDER (PONR)
ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (PRODNR) REFERENCES PRODUCT (PRODNR)
ON DELETE CASCADE ON UPDATE CASCADE)
18
Referential Integrity Constraints
´ Foreign key has the same domain as the primary key it refers to and either
occurs as a value of it or NULL
´ What should happen to foreign keys in case a primary key is updated or
deleted?
´ Options:
´ ON UPDATE/DELETE CASCADE: update/removal should be cascaded to
all referring tuples
´ ON UPDATE/DELETE RESTRICT: update/removal is halted if referring tuples
exist
´ ON UPDATE/DELETE SET NULL: foreign keys in the referring tuples are set
to NULL
´ ON UPDATE/DELETE SET DEFAULT: foreign keys in the referring tuples are
set to their default value
19
Referential Integrity Constraints
Supplier
SUPNR
21
32
37
52
68
69
SUPNAME
Deliwines
Best Wines
Ad Fundum
Spirits & co.
The Wine Depot
Vinos del Mundo
SUPADDRESS
240, Avenue of the Americas
660, Market Street
82, Wacker Drive
928, Strip
132, Montgomery Street
4, Collins Avenue
PRODNR
0178
0185
0468
0795
SUPSTATUS
20
90
95
NULL
10
92
Purchase_Order
Supplies
SUPNR
37
37
37
37
SUPCITY
New York
San Francisco
Chicago
Las Vegas
San Francisco
Miami
PURCHASE_PRICE
16.99
32.99
14.00
20.99
DELIV_PERIOD
4
3
1
3
PONR
1511
1513
1523
1577
1594
PODATE
2015-03-24
2015-04-11
2015-04-19
2015-05-10
2015-05-13
SUPNR
37
37
37
37
37
20
DROP and ALTER Command
´ DROP command can be used to drop or remove
database objects
´Can also be combined with CASCADE and RESTRICT
´ Examples:
DROP SCHEMA PURCHASE CASCADE
DROP SCHEMA PURCHASE RESTRICT
DROP TABLE PRODUCT CASCADE
DROP TABLE PRODUCT RESTRICT
21
DROP and ALTER Command
´ ALTER statement can be used to modify table column
definitions
´ Examples:
ALTER TABLE PRODUCT ADD PRODIMAGE BLOB
ALTER TABLE SUPPLIER ALTER SUPSTATUS SET DEFAULT
’10’
22
DROP and ALTER Command
SUPPLIER
SUPNR SUPNAME
21
Deliwines
32
Best Wines
…
PRODNR
0119
0154
…
SUPADDRESS
240, Avenue of the Americas
660, Market Street
PRODUCT
PRODNAME
Chateau Miraval, Cotes de Provence Rose, 2015
Chateau Haut Brion, 2008
SUPCITY
SUPSTATUS
New York
20
San Francisco 90
PRODTYPE
rose
red
red
AVAILABLE_QUANTITY
126
111
5
SUPPLIES
SUPNR PRODNR PURCHASE_PRICE DELIV_PERIOD
21
0289
17.99
1
21
0327
56.00
6
…
PURCHASE_ORDER
PONR PODATE
1511 2015-03-24
1512 2015-04-10
…
SUPNR
37
94
PO_LINE
PONR PRODNR QUANTITY
1511 0212
2
1511 0345
4
…
23
SQL Data Manipulation Language (SQL
DML)
´ SQL SELECT statement
´ SQL INSERT statement
´ SQL DELETE statement
´ SQL UPDATE statement
24
SQL SELECT Statement
´ Overview
´ Simple queries
´ Queries with aggregate functions
´ Queries with GROUP BY/HAVING
´ Queries with ORDER BY
´ Join queries
´ Nested queries
´ Correlated queries
´ Queries with ALL/ANY
´ Queries with EXISTS
´ Queries with subqueries in FROM/WHERE
´ Queries with set operations
25
Overview
SELECT component
FROM component
[WHERE component]
[GROUP BY component]
[HAVING component]
[ORDER BY component]
26
Overview
´ The result of an SQL SELECT statement is a multiset, and
not a set!
´ In a multiset (aka bag), the elements are not ordered
but there can be duplicates
´ Examples: set {10, 5, 20} and multiset {10, 5, 10, 20, 5, 10}
´ SQL does not eliminate duplicates
´Duplicate elimination is expensive
´User may want to see duplicate tuples
´Duplicates may be considered by aggregate
functions
27
Simple Queries
´ SQL statements that retrieve data from only one table
Q1: SELECT SUPNR, SUPNAME,SUPADDRESS, SUPCITY,
SUPSTATUS FROM SUPPLIER
Q1: SELECT * FROM SUPPLIER
28
Simple Queries
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
SUPSTATUS
21
Deliwines
240, Avenue of the
New York
20
Americas
32
Best Wines
660, Market Street
San Francisco
90
37
Ad Fundum
82, Wacker Drive
Chicago
95
52
Spirits & co.
928, Strip
Las Vegas
NULL
68
The Wine Depot
132, Montgomery Street
San Francisco
10
69
Vinos del Mundo
4, Collins Avenue
Miami
92
29
Simple Queries
´ Q2: SELECT SUPNR, SUPNAME FROM SUPPLIER
SUPNR
SUPNAME
21
Deliwines
32
Best Wines
37
Ad Fundum
52
Spirits & co.
68
The Wine Depot
69
Vinos del Mundo
30
Simple Queries
SUPNR
32
32
Q3: SELECT SUPNR
FROM PURCHASE_ORDER
37
37
37
37
37
68
69
94
31
Simple Queries
Q4: SELECT DISTINCT SUPNR FROM PURCHASE_ORDER
SUPNR
32
37
68
69
94
32
Simple Queries
Q5: SELECT SUPNR, PRODNR, DELIV_PERIOD/30 AS MONTH_DELIV_PERIOD FROM
SUPPLIES
SUPNR
PRODNR
MONTH_DELIV_PERIOD
21
0119
0.0333
21
0178
NULL
21
0289
0.0333
21
0327
0.2000
21
0347
0.0667
21
0384
0.0667
…
…
…
33
Simple Queries
Q6: SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPCITY = ‘San Francisco’
SUPNR
SUPNAME
SUPSTATUS
32
Best Wines
90
68
The Wine Depot
10
34
Simple Queries
Q7: SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPCITY = ‘San Francisco’ AND SUPSTATUS > 80
SUPNR
SUPNAME
SUPSTATUS
32
Best Wines
90
35
Simple Queries
Q8: SELECT SUPNR, SUPNAME, SUPSTATUS
FROM SUPPLIER WHERE SUPSTATUS BETWEEN 70 AND 80
SUPNR
SUPNAME
SUPSTATUS
94
The Wine Crate
75
36
Simple Queries
´ Q9:SELECT PRODNR, PRODNAME
FROM PRODUCT
WHERE PRODTYPE IN (‘WHITE’, ‘SPARKLING’)
PRODNR
PRODNAME
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
0199
Jacques Selosse, Brut Initial, 2012
0212
Billecart-Salmon, Brut Réserve, 2014
0300
Chateau des Rontets, Chardonnay, Birbettes
0494
Veuve-Cliquot, Brut, 2012
0632
Meneghetti, Chardonnay, 2010
…
….
37
Simple Queries
Q10: SELECT PRODNR, PRODNAME
FROM PRODUCT
WHERE PRODNAME LIKE ‘%CHARD%’
PRODNR
PRODNAME
0300
Chateau des Rontets, Chardonnay, Birbettes
0783
Clos D’Opleeuw, Chardonnay, 2012
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
0632
Meneghetti, Chardonnay, 2010
38
Simple Queries
Q11: SELECT SUPNR, SUPNAME
FROM SUPPLIER
WHERE SUPSTATUS IS NULL
SUPNR
SUPNAME
SUPSTATUS
52
Spirits & Co.
NULL
39
Queries with Aggregate Functions
´ Examples: COUNT, SUM, AVG, VARIANCE, MIN/MAX, and STDEV
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
21
0178
NULL
NULL
37
0178
16.99
4
68
0178
17.99
5
69
0178
16.99
NULL
94
0178
18.00
6
…
…
40
Queries with Aggregate Functions
Q12: SELECT COUNT(*)
FROM SUPPLIES
WHERE PRODNR = ‘0178’
Q13: SELECT COUNT(PURCHASE_PRICE)
FROM SUPPLIES
WHERE PRODNR = ‘0178’
Q14: SELECT COUNT(DISTINCT PURCHASE_PRICE)
FROM SUPPLIES
WHERE PRODNR = ‘0178’
5
4
3
41
Queries with Aggregate Functions
Q15: SELECT PRODNR, SUM(QUANTITY) AS
SUM_ORDERS FROM PO_LINE
WHERE PRODNR = ‘0178’
PONR
PRODNR
QUANTITY
1512
0178
3
1538
0178
6
…
…
0178 9
42
Queries with Aggregate Functions
´ Q16: SELECT SUM(QUANTITY) AS
TOTAL_ORDERS FROM PO_LINE
173
Queries with Aggregate Functions
43
Q17: SELECT PRODNR, AVG(PURCHASE_PRICE) AS
WEIGHTED_AVG_PRICE FROM SUPPLIES
WHERE PRODNR = ‘0178’
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
21
0178
NULL
NULL
37
0178
16.99
4
68
0178
17.99
5
69
0178
16.99
NULL
94
0178
18.00
6
…
…
0178, (16.99 + 17.99 + 16.99 + 18.00) / 4 = 17.4925
44
Queries with Aggregate Functions
Q18: SELECT PRODNR, AVG(DISTINCT
PURCHASE_PRICE)AS UNWEIGHTED_AVG_PRICE
FROM SUPPLIES WHERE PRODNR = ‘0178’
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
21
0178
NULL
NULL
37
0178
16.99
4
68
0178
17.99
5
69
0178
16.99
NULL
94
0178
18.00
6
…
…
0178, (16.99 + 17.99 + 18.00) / 3 = 17.66
45
Queries with Aggregate Functions
´ Q19: SELECT PRODNR, VARIANCE(PURCHASE_PRICE)
AS PRICE_VARIANCE FROM SUPPLIES
WHERE PRODNR = ‘0178’
PRODNR
PRICE_VARIANCE
0178
0.25251875000000024
46
Queries with Aggregate Functions
Q20: SELECT PRODNR, MIN(PURCHASE_PRICE) AS LOWEST_PRICE,
MAX(PURCHASE_PRICE) AS HIGHEST_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0178’
PRODNR
LOWEST_PRICE
HIGHEST_PRICE
0178
16.99
18.00
47
Queries with GROUP BY/HAVING
Q21: SELECT PRODNR
FROM PO_LINE
GROUP BY PRODNR
HAVING COUNT(*) >= 3
PONR
PRODNR
QUANTITY
1511
0212
2
1512
0178
3
1513
0668
7
1514
0185
2
1514
0900
2
1523
0900
3
1538
0178
6
1538
0212
15
1560
0900
9
1577
0212
6
1577
0668
9
…
..
…
48
Queries with GROUP BY/HAVING
GROUP BY
PONR
PRODNR
QUANTITY
PONR
PRODNR
QUANTITY
1511
0212
2
1512
0178
3
1577
0212
6
1538
0178
6
1538
0212
15
PONR
PRODNR
QUANTITY
PONR
PRODNR
QUANTITY
1514
0900
2
1514
0185
2
1523
0900
3
1560
0900
9
PRODNR
0212
900
PONR
PRODNR
QUANTITY
1513
0668
7
1577
0668
9
Queries with GROUP BY/HAVING
49
Q22: SELECT PRODNR, SUM(QUANTITY) AS QUANTITY FROM PO_LINE
GROUP BY PRODNR
HAVING SUM(QUANTITY) > 15
GROUP BY
PONR
PRODNR
QUANTITY
1511
0212
2
1577
0212
6
1538
PONR
1514
0212
15
SUM
23
PRODNR
PONR
PRODNR
QUANTITY
1512
0178
3
1538
0178
6
SUM
9
PONR
PRODNR
QUANTITY
1514
0900
2
1523
0900
3
1560
0900
9
SUM
14
QUANTITY
0185
2
SUM
2
PONR
PRODNR
QUANTITY
1513
0668
7
1577
0668
9
SUM
16
PRODNR
QUANTITY
0212
23
0668
16
50
Queries with ORDER BY
´ Q23: SELECT PONR, PODATE, SUPNR
FROM PURCHASE_ORDER
ORDER BY PODATE ASC, SUPNR DESC
PONR
PODATE
SUPNR
1511
2015-03-24
37
1512
2015-04-10
94
1513
2015-04-11
37
1514
2015-04-12
32
…
51
Queries with ORDER BY
Q24: SELECT PRODNR, SUPNR, PURCHASE_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0178’
ORDER BY 3 DESC
PRODNR
SUPNR
PURCHASE_PRICE
0178
94
18.00
0178
68
17.99
0178
37
16.99
0178
69
16.99
0178
21
NULL
52
Join Queries
´ Inner joins
´ Outer joins
53
Inner Joins
SUPPLIER(SUPNR, SUPNAME, …, SUPSTATUS)
SUPPLIES(SUPNR, PRODNR, PURCHASE_PRICE, …)
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
32
Best wines
90
68
The Wine Depot
10
84
Wine Trade Logistics
92
:
:
:
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
32
0474
40.00
1
32
0154
21.00
4
84
0494
15.99
2
:
:
:
SUPSTATUS
54
Inner Joins
Q25: SELECT R.SUPNR, R.SUPNAME, R.SUPSTATUS,
S.SUPNR, S.PRODNR, S.PURCHASE_PRICE
FROM SUPPLIER R, SUPPLIES S
55
Inner Joins
R.SUPNR
R.SUPNAME
R.SUPSTATUS
S.SUPNR
S.PRODNR
S.PURCHASE_PRICE
21
Deliwines
20
21
0119
15.99
32
Best Wines
90
21
0119
15.99
37
Ad Fundum
95
21
0119
15.99
52
Spirits & co.
NULL
21
0119
15.99
32
Best Wines
90
32
0154
21.00
37
Ad Fundum
95
32
0154
21.00
52
Spirits & co.
NULL
32
0154
21.00
69
Vinos del Mundo
92
94
0899
15.00
84
Wine Trade
92
94
0899
15.00
75
94
0899
15.00
…
…
Logistics
94
The Wine Crate
56
Inner Joins
Q26: SELECT R.SUPNR, R.SUPNAME,
R.SUPSTATUS, S.PRODNR,
S.PURCHASE_PRICE
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
57
Inner Joins
R.SUPNR
R.SUPNAME
R.SUPSTATUS
S.SUPNR
S.PRODNR
S.PURCHASE_PRICE
21
Deliwines
20
21
0119
15.99
21
Deliwines
20
21
0178
NULL
21
Deliwines
20
21
0289
17.99
21
Deliwines
20
21
0327
56.00
21
Deliwines
20
21
0347
16.00
21
Deliwines
20
21
0384
55.00
21
Deliwines
20
21
0386
58.99
21
Deliwines
20
21
0468
14.99
21
Deliwines
20
21
0668
6.00
32
Best Wines
90
32
0154
21.00
32
Best Wines
90
32
0474
40.00
32
Best Wines
90
32
0494
15.00
32
Best Wines
90
32
0657
44.99
32
Best Wines
90
32
0760
52.00
…
58
Inner Joins
Q27: SELECT R.SUPNR, R.SUPNAME, R.SUPSTATUS,
S.PRODNR, S.PURCHASE_PRICE
FROM SUPPLIER AS R INNER JOIN SUPPLIES AS S
ON (R.SUPNR = S.SUPNR)
59
Inner Joins
Q28: SELECT R.SUPNR, R.SUPNAME, PO.PONR, PO.PODATE,
P.PRODNR,P.PRODNAME, POL.QUANTITY
FROM SUPPLIER R, PURCHASE_ORDER PO, PO_LINE
POL, PRODUCT P
WHERE (R.SUPNR = PO.SUPNR)
AND (PO.PONR = POL.PONR)
AND (POL.PRODNR = P.PRODNR)
60
Inner Joins
R.SUPNR
R.SUPNAME
PO.PONR
PO.PODATE
P.PRODNR
37
Ad Fundum
1511
2015-03-24
0212
37
Ad Fundum
1511
2015-03-24
0345
37
Ad Fundum
1511
2015-03-24
0783
37
Ad Fundum
1511
2015-03-24
0856
94
The Wine Crate
1512
2015-04-10
0178
…
P.PRODNAME
Billecart-Salmon, Brut Réserve,
2014
Vascosassetti, Brunello di
Montalcino, 2004
Clos D’Opleeuw, Chardonnay,
2012
Domaine Chandon de Briailles,
Savigny-Les-Beaune, 2006
Meerdael, Methode
Traditionnelle Chardonnay, 2014
POL.QUANTITY
2
4
1
9
3
61
Inner Joins
Q29: SELECT R1.SUPNAME, R2.SUPNAME,
R1.SUPCITY
FROM SUPPLIER R1, SUPPLIER R2
WHERE R1.SUPCITY = R2.SUPCITY
AND (R1.SUPNR < R2.SUPNR)
62
Inner Joins
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
SUPSTATUS
21
Deliwines
240, Avenue of the Americas
New York
20
32
Best Wines
660, Market Street
San Francisco
90
37
Ad Fundum
82, Wacker Drive
Chicago
95
52
Spirits & co.
928, Strip
Las Vegas
NULL
68
The Wine Depot
132, Montgomery Street
San Francisco
10
69
Vinos del Mundo
4, Collins Avenue
Miami
92
SUPNAME
SUPNAME
SUPCITY
Best Wines
The Wine Depot
San Francisco
63
Inner Joins
´ Q30: SELECT R.SUPNAME
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = '0899'
Wine Crate
64
Inner Joins
Q31: SELECT DISTINCT R.SUPNAME
FROM SUPPLIER R, SUPPLIES S, PRODUCT P
WHERE S.SUPNR = R.SUPNR
AND S.PRODNR = P.PRODNR
AND P.PRODTYPE = 'ROSE'
SUPNAME
SUPNAME
DeliWines
DeliWines
DeliWines
The Wine Depot
DeliWines
The Wine Depot
65
Inner Joins
Q32: SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
FROM PRODUCT P, PO_LINE POL
WHERE P.PRODNR = POL.PRODNR
GROUP BY P.PRODNR
PRODNR
PRODNAME
SUM(POL.QUANTITY)
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
9
0185
Chateau Petrus, 1975
2
0212
Billecart-Salmon, Brut Réserve, 2014
23
0295
Chateau Pape Clement, Pessac-Léognan, 2001
9
0306
Chateau Coupe Roses, Granaxa, 2011
11
…
66
Outer Joins
´ Outer join can be used when we want to keep all the
tuples of one or both tables in the result of the JOIN,
regardless of whether or not they have matching tuples
in the other table
Outer Joins
67
Q33: SELECT R.SUPNR, R.SUPNAME, R.SUPSTATUS,
S.PRODNR, S.PURCHASE_PRICE
FROM SUPPLIER AS R LEFT OUTER JOIN SUPPLIES AS S
ON (R.SUPNR = S. SUPNR)
SUPNR
SUPNAME
68
SUPNR
PRODNR
PURCHASE_PRICE
DELIV_PERIOD
The Wine Depot
21
0119
15.99
1
21
Deliwines
21
0289
17.99
1
94
The Wine Crate
68
0178
17.99
5
:
:
:
:
..
SUPADDRESS
SUPCITY
SUPSTATUS
68
Outer Joins
SUPNR
SUPNAME
SUPSTATUS
PRODNR
PURCHASE_PRICE
21
Deliwines
20
0119
15.99
21
Deliwines
20
0178
NULL
37
Ad Fundum
95
0795
20.99
52
Spirits & Co.
NULL
NULL
NULL
68
The Wine Depot
10
0178
17.99
…
…
69
Outer Joins
Q34: SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
AS SUM FROM PO_LINE AS POL RIGHT OUTER JOIN PRODUCT AS P
ON (POL.PRODNR = P. PRODNR)
GROUP BY P.PRODNR
P.PRODNR
P.PRODNAME
SUM
0119
Chateau Miraval, Cotes de Provence Rose, 2015
NULL
0154
Chateau Haut Brion, 2008
NULL
0178
Meerdael, Methode Traditionnelle Chardonnay, 2014
9
0185
Chateau Petrus, 1975
2
0199
Jacques Selosse, Brut Initial, 2012
NULL
0212
Billecart-Salmon, Brut Réserve, 2014
23
…
70
Nested Queries
Outer block
Inner block
SELECT ...
FROM
...
WHERE ...
(SELECT
FROM
WHERE
...
...
...
)
71
Nested Queries
Q34: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR =
(SELECT SUPNR
FROM PURCHASE_ORDER
WHERE PONR = '1560')
Best Wines
72
Nested Queries
Q35: SELECT PRODNR, PRODNAME
FROM PRODUCT
WHERE AVAILABLE_QUANTITY >
(SELECT AVAILABLE_QUANTITY
FROM PRODUCT
WHERE PRODNR = ‘0178’)
PRODNR
PRODNAME
0212
Billecart-Salmon, Brut Réserve, 2014
0347
Chateau Corbin-Despagne, Saint-Emilion, 2005
0474
Chateau De La Tour, Clos-Vougeot, Grand cru, 2008
0885
Chateau Margaux, Grand Cru Classé, 1956
0899
Trimbach, Riesling, 1989
73
Nested Queries
Q36: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR =’0178′)
SUPNAME
Deliwines
Ad Fundum
The Wine Depot
Vinos del Mundo
The Wine Crate
74
Nested Queries
Q37: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR IN
(SELECT PRODNR
FROM PRODUCT
WHERE PRODTYPE = ‘ROSE’))
SUPNAME
Deliwines
The Wine Depot
75
Nested Queries
Q38: SELECT PRODNAME
FROM PRODUCT
WHERE PRODNR IN
(SELECT PRODNR
FROM SUPPLIES
WHERE SUPNR = ’32’)
AND PRODNR IN
(SELECT PRODNR
FROM SUPPLIES
WHERE SUPNR = ’84’)
PRODNAME
Veuve-Cliquot, Brut, 2012
Conde de Hervías, Rioja, 2004
76
Correlated Queries
´ Whenever a condition in the WHERE clause of a nested
query references some column of a table declared in
the outer query, the two queries are said to be
correlated
´ The nested query is then evaluated once for each tuple
(or combination of tuples) in the outer query
77
Correlated Queries
Q39: SELECT P.PRODNR
FROM PRODUCT P
WHERE 1 <
(SELECT COUNT(*)
FROM PO_LINE POL
WHERE P.PRODNR = POL.PRODNR)
PONR
PRODNR
PRODNAME
0212
Billecart-Salmon, Brut Réserve, 2014
0289
Chateau Saint Estève de Neri, 2015
…
0154
Chateau Haut Brion, 2008
1538
0295
Chateau Pape Clement, Pessac-Léognan, 2001
…
PRODNR
QUANTITY
0212
2
0212
15
0212
6
…
1511
…
1577
…
78
Correlated Queries
Q40: SELECT R.SUPNR, R.SUPNAME, P.PRODNR, P.PRODNAME,
S1.PURCHASE_PRICE, S1.DELIV_PERIOD
FROM SUPPLIER R, SUPPLIES S1, PRODUCT P
WHERE R.SUPNR = S1.SUPNR
AND S1.PRODNR = P.PRODNR
AND S1.PURCHASE_PRICE <
(SELECT AVG(PURCHASE_PRICE)
FROM SUPPLIES S2
WHERE P.PRODNR = S2.PRODNR)
79
Correlated Queries
SUPPLIES S1
PRODUCT P
PRODNR
PRODNAME
PRODTYPE
0178
Meerdael, Methode
Traditionnelle Chardonnay, 2014
sparkling
0185
Chateau Petrus, 1975
red
..
PRODNR
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
0178
37
16.99
4
0178
68
17.99
5
0178
69
16.99
-
0178
94
18.00
6
0178
21
-
-
0185
37
32.99
3
…
< AVG ?
SUPPLIER R
SUPPLIES S2
PRODNR
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
SUPNR
SUPNAME
SUPADDRESS
0178
37
16.99
4
37
Ad Fundum
…
0178
68
17.99
5
68
The Wine Depot
…
0178
69
16.99
-
84
Wine Trade Logistics
…
0178
94
18.00
6
0178
21
-
-
0185
37
32.99
3
…
…
80
Correlated Queries
Q41: SELECT P1.PRODNR
FROM PRODUCT P1
WHERE 3 >
(SELECT COUNT(*)
FROM PRODUCT P2
WHERE P1.PRODNR < P2.PRODNR)
P1.PRODNR
Result of Inner Query block
< 3?
Output
0119
41
No
No
0154
40
No
No
0178
39
No
No
…
…
…
…
0899
3
No
No
0900
2
Yes
Yes
0915
1
Yes
Yes
0977
0
Yes
Yes
81
Queries with ALL/ANY
´ The comparison condition v > ALL V returns TRUE if the
value v is greater than all the values in the multiset V.
´If the nested query doesn’t return a value, it evaluates
the condition as TRUE.
´ The comparison condition v > ANY V returns TRUE if the
value v is greater than at least one value in the multiset
V.
´If the nested query doesn’t return a value, it evaluates
the whole condition as FALSE.
82
Queries with ALL/ANY
Q42: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR = ‘0668’
AND PURCHASE_PRICE >= ALL
(SELECT PURCHASE_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0668’))
The Wine Depot
68
6.00, 6.99
83
Queries with ALL/ANY
SUPPLIES S1
PRODUCT P
PRODNR
PRODNAME
PRODTYPE
0178
Meerdael, Methode Traditionnelle
Chardonnay, 2014
sparkling
0668
Gallo Family Vineyards, Grenache, 2014
rose
…
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
0668
68
6.99
3
0668
21
6.00
1
0760
32
52.00
3
0760
68
52.99
2
0783
69
7.00
3
…
…
…
…
≥ ALL ?
SUPPLIES S2
SUPPLIER R
SUPNR
SUPNAME
SUPADDRESS
32
Best wines
…
68
The Wine Depot
…
84
Wine Trade Logistics
…
..
PRODNR
PRODNR
SUPNR
PURCHASE_PRICE
DELIV_PERIOD
0668
68
6.99
3
0668
21
6.00
1
0760
32
52.00
3
0760
68
52.99
2
0783
69
7.00
3
…
…
…
…
84
Queries with ALL/ANY
Q43: SELECT R1.SUPNR, R1.SUPNAME, R1.SUPCITY, R1.SUPSTATUS
FROM SUPPLIER R1
WHERE R1.SUPSTATUS >= ALL
(SELECT R2.SUPSTATUS
FROM SUPPLIER R2
WHERE R1.SUPCITY = R2.SUPCITY)
SUPNR
SUPNAME
SUPCITY
SUPSTATUS
21
Deliwines
New York
20
32
Best Wines
San Francisco
90
37
Ad Fundum
Chicago
95
69
Vinos del Mundo
Miami
92
84
Wine Trade Logistics
Washington
92
94
The Wine Crate
Dallas
75
85
Queries with ALL/ANY
Q44: SELECT SUPNAME
FROM SUPPLIER
WHERE SUPNR IN
(SELECT SUPNR
FROM SUPPLIES
WHERE PRODNR = ‘0178’ AND
PURCHASE_PRICE > ANY
(SELECT PURCHASE_PRICE
FROM SUPPLIES
WHERE PRODNR = ‘0178’))
The Wine Depot, The Wine Crate
68,94
NULL, 16.99, 17.99, 16.99, 18.00
86
Queries with EXISTS
´ The EXISTS function checks whether the result of a
correlated nested query is empty or not
´ The result is a Boolean value: TRUE or FALSE
´ EXISTS returns TRUE if there is at least one tuple in the
result of the nested query, or otherwise returns FALSE
´ Vice versa, the NOT EXISTS function returns TRUE if there
are no tuples in the result of the nested query, or
otherwise returns FALSE
87
Queries with EXISTS
Q44: SELECT SUPNAME
FROM SUPPLIER R
WHERE EXISTS
(SELECT *
FROM SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = ‘0178’)
SUPNAME
Deliwines
Ad Fundum
The Wine Depot
Vinos del Mundo
The Wine Crate
88
Queries with EXISTS
Q45: SELECT SUPNAME, SUPADDRESS, SUPCITY
FROM SUPPLIER R
WHERE NOT EXISTS
(SELECT *
FROM PRODUCT P
WHERE NOT EXISTS
(SELECT *
FROM SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND P.PRODNR = S.PRODNR))
89
Queries with Subqueries in
SELECT/FROM
Q46: SELECT P.PRODNR, P.PRODNAME,
(SELECT SUM(QUANTITY) FROM PO_LINE POL
WHERE P.PRODNR = POL.PRODNR) AS TOTALORDERED
FROM PRODUCT P
PRODNR
PRODNAME
TOTALORDERED
0212
Billecart-Salmon, Brut Réserve, 2014
23
0795
Casa Silva, Los Lingues, Carmenere, 2012
3
0915
Champagne Boizel, Brut, Réserve, 2010
13
0523
Chateau Andron Blanquet, Saint Estephe, 1979
NULL
0977
Chateau Batailley, Grand Cru Classé, 1975
11
…
90
Queries with Subqueries in
SELECT/FROM
Q47: SELECT M.PRODNR, M.MINPRICE, M.MAXPRICE FROM
(SELECT PRODNR, MIN(PURCHASE_PRICE) AS MINPRICE,
MAX(PURCHASE_PRICE) AS MAXPRICE
FROM SUPPLIES GROUP BY PRODNR) AS M
WHERE M.MAXPRICE-M.MINPRICE > 1
PRODNR
MINPRICE
MAXPRICE
0178
16.99
18.00
0199
30.99
32.00
0300
19.00
21.00
0347
16.00
18.00
0468
14.00
15.99
91
Queries with Set Operations
´ A = {10, 5, 25, 30, 45}
´ B = {15, 20, 10, 30, 50}
´ A UNION B = {5, 10, 15, 20, 25, 30, 45, 50}
´ A INTERSECT B = {10, 30}
´ A EXCEPT B = {5, 25, 45}
92
Queries with Set Operations
Q48: SELECT SUPNR, SUPNAME
FROM SUPPLIER
WHERE SUPCITY = ‘New York’
UNION
SELECT R.SUPNR, R.SUPNAME
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = ‘0915’
ORDER BY SUPNAME ASC
SUPNR
SUPNAME
21
Deliwines
84
Wine Trade Logistics
93
Queries with Set Operations
Q49: SELECT SUPNR, SUPNAME
FROM SUPPLIER
WHERE SUPCITY = ‘NEW YORK’
INTERSECT
SELECT R.SUPNR, R.SUPNAME
FROM SUPPLIER R, SUPPLIES S
WHERE R.SUPNR = S.SUPNR
AND S.PRODNR = ‘0915’
ORDER BY SUPNAME ASC
NULL
94
Queries with Set Operations
Q50: SELECT SUPNR
FROM SUPPLIER
EXCEPT
SELECT SUPNR
FROM SUPPLIES
52
95
SQL INSERT Statement
INSERT INTO PRODUCT VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’, 6)
INSERT INTO PRODUCT(PRODNR, PRODNAME, PRODTYPE, AVAILABLE_QUANTITY)
VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’, 6)
INSERT INTO PRODUCT(PRODNR, PRODNAME, PRODTYPE) VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’)
96
SQL INSERT Statement
INSERT INTO PRODUCT(PRODNR, PRODNAME, PRODTYPE, AVAILABLE_QUANTITY)
VALUES
(‘980’, ‘Chateau Angelus, Grand Clu Classé, 1960’, ‘red’, 6),
(‘1000’, ‘Domaine de la Vougeraie, Bâtard Montrachet’, Grand cru, 2010’,
‘white’, 2),
(‘1002’, ‘Leeuwin Estate Cabernet Sauvignon 2011’, ‘white’, 20)
INSERT INTO INACTIVE-SUPPLIERS(SUPNR)
SELECT SUPNR
FROM SUPPLIER
EXCEPT
SELECT SUPNR
FROM SUPPLIES
97
SQL DELETE Statement
DELETE FROM PRODUCT
WHERE PRODNR = ‘1000’
DELETE FROM SUPPLIER
WHERE SUPSTATUS IS NULL
DELETE FROM SUPPLIES
WHERE PRODNR IN (SELECT PRODNR
FROM PRODUCT
WHERE PRODNAME LIKE ‘%CHARD%’)
98
SQL DELETE Statement
DELETE FROM SUPPLIER R
WHERE NOT EXISTS
(SELECT PRODNR
FROM SUPPLIES S
WHERE R.SUPNR = S.SUPNR)
DELETE FROM SUPPLIES S1
WHERE S1.PURCHASE_PRICE >
(SELECT 2 * AVG(S2.PURCHASE_PRICE)
FROM SUPPLIES S2
WHERE S1.PRODNR = S2.PRODNR)
DELETE FROM PRODUCT
99
SQL UPDATE Statement
UPDATE PRODUCT
SET AVAILABLE_QUANTITY = 26
WHERE PRODNR = ‘0185’
UPDATE SUPPLIER
SET SUPSTATUS = DEFAULT
UPDATE SUPPLIES
SET DELIV_PERIOD = DELIV_PERIOD+7
WHERE SUPNR IN (SELECT SUPNR
FROM SUPPLIER
WHERE SUPNAME = ‘Deliwines’)
100
SQL UPDATE Statement
UPDATE SUPPLIES S1
SET (PURCHASE_PRICE, DELIV_PERIOD) =
(SELECT MIN(PURCHASE_PRICE), MIN(DELIV_PERIOD)
FROM SUPPLIES S2
WHERE S1.PRODNR = S2.PRODNR)
WHERE SUPNR = ’68’
ALTER TABLE SUPPLIER ADD SUPCATEGORY VARCHAR(10) DEFAULT ‘SILVER’
UPDATE SUPPLIER
SET SUPCATEGORY =
CASE WHEN SUPSTATUS >= 70 AND SUPSTATUS = 90 THEN ‘PLATINUM’
ELSE ‘SILVER’
END
101
SQL UPDATE Statement
SUPNR
SUPNAME
SUPADDRESS
SUPCITY
SUPSTATUS
SUPCATEGORY
21
Deliwines
20, Avenue of the Americas
New York
20
SILVER
32
Best Wines
660, Market Street
San Francisco
90
GOLD
37
Ad Fundum
82, Wacker Drive
Chicago
95
PLATINUM
52
Spirits & co.
928, Strip
Las Vegas
NULL
SILVER
68
The Wine Depot
132, Montgomery Street
San Francisco
10
SILVER
69
Vinos del Mundo
4, Collins Avenue
Miami
92
PLATINUM
84
Wine Trade Logistics
100, Rhode Island Avenue
Washington
92
PLATINUM
94
The Wine Crate
330, McKinney Avenue
Dallas
75
GOLD
102
SQL Views
´ SQL views are part of the external data model
´ A view is defined by means of an SQL query and its
content is generated upon invocation of the view by an
application or other query
´ A view is a virtual table without physical tuples
´ Views allow for logical data independence which
makes them a key component in the three-layer
database architecture
103
SQL Views
CREATE VIEW TOPSUPPLIERS
AS SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPSTATUS > 50
CREATE VIEW TOPSUPPLIERS_SF
AS SELECT * FROM TOPSUPPLIERS
WHERE SUPCITY = ‘San Francisco’
104
SQL Views
CREATE VIEW ORDEROVERVIEW(PRODNR, PRODNAME,
TOTQUANTITY)
AS SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
FROM PRODUCT AS P LEFT OUTER JOIN PO_LINE AS POL
ON (P.PRODNR = POL.PRODNR)
GROUP BY P.PRODNR
105
SQL Views
SELECT * FROM TOPSUPPLIERS_SF
SELECT * FROM ORDEROVERVIEW
WHERE PRODNAME LIKE ‘%CHARD%’
106
SQL Views
´ Query modification: RDBMS modifies queries that query
views into queries on the underlying base tables
´ View materialization: a physical table is created when
the view is first queried
107
SQL Views
´ Some views can be updated
´In this case, the view serves as a window through
which updates are propagated to the underlying
base table(s)
108
SQL Views
CREATE VIEW ORDEROVERVIEW(PRODNR, PRODNAME, TOTQUANTITY)
AS SELECT P.PRODNR, P.PRODNAME, SUM(POL.QUANTITY)
FROM PRODUCT AS P LEFT OUTER JOIN PO_LINE AS POL
ON (P.PRODNR = POL.PRODNR)
GROUP BY P.PRODNR
UPDATE VIEW ORDEROVERVIEW
SET TOTQUANTITY=10
WHERE PRODNR= ‘0154’
ERROR!
109
SQL Views
´ WITH CHECK option checks UPDATE and INSERT statements for conformity with
the view definition
CREATE VIEW TOPSUPPLIERS
AS SELECT SUPNR, SUPNAME FROM SUPPLIER
WHERE SUPSTATUS > 50
WITH CHECK OPTION
UPDATE TOPSUPPLIERS
SET STATUS =20
WHERE SUPNR=’32’
OK!
UPDATE TOPSUPPLIERS
SET STATUS =80
WHERE SUPNR=’32’
NOT OK!
110
SQL Privileges
´ A privilege corresponds to the right to use certain SQL statements such as
SELECT, INSERT, etc. on one or more database objects
Privilege
Explanation
SELECT
Gives retrieval privilege
INSERT
Gives insert privilege
UPDATE
Gives update privilege
DELETE
Gives delete privilege
ALTER
Gives privilege to change the table definition
REFERENCES
Gives the privilege to reference the table when specifying integrity constraints
ALL
Gives all privileges (DBMS specific)
111
SQL Privileges
GRANT SELECT, INSERT, UPDATE, DELETE ON SUPPLIER TO
BBAESENS
GRANT SELECT (PRODNR, PRODNAME) ON PRODUCT TO PUBLIC
REVOKE DELETE ON SUPPLIER FROM BBAESENS
GRANT SELECT, INSERT, UPDATE, DELETE ON PRODUCT TO
WLEMAHIEU WITH GRANT OPTION
GRANT REFERENCES ON SUPPLIER TO SVANDENBROUCKE
112
SQL Privileges
CREATE VIEW SUPPLIERS_NY
AS SELECT SUPNR, SUPNAME FROM SUPPLIERS
WHERE SUPCITY = ‘New York’
GRANT SELECT ON SUPPLIERS_NY TO WLEMAHIEU
113
SQL for Metadata Management
´ The catalog itself can also be implemented as a relational database
114
SQL for Metadata Management
Table(Tablename, …)
Key(Keyname, …)
Primary-Key(PK-Keyname, PK-Tablename, …)
PK-Keyname is a foreign key referring to Keyname in Key
PK-Tablename is a foreign key referring to Tablename in Table
Foreign-Key(FK-Keyname, FK-Tablename, FK-PK-Keyname, Update-rule, Delete-rule, …)
FK-Keyname is a foreign key referring to Keyname in Key
FK-Tablename is a foreign key referring to Tablename in Table
FK-PK-Keyname is a foreign key referring to PK-Keyname in Primary-Key
Column(Columnname, C-Tablename, Data type, Nulls, …)
C-Tablename is a foreign key referring to Tablename in Table
Key-Column(KC-Keyname, KC-Columnname, KC-Tablename, …)
KC-Keyname is a foreign key referring to Keyname in Key
KC-Columnname is a foreign key referring to Columnname in Column
KC-Tablename is a foreign key referring to C-Tablename in Column
115
SELECT *
FROM Column
WHERE TableName = ‘SUPPLIER’
SELECT PK.PK-Keyname, FK.FK-PK-Keyname, FK.FK-Tablename,
FK.Deleterule
FROM Primary-Key PK, Foreign-Key FK
WHERE PK.PK-Tablename = ‘SUPPLIER’
AND PK.PK-Keyname = FK.FK-PK-Keyname
116
Conclusions
´ Relational database management systems and SQL
´ SQL Data Definition Language
´ SQL Data Manipulation Language
´ SQL views
´ SQL indexes
´ SQL privileges
´ SQL for metadata management
The wisdom hierarchy:
representations of the
DIKW hierarchy
Jennifer Rowley
Bangor Business School, University of Wales, Bangor, UK
Received 30 January 2006
Revised 25 May 2006
Abstract.
This paper revisits the data–information–knowledge–wisdom (DIKW) hierarchy by examining the articulation
of the hierarchy in a number of widely read textbooks, and analysing their statements about the nature of data,
information, knowledge, and wisdom. The hierarchy referred to variously as the ‘Knowledge Hierarchy’, the
‘Information Hierarchy’ and the ‘Knowledge Pyramid’ is one of the fundamental, widely recognized and
‘taken-for-granted’ models in the information and knowledge literatures. It is often quoted, or used implicitly,
in definitions of data, information and knowledge in the information management, information systems and
knowledge management literatures, but there has been limited direct discussion of the hierarchy. After revisiting Ackoff’s original articulation of the hierarchy, definitions of data, information, knowledge and wisdom
as articulated in recent textbooks in information systems and knowledge management are reviewed and
assessed, in pursuit of a consensus on definitions and transformation processes. This process brings to the
surface the extent of agreement and dissent in relation to these definitions, and provides a basis for a discussion as to whether these articulations present an adequate distinction between data, information, and
knowledge. Typically information is defined in terms of data, knowledge in terms of information, and wisdom in terms of knowledge, but there is less consensus in the description of the processes that transform elements lower in the hierarchy into those above them, leading to a lack of definitional clarity. In addition, there
is limited reference to wisdom in these texts.
Keywords: DIKW hierarchy; wisdom hierarchy; wisdom; knowledge management; wisdom management
1.
Introduction
The data–information–knowledge–wisdom hierarchy (DIKW), referred to variously as the
‘Knowledge Hierarchy’, the ‘Information Hierarchy’ and the ‘Knowledge Pyramid’ is one of the
fundamental, widely recognized and ‘taken-for-granted’ models in the information and knowl-
Correspondence to: Jennifer Rowley, Bangor Business School, University of Wales, Bangor, Gwynedd, LL57
2DG, UK. E-mail: j.e.rowley@bangor.ac.uk
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
163
J. Rowley
edge literatures. It is often quoted, or used implicitly in definitions of data, information and
knowledge in textbooks in information management, information systems and knowledge management. The hierarchy is used to contextualize data, information, knowledge, and sometimes
wisdom, with respect to one another and to identify and describe the processes involved in the
transformation of an entity at a lower level in the hierarchy (e.g. data) to an entity at a higher level
in the hierarchy (e.g. information). The implicit assumption is that data can be used to create
information; information can be used to create knowledge, and knowledge can be used to create
wisdom. As Ackoff [1], whose paper is often cited when the DIKW hierarchy is quoted, explains,
each of the higher types in the hierarchy ‘includes the categories that fall below it’ (p.3).
The definitional role of the DIKW hierarchy positions it as a central model of information management, information systems and knowledge management. Yet, whilst there has over the years been
significant debate about related issues such as the nature and definition of information, both before
and since Ackoff’s paper, e.g. [2–11], and more recently considerable focus on the definition of
knowledge, e.g. [12–20], there has been:
• little direct discussion of the DIKW hierarchy itself, its meaning and contribution; and
• limited discussion of the nature of wisdom, and even less discussion of the organizational
processes that contribute to the cultivation of wisdom.
The objective of this paper then is to revisit the DIKW hierarchy, by examining the articulation of
the hierarchy in a number of widely read textbooks, and to analyse their statements about the nature
of data, information, knowledge, and wisdom. This paper is a theoretical paper designed to open
debate, promote reflection, and lift the discussion to wisdom from where it is languishing currently at
the level of knowledge. An improved appreciation of the relationships between knowledge and wisdom, as well as the ‘foundational concepts’ of data and information, may provide a context for achieving more convincing success in knowledge management, and more importantly organizational
achievement. This paper does not seek to make a broader theoretical contribution to the philosophical debates about the nature of information or knowledge advanced variously in the literatures of information philosophy and knowledge management. Rather its focus is on popular articulations of the
hierarchy to which students and professionals are exposed. Shenton [21] suggests that the research
subject’s and the professional’s notion of ‘information’ are a critical factor in the study of information
behaviour. The pragmatic approach adopted in this article therefore has equal, although different, relevance for information practice and research as do the more numerous philosophical debates.
The preoccupation with information and knowledge has led to the DIKW hierarchy being called
respectively the information hierarchy or the knowledge hierarchy. Here we refer to the DIKW hierarchy as the wisdom hierarchy, for two reasons:
Fig. 1.
The DIKW hierarchy.
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
164
J. Rowley
• Wisdom is identified as the pinnacle of the hierarchy.
• One of our objectives in revisiting the DIKW hierarchy is to further illuminate the notion of ‘wisdom’.
This article, then, starts with a section that outlines the wider literature that explores the nature
of information and knowledge, and then revisits and summarizes Ackoff’s original articulation of
the hierarchy. Next, definitions of data, information, knowledge and wisdom as articulated in recent
textbooks in information systems and knowledge management are reviewed and assessed, in pursuit of a consensus on definitions and transformation processes. This process brings to the surface
the extent of agreement and dissent in relation to these definitions, and provides a basis for a discussion as to whether these articulations present an adequate distinction between data, information
and knowledge. In addition, there is limited reference to wisdom in these texts. The article concludes with suggestions for further theoretical development.
2.
Theoretical context
The original formulation of the DIKW hierarchy will have been informed by theoretical discussion on
the nature of information and knowledge. Whilst the purpose of this paper is to present an analysis
based on ‘popular’ articulation and definitions of data, information, knowledge and wisdom, and the
relationships between these, it is useful to outline briefly the range of the theoretical debates that
underlie and inform these more popular representations. The theoretical and philosophical discussion
has two major branches: information philosophy, focusing on the nature of information; and knowledge management, which contributes to notions of knowledge. Whilst these fields are distinct they do
share some common foundations, and since some authors argue either that information and knowledge are the same thing, or that they are used interchangeably [22–24], it may be difficult to justify any
discussion of information that does not also explore knowledge and vice versa. Choo suggests that ‘the
knowing organisation represents an information-based view of organisations’ [25, p. 1].
The essential nature of information, since it is fundamental to our existence, has been considered
by many disciplines, including communications theory, library and information science, information
systems, cognitive science, and organization science [5]. This has generated multiple perspectives on
the nature of information. Floridi suggests that ‘Of our mundane and technical concepts information
is currently one of the most important, most widely used and least understood.’ [6, p. 459] He identifies six approaches to the definition of information, respectively: the communication theory approach,
the probabilistic approach, the modal approach, the systemic approach, the inferential approach, and
the semantic approach. Recently, there has been a renewed interest in this area arising out of the formulation of the new discipline of the philosophy of information [6–10]. Information philosophy
focuses on ‘the critical investigation of the conceptual nature and basic principles of information,
including its dynamic (especially computation and information flow)’ [7, p. 555]. Contributors to this
debate agree that the word information has been given different meaning by different writers, and that
consensus on the meaning of the word ‘information’ has not been achieved [26].
Debates about the nature of knowledge are equally longstanding, and have also gathered momentum in recent years with the blossoming of the discipline of knowledge management. Plato [27] first
defined knowledge as ‘justified true belief’ and this concept has been debated over the centuries by
Aristotle [28], Descartes [29], Kant [30], Polanyi [31] and others. Kakabadse et al. [19], drawing on these
debates, suggest that knowledge ‘can be conceived as information put to productive use’. Knowledge
management, like information philosophy, has been influenced by a variety of disciplines, including:
philosophy, cognitive science, social science, management science, information science, knowledge
engineering, artificial intelligence, and economics. Kakabadse et al. [19] propose five different knowledge management perspectives each of which takes a different stance on the nature of knowledge and
knowledge processes: philosophy-based, cognitive, network, community, and quantum.
To conclude and lead into our more specific exploration of the literature on the DIKW hierarchy,
both the information philosophy and knowledge management literatures are longstanding, and offer
multiple perspectives on the definition of information and knowledge. Some contributions also
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
165
J. Rowley
explore the nature of data and wisdom, but much of the discussion is focused on one of the elements
in the DIKW hierarchy rather than on all of the elements, and the relationship between them.
3.
The origins of the wisdom hierarchy
Many authors agree that the first appearance of the hierarchy was in T.S. Eliot’s poem The Rock in
1934 [32]. This poem contains the following lines:
Where is the wisdom that we have lost in knowledge?
Where is the knowledge that we have lost in information?
In more recent literature, authors often cite Ackoff’s 1989 paper as a source for the hierarchy.
Ackoff’s article, entitled From data to wisdom, proposed a hierarchy with the following levels: data,
information, knowledge, understanding and wisdom. Ackoff included understanding in his hierarchy, but more recent commentators have disputed that understanding is a separate level.
Ackoff defines data, information, knowledge, understanding, intelligence and wisdom and
explores the processes associated with the transformation between these elements. Most of these
definitions and processes are described from an information systems perspective, despite Ackoff’s
initial description of the types in the hierarchy as content of the human mind.
Wisdom is located at the top of a hierarchy of types […] Descending from wisdom there are understanding,
knowledge, information, and, at the bottom, data. Each of these includes the categories that fall below it – for
example, there can be no wisdom without understanding and no understanding without knowledge [1, p. 3].
Ackoff offers the following definitions of data, information, knowledge and wisdom, and their
associated transformation processes:
• Data are defined as symbols that represent properties of objects, events and their environment.
They are the products of observation. But are of no use until they are in a useable (i.e. relevant)
form. The difference between data and information is functional, not structural.
• Information is contained in descriptions, answers to questions that begin with such words as
who, what, when and how many. Information systems generate, store, retrieve and process data.
Information is inferred from data.
• Knowledge is know-how, and is what makes possible the transformation of information into
instructions. Knowledge can be obtained either by transmission from another who has it, by
instruction, or by extracting it from experience.
• Intelligence is the ability to increase efficiency.
• Wisdom is the ability to increase effectiveness. Wisdom adds value, which requires the mental
function that we call judgement. The ethical and aesthetic values that this implies are inherent
to the actor and are unique and personal.
Ackoff’s article is not the only early mention of the hierarchy. Cleveland [33] makes an early mention of the hierarchy which is to be found in the information science literature. At around the same
time as Ackoff, Zeleny [34] also discusses the DIKW hierarchy, and proposes an additional level,
enlightenment, at the top of the hierarchy. Zeleny’s model is compared with Ackoff’s in Table 1.
Enlightenment is not only answering or understanding why (which he defines as wisdom), but
going further and attaining the sense of truth, the sense of right and wrong, and having it socially
accepted, respected and sanctioned. Also Cooley [35] builds the DIKW hierarchy during his discussion of tacit knowledge and common sense.
More recently, Bellinger et al. [36] have elaborated further on Ackoff’s exposition, suggesting that
understanding is not a separate level, but rather that understanding supports the transition from
each stage to the next. They suggest that moving from data to information involves ‘understanding
relations’, moving from information to knowledge involves ‘understanding patterns’, and moving
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
166
J. Rowley
Table 1
Comparing Ackoff’s and Zeleny’s definitions of data, information, knowledge and wisdom
Zeleny [34]
Ackoff [1]
Data
Know nothing
Symbols
Information
Know what
Data that are processed to be useful; provides
answers to who, what, where and when questions
Knowledge
Know how
Application of data and information;
answers how questions
Understanding
Appreciation of why
Wisdom
Know why
Evaluated understanding
Enlightenment
Attaining the sense of truth, the sense
of right and wrong, and having it socially
accepted, respected and sanctioned
from knowledge to wisdom involves ‘understanding principles’. The label ‘DIKW hierarchy’, and
the omission of understanding as a separate level in re-iterations of the hierarchy in other sources
suggest that there is something of a consensus and that Bellinger et al. [36] are articulating a shared
view that understanding should not be considered as a separate level.
Amongst the recent information systems and knowledge management texts analysed below only
four actually draw the hierarchy. Chaffey and Wood [37] show the hierarchy in Figure 2, with the
additional axes of meaning and value. Pearlson and Saunders [38] suggest that human input goes
up in the higher levels of the hierarchy, whilst computer input goes down. Jashapara [39] shows a
hierarchy with the levels: data, information, knowledge, wisdom and truth. Choo [25] draws a rather
different diagram focusing on the transformation processes between signals, data, information and
knowledge.
Typically all of these formulations of the hierarchy share a common view that:
• the key elements are data, information, knowledge, and wisdom;
• these key elements are virtually always arranged in the same order, although some models offer
additional stages, such as understanding, or enlightenment;
High
High
Knowledge
Meaning
Information
Value
Data
Low
Fig. 2.
Low
Data, information and knowledge, according to Chaffey and Wood [37].
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
167
J. Rowley
Non-algorithmic
Non-programmable
Wisdom
Knowledge
(actionable Information)
Information
(data “in formation”)
Data
Algorithmic
Fig. 3.
Programmable
Data, information and knowledge, according to Awad and Ghaziri [20].
High
Belief
structuring
Cognitive
structuring
INFORMATION
KNOWLEDGE
Beliefs
Justification
Order/
Structure
Physical
structuring
SIGNALS
Low
Fig. 4.
Meaning
significance
DATA
Sensing
Selecting
Human Agency
High
Data, information and knowledge, according to Choo [25].
• the higher elements in the hierarchy can be explained in terms of the lower elements by identifying an appropriate transformation process; and
• the implicit challenge is to understand and explain how data is transformed into information,
information is transformed into knowledge, and knowledge is transformed into wisdom.
4.
Aims and methodology
This research does not seek to review all writing that presents definitional debates in relation to
data, information, knowledge and wisdom. Rather we examine the popular explicit or implicit articulations of the wisdom hierarchy in a number of recent textbooks in those disciplines at the core of
the knowledge revolution, information systems and knowledge management. This analysis is
intended to ascertain how some of the key authors of recent books that are read by students and others define the terms, and to investigate:
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
168
J. Rowley
• which items in the hierarchy are typically defined by writers of textbooks in information systems
and knowledge management;
• the extent of any consensus on the definition of data, information, knowledge and wisdom;
• the essential nature of these elements, as defined and described in these sources; and
• the transformation processes associated with moving between levels in the hierarchy.
The textbooks were chosen using the following criteria:
• Recently published, preferably published in 2003 or later. This criterion was important to ensure
that all the books were written at a similar time, and had the opportunity to be influenced by current theoretical debates, particularly in the area of knowledge management.
• Published by a major publisher, and therefore having the potential to be widely read, and influential.
• Where possible, books in their second or subsequent edition were chosen, to ensure the selection
of books that were established as authoritative. This criterion proved more difficult to apply to
knowledge management books, since some of the most useful texts in this area have only been
published in their first edition quite recently.
• Availability and convenience.
• Textbooks were preferred to readers or collections of articles because these books were less likely
to offer any definition, and when they did those definitions might not be consistent with one
another; this made for more difficult analysis and might have unbalanced the perspectives in
favour of these sources offering multiple perspectives.
The books in information systems and those in knowledge management were analysed separately
in order to investigate any differences in definitions or emphases rooted possibly in their different
disciplinary perspectives. Textbooks included in the analysis were [20, 25, 37–50].
The textbooks were analysed through the use of their index and scanned for key phrases that
characterized their definition of data, information, knowledge and wisdom. In some instances, these
texts offer succinct and straightforward definitions, perhaps with examples. In others a rather more
sophisticated exposition is offered which embraces concepts from semiotics, pragmatics and semantics, e.g. [47]. In some texts the definitions are clearly labelled as such, whereas in others the definitions which form the core basis of the comments in the next section have been extracted to capture
‘the flavour’ of the definition articulated by the author. Typically, definitions of knowledge tend to
be more elaborate and discursive than those of data and information.
5.
Findings
This section first notes the extent of definition of data, information, knowledge and wisdom in the
textbooks that were studied. It then proceeds to summarize and discuss, in turn, the definitions of
data, information, knowledge and wisdom offered in this literature.
5.1.
Which items are discussed?
Table 2 summarizes the extent of mention of the elements in the wisdom hierarchy. Most books offer
a description of data, information and knowledge that can be regarded as a definition. Two of the
knowledge management textbooks did not define data or information, but did offer a definition of
knowledge. Such books may take the definition of data and information as a given, even though they
define knowledge in relation to information. One of the information systems textbooks did not
define knowledge. In general then most books recognized the importance of defining all three concepts, whether their primary focus was on ‘information’ within information systems, or ‘knowledge’
as in knowledge management. However, concepts above knowledge in the wisdom hierarchy
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
169
J. Rowley
[20] E.M Awad and H.M. Ghaziri, Knowledge Management (Pearson Education International, Upper Saddle
River, NJ, 2004).
[25] C.W. Choo, The Knowing Organization: how Organizations use Information to Construct Meaning,
Create Knowledge, and make Decisions (OUP, Oxford, 2006).
[37] D. Chaffey and S. Wood, Business Information Management: Improving Performance using
Information Systems (FT Prentice Hall, Harlow, 2005).
[38] K.E. Pearlson and C.S Saunders, Managing and using Information Systems: a Strategic Approach
(Wiley, New York, 2004).
[39] A. Jashapara, Knowledge Management: an Integrated Approach (FT Prentice Hall, Harlow, 2005).
[40] L.M. Jessup and J.S. Valacich, Information Systems Today (Prentice Hall, Upper Saddle River, N J, 2003).
[41] P. Bocij, D. Chaffey, A. Greasley, and S. Hickie, Business Information Systems: Technology, Development
and Management for the e-Business 2nd edn (FT Prentice Hall, Harlow, 2003).
[42] T.R. Groff and T.P. Jones, Introduction to Knowledge Management: KM in Business (Butterworth
Heinemann, Amsterdam, 2003).
[43] K.C. Laudon and J.P. Laudon, Management Information Systems: Managing the Digital Firm 9th edn.
(Pearson Prentice Hall, Upper Saddle River, NJ, 2006).
[44] E. Turban, R.K. Rainer, and R.E. Potter, Introduction to Information Technology, 3rd edn (New York,
Wiley, 2005).
[45] D. Boddy, A. Boonstra, and G. Kennedy, Managing Information Systems: an Organizational Perspective,
2nd edn (FT Prentice Hall, Harlow, 2005).
[46] G. Curtis and D. Cobham Business Information Systems: Analysis, Design and Practice, 5th edn (FT
Prentice Hall, Harlow, 2005).
[47] P. Beynon-Davies, Information Systems: an Introduction to Informatics in Organizations (Palgrave,
Basingstoke, 2002).
[48] S. Newell, M. Robertson, H. Scarbrough, and J. Swan, Managing Knowledge Work (Palgrave
Macmillan, Basingstoke, 2002).
[49] S. Barnes, Knowledge Management Systems: Theory and Practice (Thomson Learning, London, 2002).
[50] C. Depres and D. Chauvel, Knowledge Horizons (Butterworth Heinemann, Boston, 2000).
Fig. 5.
Textbooks included in the analysis.
received very little attention. Wisdom was only defined by three books, and other higher levels were
mentioned by two authors.
5.2.
Defining data
Where definitions of data are offered these are typically clearly and succinctly stated, sometimes
with examples. In summary the definitions variously suggest that:
• Data has no meaning or value because it is without context and interpretation [27, 40–42].
• Data are discrete, objective facts or observations, which are unorganized and unprocessed, and
do not convey any specific meaning [20, 37, 38, 41].
• Data items are an elementary and recorded description of things, events, activities and transactions [43–45].
Choo [25] suggests that data are often elements of larger physical systems (such as books, or
instrument panels) which give clues about what data to notice and how they should be read.
Table 2
Extent of definition of data, information, knowledge and wisdom
Information systems textbooks (n = 8)
Knowledge management textbooks (n = 7)
Total (n = 15)
Data
Information
Knowledge
Wisdom
Other (e.g. Truth)
8
5
13
8
5
13
7
6
13
1
2
3
0
2
2
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
170
J. Rowley
Jashapara [39] and Choo [25] also introduce the concept of signals. Jashapara [39] suggests that
we acquire data from the external world through our senses and try to make sense of these signals
through our experience. Choo [25] develops this further and specifically identifies signals as the origin of data, and proposes the processes of sensing and selecting, together described as physical
structuring, as transforming signals into data.
Interestingly, these definitions are largely in terms of what data lacks; data lacks meaning or value,
is unorganized and unprocessed. They lay the foundations for defining information in terms of data.
5.3.
Defining information
Information systems books tend to focus on the relationship between data and information, often
defining information in terms of data. The concepts of format, structure, organization, meaning and
value feature in the various definitions:
• ‘Information is formatted data […(and)] can be defined as a representation of reality’ [40, p. 7].
• ‘Information is data which adds value to the understanding of a subject’ [37, p. 223 based on the
European Framework for Knowledge Management].
• ‘Information is data that have been shaped into a form that is meaningful and useful to human
beings’ [43, p. 13].
• ‘Information is data that have been organized so that they have meaning and value to the recipient’
[44, 45].
• ‘Information is data processed for a purpose’ [46, p. 3].
Bocij et al. [41] concur with the findings that there are a number of definitions of information in
common use, which they suggest are:
• data that have been processed so that they are meaningful;
• data that have been processed for a purpose; and
• data that have been interpreted and understood by the recipient.
Bocij et al. [41] and Curtis and Cobham [46] identify the processes associated with converting
data into information. They agree that these are: classification, rearranging/sorting, aggregating, performing calculations, and selection. They do not discuss whether these processes are performed by
information systems, or people, or both.
Pearlson and Saunders [38] suggest that such processing of data requires a decision about the type
of analysis, and this, in turn, requires an interpretation of the content of the data. To be relevant and
have a purpose, information must be considered within the context where it is received and used.
Boddy et al. [45] point out that the notion of meaning is subjective, and that what one person sees
as valuable information another may see as data with no particular significance. Beynon-Davies
[47], recognizing that the meaning of information is both critical and open to many interpretations,
embarks on an explanation based on semiotics or semiology. He argues that information can be seen
as embodied in signs, and discusses how the elements of semiotics, pragmatics, semantics, syntactics and empirics inform thinking about communication and information.
Five of the knowledge management textbooks also define information, and these definitions also
define information in relation to data. For example:
• ‘Information is data that have been given meaning by way of context’ [42, p. 2].
• ‘Information is an aggregation of data that makes decision making easier’ [20, p. 36].
• ‘Information is data that is endowed with meaning, relevance and purpose’ [39, p. 14].
Jashapara also agrees with Boddy et al. [45] that the human receiver determines whether a message is data or information:
Journal of Information Science, 33 (2) 2007, pp. 163–180 © CILIP, DOI: 10.1177/0165551506070706
171
J. Rowley
It is the receiver of the data that determines whether a message is data or information […] Meaning in data
often occurs through some form of association with experience or relationships with other data [39, p. 16].
Choo [25] calls this process, which assigns meaning and significance to the perceived facts and
messages, ‘cognitive structuring’.
To conclude, in both the information systems textbooks and the knowledge management literature, information is defined in terms of data, and is seen to be organized or structured data. This
processing lends the data relevance for a specific purpose or context, and thereby makes it meaningful, valuable, useful and relevant.
5.4.
Defining knowledge
Definitional statements on knowledge are often much more complex than those for data or information. Indeed a number of the knowledge management texts offer extended definitional discussions on the nature of knowledge, its various representations and manifestations, and philosophical
debates on the nature of knowledge. These debates make it more difficult to distil the essence of the
statements on the nature of knowledge than it is to capture and represent the definitional statements
that relate to data and information. Indeed, as some texts opine:
• ‘Knowledge is an intrinsically ambiguous and equivocal term’ [49, p. 3].
• ‘There is still no consensus on the nature of knowledge, except that it is based on perception that
can provide a rational justification for it’ [39, pp. 16–17].
Six of the information systems books offer definitional statements in relation to knowledge, frequently defining knowledge in terms of data and information. For example:
• ‘Knowledge is the combination of data and information, to which is added expert opinion, skills,
and experience, to result in a valuable asset which can be used to aid decision making’ [37, p. 223,
quoting the European Framework for Knowledge Management].
• ‘Knowledge is data and/or information that have been organized and processed to convey understanding, experience, accumulated learning, and expertise as they apply to a current problem or
activity’ [44, p. 38].
• ‘Knowledge builds on information that is extracted from data […] While data is a…