College of Computing and InformaticsAssignment 2
Deadline: Thursday 1/2/2023 @ 23:59
[Total Mark for this Assignment is 8]
Student Details:
Name: ###
ID: ###
CRN: ###
Instructions:
• You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
Blackboard via the allocated folder. These files must not be in compressed format.
• It is your responsibility to check and make sure that you have uploaded both the correct files.
• Zero mark will be given if you try to bypass the SafeAssign (e.g. misspell words, remove spaces between
words, hide characters, use different character sets, convert text into image or languages other than English
or any kind of manipulation).
• Email submission will not be accepted.
• You are advised to make your work clear and well-presented. This includes filling your information on the cover
page.
• You must use this template, failing which will result in zero mark.
• You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
the question.
• Late submission will result in ZERO mark.
• The work should be your own, copying from students or other resources will result in ZERO mark.
• Use Times New Roman font for all your answers.
Question One
Pg. 01
Learning
Outcome(s):
CLO3: Develop a
comprehensive IT
project plan for
estimation,
scheduling,
communication,
resource
management,
procurement, risk,
and quality.
Question One
2 Marks
As an IT project manager at a tech company in Saudi Arabia. You have been
tasked to develop a cost estimation for building a Cybersecurity testing lab
testing software product prior to launching. The time-frame for building the
Cybersecurity lab is 12 weeks (3 months). The lab should have 30 desktops with
Linux (Kali) as a system. The 31 desktops should have some appropriate
software, such as Forensics Software, Wireshark, etc. The lab should also
include network equipment with network switch, and server. Internet connection
is not allowed due to some exercises that might cause some harm to the
company network. Based on the above information, create the cost estimation
for this lab. Your cost estimation should include any other personal cost
associated with managing this project. Explain your assumptions for preparing
the cost estimation with some explanations as bullet points. Make sure to create
the WBS for the project.
Question Two
Pg. 02
Learning
Outcome(s):
CLO2: State the
best practices in
the IT project
management
processes.
Question Two
2 Marks
Several studies have found that IT projects are affected by some common
sources of risk. Broad categories of risks may be identified by different
organizations according to their own needs. Broad categories of risks may
include: market risk, financial risk, technology risk, people risk, structure /
process risk, state two examples for technology risk?
Question Three
Pg. 03
Learning
Outcome(s):
CLO3: Develop a
comprehensive IT
Question Three
2 Marks
Some organizations use a chart that lists project tasks and milestones
vertically and project members horizontally. Each task is mapped to everyone
involved in the project through samples (R, A, C, I).
project plan for
estimation,
scheduling,
communication,
resource
management,
procurement, risk,
and quality.
For example, in a project there are three tasks and three members.
First, preparing a bill of martials task and prepare estimate task must be the
responsibility of the project team, while the project manager is considered
accountable of these tasks. For the Preparing estimate task, the sponsor must
be informed. For the third task, sending procurement document task is the
responsibility of the project manager, while the project team is accountable of
this task, in the other hand the sponsor indeed must be informed in this task.
1- What is the name of this chart, and what (R, A, C, I) stand for?
2- Based on the above example create the chart in a table
Question Four
Pg. 04
Learning
Outcome(s):
CLO4: Evaluate
IT project team
management, and
IT project
performance.
Question Four
2 Marks
You are the project manager for IT Company and you want to buy new
computers for your staff, explain your expectation using statement of work
(SOW). The following picture will help you to state your answer.
College of Computing and Informatics
Assignment 2
Deadline: Sunday 29/01/2023 @ 23:59
[Total Mark for this Assignment is 8]
Student Details:
Name: ###
ID: ###
CRN: ###
Instructions:
• You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
Blackboard via the allocated folder. These files must not be in compressed format.
• It is your responsibility to check and make sure that you have uploaded both the correct files.
• Zero mark will be given if you try to bypass the SafeAssign (e.g. misspell words, remove spaces between
words, hide characters, use different character sets, convert text into image or languages other than English
or any kind of manipulation).
• Email submission will not be accepted.
• You are advised to make your work clear and well-presented. This includes filling your information on the cover
page.
• You must use this template, failing which will result in zero mark.
• You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
the question.
• Late submission will result in ZERO mark.
• The work should be your own, copying from students or other resources will result in ZERO mark.
• Use Times New Roman font for all your answers.
Question One
Pg. 01
Learning
Outcome(s):
Develop dynamic
web pages using
JavaScript.
Question One
2 Marks
Write a javascript that asks the user to enter two numbers, obtains your name,
student id and the two numbers from the user and outputs text that displays the
name, student id, sum, product, difference, and quotient of the two numbers.
Notes:
1. You must copy and paste the “HTML and javascript” as your answer for
this question. DON’T take screenshots for your HTML and javascript. It
must be editable script.
2. Take a screenshot for your filled form and paste it as a part of your
answer. The name field and student id field should be matched with your
name, and email.
Question Two
Pg. 02
Learning
Outcome(s):
Develop dynamic
web pages using
JavaScript.
Question Two
2 Marks
Write a JavaScript including statements for variable declaration and assignment
that will calculate and print the sum of the integers from 1 to 10. Use the while
statement to loop through the calculation and increment statements. The loop
should terminate when the value of variable x becomes 11.
Notes:
1. You must copy and paste the “HTML and javascript” as your answer for
this question. DON’T take screenshots for your HTML and javascript. It
must be editable script.
2. Take a screenshot for your filled form and paste it as a part of your
answer.
Question Three
Pg. 03
Learning
Outcome(s):
Develop dynamic
web pages using
JavaScript.
Question Three
2 Marks
Write JavaScript statements to sum the values contained in an array named
theArray. The 20-element integer array must first be declared, allocated and
initialized. The summation of the elements of the array must be done with for
and for…in statements.
Notes:
1. You must copy and paste the “HTML and javascript” as your answer for
this question. DON’T take screenshots for your HTML and javascript. It
must be editable script.
2. Take a screenshot for your filled form and paste it as a part of your
answer.
Question Four
Pg. 04
Learning
Outcome(s):
Develop dynamic
web pages using
JavaScript.
Question Four
2 Marks
A. Create a web page in which the user is allowed to select the page’s
background color and whether the page uses serif or sans serif fonts.
Then change the body element’s style attribute accordingly.
Notes:
1. You must copy and paste the “HTML and DOM” as your answer for this
question. DON’T take screenshots for your HTML and DOM. It must be
editable.
2. Take a screenshot for your filled form and paste it as a part of your
answer.
College of Computing and Informatics
Assignment #2
Deadline: Day 29/01/2022 @ 23:59
[Total Mark for this Assignment is 8 Marks]
Student Details:
Name: ###
ID: ###
CRN: ###
Instructions:
• You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
Blackboard via the allocated folder. These files must not be in compressed format.
• It is your responsibility to check and make sure that you have uploaded both the correct files.
• Zero mark will be given if you try to bypass the SafeAssign (e.g. misspell words, remove spaces between
words, hide characters, use different character sets, convert text into image or languages other than English
or any kind of manipulation).
• Email submission will not be accepted.
• You are advised to make your work clear and well-presented. This includes filling your information on the cover
page.
• You must use this template, failing which will result in zero mark.
• You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
the question.
• Late submission will result in ZERO mark.
• The work should be your own, copying from students or other resources will result in ZERO mark.
• Use Times New Roman font for all your answers.
Question One
Pg. 01
Learning
Outcome(s):
Apply the
concepts of
transaction
management,
concurrency and
recovery of a
database.
Question One
2 Marks
what is a recoverable schedule? Why is recoverability of schedules
desirable? Are there any circumstances under which it would be
desirable to allow nonrecoverable schedules? Explain your answer.
Question Two
Pg. 02
Learning
Outcome(s):
Apply the
concepts of
transaction
management,
concurrency and
recovery of a
database.
Question Two
3 Marks
Describe two techniques that could be used to tune database schema.
Support your answer with at least one example for each technique.
Question Three
Pg. 03
Learning
Outcome(s):
Apply the
concepts of
transaction
management,
concurrency and
recovery of a
Question Three
3 Marks
Given the system logs in the table below that shows the sequence of
operations of four transactions T1, T2, T3 and T4.
a) What is the purpose of using checkpoint?
b) In case of recovery using deferred update, which transactions will
be undone? which transactions will be redone? which transactions
will be ignored? In your answer, you need to discuss the status of
all transactions T1, T2, T3 and T4 and to explain why.
database.
System logs
[start transaction, T1]
[write transaction, T1, D, 20]
[commit, T1]
[Checkpoint]
[start transaction, T2]
[write transaction, T2, C, 50]
[write transaction, T2, A, 10]
[commit, T2]
[start transaction, T3]
[write transaction, T3, B, 10]
[start transaction, T4]
[write transaction, T4, D, 10]
system crash
FUNDAMENTALS OF
Database
Systems
SIXTH EDITION
This page intentionally left blank
FUNDAMENTALS OF
Database
Systems
SIXTH EDITION
Ramez Elmasri
Department of Computer Science and Engineering
The University of Texas at Arlington
Shamkant B. Navathe
College of Computing
Georgia Institute of Technology
Addison-Wesley
Boston Columbus Indianapolis New York San Francisco Upper Saddle River
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto
Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
Editor in Chief: Michael Hirsch
Acquisitions Editor: Matt Goldstein
Editorial Assistant: Chelsea Bell
Managing Editor: Jeffrey Holcomb
Senior Production Project Manager: Marilyn Lloyd
Media Producer: Katelyn Boller
Director of Marketing: Margaret Waples
Marketing Coordinator: Kathryn Ferranti
Senior Manufacturing Buyer: Alan Fischer
Senior Media Buyer: Ginny Michaud
Text Designer: Sandra Rigney and Gillian Hall
Cover Designer: Elena Sidorova
Cover Image: Lou Gibbs/Getty Images
Full Service Vendor: Gillian Hall, The Aardvark Group
Copyeditor: Rebecca Greenberg
Proofreader: Holly McLean-Aldis
Indexer: Jack Lewis
Printer/Binder: Courier, Westford
Cover Printer: Lehigh-Phoenix Color/Hagerstown
Credits and acknowledgments borrowed from other sources and reproduced with permission in this textbook appear on appropriate page within text.
The interior of this book was set in Minion and Akzidenz Grotesk.
Copyright © 2011, 2007, 2004, 2000, 1994, and 1989 Pearson Education, Inc., publishing as
Addison-Wesley. All rights reserved. Manufactured in the United States of America. This
publication is protected by Copyright, and permission should be obtained from the publisher
prior to any prohibited reproduction, storage in a retrieval system, or transmission in any
form or by any means, electronic, mechanical, photocopying, recording, or likewise. To
obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 501 Boylston Street, Suite 900, Boston, Massachusetts 02116.
Many of the designations by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and the publisher was
aware of a trademark claim, the designations have been printed in initial caps or all caps.
Library of Congress Cataloging-in-Publication Data
Elmasri, Ramez.
Fundamentals of database systems / Ramez Elmasri, Shamkant B. Navathe.—6th ed.
p. cm.
Includes bibliographical references and index.
ISBN-13: 978-0-136-08620-8
1. Database management. I. Navathe, Sham. II. Title.
Addison-Wesley
is an imprint of
QA76.9.D3E57 2010
005.74—dc22
10 9 8 7 6 5 4 3 2 1—CW—14 13 12 11 10
ISBN 10: 0-136-08620-9
ISBN 13: 978-0-136-08620-8
To Katrina, Thomas, and Dora
(and also to Ficky)
R. E.
To my wife Aruna, mother Vijaya,
and to my entire family
for their love and support
S.B.N.
This page intentionally left blank
Preface
T
his book introduces the fundamental concepts necessary for designing, using, and implementing
database systems and database applications. Our presentation stresses the fundamentals of database modeling and design, the languages and models provided by
the database management systems, and database system implementation techniques. The book is meant to be used as a textbook for a one- or two-semester
course in database systems at the junior, senior, or graduate level, and as a reference
book. Our goal is to provide an in-depth and up-to-date presentation of the most
important aspects of database systems and applications, and related technologies.
We assume that readers are familiar with elementary programming and datastructuring concepts and that they have had some exposure to the basics of computer organization.
New to This Edition
The following key features have been added in the sixth edition:
■ A reorganization of the chapter ordering to allow instructors to start with
projects and laboratory exercises very early in the course
■ The material on SQL, the relational database standard, has been moved early
in the book to Chapters 4 and 5 to allow instructors to focus on this important topic at the beginning of a course
■ The material on object-relational and object-oriented databases has been
updated to conform to the latest SQL and ODMG standards, and consolidated into a single chapter (Chapter 11)
■ The presentation of XML has been expanded and updated, and moved earlier in the book to Chapter 12
■ The chapters on normalization theory have been reorganized so that the first
chapter (Chapter 15) focuses on intuitive normalization concepts, while the
second chapter (Chapter 16) focuses on the formal theories and normalization algorithms
■ The presentation of database security threats has been updated with a discussion on SQL injection attacks and prevention techniques in Chapter 24,
and an overview of label-based security with examples
vii
viii
Preface
■
Our presentation on spatial databases and multimedia databases has been
expanded and updated in Chapter 26
■ A new Chapter 27 on information retrieval techniques has been added,
which discusses models and techniques for retrieval, querying, browsing,
and indexing of information from Web documents; we present the typical
processing steps in an information retrieval system, the evaluation metrics,
and how information retrieval techniques are related to databases and to
Web search
The following are key features of the book:
■ A self-contained, flexible organization that can be tailored to individual
needs
■ A Companion Website (http://www.aw.com/elmasri) includes data to be
loaded into various types of relational databases for more realistic student
laboratory exercises
■ A simple relational algebra and calculus interpreter
■ A collection of supplements, including a robust set of materials for instructors and students, such as PowerPoint slides, figures from the text, and an
instructor’s guide with solutions
Organization of the Sixth Edition
There are significant organizational changes in the sixth edition, as well as improvement to the individual chapters. The book is now divided into eleven parts as
follows:
■ Part 1 (Chapters 1 and 2) includes the introductory chapters
■ The presentation on relational databases and SQL has been moved to Part 2
(Chapters 3 through 6) of the book; Chapter 3 presents the formal relational
model and relational database constraints; the material on SQL (Chapters 4
and 5) is now presented before our presentation on relational algebra and calculus in Chapter 6 to allow instructors to start SQL projects early in a course
if they wish (this reordering is also based on a study that suggests students
master SQL better when it is taught before the formal relational languages)
■ The presentation on entity-relationship modeling and database design is
now in Part 3 (Chapters 7 through 10), but it can still be covered before Part
2 if the focus of a course is on database design
■ Part 4 covers the updated material on object-relational and object-oriented
databases (Chapter 11) and XML (Chapter 12)
■ Part 5 includes the chapters on database programming techniques (Chapter
13) and Web database programming using PHP (Chapter 14, which was
moved earlier in the book)
■ Part 6 (Chapters 15 and 16) are the normalization and design theory chapters
(we moved all the formal aspects of normalization algorithms to Chapter 16)
Preface
■
Part 7 (Chapters 17 and 18) contains the chapters on file organizations,
indexing, and hashing
■ Part 8 includes the chapters on query processing and optimization techniques (Chapter 19) and database tuning (Chapter 20)
■ Part 9 includes Chapter 21 on transaction processing concepts; Chapter 22
on concurrency control; and Chapter 23 on database recovery from failures
■ Part 10 on additional database topics includes Chapter 24 on database security and Chapter 25 on distributed databases
■ Part 11 on advanced database models and applications includes Chapter 26
on advanced data models (active, temporal, spatial, multimedia, and deductive databases); the new Chapter 27 on information retrieval and Web
search; and the chapters on data mining (Chapter 28) and data warehousing
(Chapter 29)
Contents of the Sixth Edition
Part 1 describes the basic introductory concepts necessary for a good understanding
of database models, systems, and languages. Chapters 1 and 2 introduce databases,
typical users, and DBMS concepts, terminology, and architecture.
Part 2 describes the relational data model, the SQL standard, and the formal relational languages. Chapter 3 describes the basic relational model, its integrity constraints, and update operations. Chapter 4 describes some of the basic parts of the
SQL standard for relational databases, including data definition, data modification
operations, and simple SQL queries. Chapter 5 presents more complex SQL queries,
as well as the SQL concepts of triggers, assertions, views, and schema modification.
Chapter 6 describes the operations of the relational algebra and introduces the relational calculus.
Part 3 covers several topics related to conceptual database modeling and database
design. In Chapter 7, the concepts of the Entity-Relationship (ER) model and ER
diagrams are presented and used to illustrate conceptual database design. Chapter 8
focuses on data abstraction and semantic data modeling concepts and shows how
the ER model can be extended to incorporate these ideas, leading to the enhancedER (EER) data model and EER diagrams. The concepts presented in Chapter 8
include subclasses, specialization, generalization, and union types (categories). The
notation for the class diagrams of UML is also introduced in Chapters 7 and 8.
Chapter 9 discusses relational database design using ER- and EER-to-relational
mapping. We end Part 3 with Chapter 10, which presents an overview of the different phases of the database design process in enterprises for medium-sized and large
database applications.
Part 4 covers the object-oriented, object-relational, and XML data models, and their
affiliated languages and standards. Chapter 11 first introduces the concepts for
object databases, and then shows how they have been incorporated into the SQL
standard in order to add object capabilities to relational database systems. It then
ix
x
Preface
covers the ODMG object model standard, and its object definition and query languages. Chapter 12 covers the XML (eXtensible Markup Language) model and languages, and discusses how XML is related to database systems. It presents XML
concepts and languages, and compares the XML model to traditional database
models. We also show how data can be converted between the XML and relational
representations.
Part 5 is on database programming techniques. Chapter 13 covers SQL programming topics, such as embedded SQL, dynamic SQL, ODBC, SQLJ, JDBC, and
SQL/CLI. Chapter 14 introduces Web database programming, using the PHP scripting language in our examples.
Part 6 covers normalization theory. Chapters 15 and 16 cover the formalisms, theories, and algorithms developed for relational database design by normalization. This
material includes functional and other types of dependencies and normal forms of
relations. Step-by-step intuitive normalization is presented in Chapter 15, which
also defines multivalued and join dependencies. Relational design algorithms based
on normalization, along with the theoretical materials that the algorithms are based
on, are presented in Chapter 16.
Part 7 describes the physical file structures and access methods used in database systems. Chapter 17 describes primary methods of organizing files of records on disk,
including static and dynamic hashing. Chapter 18 describes indexing techniques for
files, including B-tree and B+-tree data structures and grid files.
Part 8 focuses on query processing and database performance tuning. Chapter 19
introduces the basics of query processing and optimization, and Chapter 20 discusses physical database design and tuning.
Part 9 discusses transaction processing, concurrency control, and recovery techniques, including discussions of how these concepts are realized in SQL. Chapter 21
introduces the techniques needed for transaction processing systems, and defines
the concepts of recoverability and serializability of schedules. Chapter 22 gives an
overview of the various types of concurrency control protocols, with a focus on
two-phase locking. We also discuss timestamp ordering and optimistic concurrency
control techniques, as well as multiple-granularity locking. Finally, Chapter 23
focuses on database recovery protocols, and gives an overview of the concepts and
techniques that are used in recovery.
Parts 10 and 11 cover a number of advanced topics. Chapter 24 gives an overview of
database security including the discretionary access control model with SQL commands to GRANT and REVOKE privileges, the mandatory access control model
with user categories and polyinstantiation, a discussion of data privacy and its relationship to security, and an overview of SQL injection attacks. Chapter 25 gives an
introduction to distributed databases and discusses the three-tier client/server
architecture. Chapter 26 introduces several enhanced database models for advanced
applications. These include active databases and triggers, as well as temporal, spatial, multimedia, and deductive databases. Chapter 27 is a new chapter on information retrieval techniques, and how they are related to database systems and to Web
Preface
search methods. Chapter 28 on data mining gives an overview of the process of data
mining and knowledge discovery, discusses algorithms for association rule mining,
classification, and clustering, and briefly covers other approaches and commercial
tools. Chapter 29 introduces data warehousing and OLAP concepts.
Appendix A gives a number of alternative diagrammatic notations for displaying a
conceptual ER or EER schema. These may be substituted for the notation we use, if
the instructor prefers. Appendix B gives some important physical parameters of
disks. Appendix C gives an overview of the QBE graphical query language. Appendixes D and E (available on the book’s Companion Website located at
http://www.aw.com/elmasri) cover legacy database systems, based on the hierarchical and network database models. They have been used for more than thirty
years as a basis for many commercial database applications and transactionprocessing systems. We consider it important to expose database management students to these legacy approaches so they can gain a better insight of how database
technology has progressed.
Guidelines for Using This Book
There are many different ways to teach a database course. The chapters in Parts 1
through 7 can be used in an introductory course on database systems in the order
that they are given or in the preferred order of individual instructors. Selected chapters and sections may be left out, and the instructor can add other chapters from the
rest of the book, depending on the emphasis of the course. At the end of the opening section of many of the book’s chapters, we list sections that are candidates for
being left out whenever a less-detailed discussion of the topic is desired. We suggest
covering up to Chapter 15 in an introductory database course and including
selected parts of other chapters, depending on the background of the students and
the desired coverage. For an emphasis on system implementation techniques, chapters from Parts 7, 8, and 9 should replace some of the earlier chapters.
Chapters 7 and 8, which cover conceptual modeling using the ER and EER models,
are important for a good conceptual understanding of databases. However, they
may be partially covered, covered later in a course, or even left out if the emphasis is
on DBMS implementation. Chapters 17 and 18 on file organizations and indexing
may also be covered early, later, or even left out if the emphasis is on database models and languages. For students who have completed a course on file organization,
parts of these chapters can be assigned as reading material or some exercises can be
assigned as a review for these concepts.
If the emphasis of a course is on database design, then the instructor should cover
Chapters 7 and 8 early on, followed by the presentation of relational databases. A
total life-cycle database design and implementation project would cover conceptual
design (Chapters 7 and 8), relational databases (Chapters 3, 4, and 5), data model
mapping (Chapter 9), normalization (Chapter 15), and application programs
implementation with SQL (Chapter 13). Chapter 14 also should be covered if the
emphasis is on Web database programming and applications. Additional documentation on the specific programming languages and RDBMS used would be required.
xi
xii
Preface
The book is written so that it is possible to cover topics in various sequences. The
chapter dependency chart below shows the major dependencies among chapters. As
the diagram illustrates, it is possible to start with several different topics following
the first two introductory chapters. Although the chart may seem complex, it is
important to note that if the chapters are covered in order, the dependencies are not
lost. The chart can be consulted by instructors wishing to use an alternative order of
presentation.
For a one-semester course based on this book, selected chapters can be assigned as
reading material. The book also can be used for a two-semester course sequence.
The first course, Introduction to Database Design and Database Systems, at the sophomore, junior, or senior level, can cover most of Chapters 1 through 15. The second
course, Database Models and Implementation Techniques, at the senior or first-year
graduate level, can cover most of Chapters 16 through 29. The two-semester
sequence can also been designed in various other ways, depending on the preferences of the instructors.
1, 2
Introductory
3
Relational
Model
7, 8
ER, EER
Models
6
Relational
Algebra
9
ER–, EER-toRelational
11, 12
ODB, ORDB,
XML
4, 5
SQL
13, 14
DB, Web
Programming
21, 22, 23
Transactions,
CC, Recovery
26, 27
Advanced
Models,
IR
10
DB Design,
UML
24, 25
Security,
DDB
28, 29
Data Mining,
Warehousing
15, 16
FD, MVD,
Normalization
19, 20
Query Processing,
Optimization,
DB Tuning
17, 18
File Organization,
Indexing
Preface
Supplemental Materials
Support material is available to all users of this book and additional material is
available to qualified instructors.
■ PowerPoint lecture notes and figures are available at the Computer Science
support Website at http://www.aw.com/cssupport.
■ A lab manual for the sixth edition is available through the Companion Website (http://www.aw.com/elmasri). The lab manual contains coverage of
popular data modeling tools, a relational algebra and calculus interpreter,
and examples from the book implemented using two widely available database management systems. Select end-of-chapter laboratory problems in the
book are correlated to the lab manual.
■ A solutions manual is available to qualified instructors. Visit AddisonWesley’s instructor resource center (http://www.aw.com/irc), contact your
local Addison-Wesley sales representative, or e-mail computing@aw.com for
information about how to access the solutions.
Additional Support Material
Gradiance, an online homework and tutorial system that provides additional practice and tests comprehension of important concepts, is available to U.S. adopters of
this book. For more information, please e-mail computing@aw.com or contact your
local Pearson representative.
Acknowledgments
It is a great pleasure to acknowledge the assistance and contributions of many individuals to this effort. First, we would like to thank our editor, Matt Goldstein, for his
guidance, encouragement, and support. We would like to acknowledge the excellent
work of Gillian Hall for production management and Rebecca Greenberg for a
thorough copy editing of the book. We thank the following persons from Pearson
who have contributed to the sixth edition: Jeff Holcomb, Marilyn Lloyd, Margaret
Waples, and Chelsea Bell.
Sham Navathe would like to acknowledge the significant contribution of Saurav
Sahay to Chapter 27. Several current and former students also contributed to various chapters in this edition: Rafi Ahmed, Liora Sahar, Fariborz Farahmand, Nalini
Polavarapu, and Wanxia Xie (former students); and Bharath Rengarajan, Narsi
Srinivasan, Parimala R. Pranesh, Neha Deodhar, Balaji Palanisamy and Hariprasad
Kumar (current students). Discussions with his colleagues Ed Omiecinski and Leo
Mark at Georgia Tech and Venu Dasigi at SPSU, Atlanta have also contributed to the
revision of the material.
We would like to repeat our thanks to those who have reviewed and contributed to
previous editions of Fundamentals of Database Systems.
■ First edition. Alan Apt (editor), Don Batory, Scott Downing, Dennis
Heimbinger, Julia Hodges, Yannis Ioannidis, Jim Larson, Per-Ake Larson,
xiii
xiv
Preface
Dennis McLeod, Rahul Patel, Nicholas Roussopoulos, David Stemple,
Michael Stonebraker, Frank Tompa, and Kyu-Young Whang.
■ Second edition. Dan Joraanstad (editor), Rafi Ahmed, Antonio Albano,
David Beech, Jose Blakeley, Panos Chrysanthis, Suzanne Dietrich, Vic Ghorpadey, Goetz Graefe, Eric Hanson, Junguk L. Kim, Roger King, Vram
Kouramajian, Vijay Kumar, John Lowther, Sanjay Manchanda, Toshimi
Minoura, Inderpal Mumick, Ed Omiecinski, Girish Pathak, Raghu Ramakrishnan, Ed Robertson, Eugene Sheng, David Stotts, Marianne Winslett, and
Stan Zdonick.
■ Third edition. Maite Suarez-Rivas and Katherine Harutunian (editors);
Suzanne Dietrich, Ed Omiecinski, Rafi Ahmed, Francois Bancilhon, Jose
Blakeley, Rick Cattell, Ann Chervenak, David W. Embley, Henry A. Etlinger,
Leonidas Fegaras, Dan Forsyth, Farshad Fotouhi, Michael Franklin, Sreejith
Gopinath, Goetz Craefe, Richard Hull, Sushil Jajodia, Ramesh K. Karne,
Harish Kotbagi, Vijay Kumar, Tarcisio Lima, Ramon A. Mata-Toledo, Jack
McCaw, Dennis McLeod, Rokia Missaoui, Magdi Morsi, M. Narayanaswamy,
Carlos Ordonez, Joan Peckham, Betty Salzberg, Ming-Chien Shan, Junping
Sun, Rajshekhar Sunderraman, Aravindan Veerasamy, and Emilia E.
Villareal.
■ Fourth edition. Maite Suarez-Rivas, Katherine Harutunian, Daniel Rausch,
and Juliet Silveri (editors); Phil Bernhard, Zhengxin Chen, Jan Chomicki,
Hakan Ferhatosmanoglu, Len Fisk, William Hankley, Ali R. Hurson, Vijay
Kumar, Peretz Shoval, Jason T. L. Wang (reviewers); Ed Omiecinski (who
contributed to Chapter 27). Contributors from the University of Texas at
Arlington are Jack Fu, Hyoil Han, Babak Hojabri, Charley Li, Ande Swathi,
and Steven Wu; Contributors from Georgia Tech are Weimin Feng, Dan
Forsythe, Angshuman Guin, Abrar Ul-Haque, Bin Liu, Ying Liu, Wanxia Xie,
and Waigen Yee.
■ Fifth edition. Matt Goldstein and Katherine Harutunian (editors); Michelle
Brown, Gillian Hall, Patty Mahtani, Maite Suarez-Rivas, Bethany Tidd, and
Joyce Cosentino Wells (from Addison-Wesley); Hani Abu-Salem, Jamal R.
Alsabbagh, Ramzi Bualuan, Soon Chung, Sumali Conlon, Hasan Davulcu,
James Geller, Le Gruenwald, Latifur Khan, Herman Lam, Byung S. Lee,
Donald Sanderson, Jamil Saquer, Costas Tsatsoulis, and Jack C. Wileden
(reviewers); Raj Sunderraman (who contributed the laboratory projects);
Salman Azar (who contributed some new exercises); Gaurav Bhatia,
Fariborz Farahmand, Ying Liu, Ed Omiecinski, Nalini Polavarapu, Liora
Sahar, Saurav Sahay, and Wanxia Xie (from Georgia Tech).
Last, but not least, we gratefully acknowledge the support, encouragement, and
patience of our families.
R. E.
S.B.N.
Contents
1
■ part
Introduction to Databases ■
chapter 1 Databases and Database Users
3
1.1 Introduction
4
1.2 An Example
6
1.3 Characteristics of the Database Approach
9
1.4 Actors on the Scene
14
1.5 Workers behind the Scene
16
1.6 Advantages of Using the DBMS Approach
17
1.7 A Brief History of Database Applications
23
1.8 When Not to Use a DBMS
26
1.9 Summary
27
Review Questions
27
Exercises
28
Selected Bibliography
28
chapter 2 Database System Concepts
and Architecture
29
2.1 Data Models, Schemas, and Instances
30
2.2 Three-Schema Architecture and Data Independence
33
2.3 Database Languages and Interfaces
36
2.4 The Database System Environment
40
2.5 Centralized and Client/Server Architectures for DBMSs
44
2.6 Classification of Database Management Systems
49
2.7 Summary
52
Review Questions
53
Exercises
54
Selected Bibliography
55
xv
xvi
Contents
2
■ part
The Relational Data Model and SQL ■
chapter 3 The Relational Data Model and Relational
Database Constraints
59
3.1 Relational Model Concepts
60
3.2 Relational Model Constraints and Relational Database Schemas
3.3 Update Operations, Transactions, and Dealing
with Constraint Violations
75
3.4 Summary
79
Review Questions
80
Exercises
80
Selected Bibliography
85
chapter 4 Basic SQL
67
87
4.1 SQL Data Definition and Data Types
89
4.2 Specifying Constraints in SQL
94
4.3 Basic Retrieval Queries in SQL
97
4.4 INSERT, DELETE, and UPDATE Statements in SQL
4.5 Additional Features of SQL 110
4.6 Summary
111
Review Questions
112
Exercises
112
Selected Bibliography 114
107
chapter 5 More SQL: Complex Queries, Triggers, Views, and
Schema Modification
115
5.1 More Complex SQL Retrieval Queries 115
5.2 Specifying Constraints as Assertions and Actions as Triggers 131
5.3 Views (Virtual Tables) in SQL 133
5.4 Schema Change Statements in SQL 137
5.5 Summary 139
Review Questions 141
Exercises 141
Selected Bibliography 143
Contents
chapter 6 The Relational Algebra and Relational Calculus
145
6.1 Unary Relational Operations: SELECT and PROJECT
147
6.2 Relational Algebra Operations from Set Theory
152
6.3 Binary Relational Operations: JOIN and DIVISION
157
6.4 Additional Relational Operations
165
6.5 Examples of Queries in Relational Algebra
171
6.6 The Tuple Relational Calculus
174
6.7 The Domain Relational Calculus
183
6.8 Summary
185
Review Questions
186
Exercises
187
Laboratory Exercises
192
Selected Bibliography
194
3
■ part
Conceptual Modeling and Database Design ■
chapter 7 Data Modeling Using the
Entity-Relationship (ER) Model
199
7.1 Using High-Level Conceptual Data Models for Database Design
7.2 A Sample Database Application 202
7.3 Entity Types, Entity Sets, Attributes, and Keys
203
7.4 Relationship Types, Relationship Sets, Roles,
and Structural Constraints
212
7.5 Weak Entity Types
219
7.6 Refining the ER Design for the COMPANY Database
220
7.7 ER Diagrams, Naming Conventions, and Design Issues
221
7.8 Example of Other Notation: UML Class Diagrams
226
7.9 Relationship Types of Degree Higher than Two
228
7.10 Summary
232
Review Questions 234
Exercises 234
Laboratory Exercises 241
Selected Bibliography 243
200
xvii
xviii
Contents
chapter 8 The Enhanced Entity-Relationship
(EER) Model
245
8.1 Subclasses, Superclasses, and Inheritance
246
8.2 Specialization and Generalization
248
8.3 Constraints and Characteristics of Specialization
and Generalization Hierarchies
251
8.4 Modeling of UNION Types Using Categories
258
8.5 A Sample UNIVERSITY EER Schema, Design Choices,
and Formal Definitions
260
8.6 Example of Other Notation: Representing Specialization
and Generalization in UML Class Diagrams
265
8.7 Data Abstraction, Knowledge Representation,
and Ontology Concepts
267
8.8 Summary
273
Review Questions
273
Exercises
274
Laboratory Exercises
281
Selected Bibliography
284
chapter 9 Relational Database Design by ERand EER-to-Relational Mapping
285
9.1 Relational Database Design Using ER-to-Relational Mapping
9.2 Mapping EER Model Constructs to Relations
294
9.3 Summary
299
Review Questions
299
Exercises
299
Laboratory Exercises
301
Selected Bibliography
302
chapter 10 Practical Database Design Methodology
and Use of UML Diagrams
303
10.1 The Role of Information Systems in Organizations
10.2 The Database Design and Implementation Process
10.3 Use of UML Diagrams as an Aid to Database
Design Specification
328
10.4 Rational Rose: A UML-Based Design Tool
337
10.5 Automated Database Design Tools
342
304
309
286
Contents
10.6 Summary
345
Review Questions
347
Selected Bibliography
348
4
■ part
Object, Object-Relational, and XML: Concepts, Models,
Languages, and Standards ■
chapter 11 Object and Object-Relational Databases
11.1 Overview of Object Database Concepts
355
11.2 Object-Relational Features: Object Database Extensions
to SQL
369
11.3 The ODMG Object Model and the Object Definition
Language ODL
376
11.4 Object Database Conceptual Design
395
11.5 The Object Query Language OQL
398
11.6 Overview of the C++ Language Binding in the ODMG Standard
11.7 Summary
408
Review Questions
409
Exercises
411
Selected Bibliography 412
chapter 12 XML: Extensible Markup Language
415
12.1 Structured, Semistructured, and Unstructured Data
416
12.2 XML Hierarchical (Tree) Data Model
420
12.3 XML Documents, DTD, and XML Schema
423
12.4 Storing and Extracting XML Documents from Databases
431
12.5 XML Languages
432
12.6 Extracting XML Documents from Relational Databases
436
12.7 Summary
442
Review Questions
442
Exercises
443
Selected Bibliography
443
353
407
xix
xx
Contents
5
■ part
Database Programming Techniques ■
chapter 13 Introduction to SQL Programming
Techniques
447
13.1 Database Programming: Techniques and Issues
448
13.2 Embedded SQL, Dynamic SQL, and SQLJ
451
13.3 Database Programming with Function Calls: SQL/CLI and JDBC
464
13.4 Database Stored Procedures and SQL/PSM
473
13.5 Comparing the Three Approaches
476
13.6 Summary
477
Review Questions
478
Exercises
478
Selected Bibliography
479
chapter 14 Web Database Programming Using PHP
481
14.1 A Simple PHP Example
482
14.2 Overview of Basic Features of PHP
484
14.3 Overview of PHP Database Programming
491
14.4 Summary
496
Review Questions
496
Exercises
497
Selected Bibliography
497
6
■ part
Database Design Theory and Normalization ■
chapter 15 Basics of Functional Dependencies and
Normalization for Relational Databases
15.1 Informal Design Guidelines for Relation Schemas
503
15.2 Functional Dependencies
513
15.3 Normal Forms Based on Primary Keys
516
15.4 General Definitions of Second and Third Normal Forms
525
15.5 Boyce-Codd Normal Form
529
501
Contents
15.6 Multivalued Dependency and Fourth Normal Form
15.7 Join Dependencies and Fifth Normal Form
534
15.8 Summary
535
Review Questions
536
Exercises
537
Laboratory Exercises
542
Selected Bibliography
542
531
chapter 16 Relational Database Design Algorithms
and Further Dependencies
543
16.1 Further Topics in Functional Dependencies: Inference Rules,
Equivalence, and Minimal Cover
545
16.2 Properties of Relational Decompositions
551
16.3 Algorithms for Relational Database Schema Design
557
16.4 About Nulls, Dangling Tuples, and Alternative Relational
Designs
563
16.5 Further Discussion of Multivalued Dependencies and 4NF
567
16.6 Other Dependencies and Normal Forms
571
16.7 Summary
575
Review Questions
576
Exercises
576
Laboratory Exercises
578
Selected Bibliography
579
7
■ part
File Structures, Indexing, and Hashing ■
chapter 17 Disk Storage, Basic File Structures,
and Hashing
583
17.1 Introduction
584
17.2 Secondary Storage Devices
587
17.3 Buffering of Blocks
593
17.4 Placing File Records on Disk
594
17.5 Operations on Files
599
17.6 Files of Unordered Records (Heap Files)
17.7 Files of Ordered Records (Sorted Files)
17.8 Hashing Techniques
606
601
603
xxi
xxii
Contents
17.9 Other Primary File Organizations
616
17.10 Parallelizing Disk Access Using RAID Technology
17.11 New Storage Systems
621
17.12 Summary
624
Review Questions
625
Exercises
626
Selected Bibliography
630
chapter 18 Indexing Structures for Files
617
631
18.1 Types of Single-Level Ordered Indexes
632
18.2 Multilevel Indexes
643
18.3 Dynamic Multilevel Indexes Using B-Trees and B+-Trees
18.4 Indexes on Multiple Keys
660
18.5 Other Types of Indexes
663
18.6 Some General Issues Concerning Indexing
668
18.7 Summary
670
Review Questions
671
Exercises
672
Selected Bibliography
674
646
8
■ part
Query Processing and Optimization,
and Database Tuning ■
chapter 19 Algorithms for Query Processing
and Optimization
679
19.1 Translating SQL Queries into Relational Algebra
681
19.2 Algorithms for External Sorting
682
19.3 Algorithms for SELECT and JOIN Operations
685
19.4 Algorithms for PROJECT and Set Operations
696
19.5 Implementing Aggregate Operations and OUTER JOINs
698
19.6 Combining Operations Using Pipelining
700
19.7 Using Heuristics in Query Optimization
700
19.8 Using Selectivity and Cost Estimates in Query Optimization
710
19.9 Overview of Query Optimization in Oracle
721
19.10 Semantic Query Optimization
722
19.11 Summary
723
Contents
Review Questions
723
Exercises
724
Selected Bibliography
725
chapter 20 Physical Database Design and Tuning
727
20.1 Physical Database Design in Relational Databases
727
20.2 An Overview of Database Tuning in Relational Systems
733
20.3 Summary
739
Review Questions
739
Selected Bibliography
740
9
■ part
Transaction Processing, Concurrency Control,
and Recovery ■
chapter 21 Introduction to Transaction Processing
Concepts and Theory
743
21.1 Introduction to Transaction Processing
744
21.2 Transaction and System Concepts
751
21.3 Desirable Properties of Transactions
754
21.4 Characterizing Schedules Based on Recoverability
21.5 Characterizing Schedules Based on Serializability
21.6 Transaction Support in SQL
770
21.7 Summary
772
Review Questions
772
Exercises
773
Selected Bibliography 775
chapter 22 Concurrency Control Techniques
755
759
777
22.1 Two-Phase Locking Techniques for Concurrency Control
778
22.2 Concurrency Control Based on Timestamp Ordering
788
22.3 Multiversion Concurrency Control Techniques
791
22.4 Validation (Optimistic) Concurrency Control Techniques
794
22.5 Granularity of Data Items and Multiple Granularity Locking
795
22.6 Using Locks for Concurrency Control in Indexes
798
22.7 Other Concurrency Control Issues
800
xxiii
xxiv
Contents
22.8 Summary
802
Review Questions
803
Exercises
804
Selected Bibliography
804
chapter 23 Database Recovery Techniques
807
23.1 Recovery Concepts
808
23.2 NO-UNDO/REDO Recovery Based on Deferred Update
815
23.3 Recovery Techniques Based on Immediate Update
817
23.4 Shadow Paging
820
23.5 The ARIES Recovery Algorithm
821
23.6 Recovery in Multidatabase Systems
825
23.7 Database Backup and Recovery from Catastrophic Failures
826
23.8 Summary
827
Review Questions
828
Exercises
829
Selected Bibliography
832
10
■ part
Additional Database Topics:
Security and Distribution ■
chapter 24 Database Security
835
24.1 Introduction to Database Security Issues
836
24.2 Discretionary Access Control Based on Granting
and Revoking Privileges
842
24.3 Mandatory Access Control and Role-Based Access Control
for Multilevel Security
847
24.4 SQL Injection
855
24.5 Introduction to Statistical Database Security
859
24.6 Introduction to Flow Control
860
24.7 Encryption and Public Key Infrastructures
862
24.8 Privacy Issues and Preservation
866
24.9 Challenges of Database Security
867
24.10 Oracle Label-Based Security
868
24.11 Summary
870
Contents
Review Questions
872
Exercises
873
Selected Bibliography 874
chapter 25 Distributed Databases
877
25.1 Distributed Database Concepts
878
25.2 Types of Distributed Database Systems
883
25.3 Distributed Database Architectures
887
25.4 Data Fragmentation, Replication, and Allocation Techniques for
Distributed Database Design
894
25.5 Query Processing and Optimization in Distributed Databases
901
25.6 Overview of Transaction Management in Distributed Databases
907
25.7 Overview of Concurrency Control and Recovery in Distributed
Databases
909
25.8 Distributed Catalog Management
913
25.9 Current Trends in Distributed Databases
914
25.10 Distributed Databases in Oracle
915
25.11 Summary
919
Review Questions
921
Exercises
922
Selected Bibliography
924
11
■ part
Advanced Database Models, Systems,
and Applications ■
chapter 26 Enhanced Data Models for Advanced
Applications
931
26.1 Active Database Concepts and Triggers
933
26.2 Temporal Database Concepts
943
26.3 Spatial Database Concepts
957
26.4 Multimedia Database Concepts
965
26.5 Introduction to Deductive Databases
970
26.6 Summary
983
Review Questions
985
Exercises
986
Selected Bibliography
989
xxv
xxvi
Contents
chapter 27 Introduction to Information Retrieval
and Web Search
993
27.1 Information Retrieval (IR) Concepts
994
27.2 Retrieval Models
1001
27.3 Types of Queries in IR Systems
1007
27.4 Text Preprocessing
1009
27.5 Inverted Indexing
1012
27.6 Evaluation Measures of Search Relevance
27.7 Web Search and Analysis
1018
27.8 Trends in Information Retrieval
1028
27.9 Summary
1030
Review Questions
1031
Selected Bibliography
1033
chapter 28 Data Mining Concepts
1014
1035
28.1 Overview of Data Mining Technology
1036
28.2 Association Rules
1039
28.3 Classification
1051
28.4 Clustering
1054
28.5 Approaches to Other Data Mining Problems
1057
28.6 Applications of Data Mining
1060
28.7 Commercial Data Mining Tools
1060
28.8 Summary
1063
Review Questions
1063
Exercises
1064
Selected Bibliography
1065
chapter 29 Overview of Data Warehousing
and OLAP
1067
29.1 Introduction, Definitions, and Terminology
1067
29.2 Characteristics of Data Warehouses
1069
29.3 Data Modeling for Data Warehouses
1070
29.4 Building a Data Warehouse
1075
29.5 Typical Functionality of a Data Warehouse
1078
29.6 Data Warehouse versus Views
1079
29.7 Difficulties of Implementing Data Warehouses
1080
Contents
29.8 Summary
1081
Review Questions
1081
Selected Bibliography
1082
appendix A Alternative Diagrammatic Notations
for ER Models
1083
appendix B Parameters of Disks
1087
appendix C Overview of the QBE Language
1091
C.1 Basic Retrievals in QBE
1091
C.2 Grouping, Aggregation, and Database
Modification in QBE
1095
appendix D Overview of the Hierarchical Data Model
(located on the Companion Website at
http://www.aw.com/elmasri)
appendix E Overview of the Network Data Model
(located on the Companion Website at
http://www.aw.com/elmasri)
Selected Bibliography
Index
1133
1099
xxvii
This page intentionally left blank
part
1
Introduction
to Databases
This page intentionally left blank
chapter
1
Databases and
Database Users
D
atabases and database systems are an essential
component of life in modern society: most of us
encounter several activities every day that involve some interaction with a database.
For example, if we go to the bank to deposit or withdraw funds, if we make a hotel
or airline reservation, if we access a computerized library catalog to search for a bibliographic item, or if we purchase something online—such as a book, toy, or computer—chances are that our activities will involve someone or some computer
program accessing a database. Even purchasing items at a supermarket often automatically updates the database that holds the inventory of grocery items.
These interactions are examples of what we may call traditional database applications, in which most of the information that is stored and accessed is either textual
or numeric. In the past few years, advances in technology have led to exciting new
applications of database systems. New media technology has made it possible to
store images, audio clips, and video streams digitally. These types of files are becoming an important component of multimedia databases. Geographic information
systems (GIS) can store and analyze maps, weather data, and satellite images. Data
warehouses and online analytical processing (OLAP) systems are used in many
companies to extract and analyze useful business information from very large databases to support decision making. Real-time and active database technology is
used to control industrial and manufacturing processes. And database search techniques are being applied to the World Wide Web to improve the search for information that is needed by users browsing the Internet.
To understand the fundamentals of database technology, however, we must start
from the basics of traditional database applications. In Section 1.1 we start by defining a database, and then we explain other basic terms. In Section 1.2, we provide a
3
4
Chapter 1 Databases and Database Users
simple UNIVERSITY database example to illustrate our discussion. Section 1.3
describes some of the main characteristics of database systems, and Sections 1.4 and
1.5 categorize the types of personnel whose jobs involve using and interacting with
database systems. Sections 1.6, 1.7, and 1.8 offer a more thorough discussion of the
various capabilities provided by database systems and discuss some typical database
applications. Section 1.9 summarizes the chapter.
The reader who desires a quick introduction to database systems can study Sections
1.1 through 1.5, then skip or browse through Sections 1.6 through 1.8 and go on to
Chapter 2.
1.1 Introduction
Databases and database technology have a major impact on the growing use of
computers. It is fair to say that databases play a critical role in almost all areas where
computers are used, including business, electronic commerce, engineering, medicine, genetics, law, education, and library science. The word database is so commonly used that we must begin by defining what a database is. Our initial definition
is quite general.
A database is a collection of related data.1 By data, we mean known facts that can be
recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addresses of the people you know. You may have recorded this
data in an indexed address book or you may have stored it on a hard drive, using a
personal computer and software such as Microsoft Access or Excel. This collection
of related data with an implicit meaning is a database.
The preceding definition of database is quite general; for example, we may consider
the collection of words that make up this page of text to be related data and hence to
constitute a database. However, the common use of the term database is usually
more restricted. A database has the following implicit properties:
■
A database represents some aspect of the real world, sometimes called the
miniworld or the universe of discourse (UoD). Changes to the miniworld
are reflected in the database.
■ A database is a logically coherent collection of data with some inherent
meaning. A random assortment of data cannot correctly be referred to as a
database.
■ A database is designed, built, and populated with data for a specific purpose.
It has an intended group of users and some preconceived applications in
which these users are interested.
In other words, a database has some source from which data is derived, some degree
of interaction with events in the real world, and an audience that is actively inter1We will use the word data as both singular and plural, as is common in database literature; the context
will determine whether it is singular or plural. In standard English, data is used for plural and datum for
singular.
1.1 Introduction
ested in its contents. The end users of a database may perform business transactions
(for example, a customer buys a camera) or events may happen (for example, an
employee has a baby) that cause the information in the database to change. In order
for a database to be accurate and reliable at all times, it must be a true reflection of
the miniworld that it represents; therefore, changes must be reflected in the database
as soon as possible.
A database can be of any size and complexity. For example, the list of names and
addresses referred to earlier may consist of only a few hundred records, each with a
simple structure. On the other hand, the computerized catalog of a large library
may contain half a million entries organized under different categories—by primary author’s last name, by subject, by book title—with each category organized
alphabetically. A database of even greater size and complexity is maintained by the
Internal Revenue Service (IRS) to monitor tax forms filed by U.S. taxpayers. If we
assume that there are 100 million taxpayers and each taxpayer files an average of five
forms with approximately 400 characters of information per form, we would have a
database of 100 × 106 × 400 × 5 characters (bytes) of information. If the IRS keeps
the past three returns of each taxpayer in addition to the current return, we would
have a database of 8 × 1011 bytes (800 gigabytes). This huge amount of information
must be organized and managed so that users can search for, retrieve, and update
the data as needed.
An example of a large commercial database is Amazon.com. It contains data for
over 20 million books, CDs, videos, DVDs, games, electronics, apparel, and other
items. The database occupies over 2 terabytes (a terabyte is 1012 bytes worth of storage) and is stored on 200 different computers (called servers). About 15 million visitors access Amazon.com each day and use the database to make purchases. The
database is continually updated as new books and other items are added to the
inventory and stock quantities are updated as purchases are transacted. About 100
people are responsible for keeping the Amazon database up-to-date.
A database may be generated and maintained manually or it may be computerized.
For example, a library card catalog is a database that may be created and maintained
manually. A computerized database may be created and maintained either by a
group of application programs written specifically for that task or by a database
management system. We are only concerned with computerized databases in this
book.
A database management system (DBMS) is a collection of programs that enables
users to create and maintain a database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing
databases among various users and applications. Defining a database involves specifying the data types, structures, and constraints of the data to be stored in the database. The database definition or descriptive information is also stored by the DBMS
in the form of a database catalog or dictionary; it is called meta-data. Constructing
the database is the process of storing the data on some storage medium that is controlled by the DBMS. Manipulating a database includes functions such as querying
the database to retrieve specific data, updating the database to reflect changes in the
5
6
Chapter 1 Databases and Database Users
miniworld, and generating reports from the data. Sharing a database allows multiple users and programs to access the database simultaneously.
An application program accesses the database by sending queries or requests for
data to the DBMS. A query2 typically causes some data to be retrieved; a
transaction may cause some data to be read and some data to be written into the
database.
Other important functions provided by the DBMS include protecting the database
and maintaining it over a long period of time. Protection includes system protection
against hardware or software malfunction (or crashes) and security protection
against unauthorized or malicious access. A typical large database may have a life
cycle of many years, so the DBMS must be able to maintain the database system by
allowing the system to evolve as requirements change over time.
It is not absolutely necessary to use general-purpose DBMS software to implement
a computerized database. We could write our own set of programs to create and
maintain the database, in effect creating our own special-purpose DBMS software. In
either case—whether we use a general-purpose DBMS or not—we usually have to
deploy a considerable amount of complex software. In fact, most DBMSs are very
complex software systems.
To complete our initial definitions, we will call the database and DBMS software
together a database system. Figure 1.1 illustrates some of the concepts we have discussed so far.
1.2 An Example
Let us consider a simple example that most readers may be familiar with: a
UNIVERSITY database for maintaining information concerning students, courses,
and grades in a university environment. Figure 1.2 shows the database structure and
a few sample data for such a database. The database is organized as five files, each of
which stores data records of the same type.3 The STUDENT file stores data on each
student, the COURSE file stores data on each course, the SECTION file stores data
on each section of a course, the GRADE_REPORT file stores the grades that students
receive in the various sections they have completed, and the PREREQUISITE file
stores the prerequisites of each course.
To define this database, we must specify the structure of the records of each file by
specifying the different types of data elements to be stored in each record. In Figure
1.2, each STUDENT record includes data to represent the student’s Name,
Student_number, Class (such as freshman or ‘1’, sophomore or ‘2’, and so forth), and
2The term query, originally meaning a question or an inquiry, is loosely used for all types of interactions
with databases, including modifying the data.
3We use the term file informally here. At a conceptual level, a file is a collection of records that may or
may not be ordered.
1.2 An Example
Users/Programmers
Database
System
Application Programs/Queries
DBMS
Software
Software to Process
Queries/Programs
Software to Access
Stored Data
Stored Database
Definition
(Meta-Data)
Stored Database
Figure 1.1
A simplified database
system environment.
Major (such as mathematics or ‘MATH’ and computer science or ‘CS’); each
COURSE record includes data to represent the Course_name, Course_number,
Credit_hours, and Department (the department that offers the course); and so on. We
must also specify a data type for each data element within a record. For example, we
can specify that Name of STUDENT is a string of alphabetic characters,
Student_number of STUDENT is an integer, and Grade of GRADE_REPORT is a single
character from the set {‘A’, ‘B’, ‘C’, ‘D’, ‘F’, ‘I’}. We may also use a coding scheme to represent the values of a data item. For example, in Figure 1.2 we represent the Class of
a STUDENT as 1 for freshman, 2 for sophomore, 3 for junior, 4 for senior, and 5 for
graduate student.
To construct the UNIVERSITY database, we store data to represent each student,
course, section, grade report, and prerequisite as a record in the appropriate file.
Notice that records in the various files may be related. For example, the record for
Smith in the STUDENT file is related to two records in the GRADE_REPORT file that
specify Smith’s grades in two sections. Similarly, each record in the PREREQUISITE
file relates two course records: one representing the course and the other representing the prerequisite. Most medium-size and large databases include many types of
records and have many relationships among the records.
7
8
Chapter 1 Databases and Database Users
STUDENT
Name
Student_number
Class
Major
Smith
17
1
CS
Brown
8
2
CS
COURSE
Course_name
Course_number
Credit_hours
Intro to Computer Science
CS1310
4
Department
CS
Data Structures
CS3320
4
CS
Discrete Mathematics
MATH2410
3
MATH
Database
CS3380
3
CS
SECTION
Section_identifier
Course_number
Semester
Year
Instructor
85
MATH2410
Fall
07
King
92
CS1310
Fall
07
Anderson
102
CS3320
Spring
08
Knuth
112
MATH2410
Fall
08
Chang
119
CS1310
Fall
08
Anderson
135
CS3380
Fall
08
Stone
GRADE_REPORT
Student_number
Section_identifier
Grade
17
112
B
17
119
C
8
85
A
8
92
A
8
102
B
8
135
A
PREREQUISITE
Course_number
Figure 1.2
A database that stores
student and course
information.
Prerequisite_number
CS3380
CS3320
CS3380
MATH2410
CS3320
CS1310
1.3 Characteristics of the Database Approach
Database manipulation involves querying and updating. Examples of queries are as
follows:
■
Retrieve the transcript—a list of all courses and grades—of ‘Smith’
List the names of students who took the section of the ‘Database’ course
offered in fall 2008 and their grades in that section
■ List the prerequisites of the ‘Database’ course
■
Examples of updates include the following:
■
Change the class of ‘Smith’ to sophomore
■ Create a new section for the ‘Database’ course for this semester
■ Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester
These informal queries and updates must be specified precisely in the query language of the DBMS before they can be processed.
At this stage, it is useful to describe the database as a part of a larger undertaking
known as an information system within any organization. The Information
Technology (IT) department within a company designs and maintains an information system consisting of various computers, storage systems, application software,
and databases. Design of a new application for an existing database or design of a
brand new database starts off with a phase called requirements specification and
analysis. These requirements are documented in detail and transformed into a
conceptual design that can be represented and manipulated using some computerized tools so that it can be easily maintained, modified, and transformed into a database implementation. (We will introduce a model called the Entity-Relationship
model in Chapter 7 that is used for this purpose.) The design is then translated to a
logical design that can be expressed in a data model implemented in a commercial
DBMS. (In this book we will emphasize a data model known as the Relational Data
Model from Chapter 3 onward. This is currently the most popular approach for
designing and implementing databases using relational DBMSs.) The final stage is
physical design, during which further specifications are provided for storing and
accessing the database. The database design is implemented, populated with actual
data, and continuously maintained to reflect the state of the miniworld.
1.3 Characteristics of the Database Approach
A number of characteristics distinguish the database approach from the much older
approach of programming with files. In traditional file processing, each user
defines and implements the files needed for a specific software application as part of
programming the application. For example, one user, the grade reporting office, may
keep files on students and their grades. Programs to print a student’s transcript and
to enter new grades are implemented as part of the application. A second user, the
accounting office, may keep track of students’ fees and their payments. Although
both users are interested in data about students, each user maintains separate files—
and programs to manipulate these files—because each requires some data not avail-
9
10
Chapter 1 Databases and Database Users
able from the other user’s files. This redundancy in defining and storing data results
in wasted storage space and in redundant efforts to maintain common up-to-date
data.
In the database approach, a single repository maintains data that is defined once
and then accessed by various users. In file systems, each application is free to name
data elements independently. In contrast, in a database, the names or labels of data
are defined once, and used repeatedly by queries, transactions, and applications.
The main characteristics of the database approach versus the file-processing
approach are the following:
■
Self-describing nature of a database system
■ Insulation between programs and data, and data abstraction
■ Support of multiple views of the data
■ Sharing of data and multiuser transaction processing
We describe each of these characteristics in a separate section. We will discuss additional characteristics of database systems in Sections 1.6 through 1.8.
1.3.1 Self-Describing Nature of a Database System
A fundamental characteristic of the database approach is that the database system
contains not only the database itself but also a complete definition or description of
the database structure and constraints. This definition is stored in the DBMS catalog, which contains information such as the structure of each file, the type and storage format of each data item, and various constraints on the data. The information
stored in the catalog is called meta-data, and it describes the structure of the primary database (Figure 1.1).
The catalog is used by the DBMS software and also by database users who need
information about the database structure. A general-purpose DBMS software package is not written for a specific database application. Therefore, it must refer to the
catalog to know the structure of the files in a specific database, such as the type and
format of data it will access. The DBMS software must work equally well with any
number of database applications—for example, a university database, a banking
database, or a company database—as long as the database definition is stored in the
catalog.
In traditional file processing, data definition is typically part of the application programs themselves. Hence, these programs are constrained to work with only one
specific database, whose structure is declared in the application programs. For
example, an application program written in C++ may have struct or class declarations, and a COBOL program has data division statements to define its files.
Whereas file-processing software can access only specific databases, DBMS software
can access diverse databases by extracting the database definitions from the catalog
and using these definitions.
For the example shown in Figure 1.2, the DBMS catalog will store the definitions of
all the files shown. Figure 1.3 shows some sample entries in a database catalog.
1.3 Characteristics of the Database Approach
11
These definitions are specified by the database designer prior to creating the actual
database and are stored in the catalog. Whenever a request is made to access, say, the
Name of a STUDENT record, the DBMS software refers to the catalog to determine
the structure of the STUDENT file and the position and size of the Name data item
within a STUDENT record. By contrast, in a typical file-processing application, the
file structure and, in the extreme case, the exact location of Name within a STUDENT
record are already coded within each program that accesses this data item.
1.3.2 Insulation between Programs and Data,
and Data Abstraction
In traditional file processing, the structure of data files is embedded in the application programs, so any changes to the structure of a file may require changing all programs that access that file. By contrast, DBMS access programs do not require such
changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data independence.
RELATIONS
Relation_name
Figure 1.3
An example of a database
catalog for the database
in Figure 1.2.
No_of_columns
STUDENT
4
COURSE
4
SECTION
5
GRADE_REPORT
3
PREREQUISITE
2
COLUMNS
Column_name
Data_type
Belongs_to_relation
Name
Character (30)
STUDENT
Student_number
Character (4)
STUDENT
Class
Integer (1)
STUDENT
Major
Major_type
STUDENT
Course_name
Character (10)
COURSE
Course_number
XXXXNNNN
COURSE
….
….
…..
….
….
…..
….
….
…..
Prerequisite_number
XXXXNNNN
PREREQUISITE
Note: Major_type is defined as an enumerated type with all known majors.
XXXXNNNN is used to define a type with four alpha characters followed by four digits.
12
Chapter 1 Databases and Database Users
For example, a file access program may be written in such a way that it can access
only STUDENT records of the structure shown in Figure 1.4. If we want to add
another piece of data to each STUDENT record, say the Birth_date, such a program
will no longer work and must be changed. By contrast, in a DBMS environment, we
only need to change the description of STUDENT records in the catalog (Figure 1.3)
to reflect the inclusion of the new data item Birth_date; no programs are changed.
The next time a DBMS program refers to the catalog, the new structure of STUDENT
records will be accessed and used.
In some types of database systems, such as object-oriented and object-relational
systems (see Chapter 11), users can define operations on data as part of the database
definitions. An operation (also called a function or method) is specified in two parts.
The interface (or signature) of an operation includes the operation name and the
data types of its arguments (or parameters). The implementation (or method) of the
operation is specified separately and can be changed without affecting the interface.
User application programs can operate on the data by invoking these operations
through their names and arguments, regardless of how the operations are implemented. This may be termed program-operation independence.
The characteristic that allows program-data independence and program-operation
independence is called data abstraction. A DBMS provides users with a conceptual
representation of data that does not include many of the details of how the data is
stored or how the operations are implemented. Informally, a data model is a type of
data abstraction that is used to provide this conceptual representation. The data
model uses logical concepts, such as objects, their properties, and their interrelationships, that may be easier for most users to understand than computer storage
concepts. Hence, the data model hides storage and implementation details that are
not of interest to most database users.
For example, reconsider Figures 1.2 and 1.3. The internal implementation of a file
may be defined by its record length—the number of characters (bytes) in each
record—and each data item may be specified by its starting byte within a record and
its length in bytes. The STUDENT record would thus be represented as shown in
Figure 1.4. But a typical database user is not concerned with the location of each
data item within a record or its length; rather, the user is concerned that when a reference is made to Name of STUDENT, the correct value is returned. A conceptual representation of the STUDENT records is shown in Figure 1.2. Many other details of file
storage organization—such as the access paths specified on a file—can be hidden
from database users by the DBMS; we discuss storage details in Chapters 17 and 18.
Data Item Name
Starting Position in Record
Length in Characters (bytes)
Name
1
30
Student_number
31
4
Class
35
1
Major
36
4
Figure 1.4
Internal storage format
for a STUDENT
record, based on the
database catalog in
Figure 1.3.
1.3 Characteristics of the Database Approach
In the database approach, the detailed structure and organization of each file are
stored in the catalog. Database users and application programs refer to the conceptual representation of the files, and the DBMS extracts the details of file storage
from the catalog when these are needed by the DBMS file access modules. Many
data models can be used to provide this data abstraction to database users. A major
part of this book is devoted to presenting various data models and the concepts they
use to abstract the representation of data.
In object-oriented and object-relational databases, the abstraction process includes
not only the data structure but also the operations on the data. These operations
provide an abstraction of miniworld activities commonly understood by the users.
For example, an operation CALCULATE_GPA can be applied to a STUDENT object to
calculate the grade point average. Such operations can be invoked by the user
queries or application programs without having to know the details of how the
operations are implemented. In that sense, an abstraction of the miniworld activity
is made available to the user as an abstract operation.
1.3.3 Support of Multiple Views of the Data
A database typically has many users, each of whom may require a different perspective or view of the database. A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored.
Some users may not need to be aware of whether the data they refer to is stored or
derived. A multiuser DBMS whose users have a variety of distinct applications must
provide facilities for defining multiple views. For example, one user of the database
of Figure 1.2 may be interested only in accessing and printing the transcript of each
student; the view for this user is shown in Figure 1.5(a). A second user, who is interested only in checking that students have taken all the prerequisites of each course
for which they register, may require the view shown in Figure 1.5(b).
1.3.4 Sharing of Data and Multiuser Transaction Processing
A multiuser DBMS, as its name implies, must allow multiple users to access the database at the same time. This is essential if data for multiple applications is to be integrated and maintained in a single database. The DBMS must include concurrency
control software to ensure that several users trying to update the same data do so in
a controlled manner so that the result of the updates is correct. For example, when
several reservation agents try to assign a seat on an airline flight, the DBMS should
ensure that each seat can be accessed by only one agent at a time for assignment to a
passenger. These types of applications are generally called online transaction processing (OLTP) applications. A fundamental role of multiuser DBMS software is to
ensure that concurrent transactions operate correctly and efficiently.
The concept of a transaction has become central to many database applications. A
transaction is an executing program or process that includes one or more database
accesses, such as reading or updating of database records. Each transaction is supposed to execute a logically correct database access if executed in its entirety without
interference from other transactions. The DBMS must enforce several transaction
13
14
Chapter 1 Databases and Database Users
TRANSCRIPT
Student_name
Smith
Brown
(a)
Student_transcript
Course_number
Grade
Semester
Year
Section_id
CS1310
C
Fall
08
119
MATH2410
B
Fall
08
112
MATH2410
A
Fall
07
85
CS1310
A
Fall
07
92
CS3320
B
Spring
08
102
CS3380
A
Fall
08
135
COURSE_PREREQUISITES
Course_name
(b)
Course_number
Database
CS3380
Data Structures
CS3320
Prerequisites
CS3320
MATH2410
CS1310
Figure 1.5
Two views derived from the database in Figure 1.2. (a) The TRANSCRIPT view.
(b) The COURSE_PREREQUISITES view.
properties. The isolation property ensures that each transaction appears to execute
in isolation from other transactions, even though hundreds of transactions may be
executing concurrently. The atomicity property ensures that either all the database
operations in a transaction are executed or none are. We discuss transactions in
detail in Part 9.
The preceding characteristics are important in distinguishing a DBMS from traditional file-processing software. In Section 1.6 we discuss additional features that
characterize a DBMS. First, however, we categorize the different types of people who
work in a database system environment.
1.4 Actors on the Scene
For a small personal database, such as the list of addresses discussed in Section 1.1,
one person typically defines, constructs, and manipulates the database, and there is
no sharing. However, in large organizations, many people are involved in the design,
use, and maintenance of a large database with hundreds of users. In this section we
identify the people whose jobs involve the day-to-day use of a large database; we call
them the actors on the scene. In Section 1.5 we consider people who may be called
workers behind the scene—those who work to maintain the database system environment but who are not actively interested in the database contents as part of their
daily job.
1.4 Actors on the Scene
1.4.1 Database Administrators
In any organization where many people use the same resources, there is a need for a
chief administrator to oversee and manage these resources. In a database environment, the primary resource is the database itself, and the secondary resource is the
DBMS and related software. Administering these resources is the responsibility of
the database administrator (DBA). The DBA is responsible for authorizing access
to the database, coordinating and monitoring its use, and acquiring software and
hardware resources as needed. The DBA is accountable for problems such as security breaches and poor system response time. In large organizations, the DBA is
assisted by a staff that carries out these functions.
1.4.2 Database Designers
Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These
tasks are mostly undertaken before the database is actually implemented and populated with data. It is the responsibility of database designers to communicate with
all prospective database users in order to understand their requirements and to create a design that meets these requirements. In many cases, the designers are on the
staff of the DBA and may be assigned other staff responsibilities after the database
design is completed. Database designers typically interact with each potential group
of users and develop views of the database that meet the data and processing
requirements of these groups. Each view is then analyzed and integrated with the
views of other user groups. The final database design must be capable of supporting
the requirements of all user groups.
1.4.3 End Users
End users are the people whose jobs require access to the database for querying,
updating, and generating reports; the database primarily exists for their use. There
are several categories of end users:
■
Casual end users occasionally access the database, but they may need different information each time. They use a sophisticated database query language
to specify their requests and are typically middle- or high-level managers or
other occasional browsers.
■ Naive or parametric end users make up a sizable portion of database end
users. Their main job function revolves around constantly querying and
updating the database, using standard types of queries and updates—called
canned transactions—that have been carefully programmed and tested. The
tasks that such users perform are varied:
Bank tellers check account balances and post withdrawals and deposits.
Reservation agents for airlines, hotels, and car rental companies check
availability for a given request and make reservations.
15
16
Chapter 1 Databases and Database Users
Employees at receiving stations for shipping companies enter package
identifications via bar codes and descriptive information through buttons
to update a central database of received and in-transit packages.
■ Sophisticated end users include engineers, scientists, business analysts, and
others who thoroughly familiarize themselves with the facilities of the
DBMS in order to implement their own applications to meet their complex
requirements.
■ Standalone users maintain personal databases by using ready-made program packages that provide easy-to-use menu-based or graphics-based
interfaces. An example is the user of a tax package that stores a variety of personal financial data for tax purposes.
A typical DBMS provides multiple facilities to access a database. Naive end users
need to learn very little about the facilities provided by the DBMS; they simply have
to understand the user interfaces of the standard transactions designed and implemented for their use. Casual users learn only a few facilities that they may use
repeatedly. Sophisticated users try to learn most of the DBMS facilities in order to
achieve their complex requirements. Standalone users typically become very proficient in using a specific software package.
1.4.4 System Analysts and Application Programmers
(Software Engineers)
System analysts determine the requirements of end users, especially naive and
parametric end users, and develop specifications for standard canned transactions
that meet these requirements. Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned
transactions. Such analysts and programmers—commonly referred to as software
developers or software engineers—should be familiar with the full range of
capabilities provided by the DBMS to accomplish their tasks.
1.5 Workers behind the Scene
In addition to those who design, use, and administer a database, others are associated with the design, development, and operation of the DBMS software and system
environment. These persons are typically not interested in the database content
itself. We call them the workers behind the scene, and they include the following categories:
■
DBMS system designers and implementers design and implement the
DBMS modules and interfaces as a software package. A DBMS is a very complex software system that consists of many components, or modules, including modules for implementing the catalog, query language processing,
interface processing, accessing and buffering data, controlling concurrency,
and handling data recovery and security. The DBMS must interface with
other system software such as the operating system and compilers for various programming languages.
1.6 Advantages of Using the DBMS Approach
■
Tool developers design and implement tools—the software packages that
facilitate database modeling and design, database system design, and
improved performance. Tools are optional packages that are often purchased
separately. They include packages for database design, performance monitoring, natural language or graphical interfaces, prototyping, simulation,
and test data generation. In many cases, independent software vendors
develop and market these tools.
■ Operators and maintenance personnel (system administration personnel)
are responsible for the actual running and maintenance of the hardware and
software environment for the database system.
Although these categories of workers behind the scene are instrumental in making
the database system available to end users, they typically do not use the database
contents for their own purposes.
1.6 Advantages of Using the DBMS Approach
In this section we discuss some of the advantages of using a DBMS and the capabilities that a good DBMS should possess. These capabilities are in addition to the four
main characteristics discussed in Section 1.3. The DBA must utilize these capabilities to accomplish a variety of objectives related to the design, administration, and
use of a large multiuser database.
1.6.1 Controlling Redundancy
In traditional software development utilizing file processing, every user group
maintains its own files for handling its data-processing applications. For example,
consider the UNIVERSITY database example of Section 1.2; here, two groups of users
might be the course registration personnel and the accounting office. In the traditional approach, each group independently keeps files on students. The accounting
office keeps data on registration and related billing information, whereas the registration office keeps track of student courses and grades. Other groups may further
duplicate some or all of the same data in their own files.
This redundancy in storing the same data multiple times leads to several problems.
First, there is the need to perform a single logical update—such as entering data on
a new student—multiple times: once for each file where student data is recorded.
This leads to duplication of effort. Second, storage space is wasted when the same data
is stored repeatedly, and this problem may be serious for large databases. Third, files
that represent the same data may become inconsistent. This may happen because an
update is applied to some of the files but not to others. Even if an update—such as
adding a new student—is applied to all the appropriate files, the data concerning
the student may still be inconsistent because the updates are applied independently
by each user group. For example, one user group may enter a student’s birth date
erroneously as ‘JAN-19-1988’, whereas the other user groups may enter the correct
value of ‘JAN-29-1988’.
17
18
Chapter 1 Databases and Database Users
In the database approach, the views of different user groups are integrated during
database design. Ideally, we should have a database design that stores each logical
data item—such as a student’s name or birth date—in only one place in the database.
This is known as data normalization, and it ensures consistency and saves storage
space (data normalization is described in Part 6 of the book). However, in practice, it
is sometimes necessary to use controlled redundancy to improve the performance
of queries. For example, we may store Student_name and Course_number redundantly
in a GRADE_REPORT file (Figure 1.6(a)) because whenever we retrieve a
GRADE_REPORT record, we want to retrieve the student name and course number
along with the grade, student number, and section identifier. By placing all the data
together, we do not have to search multiple files to collect this data. This is known as
denormalization. In such cases, the DBMS should have the capability to control this
redundancy in order to prohibit inconsistencies among the files. This may be done by
automatically checking that the Student_name–Student_number values in any
GRADE_REPORT record in Figure 1.6(a) match one of the Name–Student_number values of a STUDENT record (Figure 1.2). Similarly, the Section_identifier–Course_number
values in GRADE_REPORT can be checked against SECTION records. Such checks can
be specified to the DBMS during database design and automatically enforced by the
DBMS whenever the GRADE_REPORT file is updated. Figure 1.6(b) shows a
GRADE_REPORT record that is inconsistent with the STUDENT file in Figure 1.2; this
kind of error may be entered if the redundancy is not controlled. Can you tell which
part is inconsistent?
1.6.2 Restricting Unauthorized Access
When multiple users share a large database, it is likely that most users will not be
authorized to access all information in the database. For example, financial data is
often considered confidential, and only authorized persons are allowed to access
such data. In addition, some users may only be permitted to retrieve data, whereas
GRADE_REPORT
Figure 1.6
Redundant storage
of Student_name
and Course_name in
GRADE_REPORT.
(a) Consistent data.
(b) Inconsistent
record.
(a)
Student_number
Student_name
Section_identifier Course_number
Grade
17
Smith
112
MATH2410
B
17
Smith
119
CS1310
C
8
Brown
85
MATH2410
A
8
Brown
92
CS1310
A
8
Brown
102
CS3320
B
8
Brown
135
CS3380
A
GRADE_REPORT
(b)
Student_number
Student_name
17
Brown
Section_identifier Course_number
112
MATH2410
Grade
B
1.6 Advantages of Using the DBMS Approach
others are allowed to retrieve and update. Hence, the type of access operation—
retrieval or update—must also be controlled. Typically, users or user groups are
given account numbers protected by passwords, which they can use to gain access to
the database. A DBMS should provide a security and authorization subsystem,
which the DBA uses to create accounts and to specify account restrictions. Then, the
DBMS should enforce these restrictions automatically. Notice that we can apply
similar controls to the DBMS software. For example, only the dba’s staff may be
allowed to use certain privileged software, such as the software for creating new
accounts. Similarly, parametric users may be allowed to access the database only
through the predefined canned transactions developed for their use.
1.6.3 Providing Persistent Storage for Program Objects
Databases can be used to provide persistent storage for program objects and data
structures. This is one of the main reasons for object-oriented database systems.
Programming languages typically have complex data structures, such as record
types in Pascal or class definitions in C++ or Java. The values of program variables
or objects are discarded once a program terminates, unless the programmer explicitly stores them in permanent files, which often involves converting these complex
structures into a format suitable for file storage. When the need arises to read
this data once more, the programmer must convert from the file format to the program variable or object structure. Object-oriented database systems are compatible
with programming languages such as C++ and Java, and the DBMS software automatically performs any necessary conversions. Hence, a complex object in C++ can
be stored permanently in an object-oriented DBMS. Such an object is said to be
persistent, since it survives the termination of program execution and can later be
directly retrieved by another C++ program.
The persistent storage of program objects and data structures is an important function of database systems. Traditional database systems often suffered from the socalled impedance mismatch problem, since the data structures provided by the
DBMS were incompatible with the programming language’s data structures.
Object-oriented database systems typically offer data structure compatibility with
one or more object-oriented programming languages.
1.6.4 Providing Storage Structures and Search
Techniques for Efficient Query Processing
Database systems must provide capabilities for efficiently executing queries and
updates. Because the database is typically stored on disk, the DBMS must provide
specialized data structures and search techniques to speed up disk search for the
desired records. Auxiliary files called indexes are used for this purpose. Indexes are
typically based on tree data structures or hash data structures that are suitably modified for disk search. In order to process the database records needed by a particular
query, those records must be copied from disk to main memory. Therefore, the
DBMS often has a buffering or caching module that maintains parts of the database in main memory buffers. In general, the operating system is responsible for
19
20
Chapter 1 Databases and Database Users
disk-to-memory buffering. However, because data buffering is crucial to the DBMS
performance, most DBMSs do their own data buffering.
The query processing and optimization module of the DBMS is responsible for
choosing an efficient query execution plan for each query based on the existing storage structures. The choice of which indexes to create and maintain is part of physical
database design and tuning, which is one of the responsibilities of the DBA staff. We
discuss the query processing, optimization, and tuning in Part 8 of the book.
1.6.5 Providing Backup and Recovery
A DBMS must provide facilities for recovering from hardware or software failures.
The backup and recovery subsystem of the DBMS is responsible for recovery. For
example, if the computer system fails in the middle of a complex update transaction, the recovery subsystem is responsible for making sure that the database is
restored to the state it was in before the transaction started executing. Alternatively,
the recovery subsystem could ensure that the transaction is resumed from the point
at which it was interrupted so that its full effect is recorded in the database. Disk
backup is also necessary in case of a catastrophic disk failure. We discuss recovery
and backup in Chapter 23.
1.6.6 Providing Multiple User Interfaces
Because many types of users with varying levels of technical knowledge use a database, a DBMS should provide a variety of user interfaces. These include query languages for casual users, programming language interfaces for application
programmers, forms and command codes for parametric users, and menu-driven
interfaces and natural language interfaces for standalone users. Both forms-style
interfaces and menu-driven interfaces are commonly known as graphical user
interfaces (GUIs). Many specialized languages and environments exist for specifying GUIs. Capabilities for providing Web GUI interfaces to a database—or Webenabling a database—are also quite common.
1.6.7 Representing Complex Relationships among Data
A database may include numerous varieties of data that are interrelated in many
ways. Consider the example shown in Figure 1.2. The record for ‘Brown’ in the
STUDENT file is related to four records in the GRADE_REPORT file. Similarly, each
section record is related to one course record and to a number of GRADE_REPORT
records—one for each student who completed that section. A DBMS must have the
capability to represent a variety of complex relationships among the data, to define
new relationships as they arise, and to retrieve and update related data easily and
efficiently.
1.6.8 Enforcing Integrity Constraints
Most database applications have certain integrity constraints that must hold for
the data. A DBMS should provide capabilities for defining and enforcing these con-
1.6 Advantages of Using the DBMS Approach
straints. The simplest type of integrity constraint involves specifying a data type for
each data item. For example, in Figure 1.3, we specified that the value of the Class
data item within each STUDENT record must be a one digit integer and that the
value of Name must be a string of no more than 30 alphabetic characters. To restrict
the value of Class between 1 and 5 would be an additional constraint that is not
shown in the current catalog. A more complex type of constraint that frequently
occurs involves specifying that a record in one file must be related to records in
other files. For example, in Figure 1.2, we can specify that every section record must
be related to a course record. This is known as a referential integrity constraint.
Another type of constraint specifies uniqueness on data item values, such as every
course record must have a unique value for Course_number. This is known as a key or
uniqueness constraint. These constraints are derived from the meaning or
semantics of the data and of the miniworld it represents. It is the responsibility of
the database designers to identify integrity constraints during database design.
Some constraints can be specified to the DBMS and automatically enforced. Other
constraints may have to be checked by update programs or at the time of data entry.
For typical large applications, it is customary to call such constraints business rules.
A data item may be entered erroneously and still satisfy the specified integrity constraints. For example, if a student receives a grade of ‘A’ but a grade of ‘C’ is entered
in the database, the DBMS cannot discover this error automatically because ‘C’ is a
valid value for the Grade data type. Such data entry errors can only be discovered
manually (when the student receives the grade and complains) and corrected later
by updating the database. However, a grade of ‘Z’ would be rejected automatically
by the DBMS because ‘Z’ is not a valid value for the Grade data type. When we discuss each data model in subsequent chapters, we will introduce rules that pertain to
that model implicitly. For example, in the Entity-Relationship model in Chapter 7, a
relationship must involve at least two entities. Such rules are inherent rules of the
data model and are automatically assumed to guarantee the validity of the model.
1.6.9 Permitting Inferencing and Actions Using Rules
Some database systems provide capabilities for defining deduction rules for
inferencing new information from the stored database facts. Such systems are called
deductive database systems. For example, there may be complex rules in the miniworld application for determining when a student is on probation. These can be
specified declaratively as rules, which when compiled and maintained by the DBMS
can determine all students on probation. In a traditional DBMS, an explicit
procedural program code would have to be written to support such applications. But
if the miniworld rules change, it is generally more convenient to change the declared
deduction rules than to recode procedural programs. In today’s relational database
systems, it is possible to associate triggers with tables. A trigger…