SSCI 582 UCLA Programming Database Design Techniques Questions

SSCI 582: Spatial Databases
Reading & Research Assignment #4
READING & RESEARCH ASSIGNMENT #4 – DATABASE DESIGN
TECHNIQUES: NORMALIZATION
Due: Monday, March 6th , 9:00 AM PT
Deliver your documents (described below) into the D2L Section
Graded Numeric Score
Value: 5% of the course grade
Penalty for late delivery: This assignment will be penalized 2 points up to FOUR days late. No points
will be given for submissions more than FOUR days late.
This week, we continue to learn about database design. We will be learning about normalization, a
process intended to help you refine your E-R diagrams so that they do a better job at providing a big
picture of a project’s data requirements and operations. Normalization should be part of any
database design process; it is basically a process for evaluating and correcting table structures in
order to get rid of data redundancies.
Many databases in the real world end up with redundancies which lead to anomalies if improperly
modified over time. In a nutshell: you normalize a database design through an iterative process
where you identify entities, their attributes and relationships, then use the results to identify more
entries and attributes. The iterative process is a series of stages called “normal forms”. Each of the
readings this week discusses normalization using examples.
Objectives of Reading & Research Assignment 4
● Describe the relational database terminology
● Discuss the concepts of normalization applied to databases
● Apply the normalization process to an example geodatabase model
Normalization can appear complicated and unnecessary at first. With its strict rules, intimidating
classification of 1NF, 2NF, and 3NF, and arcane references to “atomicity” and “domains of relations”
it is hard to guess that at its core, normalization is actually about reducing a complex data structure to
its simplest, most stable form.
We normalize – or rearrange – the attributes and tables of a relational database in order to limit
redundancy and decrease size and complexity. Storing the same data multiple times is uneconomic:
since computer storage and processing time are both expensive, it is good practice to ensure that
neither is wasted. The more storage a database requires, the more time is taken to execute query and
update commands. Critically, if data are stored redundantly, human labor must be expended to ensure
that all occurrences are found when edits are needed, consuming even more time and money and
potentially introducing error. By breaking large, complex tables into a number of smaller, leaner ones,
with logical relationships between them, normalization economizes computer resources and prevents
the so-called editing anomalies from occurring.
Normalization also supports flexibility of the database to accommodate future data. Database
structures that are not normalized may need to be radically changed if new or different data types are
added, whereas a normalized database will present fewer hurdles in making new data “fit.” In other
USC Spatial Sciences Institute © 2023
1
SSCI 582: Spatial Databases
Reading & Research Assignment #4
words, if we make our database too “comfortable” at first, avoiding the normalization rules, we may
unknowingly restrict our ability to introduce new data or to construct meaningful queries of it.
As successive normalization constraints are applied to a relational table, the table is said to be in a
particular “normal form.” A higher normal form is dependent upon the previous normal form(s); we
can be assured that the database already adheres to first and second normal form if it is in third normal
form. While databases can theoretically be normalized to fifth normal form (5NF), this level of
rearrangement often causes more problems than benefits; 3NF is usually sufficient in real-world
practice.
Integral to normalization, a unique field, called a primary key (PK), is used to identify each row in each
table and also to interrelate the tables to one another where needed. Each table must have a PK that
identifies one and only one record in that table. Tables may have multiple alternate keys (AKs), but
only ONE primary key.
The following section details the three important forms as a full normalization process from First
Normal Form (1NF) to Second Normal Form (1NF) to Third Normal Form (3NF). You should
complete this week’s readings first then read all of the instructions below before you begin to work
on the normalization exercise for submission. Be sure to check out the references included in the end
of this document if you still have questions about the normalization process after all the readings. I
found them (e.g. Bruce Fulton’s videos) extremely helpful.
THE NORMALIZATION PROCESS: 1NF – 2NF – 3NF
First Normal Form (1NF):
Eliminate repeating groups within each table.
A succession of fields (columns) containing similar data – often these are obvious by having field
names with numeric suffixes – are classified as repeating groups, which should be moved into records
(rows) of a separate table. Repeating groups, by their nature, tend to leave many “empty” cells (columnrow intersections) in the latter fields of a group. In fact, the cells are not actually empty: they store a
NULL value, which still occupies storage space. Figure1 demonstrates an all-too typical situation.
Figure 1. An Example of Repeating Groups in a Relational Table. Note the many
“empty” cells in the latter fields.
Make sure each field (column) has exactly one value in every record (row).
USC Spatial Sciences Institute © 2023
2
SSCI 582: Spatial Databases
Reading & Research Assignment #4
Nor can multiple values be stored in a single cell. In the above example, putting comma-separated lists
into a field called (perhaps) ‘Pros’ rather than ‘Pro1’, ‘Pro2’, …, also violates the intent of 1NF. Figure2
demonstrates another common appearance of lists within cells.
Figure 2. Multiple values in one cell (second row), violating 1NF.
Make sure each record has a unique primary key.
There are no duplicate rows in a normalized table because the PK field contains a unique value for
each record, by definition. Each PK – preferentially an integer and often software-generated –
identifies one and only one record. PKs are also used to establish relationships between tables (see
Entity-Relationship Diagramming). Typical PK field names are ‘ID’, ‘CustomerID’, ‘ProductID’, etc.,
and in ArcGIS uniformly ‘ObjectID’. Textual primary keys can create problems, as we see in the next
example.
Second Normal Form (2NF):
After the database is in first normal form
Make sure any non-key field is dependent on the (entire) primary key.
A translation of this rule would be “Can we figure out any of the values in a row from just part of the
primary key?” If so, the record is not in 2NF – and the proposed PK is suspect as well. Figure 3
demonstrates both problems: there is not a numeric ‘ProductID’, and the ‘ModelFullName’ being
used as a PK instead contains an obvious combination of Manufacturer and Model data. 2NF
concerns can also arise with so-called composite PKs, comprising two or more numeric fields, as we see
in the next example.
Figure 3. Model Full Name is a composite primary key (Wikipedia, 2021).
USC Spatial Sciences Institute © 2023
3
SSCI 582: Spatial Databases
Reading & Research Assignment #4
Third Normal Form (3NF):
After the database is in second normal form
Make sure no non-key field is dependent on any other field, except the primary key .
This rule goes one step further to ask “Can we figure out, or calculate, any of the values in a record
from any of its non-key fields?” An unfortunately common example is shown in Figure 4: carrying a
‘TotalCost’ field in addition to ‘UnitCost’ and ‘Quantity’ fields. Clearly, the total is just the product of
the latter two fields. Thus,’TotalCost’ field is redundant and must be removed entirely to adhere to 3NF.
Note, too, that ‘Description’ is queryable from ‘ItemNo’, which is part of a composite PK, and so
violates 2NF as well.
Figure 4. Both Description and TotalCost are redundant and must be removed for 3NF.
A subtler example of 3NF violation occurs with ZIP codes. If a table has a ‘ZIPCode’ field, is it really
necessary to have a ‘City’ or “’State’ field in the same table? ZIP codes are unique numbers, i.e. are
already good PKs, that identify preferred City and State values, so should be kept in a separate table.
Important People
● Edgar F. Codd – invented the Relational model of database management while working at
IBM and in 1970 detailed the first, second and third normal forms in A Relational Model of
Data for Large Shared Data Banks. He first used “atomicity” to describe the idea of all-ornothing transactions in relational databases, but later backed away from the word, as he and
other leading information scientists asserted it was misleading and vague.
● Raymond F. Boyce – co-developed SQL while at IBM. In 1974 he and Codd continued
Codd’s work on the relational model, developing Boyce-Codd normal form (BCNF or
3.5NF), while “stronger” than 3NF is not always achievable.
● Christopher J. Date – worked with Codd at IBM in the early 1970s, also co-developed SQL.
He has continued to write prolifically about relational database models, publishing several
books and papers.
USC Spatial Sciences Institute © 2023
4
SSCI 582: Spatial Databases
Reading & Research Assignment #4
WHAT TO HAND IN
Deliver this week’s Reading & Research Assignment as per the instructions below into the Week 8
Reading & Research Assignment #4 (as a Word doc) submission links in the Week 8 folder.
Normalization is a bottom-up process of dissecting pre-existing table structures into (many!) simpler
tables. It is easiest to understand normalization from table extensions, i.e. including sample data, as
shown here, together with their intensions.
1. (3 pts) Using MS Excel, create three sets of tables showing the transformations you make of the
non-normalized table below through each of the normalization steps to 1 st, 2nd, and 3rd Normal
Forms.
0NF (“Flatfile”): Everything piled-in together!
LakeID LakeName Basin
Basin_mi2 PortName1
State1
FIPS1 PortName2 State2
FIPS2 PortName3 State3
FIPS3
1
Nipigon
Superior 146000
2
Superior
Superior 146000
Duluth
MN
27
Marquette MI
3
Huron
Huron
92000
St.Ignace
MI
28
Bay City
MI
26
Sarnia
ON
4
Michigan Michigan 89200
Chicago
IL
17
St.Ignace
MI
26
Chicago
WI
55
5
St. Clair
Erie
39640
Detroit
MI
26
6
Eire
Erie
39640
Toledo
OH
39
Cleveland OH
39
Buffalo
NY
36
7
Ontario
Ontario
29360
Toronto
ON
Rochester NY
36
The criteria for each stage of normalization is as follows:
1NF: No repeating groups, viz. (City, State, FIPS) was repeating
2NF: No partial key dependencies, viz. FIPSCode depended only on StateAbrv, part of
key
3NF: No non-key dependencies, viz. Basin area_mi2 depends on Basin, not on lake
● To start the normalization process, copy & paste the non-normalized table into Excel where
it is easy to dissect and move parts around. Follow the 1NF, 2NF, 3NF process to normalize
it.
● In each normalized table during the 1NF-2NF-3NF process, double-underline the numeric
primary keys (PKs) (note: in Word, right-click > Font > change the underline style),
underline the textual alternate keys (AKs) if exists, and italicized foreign keys (FKs) if any.
For this exercise, you generate the PKs for all tables except for LAKE.
● Once you complete the process, copy and paste the resulting tables in each normalization
step back into a Word document for submission.
USC Spatial Sciences Institute © 2023
5
SSCI 582: Spatial Databases
Reading & Research Assignment #4
● Teasing the table apart may expose some errors; feel free to comment on them!
For your convenience, I have also sketched the tables and their attributes (with their PKs and
FKs etc.) to be included in each form/process below.
1NF: No repeating groups, viz. (City, State, FIPS) was repeating
LAKE (LakeID, LakeName, Basin, Basin_mi2)
PORT (PortID, On_LakeID, PortName, StateAbrv, FIPSCode)
2NF: No partial key dependencies, viz. FIPSCode depended only on StateAbrv, part of key
LAKE (LakeID, LakeName, Basin, Basin_mi2)
PORT (PortID, On_LakeID, In_StateID, PortName)
STATE (StateID, StateAbrv, FIPSCode)
3NF: No non-key dependencies, viz. Basin area_mi2 depends on Basin, not on lake
BASIN (BasinID, Basin, Basin_mi2)
LAKE (LakeID, Basin_ID, LakeName)
PORT (PortID, On_LakeID, In_StateID, PortName)
STATE (StateID, StateAbrv, FIPSCode)
2. (2 pts) Using LucidChart*, create an E-R diagram with the ‘Crow’s Feet’, 3-column table structure
(i.e. PK/FK, attribute field, and data type, as what you did for Discussion 3) that emerged from
the normalization at 3NF – which means the tables are ready to use in an E-R diagram.
Your submission are the tables in each stage of the normalization process (1nf, 2nf, and 3nf)
from Excel (#1) and the associated E-R diagram from LucidChart (#2) into a single, suitably
headed Word document. Submit this document as RA4_[your last name] in the assignment
submission link in D2L.
* If LucidChart does not support dashed line options for containers, use a thin line (1 pt) line
instead.
REFERENCES AND ADDITIONAL RESOURCES
Codd, Edgar F. (1970) “A Relational Model of Data for Large Shared Data Banks” Communications of
the ACM. 13(6): 377-387. DOI: 10.1145/362384.362685
Darwen, Hugh (2012) “The Relational Model: Beginning of an Era” IEEE The Annals of the History of
Computing, 34(4): 30-37. DOI: 10.1109/MAHC.2012.50
USC Spatial Sciences Institute © 2023
6
SSCI 582: Spatial Databases
Reading & Research Assignment #4
Date, Christopher J. and Hugh Darwen (2006) Databases, Types and The Relational Model: the Third
Manifesto (3rd ed.), Reading, PA: Addison-Wesley, 572 pp. http://www.thethirdmanifesto.com
(Firefox etc.; in MS Edge, I found it here: http://www.dcs.warwick.ac.uk/~hugh/TTM/, June
2016)
Fulton, Bruce (2015) 1NF – Normalization Plain and Simple – Video:
https://www.youtube.com/watch?v=70nyAehrrSY (October, 2018). This is the first of a series
of short videos (i.e. 2-3 minutes each) that walk you through the forms, including what the forms
are and why we need them.
Kent, W. (1983) “A Simple Guide to Five Normal Forms in Relational Database Theory” Communications of the
ACM, 26(2): 120–125. DOI: 10.1145/358024.358054
Microsoft Support (2022) Description of the database normalization basics:
https://support.microsoft.com/en-us/help/283878/description-of-the-database-normalizationbasics.
Yeung, Albert K.W. and G. Brent Hall (2007) Spatial Database Systems: Design, Implementation and Project
Management. The GeoJournal Library, 87. Dordrecht: Springer, 553 pp. DOI: 10.1007/1-40205392-4_1.
ADDENDUM – GENERALIZATION IN E-R DIAGRAMS
The purpose of describing the term ‘generalization’ here briefly is to differentiate it from the
‘normalization’ we talked about this week. It is okay if you do not fully understand it.
Generalization is the design technique of packaging, i.e. lumping, allied tables into one generic table,
either conceptually (super-typing) or explicitly (sub-typing), with the goal of making the diagram easier
to manage and read. Conceptual packaging is accomplished by drawing a dashed line* around the
aggregated entities to denote a super-entity whose relationships only need to be shown once. The
implication is that relationships coming to the edge of package apply equally to all entities with it.
Explicit packaging is accomplished by adding an explicit “type” field that allows similar sub-entities
to be tracked within a host entity. This special field is marked “ST” in the key-indicator column;
conventionally its data-name is simply Type, and it must be of integer data-type, indicated as Code.
An explicit list of integer code: description pairs appears to the right of the field, or as a footnote. In
Esri’s current (v.10) geodatabase implementation, all the host entity attributes appear in all the subentities, although their values may come from different domains, per the Type field. The behavior of
the sub-entities wrt relationships and topology may also be different, again per Type.
USC Spatial Sciences Institute © 2023
7
READING & RESEARCH ASSIGNMENT #2
SSCI 582
Rui Chen
02/08/2023
price
ID
Name
Available
Title
Book ID
Books
Published
by
Publisher
Author
Address
Date
Experienced
date
Name
Borrowed by
Member
Address
Issue
Due date
Member
type
Member
date
An entity is a real-world item or idea that is significant to the library system and can be
separated from other objects and concepts. The four entities in this E-R diagram are Member,
Book, Author, and Publisher.
The Member entity represents the individual who may borrow books from the library.
The Member entity’s characteristics, which are used to identify and characterize library members,
include Member ID, Name, Address, Phone Number, and Email. The main key to differentiating
one member from another is their Member ID.
The Book entity represents the physical object that a member can check out. The ISBN,
Title, Author, Publisher, Publication Date, and Genre characteristics of the Book object are used
to describe the books in the library. The main factor that sets each book from others is its ISBN.
The Author entity represents the writer of one or more books. The properties of the
Author object, which are used to identify the authors of the books in the library, are Author ID,
Name, Nationality, and Date of Birth. The main identifier that sets each author apart from the
rest is their Author ID.
The Publisher entity represents an organization that publishes books. The Publisher entity
includes properties that are used to identify the publishers of the books in the library, including
Publisher ID, Name, and Address. The essential key that sets each publisher apart is their
Publisher ID.
Relationships represent the relationships between entities. Six relationships may be seen
in this E-R diagram: Member – Book, Book – Member, Author – Book, Publisher – Book, and
Author – Author.
The link between a Member and a Book show that a Member may borrow more than one
book. The link between a book and a member shows that only one member can borrow a book.
The link between a book and its author shows that a book has just one author. The link between
an author and a book indicates that an author may write several books. The link between a book
and its publisher shows that only one publisher publishes a book. The link between a publisher
and a book indicates that a publisher may publish several books.
The Cardinality Constraints show how frequently one entity occurs compared to another.
The Cardinality Constraints in this E-R diagram, which indicate how often one entity occurs
concerning another entity, are 1:N, 1:1, and N:M.
A visual depiction of the entities, characteristics, relationships, and cardinality
restrictions in a library system is represented by the E-R diagram for the Library System. This
graphic can better understand the structure and connections between the items and ideas in the
library system.
In summary
Entities:
Member: A person who is a member of the library and can borrow books.
Book: A physical item that can be borrowed by a member.
Author: A person who has written one or more books.
Publisher: An organization that publishes books.
Attributes:
Member: Member ID (primary key), Name, Address, Phone Number, Email.
Book: ISBN (primary key), Title, Author, Publisher, Publication Date, Genre.
Author: Author ID (primary key), Name, Nationality, Date of Birth.
Publisher: Publisher ID (primary key), Name, Address.
Relationships:
A Member can borrow multiple Books.
A Book can be borrowed by only one Member at a time.
A Book is written by only one Author.
An Author can write multiple Books.
A Book is published by only one Publisher.
A Publisher can publish multiple Books.
Cardinality Constraints:
Member – Book: 1:N (One Member can borrow multiple Books)
Book – Member: 1:1 (A Book can be borrowed by only one Member at a time)
Book – Author: 1:1 (A Book is written by only one Author)
Author – Book: N:M (An Author can write multiple Books)
Book – Publisher: 1:1 (A Book is published by only one Publisher)
Publisher – Book: N:M (A Publisher can publish multiple Books)

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER