Disaster Recovery

INSTRUCTIONS:   All responses must be prepared in Microsoft Word format and uploaded to the appropriate online assignment.  Please   include your name, course number, week number and assignment name at   the top of your submissions (for example MargaretFoltz-ISSC366-Week5   Assignment).
 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Read chapters 13, 14 & 15.  Answer each of the following questions with at least 300 words each. (900 Words Total)

 1) Why is it important to test a disaster recovery plan?
 

2) In your opinion, is there such a thing as a ‘failed’ disaster recovery test?
 

 3)   Summarize the physical/logical layout of your network (tips: are      there  any single points of failure in the network, how critical are   the     circuits, etc).       

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

192

CHAPTER 15
TELECOMMUNICATIONS

AND NETWORKING
Your Connection to the World

Everything is connected … no one thing can change by itself.
—Paul Hawken

INTRODUCTION

All of us are becoming more interconnected, as people and companies make increasing use of
telecommunications networks to connect to suppliers, customers, business associates, and friends.
Telecommunications has enabled our businesses and personal lives to be “just-in-time” no matter where
we might be in the world. We use telecommunications networks to talk, share data, run applications,
download files, organize schedules, exchange e-mail, and access the Internet. We use these networks to
connect us with our coworkers and to collaborate with customers in doing the business of the firm. This
chapter reviews the dangers that threaten our telecommunications networks and the attributes unique to
these networks that require special processes to restore them after a disaster.

Most organizations have two distinct telecommunications networks: one for voice communication
that uses the standard telephone system (also known as the public switched telephone network, or PSTN)
and another for all electronic communication that uses one or more computer networks that are typically
connected to the Internet. While the need for two distinct networks is slowly changing with the increasing
use of VoIP (Voice-over-Internet Protocol) for sending voice communications over the Internet, many
organizations will have to maintain both for the foreseeable future. Fortunately for our planning purposes,
both types of networks have very similar issues that affect disaster recovery.

PUBLIC SWITCHED TELEPHONE NETWORK

How important can a telephone line be? What’s the cost of a telephone call? What is the value of a
missed call? How much would you pay for 100 percent telecommunications reliability? Is such a thing
even possible?

Consider this scenario. You are the Plant Materials Manager sitting at home in your living room. As
you watch the evening news, you see a video of flames rising out of the roof of a key supplier’s main
factory. You don’t need trouble like this. What should you do? You try calling your salesperson and the
company offices but no one answers. First thing in the morning and all the next morning you try calling
them. The company is more than an hour drive away, you can’t get away from the office for that long,
and your calls still go unanswered.

Looking through your contact list, you select another supplier and place an order for a 2-month supply
of goods to replace the supplier that had the fire. The set-up costs are a killer but it is worth the money if
you can keep your factory running. At least this problem is contained. Trouble is, the fire you saw on TV
was isolated to the front offices and the factory is fine. The warehouse is bulging. At a time like this, the
supplier needs the cash, but employees can’t call out until the telephone switching room is replaced! The
company took care of the fire but forgot all about its customers.

All companies strike a fine balance between the cost of reliable telephone service and the cost of
downtime. In a time of tight budgets (which is always), you balance the cost of premium services against
the potential loss of telephone service. The more reliable that you want your telephone network to be, the
more you must spend. Telephone service is central to the conduct of business in most companies. A
failure isolates customers and suppliers from your company—and that can quickly become a very lonely
feeling.Co

py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.

EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

193

Some companies are even more dependent on telephone service than others are. If you work at a
factory, you use the telephone to conduct business, but your main concern is moving products down the
line. The loss of telephone service does not slow down the assembly line one bit. However, if your
facility is a call center for sales or service, then 100 percent reliable telephone service is crucial to your
ongoing business. In the factory example, if the conveyor on the main assembly line breaks, the workers
have nothing to do. In the call center, if the telephone service is interrupted, then they are likewise idled.
How your facility utilizes its telephone service has a great deal to do with how crucial it is for your
operations.

The North American telephone network is designed to carry traffic from about 10 percent of all
telephones in a given area. This approximates its load at its busiest times. In a wide-area disaster, this
capacity is quickly swamped, as people everywhere call to check on their loved ones. The cellular
network doesn’t fare any better because portions of it also use landlines and it has its own capacity limits.
In today’s fast-paced business environment, how long can your business afford to be without telephone
service?

The disasters that can befall your delicate communication lines are legion. Anywhere along its path
the wire can be broken, switching equipment can lose power, or problems can even occur within your
building. The high level of telephone communications reliability of our modern telecommunications
networks is the envy of the world. You must ensure that your company has reviewed the risks to its
telecommunications pipeline and has taken steps to reduce the likelihood of failure. Our job here is to
identify those risks and build a wish list of mitigation actions to make your company’s telephone
communications even more rock solid than ever before.

The next few sections of this chapter review the basics of how your telephone system works. If you
are already a telephone system expert, you can skip these sections. If you are charged with writing a plan
in an area in which you are not a full-fledged expert, then you should read on. A basic understanding of
the main components will make it easier for you to develop an appropriate plan.

PSTN Basics

The world of telephonic communications all begins with the telephone instrument on your desk. This
modern marvel connects you to the world at large. The telephone is connected by wire to the wall jack,
which in turn is connected to a wiring closet. A wiring closet may be one of those locked doors on your
office building floor that you never get to look into. The same function can be served by running wires to
a specific place on the wall of a factory (which is generally a wide-open space). In either place, you
would see brightly colored wires running in spaghetti-like fashion to a telephone wire “punch-down
block” or punch block.

If you look at a punch block, it seems to be a “rat’s nest” of colored wire routed in an orderly fashion,
yet going in all directions. Each office telephone has at least one pair of wires running from its wall jack
all the way back to the wiring punch block. The punch block also has wires running to the telephone
switch for your building. This is where the two are connected to each other. If an office no longer needs a
telephone line, the punch block is the place where it can be disconnected.

Private Branch Exchange

From the wiring closet, large bundles of wires run to the company’s telephone switching equipment. In a
larger company this would be a Private Branch Exchange (PBX). A PBX replaces the long-gone
company telephone operator who would connect internal calls with a plug patch panel. The patch panel
physically connected the wires from one telephone instrument into the wires of another. This is really a
basic operation, but until electronics matured, it was the only way to do it. This is now all done
electronically. In today’s offices, a PBX takes these incoming wires and, using the signal on them,
provides electronic switching of calls within the building. A simple way to think about a PBX is to
consider it a special-purpose computer with all the support needs of a minicomputer.

The PBX determines which calls are intended for external telephone numbers and connects to the
local telephone company central office using “trunk” lines. Trunk lines are used for inbound or outboundC

op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

194

calls. The number of trunk lines the PBX has connected to the central office determines the maximum
number of external (inbound or outbound) calls you are capable of supporting at a given moment. Modern
PBX systems offer a wide range of additional services such as:

Voice mail.
Telephone conferencing.
Call transferring.
Music on hold.

Based on this, from a business continuity perspective, you have a large single point of failure device.
You must plan to recover from a catastrophic failure of the PBX (e.g., burned to a cinder in a fire). A
close examination of the PBX room and its ancillary equipment will show that it is essentially a computer
room, requiring the same electrical and climate stabilization actions as any computer room. Backup
copies of the configuration data must be made for each device and handled with the same care as the
backup data from your computer system. These data should be securely stored off-site and available when
needed. Up-to-date backed-up data are the key to a prompt recovery.

Internal to a company, the telephone signals can be analog or digital. Digital PBX systems provide a
wide range of services beyond simply routing calls. Most PBX systems use digital signals to
communicate with the telephones. This allows for additional services, such as one-touch dialing,
preprogrammed telephone numbers, voice mail, etc. This is significant in the case where you want to use
a modem to dial out of the office. A fax machine or a direct modem connection requires an analog line. If
you have a digital PBX, then you will need separate analog lines.

Rather than run a multitude of analog telephone lines, most companies access the Internet via their
external data network. This works fine for most office dwellers. However, some devices still need these
analog lines and you should keep track of where they are. Examples of analog dial-out lines might be for
an alarm service, which dials out to notify the repair service of an out-of-tolerance condition, or for
validating a credit card in stores at the point of purchase.

OTHER VITAL EQUIPMENT LOCATED IN THE PBX ROOM Other important equipment is
typically located in the same room as your PBX. After the critical devices are identified, make sure they
are protected and draft a plan to fully recover them. These devices may include:

. Gives callers information based on what the caller enters usingInteractive Voice Response (IVR)
the telephone keypad. You have heard the messages—please select 1 to talk to sales, select 2 to talk
to … These audio tracks should be backed up and the queuing logic documented.

. This is the business version of what is sold to consumers asAutomatic Number Identification
“caller ID.”

. Connects the incoming call to the first available line. This is used whenIntelligent Port Selector
you have multiple people answering inbound calls, as in a hotel chain’s reservation center.

. Used to monitor the volume of telephone calls during peak periods, toCall Management System
identify the number of telephone operators needed, and to track operator efficiency.

. Tracks calls made and assigns them to a billing account. This is also known as aCall Accounting
“Station Message Detail Recording” (SMDR). This is used in various ways. Lawyers might use it to
bill their time to a specific client. Some companies use this to track employees’ long distance usage,
etc.

. Tracks the level of call activity by showing the status of the trunk lines, the numberCall Monitoring
of calls in progress, the number of calls waiting in queues, the wait time, the number of abandoned
calls, and the status of the operators.

The Telephone Company’s Central Office

Soon after telephone service was first created, it became obvious that every person could not be wired to
every other person they might possibly want to call. This would result in an impossible maze of wires. ToCo

py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

195

call someone new, you would first need to run a wire from your telephone to their telephone. Imagine the
problems in asking a girl for her telephone number in those days!

To simplify this problem, all the telephone lines were run into a centralized building and then the
switchboard operators would physically patch your telephone line into the switchboard, making the
connection. These buildings would be located at various places around the city and were the central place
where connections were made. Eventually, automated switching made it easier, but buildings were still
needed to switch the calls. In more recent years with the advent of solid-state circuits, the floor space
required for these buildings has shrunk dramatically. You still see them around: small buildings without
any windows, usually neatly trimmed grass, and a small telephone company sign by the front door.

The central office provides a service similar to your PBX by switching your call to another local
telephone, to a different central office far away, or to a long-distance carrier’s point of presence (POP).
The long-distance company then routes the call through its switching center and back to a distant central
office and down to the far-away telephone.

Interexchange Carrier Point of Presence

With the breakup of the AT&T long-distance monopoly in 1982 came the creation of independent
long-distance telecommunications providers. These non-AT&T long-distance providers are known as
Interexchange Carriers (IXCs). Along with this choice came the opportunity to split your long-distance
service across different carriers in hopes that all of them would not be knocked out in a disaster and
therefore your communications traffic could flow out on the alternative pipeline. For that to be true, a lot
of careful planning is necessary.

First, you must ensure cable separation so that you have a different wire path from your facility to the
IXC’s point of presence. Most companies are a mix of owning the network in high-traffic areas and
renting from another carrier in a low-traffic area. This means your traffic separation may only be on
paper. If you are in a high-traffic area, the carrier may run a separate cable to your facility. If you are not,
then you will probably connect to the IXC in the nearest central office. Do not take for granted that
because you use two companies you are on two different wires. Ask them if they share lines and ask to
see the route of the cable.

Some telecommunications experts believe that route separation is much more important than using
multiple telecommunications companies. They feel it is easier to manage one supplier so long as the cable
issue is addressed.

When evaluating networks, the first thing to consider is whether they rent their network or own it.
Most are a mixture. The more of the network equipment that they own, the more control they can exercise
over it. When choosing an IXC, things to consider include:

What is their system availability time, and what they will guarantee?
What are the consequences to the IXC of downtime? You can ill afford it. A few extra free minutes of
service every month is poor recompense for missed customer calls.

What is the restoration priority for the sections of the network that you will be using?
What are their alternate routes for the places you normally communicate with? Don’t automatically
assume a new carrier will be a better alternative to your existing carrier. Is your service route a spur
(single-threaded) service? Is it a ring architecture, which at least gives you two paths in case one has
a problem?

How often does the carrier practice its disaster recovery procedures?
How easily can the IXC shift your inbound calls to another site?

OK, well, this all sounds pretty straightforward. After all, telephone service has been around for well
over 100 years and its technology is pretty well known. What could possibly go wrong?

COMPUTER NETWORKS

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

196

All but the smallest firms today have one or more networks within their organization. A network consists
of one or more servers that are connected to one or more workstations and allows users to share
information and resources. This connection can be by copper wire, by fiber-optic cable, or even by radio
frequency (RF, or wireless). Rarely are workstations connected directly into a server. Typically they plug
into the wall jack, which is connected to a wiring closet. In the closet will be one or more “hubs” that will
connect the local devices to a hub in the computer room. This room is often the same closet as contains
the telephone wiring.

In the computer room (either a central location or possibly sprinkled about the facility) sit the servers.
A server is a computer that runs software to provide access to resources attached to the network, such as
printers, disk storage, and network applications. A server can be any type of computer that supports the
sharing of resources: it may be a standard desktop PC, or a dedicated device containing large amounts of
memory and multiple storage devices that can support hundreds of PCs at the same time.

The next section reviews the basics of how your computer network works. If you are already a
network expert, you can skip this section. If you are charged with writing a plan in an area in which you
are not a full-fledged expert, then you should read on. A basic understanding of the main components will
make it easier for you to develop an appropriate plan.

Computer Network Basics

The term local area network (LAN) is typically used to describe workstations and servers that are all
physically in the same location. A LAN can be implemented using two main architectures:

. Each computer on the network communicates directly with every other computer. EachPeer-to-Peer
computer is treated equally on the network. Communication between computers is coordinated using
a network hub.

. Each end-user PC is connected to a central server, which coordinates communicationClient/Server
and supplies resources to the end users. Typically the server contains file storage, printer, backup,
and other resources that are shared by the end-user client computers.

Figure 15-1 shows a typical client/server LAN, with five PCs connected to a server. The server has
attached a network printer and a backup tape drive that can be accessed from any end-user client
computer on the network.

A wide-area network (WAN) is a data communications network that consists of two or more local
area networks connected to allow communication between geographically dispersed locations. Most
companies have some sort of WAN connection, either connecting to other locations within the
organization or to the public Internet. There are several different types of WAN communication links
available. The most common ones are:

. This is the most basic type of connection, using a modem and a PSTN line. This is also theDialup
slowest and most unreliable type of connection available, supporting up to 56 kilobits per second
(kbps).

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

197

FIGURE 15-1: Typical LAN configuration.

. An ISDN (integrated services digital network) connection uses a PSTN line using an enhancedISDN
communication standard to speed up the connection to 64 or 128 kbps.

. A dedicated connection that supports data rates up to 1.5 megabits per second (mbps). A T-1 lineT-1
consists of 24 individual channels, each supporting 64 kbps of data.

. A dedicated connection that supports data rates up to 43 mbps. A T-3 line consists of 672T-3
individual channels, each supporting 64 kbps of data.

. This is a packet switched protocol for connecting devices on the WAN. It supportsFrame Relay
data transfer rates up to T-3 speed.

. Using the same digital or fiber-optic network as cable TV, cable modem Internet isCable Modem
capable of speeds up to 60 mbps.

. ATM (Asynchronous Transfer Mode) is a means of digital communications that is capable ofATM
very high speeds; suitable for transmission of images or voice or video as well as data.

. A wireless LAN bridge can connect multiple LANs for distances up to 30 miles with aWireless
direct line of sight.

. A VPN (Virtual Private Network) uses encryption in the lower protocol layers to provide aVPN
secure connection through the public Internet.

Figure 15-2 shows a corporate wide area network, with two remote locations linked to the
headquarters, which is in turn connected to the public Internet. Just as your phone system has a room for
patching users into the telephone network, your computer network also has a central location where
individual pieces of equipment are connected into one or more servers. This might be the same room as
your telephone patch panel, or it could be part of a separate dedicated computer room.C

op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

198

It starts with each piece of equipment on the network being connected by wire to the wall jack, which
is in turn connected to a wiring closet. A wiring closet for your computer network is typically located in
or next to the computer room where your servers are located. The wiring closet consists of many different
cables running in spaghetti-like fashion to a network cable “punch-down block” or punch block.

The network cable punch block is similar to the telephone wiring punch block discussed earlier. It has
network cables routed in an orderly fashion, yet going in all directions. Each piece of equipment on the
network has at least one cable running from its wall jack all the way back to the cable punch block. The
punch block also has wires running to the computer servers. This is where the two are connected to each
other. If a piece of equipment needs to be removed from the network, the punch block is the place where
it can be disconnected.

FIGURE 15-2: Typical WAN configuration.

Internet Service Providers

Internet Service Providers (ISPs) connect your organization to the Internet. You are connected to your
ISP via one of the network connection options listed in the previous section. Most organizations will have
only one physical connection between its network and the ISP for Internet access. If Internet access is
critical to the operation of your business, you should consider a connection to a second ISP to give you a
second pathway to the Internet. There are, however, several things to evaluate and plan for when
considering a redundant ISP. (This is similar to considering multiple telecommunications companies.)

First, you must ensure cable separation so that you have a different wire path from your facility to the
ISP’s point of presence. Most ISPs are a mix of owning the network in high-traffic areas and renting from
another ISP in a low-traffic area. This means your traffic separation may only be on paper. If you are in a
high-traffic area, the ISP may run a separate cable to your facility. If you are not, then you will probably
connect to the ISP’s nearest point of presence. Do not take for granted that because you use two ISPs you
are on two different wires. Ask them if they share lines and ask to see the route of the cable.

Some telecommunications experts believe that route separation is much more important than usingC
op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

199

multiple ISPs. They feel it is easier to manage one ISP so long as the cable issue is addressed. When
choosing an ISP, things to consider include:

What is their system availability time, and what they will guarantee?
What are the consequences to the ISP of downtime? You can ill afford it. A few extra free minutes of
service every month is poor recompense for missed sales due to the Internet being unavailable.

What is the restoration priority for the sections of the network that you will be using?
What are ISP’s alternate routes for the places you normally communicate with?
How often does the ISP practice its disaster recovery procedures?

RISK ASSESSMENT

When developing your disaster recovery plan for your telecommunication services, look for single points
of failure that will adversely impact your critical business processes. These critical processes should have
been identified in the Business Impact Analysis. External threats to telecommunications include cables
being cut, interference from electromagnetic sources, attacks by hackers and other intruders, and damage
from natural hazards, such as fire or water. Internal threats include many of these same dangers; cables
can be cut by remodelers and leaky water pipes can damage cables and equipment. In an ideal world, you
would have duplicate service running to each telephone or desktop. Externally, an issue with multiple
Internet connections is that there are now two points of potential entry for hackers; be sure to work
closely with your security team to adequately protect your WAN. Internally, it is usually not
cost-effective to run duplicate cables; an alternative is to install an extra jack every few cable drops. If a
problem occurs in an end-user’s cable, the extra jack would be available as a backup until the problem
cable is repaired.

It is also important to review your vulnerability to problems with the devices that connect your
telecommunications network together, such as routers, switches, and hubs. Your Business Impact
Analysis should determine the level of threat that these devices pose if they fail. Look at having
redundant devices installed that can take over if the primary device fails.

Wireless telecommunication connections can provide an effective backup strategy if the wired
network fails. Wireless networks have no wires to fail and can be quickly installed if needed after a
failure in the wired system. The major drawback to using a wireless network is that the transmissions can
be intercepted; make sure encryption and other security measures are implemented to avoid others
intercepting your network traffic.

You should also consider using network monitoring software that can detect problems on the network.
Network monitoring software can alert you immediately if a node on the network is having problems or
has failed, which will allow you to restore service more quickly and help prevent problems from
cascading. Most network monitoring software can be configured to look for system parameters that fall
out of the desired range and then generate a message to a support person to take action.

Natural Hazards

Now that you know the basics of how your telecommunications systems work, what are the risks to these
systems? discussed natural hazards in detail, so this section only addresses those naturalChapter 3
hazards that have a major impact on your telecommunications systems.

. Ice can coat cables strung from telephone poles and, if the weight is greatIce Storms and Blizzards
enough, potentially bring them down.

. Severe rain can weaken the ground around a pole and cause it to sagThunderstorms and Lightning
when there are high winds at the same time pushing against it. Lightning can strike telephone poles
and send a major charge flying down the line, burning up wire and equipment along the way.

. A powerful destructive force that can snap lines and rip up telephone poles. High-riskTornadoes
areas are prime candidates for buried lines.

. Can cover a wide swath of land and not only bring down aHurricanes and FloodsC
op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

200

telecommunications network but also prevent crews from promptly addressing the problems.

Human-Created Hazards

People are just as big a danger to your telecommunications systems as nature can be. Human-created
dangers include:

. Sometimes emergency excavation is necessary (such as repairing a brokenBreaks in Buried Lines
gas main). Sometimes the local government is cleaning trash out of the ditches alongside the road.
Sometimes a well-meaning person just digs without asking (including on your own property). In any
of these cases, there is the chance that your tiny little cable will be dug up and severed.

. Sometimes people miss the tight turn and break off a telephone pole. If this is your onlyAccidents
line to the central office or your ISP, your service is gone until the pole is replaced.

. A problem in a central office or with your ISP can quickly shut downCentral Office or ISP Failure
your telecommunications unless you are wired to a second provider.

. Hackers can use various techniques to render your Internet serviceDenial of Service Attacks
unusable. A Distributed Denial of Service (DDoS) attack uses the computing power of thousands of
vulnerable, infected machines to overwhelm a target or a victim.

Telecommunications Equipment Room

The room that houses your telecommunications equipment may not be as safe as you think it is. Dangers
to your telecommunications room include:

that are too hot, too cold and that swing widely are all hard on your telephoneTemperatures
switching equipment. Extreme temperatures stress the printed circuits. Large swings in temperature
(hot to cold) cause expansion and shrinkage of circuit cards and again can weaken components over
time.

has an effect on temperature and the growth of mold on your equipment.Humidity
will definitely stop a PBX, routers, switches, and other telecommunicationsLoss of electrical power
equipment. Pay phones and direct lines out should still be operational since the telephone circuits
supply their own power.

in your telecommunications room could release water onto your equipment.Water pipes overhead
The same goes for your cable panels. Overhead pipes along external walls could potentially freeze
and leak when thawing. Leaks from rooftop air conditioning compressors are also a problem.

. This room is not set up to accommodate tourists and they should not beSecurity may be a factor
allowed in. Keep the door locked at all times.

. This equipment generates heat and is in danger from fire or the sprinklersFire is a possible danger
used to extinguish a fire. Typically this room is unattended and a slow-starting fire may go
undetected. Gas fire suppression is expensive but may save your equipment.

Cabling

The discussions earlier about wiring closets had a purpose. Imagine the mess if an isolated fire in the
wiring closet melted all these wires. In terms of structural damage, you got off pretty easy. But in terms
of damage to the telecommunications system, that entire area of the facility will be without service for
some time. New cable would have to be run from the PBX or Internet server to the closet and from the
wall to the closet. The alternative is to splice an extension onto each cable and run it into a punch block.
Either way is time consuming and expensive. So you can see how the wiring closet can be a single point
of failure for your telecommunications systems. This makes it a good idea to keep everyone out of it and
to not store anything in the closet that might cause a problem.

INTERNAL CABLING For disaster avoidance, concerns begin with the wiring closet and patch panels.
Take a walk with your telecommunications specialist and identify the location of all telecommunicationC

op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

201

patch panels in your facility. Things you are looking for:

1. In a closet

The door and any windows are kept locked.
There is fire suppression equipment (usually fused link sprinklers).
There is nothing else in the closet except telecommunication and/or data communications
equipment. Combustible materials stored in the closet threaten your equipment. This is not the
place to store holiday decorations, old files, office supplies, janitor supplies, etc.

Very few people have keys—telecommunication support staff, security, and no one else.
There should be sufficient light to work in the closet.
There will often be data network wiring and telephone switching equipment together in these
closets. They are a natural fit here. The issue is that where the patch panel is simply a physical
connection for wires, the network equipment is energized electronics, which will generate heat
and introduce a potential fire source.

2. Outside a closet

Be sure it is covered and the cover is locked. Same rules for the keys as for the wiring closets.
Enough keys for the right people and none for anyone else.

All cables and wires leading into the external panel are encased in heavy conduit to inhibit
tampering.

If this is located in a warehouse or factory, be sure it is strongly protected from environmental
influences (leaky ceilings, etc.) and from being crushed by a forklift or toppling stacked material.

EXTERNAL CABLING The first rule of outside cabling is “cabling and backhoes don’t mix!” A chain
is only as strong as its weakest link. Your servers and PBX are snugly locked up in their room, and the
ISP’s office and the telephone company’s central office are also secure. But the wire in between is
exposed to the ravages of weather, people, and machines.

Experienced telecommunication professionals have their own name for backhoes—cable
locators!

External cabling, which is the cable that runs from your wiring closet to the telecommunication
provider’s office, is your biggest concern. The wire runs from your building to an access point along the
road, usually on a pole. Then the wire runs through the countryside (usually along a road or railroad) to a
point of presence or central office. In the city, it might run through underground pipes to the point of
presence or central office. You have no control over where the wire is run and no capability to protect it!
In some areas, you will even have separate cable runs to be concerned about—one for the local telephone
company, one for the Interexchange Carrier (long-distance service), and yet another for your Internet
service.

A common term used when describing part of a telecommunications network is “the last mile.” The
last mile describes the wire from the telephone company’s central office or the ISP’s point of presence to
your structure (and is usually more than a mile). This is also known as the “local access” or the “local
loop.” This part of the network is the most vulnerable. It is often carried on telephone poles (susceptible
to ice storms and errant vehicles) or underground (susceptible to damage from digging).

ROUTE SEPARATION The best path to telecommunications reliability is redundancy. This can include
redundant equipment, redundant technicians, and redundant cabling. The more alternate paths that a
signal can be routed, the more likely it is to get through. The principle of cable route separation should be
an integral part of your telecommunications network design. Essentially this means that you have more
than one cable between your building and your telecommunications providers. This prevents a total
communications outage from a single cable cut.Co

py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

202

Few companies can afford to take this to the extreme, but you can look into some steps. Begin by
asking the telecommunications provider to show you the route that your cable takes from the wall of your
building to their office. This will also show you how exposed the cable is to auto accidents (are the poles
close to the road), to backhoes (is the cable buried along the road in some places), or any other number of
threats.

With this experience in mind, negotiate with the telecommunications provider a fresh cable run from
your building (exit from the opposite end from the other cable) to a different central office or point of
presence. This will keep you operational in case your usual point of presence is damaged or experiences
an equipment failure, or if the cable to the point of presence was broken. How likely is that to happen?

In May 1988, there was a fire in the Illinois Bell central office located in Hinsdale, IL. The
two-story building was completely gutted. This building was an important hub for Illinois
Bell, as well as for major long-distance carriers, such as MCI and Sprint. In addition to
cellular service and data networks, approximately 40,000 subscriber lines and 6 fiber-optic
lines lost service.

To reduce the chance of a total system failure similar to the Hinsdale disaster, the telephone
companies have gradually migrated their central office structure from a spoke-and-hub approach (with its
obvious single point of failure) to a ring or mesh approach, in which multiple central offices are
connected to each other. In this scenario, calls are routed around the damaged central office in a manner
that is transparent to the caller. This is an ongoing process and may not yet have been completed in rural
areas or small towns. If possible, you want routing separation with the wire running from your facility
running to two different central offices on separate routes.

Route separation is more important than having multiple vendors. Most cable routes follow railroad
rights of way and the major carriers’ lines commonly converge at bridges. Imagine the number of places
there are to cross a major river, highway, etc. There aren’t many to choose from. The lines from various
companies often come together here and cross under the same portion of the bridge. What could go
wrong? Vehicles crossing bridges might catch on fire, river barges can break free and strike bridge
pilings, and major bridges are tempting terrorist targets.

Many telecommunication providers define cable separation as a distance of 25 feet or more between
cables. Others ensure cables are at least 100 feet apart and have at least a 2-foot separation at cable
crossovers. Ask your Internet, local, and long-distance telephone carriers how they define cable route
separation and how faithful they are to that standard. When using multiple vendors, even if their cables
are separated, they share a common weakness if they join at the same point of presence or cross a river
under the same bridge. Upon close investigation, you may even see that where you are using two
companies, one is leasing part of the same wire from your other provider!

What to do? Find out how many access points there are for your telecommunication service. Does it
enter the building at more than one place (again, trying to separate the cables)? Ask the
telecommunication provider to identify the route that the cable takes from their office to your facility.
Drive along this route and look for potential problems.

In terms of your long-distance provider, where is their point of presence located? Where is the point
of presence of their competitors? You can consider running a separate connection to a second point of
presence to provide for multiple circuits. Depending on the distance, this separate line may be quite
expensive to run and maintain.

MAP IT OUT Now that you know the cable route to the telecommunication provider, map it out—from
where it enters your wall to the provider’s point of presence. You might think, “Isn’t that the
telecommunication provider’s responsibility?” The answers is yes, but type up that excuse and paste it to
your wall in case the line is cut. Try it on your boss! If you drive to work using the same route as your

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

203

cable, you might see construction crews digging near where the cable is to repair a water main, snapped
poles, or sagging wire due to accidents or severe weather—any number of threats too close to the wire for
comfort.

So draw out a map of where your wire runs from the telephone pole outside to the telecommunication
provider’s office. Then make a detailed map of the run from the pole to the wall and on to the wiring
closet room. Indicate which lines terminate in equipment provided by the telecommunication provider
and which lines provide essential services.

DEVELOPING A PLAN

This sections describe issues to consider as your develop a disaster recovery plan for your
telecommunication assets.

What Are We Protecting?

As you know, the first step in building a plan is to make an inventory of your telecommunications assets.
You will assemble at least three lists. The first lists every major item and who to call if it breaks. The
second shows where the main cable runs are located in the building. The third includes all the telephone
numbers used by the facility. In a crisis, you may need to reroute some of these to another location.

1. Begin with a list of all major devices in your telephone switching room and network room, such as
the PBX, IVR, Internet server, etc. Include on this list:

A description of each device.
The serial numbers of the main equipment and any major components.
The name, 24-hour telephone number, contract number, and contract restrictions for whomever you
have arranged to service that item.

The location of every item, including a simple floor map of all telecommunication rooms.
Be sure to back up your entire configuration data, either on magnetic media or, if the file is small,
print it off and store it safely away.

2. Now make a wiring inventory of all the cable runs within your facility. You do not need to show
individual runs to the offices. In a crisis, you can always shift someone to another office. This is best
accomplished with computer-aided drafting software on a digitized version of the floor plan.

Indicate the runs on maps of each building or floor of a multistory building. Knowing the location
of these cable runs is important to quickly assess damage.

Indicate where the telecommunication service enters the building and its route to the telephone
exchange and network rooms. Note any hazards along the cable path.

If you have pay phones, indicate them on the map.
If you have any independent direct lines that bypass your telephone switch, mark them on the map.

3. Make up a telephone number inventory with all the telephone numbers assigned to your
building/facility.

DID (Direct Inward Dialing) lines.
Dedicated telephone lines that bypass your PBX, such as fax machines.
Pay telephones.
Foreign exchange lines.

Identify Critical Telephone Circuits

Telephones are used to communicate, and every part of the company uses them differently. In an
emergency, salespeople will need to contact customers, the warehouse will need to call suppliers, and the

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

204

people at the disaster site will want to call home. In a disaster, you won’t be able to please everyone right
away, so you need a restoration priority guideline to know which circuits to recover (or protect) and in
what order.

Start this analysis by reviewing the critical functions identified by your executives at the beginning of
the business continuity planning project. Next, meet with representatives from each department and ask
them to identify their critical telecommunications needs and at what point in the disaster
containment/recovery they would be needed. For example, early in the emergency, the Human Resources
team might need to notify employees to not report to work or to come in at certain times, etc. Each
department must prioritize its communications needs.

With your critical communications functions identified, you can determine which circuits support
these key functions. If you trace out the top three or four circuits, you may see some of the same
hardware, some of the same cables, and circuit paths common to them all. Working down your list, you
can see which of your hardware devices (or cable bundles) has the greatest benefit to being restored first.

You must also include in your plan how you will relocate operations to another site. This is a twofold
issue. In the first case, if a department must be moved off-site due to damage to its offices, you need to be
able to shift its inbound calls to the new location quickly. In the second case, if your telephone equipment
room is destroyed, you must quickly restore a minimal level of service. Some people might plan to build
a new telephone room onsite. Others will contract with a service company that will bring a telephone
equipment room to your site already set up in a large trailer. The company needs to plug into your wiring
(no trivial task). This could buy you time until your equipment room is rebuilt. If you elect to bring in the
configured trailer, this agreement must be made in advance, and be aware that these might not be
available in a wide-area disaster.

Review Network Security

Your network has some unique security issues to consider. The first is physical security. No one but the
network support staff should be allowed in the network closets. The network administrator must work
with the facility’s security manager to ensure that these doors are always locked. Depending on the
airflow in the closet, some vents may be added to the doors. These rooms are not for storing holiday
decorations, old documents, unwanted furniture, etc.

Sometimes network equipment is in freestanding cabinets instead of closets. These cabinets must be
locked just like the closets and the keys held by only a few people. In a large facility, this can add up to a
lot of keys. Your goal is to minimize the number of keys by using a submaster key for all closets and, if
possible, a submaster key to all cabinets. These keys must be tracked and, if lost, a determination made
whether to rekey the locks.

The next area to consider is logical security. The network software on the servers and in the network
devices will be, in some cases, password protected. These passwords should be protected like any other
and known only to the key network support staff. However, they should also be written down and locked
in the data processing manager’s office in case the network staff is unavailable. (In a wide-area
emergency, the network manager or key staff members may not be available or able to come in.)

Another issue to consider is having a policy that no one is permitted to plug anything into the
network. That is the exclusive job of the network support team. This is to stop people from plugging in
their home notebook PCs and bringing down your network. In addition, contract employees should never
be permitted to connect into your network. The same policy should cover your wireless nodes.

Because servers can support a large number of users and are used to host critical applications, loss of
a server can have a severe impact on the business. Some processes to ease the restoration of servers
include:

. Follow the procedures discussed in forStore Backup Tapes and Software Off-site Chapter 17
handling and storing backups. Backups of data and application installation media should be stored
off-site in a secure, environmentally controlled facility.

. Standard configurations of these items willStandardize Hardware, Software, and Peripherals
make restoration much easier. The standard configurations for hardware, software, and peripheralsC

op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

205

should be documented in your plan.
. The physical and logical network diagram must be kept up to date. TheDocument the Network
physical documentation should include a diagram of the physical facility, where the primary cables
are routed. The logical diagram shows the network nodes and how they interconnect. Both diagrams
are critical in restoring the network after a disaster.

. Maintaining up-to-date information on the vendors you use forDocument Vendor Information
hardware, software, and peripherals will make it easier to restore your operations.

. The more secure your systems are, the less likely you are to have aWork with Your Security Team
data loss due to a security breach. You’ll also want to be sure you can restore the latest level of
security if you must configure new servers after an emergency.

Telecommunications Mitigation Plan

With these risks in mind, along with due consideration of the identified priority circuits and equipment,
you can assemble a mitigation plan to reduce the likelihood of a threat or its impact if it occurs. The key
to telecommunications mitigation is redundancy. Redundancy in equipment in case one machine must be
repaired. Redundancy in communications routes in case a cable is severed or interrupted. Redundancy in
communication methods, such as radios, cellular telephones, or satellite communications, will provide at
least basic communications support.

CABLE MITIGATION PLAN Your cable mitigation plan should consider the following.
Multiple paths for the “last mile.” Investigate the path from your telecommunications equipment
room all the way to the central office or point of presence. Ask the telecommunications provider to
make another connection from your PBX or Internet server out through the wall of your building at a
point distant from the other exit point and on a different route to a different point of presence. Be
sure that your in-house wiring staff understands what you are asking for before they start.

Multiple paths for Interexchange Carriers. Investigate the path from your telephone equipment room
to your service provider’s point of presence. Avoid the same route as used for your local telephone
service.

From the pole to your wall, if the cable is underground, ensure it is clearly marked with “do not dig”
indicators or other obstacles to keep digging equipment away (or at least delay them until the cable
can be marked).

TELECOMMUNICATIONS EQUIPMENT ROOM MITIGATION PLAN Think of your
telecommunications room like a computer room. Their environmental and security needs are almost
identical.

. When you lose electrical power, this is what will keep yourUninterruptible Power Supply (UPS)
PBX or Internet server active until external power is restored or until your facility’s electrical
generators kick in. Conduct a power loss test to see which equipment is not connected to the UPS. If
the device isn’t essential or time consuming to restart, take it off the UPS. The fewer machines on the
UPS, the better. When conducting the power test, see how long the batteries can support the load. Be
sure the UPS is properly maintained by your service company.

When Hurricane Hugo struck the southeast United States in 1989, it surged hundreds of
miles inland with huge amounts of rain and wind. The storm downed so many trees that it
was days before electrical power was restored to some areas. Even the emergency batteries
at the central offices were eventually drained. So, do not depend solely on your UPS for
emergency power. Consider other power sources as well.

. This room requires the same fire protection as a computer room. An early-warning fire alarmFire
system and gas fire suppression system are highly recommended.C

op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

206

. This room is normally unattended. It should be locked as no one has any businessSecurity Access
strolling around in this room. The telecommunications system administrator can normally perform
his or her administration duties via terminal over the network.

. Although you can rarely select the room, a close inspection can identifyStructural Investigation
problems. Water pipes running along the walls or along the ceiling are a potential source of
problems. They may freeze or leak. They should be watched carefully. Consider installing a plastic
shield attached to a drain placed under them to catch condensation or leaks. A roof-mounted air
conditioner may cause a roof leak. External walls may stress the air-handling equipment as outside
temperatures heat or cool the walls.

. Your equipment must be in area that maintains a specific operatingTemperature Variability
temperature and humidity range for maximum life. This is also essential to maintaining your service
agreement coverage. Proper air conditioning, heating, and humidity control equipment must be in
service and well cared for.

. A bank of alarms will help youAlarms Such as Humidity, Temperature, Fire, and Electric
monitor the condition of the room. These alarms must sound within the room, as well as at the
security guard station as the room is normally unattended. Early detection will reduce the likelihood
of significant damage. Include automatic paging equipment for notifying the after-hours support
team of problems. Consider installing automatic shutdown software for your equipment. This signals
your hardware to shut itself down gracefully in the event of a problem.

. Like a computer, your switch and configurable devices need to back up theirData Backups
configuration data whenever these change. Store copies of these files in a secure off-site location.

. Keeping the door locked will help to prevent others from using the switch room as aHousekeeping
storage facility. Nothing should be stored in this room that does not pertain to the equipment. This
will reduce the amount of combustibles available to a fire. Do not let this room become another
storage or janitor’s closet.

ALTERNATE COMMUNICATIONS METHODS You must have written procedures for quickly
routing Interexchange Carrier traffic to the local telephone service in the event that your IXCs have major
problems. How hard is this to do, and then to switch it back again? Is it easier (or cheaper) to do this by
splitting the load across two carriers and then shifting the entire load to the functional carrier? Other items
to consider include:

Do you have company-owned cell phones and cellular modems for communicating when the
telephone system is inoperable?

Develop written procedures for how to work with the telephone company to shift specific telephone
numbers or all incoming calls to a different company site.

Do you have any employees who are ham radio operators? It is nice to have alternative channels
when the primary ones are not available. In a wide-area emergency, radio communications will also
get congested. Radio conversations are not secure and anyone with the proper equipment can listen to
them. Radio communication is also slow and not suitable for large data volumes.

Other communications alternatives include satellite and microwave, both susceptible to problems if
the antenna has been shifted by a storm or earthquake.

ACTION STEPS FOR YOUR PLAN

Most companies contract emergency recovery of their telecommunications equipment through the same
company that services their equipment. A problem arises if there is more than one service company
involved. An alternative is to arrange with a company to come onsite if your telecommunications
equipment room is destroyed and to set up a trailer adjacent to your building with a ready-to-go telephone
switch and Internet connection. To support this, companies often run cables to the front of the building
for a quick connection.

Another consideration is that pay phones do not route through your PBX. They are direct lines to theC
op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

207

outside service. So, if your PBX is inoperative, use pay phones. Keep a list of where they are and their
telephone numbers.

Cell phones are also common. Cellular traffic can be somewhat limited in a wide-area emergency as
the local cell towers become saturated with calls. Use this as an alternative communications channel, not
as the primary backup.

A branch office is an ideal place to shift operations to until the disaster-struck site is recovered. The
key is how far away the backup facility is located from the affected site. It should be far enough away to
be unaffected by the same disaster. There is no set mileage distance but it should at least be on a different
power grid and telephone central office. If it is too far, then you must also provide lodging for the
relocated people.

One problem is how to relocate to these sites in a crisis. If this is during an area-wide disaster, mass
transit such as air travel may be disrupted and driving there yourself may be difficult. One side aspect of
the attack on the World Trade Center in September 2001 was that it shut down all air travel. Companies
scrambling to activate their out-of-state recovery sites had a difficult time shifting key personnel and
material to the site. No one had planned for a complete shutdown of the air transportation system.

Things to consider after a disaster:
Do not make any unnecessary calls—only for emergencies.
When calling, you may need to wait several minutes to get a dial tone. Do not hit the switch hook,
because every time you do, you are placed at the end of the line for the next available dial tone.

When you receive a dial tone, quickly dial your number. In a time of low telephone service
availability, the dial tone is offered for a much shorter time.

TESTING

Plans are written by people with the best of intentions but, unless they are tested, you will never know
what you don’t know. Testing a plan exposes gaps and omissions in the process steps. It points out
incorrect emergency telephone numbers and emergency equipment that no longer exists.

Test exercises are a great way to train the people on what to do. The more that they practice their
emergency steps, the faster they will be able to perform them as they become familiar actions instead of
something new. Where possible, include your emergency service providers. They will be a great source
of recovery information.

When testing your plan, include some of your contract service providers. They may be able to point
out some gaps in your planning or things that can be done in advance to make their assistance to you flow
much easier.

Also, over time, your communications flows will shift. As this occurs, update your testing
accordingly.

CONCLUSION

The modern business relies on telecommunications to perform its role in the marketplace. Like any
resource, we need to be familiar with its role within our operation and how its absence will affect us.
Redundancy is your best defense against a disaster removing your ability to communicate with customers
and suppliers. A thorough understanding of the telecommunication requirements of your organization will
help you to design the most cost-effective plan to protect against its absence.

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:52 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

161

CHAPTER 13
TESTING YOUR PLANS

Test, Test, Test

Action is the foundational key to all success.
—Pablo Picasso

INTRODUCTION

Writing a recovery plan is only half of the challenge. The second half, the real challenge, is to
periodically test it. Everyone can relate to writing a recovery plan. “Testing” a plan sounds like you do
not trust it. Testing requires expensive technician time, the equipment and facility resources to conduct a
test, and the expertise to plan the exercise. Gathering all of this into one place can be difficult.

Arranging for expensive technician time was tough enough to secure for writing the plans. The most
knowledgeable people are usually the busiest. Getting them to give the time to sit down long enough to
test a plan is difficult— yet essential. Testing validates that a recovery plan will work. A plan that is
tested has a much higher possibility of succeeding over a plan that has never been proven. The many
benefits to testing include:

Demonstrating that a plan works.
Validating plan assumptions.
Identifying unknown contingencies.
Verifying resource availability.
Training team members for their recovery roles.
Determining the true length of recovery time, and ultimately the ability to achieve the desired
company RTO.

The Many Benefits of Plan Testing

Recovery plans are tested for many business reasons. An untested plan is merely process documentation.
Testing a plan ensures that the document provides the desired results. The benefits of testing include the
following.

TESTING REVEALS MISSING STEPS When people write a plan, they think about a process or IT
system, and then write the plan so that they will understand what is explained and the steps to take. In this
sense, the plan is a reflection of their experience. However, in a crisis, they may not be the person who
will execute the recovery. Further, some people cannot break down a process to include each of its
individual steps. In action, they will pick up on visual cues to take a specific action to fill a gap.

Therefore, the first purpose of a recovery plan test is to ensure that it includes all of the necessary
steps to achieve recovery. Missing steps are not unusual in the first draft of a recovery plan. Other
missing information may be IT security codes, the location of physical keys for certain offices or work
areas, or the location of vendor contact information.

TESTING REVEALS PLAN ERRORS Writing a plan sometimes introduces misleading, incorrect, or
unnecessary steps. Testing the plan will uncover all of these.

TESTING UNCOVERS CHANGES TO THE PROCESS SINCE THE RECOVERY PLAN WAS
WRITTEN A plan may have been sitting on the shelf for a period of time without review. Over time, IT
systems change server sizes, add disk storage, or are upgraded to new software versions. Business
processes move machinery and change the sequence of steps, and key support people leave the company.Co

py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.

EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

162

TESTING A PLAN TRAINS THE TEAM After a plan has been debugged, exercising it teaches each
recovery participant his or her role during the emergency. It is one thing to read the words on a page and
another to actually carry out the steps.

Types of Recovery Plan Tests

Exercises can consist of talking through recovery actions or physically recovering something.
Discussion-based tests exercise teamwork in decision making, analysis, communication, and
collaboration. Operations-based tests involve physically recovering something, such as a data center,
telephone system, office or manufacturing cell. This type of test uses expensive resources and is more
complex to conduct.

Everyone has their own name for the various types of testing. Tests are categorized by their
complexity in setting them up and in the number of participants involved. These tests are listed here in a
progression from least complex to most difficult to run:

is where the person who authored the plan reviews it with someone else with a

Standalone Testing

similar technical background. This may be the manager or the backup support person. This type of
testing is useful for catching omissions, such as skipping a process step. It also provides some insight
into the process for the backup support person.

involves everyone mentioned in the plan and is conducted around a

Walk-Through Testing

conference room table. Everyone strictly follows what is in the plan as they talk through what they
are doing. This also identifies plan omissions, as there are now many perspectives examining the
same document.

occurs when all of the components of an IT system (database,Integrated System Testing
middleware, applications, operating systems, network connections, etc.) are recovered from scratch.
This type of test reveals many of the interfaces between IT systems required to recover a specific IT
function. For example, this would be to test the recovery of the Accounting department’s critical IT
system, Human Resources IT system, the telephone system, e-mail, etc.

simulate a disaster but the response to it is conducted in a conference room. ATable-Top Exercises
disaster scenario is provided and participants work through the problem. This is similar to
Walk-Through Testing, except that the team responds to an incident scenario. As the exercise
progresses, the Exercise Coordinator injects additional problems into the situation.

take a Table-Top exercise one step further by including the actual recovery siteSimulation Exercises
and equipment. A simulation is the closest that a company can come to experiencing (and learning
from) a real disaster. Simulations provide many dimensions that most recovery plan tests never
explore. However, they are complex to plan and expensive to conduct.

Validating the Recovery Time Objective

Testing recovery plans ensure that it can achieve the required recovery time objective. Since plans are
tested in small groups, the actual RTO is determined by tracking the amount of time required to recover
each IT system and business process. These plans fit into an overall recovery sequence (developed by the
Business Continuity Manager). Once in this framework, the time required to complete each plan is added
up (many plans execute in parallel) to determine if the RTO can be achieved.

Is it a “test” or an “exercise”? A “test” implies a pass or fail result. An “exercise” implies
using something and is less threatening to participants.

WRITING A TESTING STRATEGY

Testing distracts an organization from its mission of returning a profit to shareholders. Everyone is busy
meeting his or her own company objectives. Somehow, time must be found within each department’sCo

py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

163

busy schedule to test its recovery plans. To maximize the benefit to the company while minimizing cost,
develop a testing strategy for your company. This strategy describes the type and frequency of test for
recovery plans. An executive-approved testing strategy provides the top-level incentive for management
compliance. The testing strategy is inserted into the administrative plan.

The testing calendar should reach out over several years. Keep in mind that different departments
have their own “busy season” and trying to test at that time will be difficult. For example, the Accounting
department will be occupied before and after the end of the company’s fiscal year. Payroll needs to
submit tax forms at the end of the calendar year, etc. By using an annual testing calendar, it is easier to
gain commitment from the various departments to look ahead and commit to tests on specific days.

Testing follows a logical progression. It begins with the individual plan. The next level is a grouping
of recovery plans to test together. This is followed by a simulation of some sort. Executives become
frustrated by the length of time required to properly test all of the plans in this sequence, but they are
more disturbed by the cost to test them faster.

Begin by Stating Your Goals

As with all things in the business continuity program, begin writing the testing strategy by referring back
to the Business Impact Analysis. If the recovery time objective is brief (measured in minutes or hours),
then the testing must be frequent and comprehensive. The longer the recovery time objective, the less
frequent and comprehensive the testing may be. Considered from a different angle, the less familiar that
the current recovery team is with a plan, the longer it will take them to complete it.

Another issue is the severity of an incident. While the overall plan may tolerate a long recovery time,
there may be specific processes whose availability is important to the company. This might be the Order
Entry IT system or a critical machine tool. Consider testing those few highly critical processes more
frequently than the overall plan.

The testing goal may be stated as, “Recovery plans are tested to demonstrate that the
company’s approved recovery time objective of (your RTO here) can be achieved” and that
all participants understand their roles in achieving a prompt recovery.

Progressive Testing

Testing follows a progression from simple to complex. Once a plan is written, it begins at the Standalone
Test level and progresses from there. Any process or IT system that is significantly changed must be
retested beginning at the Standalone level. The progression of testing is as follows:

is the first action after a plan is written. It reveals the obvious problems.Standalone Testing
exercises a group of related plans at the same time, conducted as a groupWalk-Through Testing
discussion.

tests a group of related plans at the same time by actually recoveringIntegrated System Testing
them on spare equipment.

test a group of related plans at the same time, based on an incident scenario.Table-Top Exercises
combine many groups for an actual recovery in the recovery sites, based on anSimulation Exercises
incident scenario.

Creating a Three-Year Testing Roadmap

Some tests only involve two people, while others can include most of the IT department. All tests require
preparation time. This is necessary to coordinate schedules for people, exercise control rooms, and
equipment. Copies of plans must be printed and distributed and exercise scenarios created. At a
minimum, every plan should be tested annually. This can be accomplished by the manager and the
process owner performing a Standalone Test to see if anything significant has changed in the process.Co

py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

164

Few companies halt operations for several days to conduct a complete disaster simulation. Instead,
they test “slices” of the recovery program. For example, the test might focus on a recovery of the
Operations department or Shipping department. On the IT side, this would be a group of related systems
that regularly exchange information, such as order entry, materials management, and billing.

Too much testing can reduce interest in the program. Practically speaking, testing is a preventative
measure (all cost and no immediate payback) and does not increase a company’s revenue. Depending on
the industry, testing may never progress beyond the table-top exercise stage. The Business Continuity
Manager works with the program sponsor to identify the adequate level of testing for the organization and
then spreads it throughout the year.

When developing a testing calendar, executives will vent their frustration. They will want something
that is written, tested, and then set aside as completed. They do not like to consider that completed plans
must continue to be exercised regularly. There are many plans and combinations of plans to test: business
processes, IT systems, work area recovery, pandemic, etc. A typical testing schedule includes:

Quarterly

Inspect Command Center sites for availability and to ensure their network and telecommunication
connections are live.

Data Backups

Verify that data backups (on each media type) are readable.

Ensure that every disk in the data center and key personal computers are included in the
backups.

Inspect safe and secure transportation of media to off-site storage.

Inspect how the off-site storage facility handles and secures the media.

All business process owners verify that their employee recall lists are current.

Issue updated versions of plans.
Annually (spread throughout the year)

Conduct an IT simulation at the recovery site.

Conduct a work area recovery simulation at the recovery site.

Conduct a pandemic table-top exercise.

Conduct an executive recovery plan exercise with all simulations.

Review business continuity plans of key vendors.

All managers submit a signed report that their recovery plans are up to date.

Practice a data backup recall from the secured storage area to the hot site.

A partial plan exercise calendar might look like .Figure 13-1

TESTING TEAM

Testing a recovery plan is a team effort. The best results come from a clear explanation of the
responsibilities of team members and some training to show them what to do. This enables each person to
contribute expertise to the exercise while learning by doing. Training for individual team members is the
responsibility of the Business Continuity Manager.

The duties for each of the testing team members will vary according to the type of test, with a full
disaster simulation requiring the most time from everyone. Possible team duties include:

Business Continuity Manager

Develops a long-term testing calendar, updated annually.

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

165

Develops or updates a testing strategy.

FIGURE 13-1: Example of a three-year testing calendar.

Schedules tests.

Prepares test areas and participant materials.

Explains testing process to team prior to start of exercise.

Presents scenario.

Logs events during the exercise.

Keeps exercise focused for prompt completion.

Injects variations to scenarios during simulation testing.

Conducts after-action critiques of recovery plan and a separate discussion of the test process.

Provides a written test report to the program sponsor.
Sponsor

Reviews and approves recovery plan test calendar and testing strategy.

Approves initiation of all tests.

Provides financial support for tests.

Ensures internal support of test program.

Observes tests in progress.

Reviews written report of test results and team critique.
Exercise Recorder

Records actions and decisions.

Records all assumptions made during the test.

Drafts narrative of what happened during the test for the after-action review.C

op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

166

Exercise Participants

Prepare for the test by reviewing the recovery plans.

Participate in test by following the plans.

Offer ways to improve the recovery plans during the test.

Participate in the after-action critique of the recovery plans and the testing process.
Nonemployee Participants

Where practical include people from other organizations who have a stake in your plans, such as
the fire and police departments, power company.

News reporters to report on the exercise and to participate by exercising your corporate
communications plan.

Visitors, such as customer or supplier representatives.

EXERCISE SCENARIOS

A disaster scenario is a hypothetical incident that gives participants a problem to work through. The
scenario may describe any disruption to the normal flow of a business process. The scenario should be
focused on the type of problem that a particular group of people may face. For example, the problem and
its mid-execution “injection of events” should encompass all participants.

Every simulation starts with a scenario, a hypothetical situation for the participants to work through.
Scenarios that reflect potential threats also add an air of reality to the exercise. A good place to look for
topics is in the plan’s risk analysis section or program assumptions. Another place to look is in the recent
national or local news.

For example, who has never experienced a power outage or a loss of data connectivity? How about
severe weather like a hurricane or a blizzard? Or consider tornados and earthquakes. Human-created
situations, such as fire, loss of water pressure, or a person with a weapon in the building, are also
potential scenarios.

The planning expertise of the test coordinator is crucial. The coordinator must devise an exercise
schedule to include a detailed timeline of events, coordinate and place the resources involved (people,
equipment, facilities, supplies, and information), establish in everyone’s minds their role, and identify
interdependencies between individuals and groups.

In theory, a business faces a wide range of threats from people, nature, infrastructure, etc. In reality,
few of these will occur. Some are dependent on the season and changes in the political environment.
Whatever the crisis, the recovery steps for many threats are the same. A data center lost to a fire is the
same as a data center lost (or made unusable) because of a collapsed roof, a flood, etc. In each of these
cases, there will be many steps unique to that event. However, the initial actions in each case will be the
same. It is this similarity that enables disaster recovery planning. A disaster plan is most useful in the first
few hours when there is limited information, but the greatest benefit from containing the damage and
restoring minimal service to the company.

Include in the scenario the incident’s day of week and time of day. The weekend response will differ
from the work-time response. Consider declaring the scenario to include the company’s worst time of the
year (such as the day before Christmas for a retailer). Also, the severity of the damage can at first appear
to be small and then grow through “injects” provided by the exercise controller. Consider the example of
a small fire. When a large amount of water was sprayed on the fire, it ran down the floor and saturated the
carpet in the nearby retail show room. It also leaked through the floor into the data center below, soaking
the equipment.

Ask the program sponsor to approve the scenario used in an exercise. This will minimize
participant discussion during its presentation. It will also help to avoid scenarios that
executives feel are too sensitive.

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

167

Some potential testing scenarios might be:
Natural Disasters

Hurricane or heavy downpour of rain

Tornado or high winds

Earthquake

Flood

Pandemic

Fire

Severe snow or ice
Civil Crises

Labor strike (in company or secondary picketing)

Workplace violence

Serious supplier disruption

Terrorist target neighbor (judiciary, military, federal, or diplomatic buildings)

Sabotage/theft/arson

Limited or no property access
Location Threats

Nearby major highway, railway, or pipeline

Hazardous neighbor (stores or uses combustibles, chemicals, or explosives)

Offices above 12th floor (limit of fire ladders)

Major political event that may lead to civil unrest
Network/Information Security Issues

Computer virus

Hackers stealing data

Data communication failure
Data Operations Threats

Roof collapse (full or partial)

Broken water pipe in room above data center

Fire in data center

Critical IT equipment failure

Environmental support equipment failure

Telecommunications failure

Power failure

Service provider failure

Loss of water pressure which shuts down chilled water coolers

Select scenarios so that the problem exercises multiple plans. Choosing the right scenario can engage
the participant’s curiosity and imagination. It converts a dull exercise into a memorable and valuable
experience for its participants. A good scenario should:

Be realistic—no meteors crashing through the ceiling.Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

168

Be broad enough to encompass several teams to test their intergroup communications.
Have an achievable final solution.
Include time increments, such as every 10 minutes equals one hour, etc.
Prior to the exercise, draft the scenario as a story. It begins with an initial call from the alarm

monitoring service with vague information—just like a real incident. To add to the realism, some people
will use a bit of photo editing to illustrate the scene. Imposing flames over the top of a picture of your
facility may wake some people up!

As the exercise continues, the Exercise Coordinator provides additional information known as
“injects.” This predefined information clarifies (or confirms) previous information and also raises other
issues incidental to the problem. For example, if there was a fire in the warehouse, an inject later in the
exercise may say that the warehouse roof has collapsed injuring several employees or that the fire
marshal has declared the warehouse to be a crime scene and the data center is unreachable until the
investigation is completed in two days.

Injects, like the scenario, must make sense in the given situation and may also include good news,
such as workers missing from the warehouse fire are safe and have been found nearby. Unplanned injects
may be made during the exercise if a team is stumped. Rather than end their portion of the test, state an
assumption as fact. For example, the Exercise Coordinator could state that, “The data center fire was
concentrated in the print room and no servers were damaged.”

Try to insert some humor into a tragic situation. For example, state that a fire started by a
lightning strike in the boss’s office or the Board of Health condemned the food vending
machines or no one can enter the building.

TYPES OF EXERCISES

There are various types of recovery plan tests. They range from easy to set up and quick to complete to
full simulation requiring months of planning. Each plan starts with Standalone Testing. Unfortunately,
many companies never test their plans in a full simulation.

Standalone Testing

Standalone Testing is the first level of testing for all recovery plans. It is also required when a significant
change has been made to the IT system of a business process.

Standalone Testing exercises individual IT components or business processes to estimate the time
required for recovery. It provides the first level of plan error checking. The scenario of a Standalone Test
is to recover an individual IT component or business process from nothing. (It assumes the process or IT
system has been destroyed or rendered totally unusable.) A recovery plan is written so that someone other
than the primary support person can understand and follow it. It also familiarizes at least one other person
with the plan’s contents.

Recovering business processes often requires that many plans work together. Standalone
Testing examines the individual building blocks of the overall effort. Later tests examine
the interactions and interfaces among the individual plans.

The result of a Standalone Test should be a recovery plan that is in the company standard format. This
ensures that anyone unfamiliar with this process can find the same type of information in the same place.
The plan should be approved as complete and accurate by the plan’s author and plan reviewer. The plan’s
author also provides a time estimate as to how long the recovery plan should take to complete.

PREPARATION Schedule a conference room away from an office’s distractions. If the document isCo
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

169

large, break it into one-hour meetings to keep everyone fresh.

MATERIALS TO PROVIDE A copy of the standard plan format and copies of the plan for each
participant.

TESTING TEAM Consists of the document author and a reviewer (may be the backup support person or
the process’s owner).

THE MEETING AGENDA The meeting agenda should be as follows:
Review ground rules.

This is a draft document and anyone can suggest changes.

Suggesting a change is not a personal attack.

All comments are focused on the document and not on the author.
Review document for:

Proper format.

Content.

Clarity.
Estimate time required to execute each step in this plan and the plan overall.
Set time for review of changes suggested by this test.

FOLLOW UP Continue Standalone plan reviews until the document conforms to the company standard
format and the participants believe that the document reflects the proper recovery process.

Integration Testing

Integration testing (or Integrated System Testing) exercises multiple plans in a logical group. This might
be an IT system with its interdependent components (a database server, an application, special network
connections, unique data collection devices, etc.).

The purpose of an Integration Test is to ensure that the data exchanges and communication
requirements among individual components have been addressed. These interdependent components
require each other to provide the desired business function. This type of test is normally used by IT
systems. For example, the Order Entry system may require access to multiple databases, files, and
applications. To test the recovery of the Order Entry system, all of the three other components must be
recovered first.

The ideal place to execute this plan is at the IT recovery hot site that the company will use in a crisis.
If that is not available, then use equipment that is as close in performance and configuration to the hot site
as practical. This will help to identify differences between the hot site and the required data center
configuration.

Another advantage to using the hot site is to provide an actual recovery time for validating the
recovery time objective (RTO). This result is added to other test results to see what the company can
realistically expect for a recovery time, given the current technology.

Integration testing is usually conducted by the backup support person(s) for each recovery plan. The
Business Continuity Manager observes the test and records the actual time required to recover the IT
system or business process.

In most IT recoveries, the server administrator builds the basic infrastructure and then
provides it to the recovery team. For example, an operating system is loaded onto “blank”
servers and then turned over to the recovery team. The time to prepare these devices is
part of the RTO calculation.

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

170

PREPARATION Schedule time at the hot site or the use of equipment in the data center.

TESTING TEAM The testing team should consist of the following people:
Backup support person(s) for each device to be recovered.
Network support technician to isolate the test network from production and to load the DNS server.
System administrator to load the operating systems and establish the domain controller.
Applications support team to load and test their systems.
Optionally, a database administrator.
Reviewer (IT Manager or Business Continuity Manager).
Business process owner to validate a good recovery.
Business continuity program sponsor.

MATERIALS TO PROVIDE These include the following:
The business continuity program sponsor approves timing of test and required funding.
Nonproduction (spare) IT equipment, based on the list of required equipment as detailed in the
recovery plans.

Copies of the recovery plans to be tested.

THE TEST PROGRAM The program should include the following:
Review ground rules.

Write down all corrections as they are encountered.

Record the amount of time required to complete each step in the plan and the total in the plan.
This may isolate steps that take a long time as targets for improving the speed of the recovery.

Given the amount of time required to set up an Integration Test, if time permits, rerun it after the
plans have been corrected.

The support team (network, database, systems administrators) remains close at hand to address
problems after the applications recovery begins.

Prepare for the test.

Set up a network that is isolated from the world since some applications may have embedded IP
addressing.

Conduct the test.

Set up the infrastructure.

Set up required infrastructure, such as DNS and domain controllers.

Load a basic operating system on the recovered servers.

Provide adequate servers and disk storage space.

Using the recovery plans, follow each step.

Note all corrections.

Using these corrections, restart the test from the beginning.

Once the system is ready:

Applications support runs test scripts to ensure the system has been properly recovered.

A business process owner validates that it appears to function correctly.
Review the results.

Update any plans as required.Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

171

Determine a realistic recovery time for this process.
Conduct the after-action review.

Identify plan improvement needs.

Identify areas to research to reduce the recovery time.

Identify improvements in the testing process.

Business Continuity Manager writes a report of test results to the program sponsor.

Set time for review of changes.

FOLLOW UP Collect all plan corrections and reissue updated documents. If a plan required significant
changes, then it should be reviewed in a Standalone Test before using it in another Integration test.

Just as recovery plans are exercised, so is your ability to plan and conduct a test. After the
plans are updated, ask the participants to review the planning and testing processes for
ways to improve them.

Walk-Through Testing

The purpose of a Walk-Through Test is to test a logical grouping of recovery plans at one time. It is
similar to an Integration test except that no equipment is involved and the recovery is theoretical. A
Walk-Through recovery plan exercise familiarizes recovery team members with their roles. It is useful for
rehearsing for an Integration test, for testing when an Integration test is not practical, and for reviewing
business process recovery plans.

Integration testing is valuable since it involves an actual recovery. A Walk-Through also provides
many benefits, but without the expense of actually using equipment.

Walk-Through recovery plan testing is conducted in a conference room. Participants explain their
actions as they read through the recovery plan. The goals are to improve plan clarity, to identify gaps in
the plans, and to ensure that all interfaces among individual plans are addressed. These interfaces may be
the passing of data from one IT component to another or the passing of a document between workers.

A Walk-Through Test does not provide a real RTO for the collective plans. However, estimates may
be provided by the recovery team members.

PREPARATION Schedule time in a conference room.

TESTING TEAM The testing team should consist of the following people:
Backup support person(s) for each plan to be recovered.
Exercise Coordinator (IT Manager or Business Continuity Manager).
Business process owner.
Exercise recorder to capture action, decisions, and assumptions as they occur.

MATERIALS TO PROVIDE Include copies of the recovery plans to be tested.

THE TEST PROGRAM The program should include the following:
Conduct the test, following the recovery plans.

Set up the infrastructure.
Set up required infrastructure, such as DNS and domain controllers.
Load a basic operating system on the recovered servers.

Provide adequate servers and disk storage space.Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

172

Note all corrections.
Review the results.

Update plans as required.

Team members are now more familiar with their recovery roles.
Conduct the after-action review.

Identify plan improvement.

Identify areas to research to reduce the recovery time.

Identify improvements in the testing process.
Business Continuity Manager writes report of test result to program sponsor.
Set time for review of changes.

FOLLOW UP Collect all plan corrections and reissue updated documents. If a plan requires significant
changes, then it should be reviewed in a Standalone Test before using it in another Walk-Through
exercise.

Simulations

Up to this point, all tests have been based on recovering a business process or IT system from scratch.
The reasoning is that if the plan has adequate information to recover from nothing, then it will have the
information necessary to recover from a partial failure. However, it is this partial failure that is more
common.

A simulation test brings all of the plans together. In a real crisis, rarely is the recovery isolated to a
single plan. IT systems recover the data center, work area recovery plans recover office processes, and the
supporting plans for Human Resources, Corporate Communications, Facilities, Security, and a range of
other departments are all in play. A simulation not only invokes these many plans but forces them to work
together toward the common goal.

A simulation test begins with a scenario (such as a partial roof collapse from a severe storm or a
person entering the building with a gun). In both of these examples, most of the facility is intact yet may
be temporarily disabled.

Simulation adds to plan exercises the elements of uncertainty, time pressure, and chaos. No situation
comes with complete and verified information, yet managers must react correctly to minimize damage to
the company. Chaos comes from inaccurate and incomplete information yet decisions must be made.
Unlike the smooth pace of a Walk-Through Test, simulations add to the element of chaos in which events
surge forward whether someone is ready for it or not.

Simulation tests can be simplistic, such as Table-Top exercises. They can also be complex (and
expensive), such as relocating the entire data center or work area to the recovery site and running the
business from there. Most simulations only address a portion of the company, usually a group of related
business processes. This keeps the recovery team to a manageable size and the recovery exercise focused
on a set of plans.

Make it fun! Send out pre-exercise announcements as if they were news elements related to
the scenario (clearly marked as exercise notices for training only). At the beginning of the
exercise, state the goals to instill a sense of purpose in the group. At the end of the exercise,
restate the goals and ask the group how well it measured up. After all, these people gave up
some hours of their lives, so show them how important it was to the company!

Simulations can also add the dimension of external agencies to the recovery. Firefighters, reporters,
police officers, and other emergency groups can be invited to keep the chaos lively while educating
participants of each agency’s role in a crisis. The Exercise Coordinator may also include employees at

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

173

other company sites via conference call.
The purpose of the exercise is to validate that the plans are workable and flexible enough to meet any

challenge. Participants will depend on the plan to identify actions to take during an incident. (However,
just as in a real crisis, they are free to deviate from them.) Notes will be collected, and the plan will be
updated as a result of the exercise. Participants will note areas for improvement such as corrections,
clarity, content, and additional information.

There is no “right” answer to these exercises. The goal is to debug the plans and seek ways to make
them more efficient without losing their flexibility (since we never know what sorts of things will arise).
“Rigging the test” to ensure success should not be done. Conducting the exercise at the recovery site will
minimize distractions from electronic interruptions.

About one week before the exercise, verify that participants or their alternates are available. This is
also a good time to rehearse the exercise with the testing team and to handle minor administrative tasks
such as making copies of plan and tent cards identifying participants and their roles.

People will react in different ways. If someone on the team will be declared injured or
killed during the exercise, ensure that they agree to this prior to the start of the exercise.

Table-Top Testing

A Table-Top Test is a simulated emergency without the equipment. It exercises decision making:
analysis, communication, and collaboration are all part of the plan. Table-Top exercises test an incident
management plan using a minimum of resources. The size of the incident is not important. It is the fog
within which early decisions must be made until the situation becomes clearer.

A Table-Top exercise tests a logical grouping of recovery plans with a realistic disaster scenario. One
or more conference rooms are used to control the recovery. The primary difference between a
Walk-Through Test and a Table-Top Test is that a Table-Top Test uses a scenario, midexercise problem
injections, and, often, external resources.

A Table-Top exercise is much less disruptive to a business than a full simulation. A Table-Top
exercise typically runs for a half day, where a full simulation can run for several days.

The goals are to train the team members, identify omissions in the plans, and raise awareness of the
many dimensions of recovery planning. Each participant uses the recovery plan for guidance but is free to
choose alternative actions to restore service promptly. The goals are to improve plan clarity, to identify
gaps in the plans, and to ensure that all interfaces between individual plans are addressed. These
interfaces may be the passing of data from IT component to another or the passing of a document
between workers.

A Table-Top Test does not provide a real RTO for the collective plans. However, estimates may be
provided by the recovery team members.

The exercise coordinator keeps the group focused on the test. It works best if someone else
is designated as the exercise recorder. The recorder writes down the events, decisions, and
reactions during the exercise, freeing the exercise coordinator to work with the team. These
notes are valuable later when considering ways to improve the recovery plans and the
Table-Top exercise process.

PREPARATION Schedule a conference room.

TESTING TEAM The testing team should consist of the following people:
Backup support person(s) for each plan to be recovered.
Exercise Coordinator (IT Manager or Business Continuity Manager).
Exercise recorder.

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

174

Business process owners.
External resources, such as news reporters, firefighters or police officers.

MATERIALS TO PROVIDE Materials to provide include:
Copies of the recovery plans to be tested.
Scenario and incident “injects.”
A clock projected by a PC onto a whiteboard.

THE TEST PROGRAM The program should include the following:
Explain to participants the rules for the exercise:

Time is essential; decisions must be made with incomplete information.

Everyone must help someone if asked.

Everyone takes notes for the after-action critique.

No outside interruptions are permitted—cell phones off.

If an issue is bogging down the exercise, the Exercise Coordinator can announce a decision for the
issue or set it aside for future discussion.

Introduce each of the team members and explain their role in the recovery.
State the exercise goals (familiarize the team with the plan, gather RTO data, improve the plans, etc.).
Introduce the scenario to the team.

Clarify group questions about the situation.

Ensure everyone has a copies of the appropriate plans.
Conduct the exercise.

Select several of the primary recovery team members to step out of the exercise; their backup
person must continue the recovery.

Inject additional information and complexity into the exercise every 10 minutes.

End the exercise at a predetermined time, or when the company is restored to full service.
Conduct the after-action review.

Identify plan improvements.

Identify areas to research to reduce the recovery time.

Identify improvements in the testing process.
Business Continuity Manager submits a written report of the test result to the program sponsor.

FOLLOW UP Collect all plan corrections and reissue updated documents.

Two types of plans are best tested as Table-Top exercises. A crisis management plan is
easily tested in a conference room. The types of actions required can be discussed rather
than acted out. A pandemic can range over 18 months, so a full simulation is not practical.
Both can be conducted in the Command Center for additional realism.

Disaster Simulation

The purpose of a Disaster Simulation is to test a logical grouping of recovery plans with a realistic
scenario. Essentially, a Disaster Simulation is a simulated emergency that includes the people and
equipment necessary to recover IT equipment or a wide range of business processes. Running a
simulation is expensive in time and equipment, so it should be approved far in advance. A simulation mayCo

py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

175

be disruptive to a company’s normal business and should be planned for the company’s slow time of the
year. It may run for several days.

The goals are to train the team members, identify omissions in the plans, and to raise awareness of the
many dimensions of recovery planning. A key advantage of a simulation is that it provides the actual time
required to recover a process. It also adds in the pressure of chaos to the recovery.

The Exercise Coordinator keeps the group focused on the test. Appoint an exercise recorder for each
recovery team. That person writes down the events, decisions, and reactions during the exercise. These
notes are valuable later when considering ways to improve the recovery plans and the exercise process.

Always preannounce a simulation; there should be no surprise alerts. Before engaging outside
participants, the DR core team should perform the simulation exercise, as a dress rehearsal to “polish” the
sequence of events.

Real tests provide the most realistic results. Avoid the temptation of the IT team to make a
“special” set of backup media just for the test. The true recovery time comes from sifting
through the many backup tapes to find the files that you need.

A simulation begins with the initial incident alert by the night watchman or by an alarm that
automatically alerts a manager. Full-scale testing involves pulling the plug on some part of the operation
and letting the disaster recovery plan kick in. For obvious reasons, this is rarely done.

Simulation tests should be conducted at the recovery site at least once per year. Recovery plans are
used as guidelines, but participants are free to deviate from them. The goals are to improve plan clarity, to
identify gaps in the plans, and to ensure that all interfaces between individual plans are addressed. These
interfaces may be the passing of data from one IT component to another or the passing of a document
between workers.

PREPARATION Schedule a conference room. Ensure the participants understand the exercise is a
rehearsal and not a test. A rehearsal allows people to play out their actions; a test implies pass or fail. For
each recovery team:

Create a log sheet to document the communication among recovery teams (see Form 13-1 on the
CD).

Create an observation log (see Form 13-2 on the CD).

TESTING TEAM The testing team should consist of the following people:
Backup support person(s) for each plan to be recovered.
Exercise Coordinator (IT Manager or Business Continuity Manager).
Business process owners.
Exercise recorder.
External resources, such as news reporters, firefighters or police officers.

MATERIALS TO PROVIDE Materials to provide include:
Copies of the recovery plans to be tested.
Scenario and incident “injects.”
A clock projected by a PC onto a whiteboard.
Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

176

FIGURE 13-2: RTO Hour-by-Hour Recovery Plan for a Data Center.

THE TEST PROGRAM The program should include the following:
Explain to participants the rules for the exercise:
Time is essential; decisions must be made with incomplete information.
Everyone must help someone if asked.

Everyone should take notes for the after-action critique.

No outside interruptions are permitted—cell phones off.

If an issue is bogging down the exercise, the exercise coordinator can announce a decision for the
issue or can set it aside for future discussion.

Introduce each of the team members and explain their role in the recovery.
Introduce the scenario to the team.

Clarify group questions about the situation.
Conduct the exercise.

Select several of the primary recovery team members to step out; their backup person must
continue the recovery.

Inject additional information and complexity into the exercise every 10 minutes.
End the exercise at a predetermined time, or when the company is restored to full service.
Conduct the after-action review.
Identify plan improvements.
Identify areas to research to reduce the recovery time.
Identify improvements in the testing process.

Collect RTO metrics.
The Business Continuity Manager gives a report of the test result to program sponsor.Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

177

Set time for review of changes.

FOLLOW UP Collect all plan corrections and reissue updated documents. If a plan required significant
changes, then it should be reviewed in a Standalone Test before using it in another Integration test. In
addition, update the RTO Hour-by-Hour Recovery Plan.

SOMETIMES NATURE TESTS THE PLANS FOR YOU

There are numerous incidents that pop up from time to time that are not significant emergencies but that
provide an opportunity to test parts of a plan. For example, if there is a power outage at work, use the
plans to minimize the disruption. Do the same for a loss of data communications, a tornado warning, or a
snow storm emergency. Relocating a business process or significant portion of the data center is similar
to a disaster.

Another opportunity that can trigger a test plan is facility construction. An example of this is if the
electricity to a building must be turned off for work on the power main. Use the recovery plans to locate
and turn off all of the equipment, noting anything found that was not in the plan. When the work is over,
use the plans to turn back on all of the equipment. Then test each critical system to ensure it is
operational. Following the plans for restarting equipment may uncover equipment tucked away in offices
or closets that are not in the plan.

Relocation to a new facility is a great opportunity to completely test your disaster recovery plan.
Many of the activities necessary during relocation are the same as those required in a disaster: new
machines may need to be purchased, servers are down for some period of time, new communications
infrastructure needs to be built, data need to be restored, etc. In fact, if a relocation project is not done
properly, it may turn into a real disaster!

Whenever such a problem occurs:
Focus people on referring to their recovery plans. The value of a plan is to reduce chaos at the
beginning of a crisis. Plans are no good if no one uses them.

Begin recording what has occurred and people’s reaction to it. These notes are used to improve our
plans (and never to criticize anyone).

Conduct an after-action review the next day to gather everyone’s perspective.
A plan that is used for a real event has been tested just as surely as a scheduled exercise. Mark that
plan as tested for the quarter!

Whenever a significantly disruptive incident occurs, such as a power outage, loss of external network,
or a computer virus outbreak, begin taking notes during the event. These notes should be a narrative of
times and actions taken—who did what, when, and the result. See if anyone thought to break out the
appropriate recovery plans and follow them.

Within two working days after the incident, convene a group to conduct an after-action review. This
review is intended to capture everyone’s perspective of the incident to improve plans for future use.

Debriefing Participants Using an After-Action Review

Whenever an incident occurs (such as a power outage, fire in the computer room, etc.) that
is covered by a recovery plan (or should have been covered by a plan), conduct an
after-action review on the next work day after the recovery. This is an open discussion of
the event and how to improve future reaction.

Someone is appointed as the review coordinator (usually the Business Continuity
Manager). It is helpful if someone else records the discussions, so that the review
coordinator is free to focus on the discussion.

What happened – it is important to gain agreement on what occurred, as further

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

178

discussion is based on this. Each person will define the problem from his or her own
perspective. Sometimes agreement on a point takes a lot of discussion.

What should have happened – this is where positive things are listed, such as the
recovery plan was easy to find, etc.

What went well – not all is doom and gloom. Now that the crisis has ended, take credit
for the things that went well. Acknowledge those people who contributed to the recovery.

What did not go well – this is the substance of the review. Once you list what did not
work out, you can move to the last step. Take care never to personalize the discussions.
Focus on the action and not on a person. Otherwise, people become defensive and no one
will participate in the discussion.

What will be done differently in the future – list the solution to each item identified in
the previous step. Assign action items to specific people each with a due date.

An example after-action report for a power outage:

What happened – The power for the building went out and everyone stopped working. The
office people flooded out to the factory because there was light there through the windows.
People milled around outside of the data center to see if they could help. The emergency
lights failed in most of the offices and everyone was in the dark.

What should have happened – The emergency lights should have worked. Everyone
should have known where to meet for further instructions.

What went well – Nobody panicked. The UPS system kept the data center running until
power was restored.

What did not go well – No one knew what to do. Different people were shouting out
different directions, trying to help but really confusing everyone. We could not find the
system administrators in case the servers needed to be turned off.

What will be done differently in the future – We will identify assembly areas for
everyone. Supervisors will be responsible for finding out what has occurred and passing it
on to their people. The emergency lights will be checked monthly.

DEMONSTRATING RTO CAPABILITY

During the Business Impact Analysis, an RTO was established by the company. It was selected based on
the impact to the company, not on what the company was capable of doing. Testing recovery plans and
recording the time is the place where the company proves it can meet the RTO. If not, something must
change to meet it.

Some RTOs are obvious. A company that expects to recover from tape backup requires days. If the
organization requires recovery in a few hours, then stop the testing and rework the data storage strategy.
However, if a company has a reasonable strategy based on its RTO, then only testing can prove if this is
achievable or not.

Figure 13-2 shows the first page of a possible RTO Hour-by-Hour Recovery Plan for a data center. (A
similar chart should be built for work area recovery.) This chart collects the recovery times from plan
exercises. In the IT world, most recoveries must wait until the basic infrastructure is in place (network,
firewalls, DNS, domain controllers, etc.). The plan for recovering each infrastructure component is placed
in sequence at the top of the chart. Below that are the applications, databases, etc., that must be recovered
in the appropriate sequence. For example, a LAN recovery must be in place before the domain controller
can be recovered.

Use this basic plan as an outline for building your own recovery timeline. As you enter the times from
actual system recoveries, you can prove or disprove the company’s ability to meet its desired RTO.
Actual recovery times are always preferred to estimated values.C

op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

179

If the RTO is not achievable or if you wish to shorten it, use this plan to identify places to make
changes. Look for tasks that could run in parallel instead of sequentially. Look for the ones that take a
long time and seek ways to shorten them (use faster technology or redesign the process). Of course, the
best way to reduce the time required is to eliminate noncritical steps.

During a recovery, company executives can use the RTO Hour-by-Hour Recovery Plan to follow
along with the recovery’s progress. Based on where the team is in the recovery, they can look at the times
and see how much longer before (for example) the e-mail system should be available, or that the billing
system should be operational.

CONCLUSION

No plan can be called complete until it has been tested. Beyond the initial testing, ongoing testing is
critical to ensure that the plan is kept up to date. As the organization grows and evolves, the plan must be
updated to incorporate the necessary changes. Periodic testing validates these changes and keeps
everyone aware of their responsibilities when a disaster strikes.

There are different types of tests from simple one-on-one Standalone Tests to full simulated disasters.
Tests should follow a progression from simple tests to complex. Trying to jump too quickly into
simulations will result in people sitting around while muddled plans are worked through. Participants will
conclude that the tests themselves are the disaster.

The people participating in the tests are a valuable source of information. After each exercise,
promptly gather their ideas in an after-action meeting. They should advise the Exercise Coordinator of
ways to improve the plans, communications among the testing teams, and everything that can speed a
recovery. In a separate meeting, ask them to critique the testing process. This will improve their
participation and cooperation in the future, as well as make your tests run smoother.

There are times when company activities or Mother Nature tests your plans for you. Immediately
focus everyone on using their plans. After the event has passed, pull everyone together for an after-action
meeting to collect their ideas. (This is also a great time to slip in a plug for the value of plans when
disaster strikes.)

The outcome of each test should be used to update an RTO Hour-by-Hour Recovery Plan. It is one
thing for a company to declare an RTO but that chart illustrates whether it is likely or not.

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

180

CHAPTER 14

ELECTRICAL SERVICE

Keeping the Juice Flowing

Nothing shocks me. I’m a scientist.
—Harrison Ford, as Indiana Jones

INTRODUCTION

This chapter is provided so you will have a basic understanding of electrical power support for your
critical equipment. Use the information as a background when talking to your facility electrical engineer
and your UPS supplier. While you’ll not want to work on high-voltage circuits or equipment yourself
(leave that to trained professionals), an overall knowledge of how your electrical systems work will help
you to write a better plan.

ELECTRICAL SERVICE

Imagine a business in which if you create too much product it is immediately lost forever. A business
where people only pay for what they use, but demand that all they want be instantly available at any time.
A product they use in varying amounts throughout the day. A product that requires an immense capital
investment but is sold in pennies per unit. Welcome to the world of electricity. Electrical service is so
reliable, so common, that people take it for granted that it will always be there whenever they want it.
Electricity is an essential part of our everyday existence. Few businesses could run for a single minute
without it.

Side-stepping the issue of the huge effort of the electric company to ensure uninterrupted service, let’s
consider the impact of electricity on our business. Without a reliable, clean source of electric power, all
business stops. We have all experienced an electrical blackout at some point. When we add together how
important electricity is and that we believe a blackout is likely to occur again, we meet all the criteria for
requiring a disaster recovery plan. Because we cannot do without it and there are economically feasible
disaster containment steps we can take, a mitigation plan must also be drafted.

In addition to recovering from an outage, our mitigation plan will reduce the likelihood of losing
power to critical machinery. There are many other problems with electrical power beyond whether we
have it or not. Therefore our mitigation plan must address ensuring a clean as well as a reliable electrical
supply.

In the case of electricity, we need a process that:
Monitors the line and filters out spikes.
Provides additional power in case of a brownout or partial outage.
Provides sufficient temporary power in case of a total outage.
Makes the transition from normal power supply to emergency power supply without loss of service to
critical devices.

Whatever power support plan you select, keep in mind that it must be tested periodically. With luck,
you will be able to schedule the tests so that a power failure will have minimal impact. With a touch of
bad luck, nature will schedule the power outages for you and, again, at that time you will know how well
your power support plan works.

RISK ASSESSMENT

What sorts of problems are we protecting against? In an ideal situation, North American electricity isC
op
yr
ig
ht
@
2
01
1.
A
MA
CO
M.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.

EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

181

provided at 120 volts, 60 cycles per second, alternating current. If we viewed this on an oscilloscope, the
60 cycles would display a sine wave. There are many variations from “normal” that will play havoc with
your reliable power connection.

The most common electrical power problem is , or what is more commonly known as avoltage sag
“brownout.” Generally speaking, this is a reduced voltage on the power line; you can see a brownout
when the room lights dim. They can cause some computer systems to fail and occasional hardware
damage by forcing equipment power supplies to work harder just to function.

Sags can be caused by turning on power-hungry equipment. As they begin operation, these power
hogs draw the amount of electricity they need to run from the power grid. This sudden electrical load
causes a momentary dip in the line voltage until the electric company compensates for it. These power
drains might be anything from heavy machinery to the heater under the desk next door. Most sags are of
short duration.

Brownouts can also be caused by utility companies switching between power sources and, in some
situations, it can be an intentional voltage drop by the electrical company to cope with peak load
conditions. An example is the summer of 2001 power crisis in California with its rolling brownouts when
the electrical utility company could not meet peak demand. The demand for electrical power is growing,
but the supply of electricity is not.

Once a voltage sag ends, there is typically a corresponding “spike” of “overvoltage” that can further
damage equipment. Sharp or extended overvoltages can severely damage your electronic systems, which
are not designed to receive and handle large voltage variations.

Another common electrical power problem is a , which is a short-term substantialvoltage surge
increase in voltage caused by a rapid drop in power requirements. A typical surge lasts for 3 nanoseconds
or more (anything less is known as a spike). Surges are caused by major power users being switched off.
For that brief moment, the power available for that item is still being supplied but is no longer needed and
must be absorbed by other devices on that line. Examples of large users that may be switched off are
factory equipment, air conditioners, and laser printers.

Surges frequently occur and usually go unnoticed. Some can be handled by the equipment’s power
supply, some must be absorbed by a surge protector, and the rare major surge will wipe out anything in
its path. A common example of a major power surge is a lightning strike that surges down power and
telephone lines into nearby equipment.

Noise is seen as jitters riding along on the 60-cycle sine wave. It is electrical impulses carried along
with the standard current. Noise is created by turning on electrical devices, such as a laser printer, an
electrical appliance in your home, or even fluorescent lights. Did you ever see “snow” on your television
screen when using an electrical appliance? That is an example of line noise sent back into your electrical
system. What you see on the screen is electrical noise riding on your local wiring that is too powerful for
your television to filter out.

Noise is one source of irritating PC problems, such as keyboard lockups, program freezes, data
corruption, and data transfer errors. It can damage your hard drives and increase audio distortion levels.
The worse part of the problem is that in many cases you aren’t even aware of what is happening when it
occurs.

Voltage spikes are an instantaneous increase in line voltage that is also known as a “transient.” A
spike may be caused by a direct lightning strike or from the return of power after a blackout. Think of a
spike as a short-duration surge that lasts for 2 nanoseconds or less. Spikes can be very destructive by
corrupting data and locking up computer systems. If the spike hitting the device is intense, there can be
significant hardware damage.

An is a total failure of electrical power. It is any voltage drop to below 80 voltselectrical blackout
since, at that point, most electrical devices cease to function. Blackouts have a wide range of causes from
severe weather to auto accidents to electrical service equipment failures.

A blackout immediately shuts down your equipment, and it is time consuming to restart most
machinery from a “hard stop.” Even though most blackouts are of a very short duration, from a business
perspective a momentary blackout can be just as serious as a 2-hour outage. In addition, some equipmentCo

py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

182

may not have been turned off for years—and for good reason. There may be some doubt as to whether it
will even start again!

Blackouts are very damaging to computer systems. Anything residing in memory, whether it is a
spreadsheet or a server’s cache, is immediately lost. Multiply this across the number of people working in
a single building, and you can see the lost time just for one occurrence. Compounding the data loss is the
damage and weakening of your equipment. A further issue is that, from a recovery perspective, there may
be network devices working out of sight in a closet deep within the building. If you don’t know these
exist or where to find them, just the process of restarting equipment can be very troublesome.

When recovering from blackouts, beware of the corresponding power surge that accompanies the
restoration of system power. So when a blackout strikes, turn off your equipment and do not restart it
until a few minutes after power is stabilized.

YOUR BUILDING’S POWER SYSTEM

Many years ago, your company’s delicate data processing equipment was concentrated in the facility’s
data processing center. Often one whole wall was made of glass so everyone could see this technical
marvel in action (hence the term “glass house” for computer rooms). This concentration allowed the
equipment to be supported by a few Uninterruptible Power Supply (UPS) units and power line
conditioning devices.

Now, the primary computing muscle for most companies is spread all over the facility in the form of
PCs and departmental servers. Instead of a carefully conditioned and electrically isolated power feed,
your equipment shares the same power circuits as soda pop machines, copiers, and factory
machinery—all of which add noise and surges to the power line. None of this is good for your computer
systems and network devices!

This variety of computing power creates a need to monitor electrical service to ensure maximum
network and computer capabilities. The emphasis is on network because while personal computers are
located comfortably on office desktops, network hubs, routers, and bridges can be found stuffed in any
closet, rafter, or crawl space, or under a raised floor. This makes the automatic monitoring of electrical
service across your facility an important network management function.

Filtering the electricity as it enters your building is a good practice to minimize external influences.
Sometimes, however, the problems are caused by equipment inside your building; this might be arc
welders, heavy machinery, etc.

BUILDING A POWER PROTECTION STRATEGY

Power protection for business continuity is a five-step process.

1. The first step is to isolate all your electronic equipment from power surges by use of small surge
protectors. Power surges sometimes occur internal to your building. Surge strips are inexpensive and
simple to install.

2. The next layer is line conditioning. A line conditioner smoothes out voltage variation by blocking
high voltages and boosting the line voltage during brownouts. This filtering should always be
applied to the power line before electricity is passed to your UPS.

3. An Uninterruptible Power Supply provides electrical power for a limited time during the event of a
power outage. The UPS battery system can also help to protect against brownouts by boosting low
voltages. A UPS is a critical device for ensuring that key components do not suddenly lose electrical
power.

4. One of the best solutions for companies that cannot tolerate even small power outages is an onsite
electric generator. These backup units instantly start and begin generating electricity to support your
facility. Imagine a hospital’s liability if all their life support equipment suddenly stopped from a lack

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

183

of power. This added security is not cheap to install or maintain. Keep in mind that electrical
generators of this sort are internal combustion engines and must conform to local air pollution and
building regulations.

5. The last step, not directly related to electricity, is the physical security of the electrical support
equipment. Few people require access to this equipment, and it must be safeguarded against
sabotage. This equipment is unique in that if someone disabled it, the entire facility could stop with
lost production quickly running into the thousands of dollars per minute. Additionally, the “hard
stop” on machinery and computer servers will result in lost or corrupt data files.

However you secure this equipment, keep in mind the cooling and service clearance requirements for
the UPS system. The UPS control panel must also be available to the disaster containment team in a
crisis.

Surge Protection

One of the most common electrical protection devices is a surge protector power strip. Computer stores
sell these by the bushel. For your dispersed equipment, a surge protector can provide some measure of
inexpensive protection. A typical surge protector contains circuitry that suppresses electrical surges and
spikes. All electronic devices should be attached to electrical power through a surge strip to include all
your PCs, network equipment, printers, and even the television used for demonstrations in the conference
room. Even if your facility’s power is filtered as it enters the building, a direct lightning strike can ride
the wires inside the building and still fry your equipment.

There are many brands of surge suppressors on the market. There are places to save money and places
to lose money. The old saying goes “for want of a nail, the battle was lost.” When protecting your
equipment from power problems, you may not want to skimp too much. Here are some things to look for
when buying a surge strip:

. A joule rating is a measure of a surge protector’s ability to absorb power surges. AJoule Ratings
joule is a unit of energy equal to the work done by a force of 1 Newton through a distance of 1 meter.
Generally, the higher the rating the better. A good surge suppressor will absorb between 200 and 400
joules. If greater protection is needed, look for a surge suppressor rated at least 600 joules.

. The amount of above-normal amps the surge protector can absorb. As withSurge Amp Ratings
joules, the higher the better.

. Underwriters Laboratories™ tests to determine how muchUL 1449 Voltage Let-Through Ratings
of a surge is passed by the surge protector on to the equipment it is protecting. The best rating is 330
volts. Any voltage rating less than 330 adds no real benefit. Other ratings of lesser protection are 400
and 500. Be aware that UL 1449 safety testing does not test for endurance.

. The response time of the surge protector is important. If it blocks high voltages butResponse Time
is slow to react, then it is of marginal usefulness. Adequate response time is 10 nanoseconds or less.
The lower the number, the better.

. A high-quality surge protector guards against surges on the ground wire, asAll-Wire Protection
well as the current-carrying wires.

. To protect your modem from power surges riding on the telephone wire.Telephone Line Support
. The voltage at which the surge suppressor begins to work. The lower the rating,Clamping Voltage
the better. Look for a rating of 400 volts or less.

Some surge protectors provide basic line conditioning against noise on the line. This circuitry can
smooth out minor noise from the lines.

An interesting thing about the ubiquitous surge protector strips is that in addition to protecting
equipment, they make handy extension cords. Over time, these surge strips may silently have absorbed
any number of electrical “attacks” that have eroded or destroyed their ability to protect your equipment.
Most people cannot see this because they still function quite nicely as extension cords.Co

py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

184

Surge protectors often have lights to tell you when they are energized or not. High-quality surge
protectors will have an additional light to let you know if their surge-fighting days are over. This light
may say something like “protected,” “surge protection present,” etc. If your surge protector has such a
light and it is no longer lit when running, then you may have a false sense of security that it is functioning
as something other than an extension cord, because that is all it now is!

People traveling around the country using a notebook computer would be well advised to carry and
use high-quality surge protectors when plugging their equipment into the local power grid. When you
consider how big a typical surge suppressor is and how tiny notebook PCs have become, you can guess at
how little room there is for surge suppression circuits in the notebook PC chassis. This is especially
important for international travelers, as the power in some countries is a bit rougher than it is in the
United States.

A few more things to consider when using surge protectors. As great a tool as they are, they cannot
stop a nearby lightning strike from damaging your equipment. When a lightning storm approaches,
unplug both the power strip and your network cable from the wall. (This is good advice for any sensitive
electronic equipment that depends solely on a surge protector to defend against lightning.)

Also, never use a ground eliminator with a surge strip (a ground eliminator converts a three-prong
plug into a two-prong plug for use in an older building). Doing so will make it difficult if not impossible
for your surge protector to resist a major line surge.

Line Conditioning

Line conditioning ensures that your equipment always receives the same steady voltage. It also screens
out noise on the power waveform. Line conditioning involves passing your normal electrical service
through filtering circuitry before it is used. Many people don’t realize that the “old reliable” electricity
that magically comes out of the walls is susceptible to a wide range of influences. These influences take
the “pure” 60-cycle alternating current and introduces fluctuations in the voltage or current as it passes
down the line.

These fluctuations can have many sources but one that we especially want to avoid is lightning.
Lightning can cause a localized one-shot power surge to roar down the electrical line into your
equipment. When this happens, equipment power supplies and integrated circuits can quickly melt.

Line conditioning is also advised for analog telephone lines connected to PC modems. The same
lightning strike that induces an electrical charge in your electrical power lines can throw a jolt down your
telephone line. Unfortunately, PC modems have little protection against a power surge, and they are very
easily destroyed. Many surge suppressors now include a telephone line surge suppression jack to filter
these problems out.

A line conditioner should always be installed between a UPS and the electrical power source to
reduce the load on the UPS batteries. Some UPS units include a line filtering capability. Check your
model to see what it is capable of doing. A line conditioner reduces the number of times that the UPS
jumps on and off of battery power (which shortens the life of your batteries).

A line conditioner is an essential component when generating your own emergency power. Use it to
filter the electricity provided by the generator before it is passed on to delicate computer hardware. The
power delivered by a generator is not as clean as that normally delivered by the power company.

Uninterruptible Power Supplies

An Uninterruptible Power Supply provides several essential services and is best used in conjunction with
surge protection and line conditioning equipment. A UPS can help to smooth out noisy power sources and
provide continuous power during electrical sags. Its primary benefit is to provide temporary electrical
power during a blackout. Depending on the model, it may also provide some measure of line conditioning
protection.

Uninterruptible Power Supplies come in three basic types, based on their features.

1. The basic UPS is a “standby” UPS. A standby UPS provides battery backup against power outagesCo
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

185

(blackouts and brownouts) and a modest amount of battery-powered voltage correction.

2. The “line interactive” UPS is a step above the basic unit. It provides voltage regulation as well as
battery backup by switching to battery power when line voltages move beyond preset limits. This
type of UPS converts a small trickle of electricity to charge its batteries at all times. When power
fails, the line interactive UPS detects the power loss and switches itself on. A line interactive UPS
has a subsecond switching time from line power to battery power.

3. The third type of UPS is an “online” UPS that sits directly between line power and your equipment.
The online UPS is always providing power to your electric circuits and has a zero transfer time
between the loss of line power and the start of battery power.

UPS BATTERIES UPS systems provide power during a blackout by drawing on their battery electrical
supply system. Most of these batteries are sealed lead acid batteries. Unlike the batteries found in many
notebook PCs, these batteries do not have a “memory” and should be completely drained as few times as
possible. Depending on how often your UPS draws on its batteries, they should last up to 5 years.
Remember that brownouts and short-duration blackouts all wear on the batteries, so if your local power
fluctuates very much, the life of your batteries will be reduced. The speed at which your UPS batteries
age is also determined by their environment. Extreme heat or cold are not good for your batteries. Refer
to your manufacturer’s guide for the recommended operating temperature range.

As batteries age, their power-generating capability will decrease. Therefore, regular preventative
maintenance is important. Preventative maintenance should include changing the air filters to help keep
the UPS unit cool. At that time, all the batteries should be checked for damage, leaks, or weak cells. You
should also consider a service agreement that includes the replacement of damaged batteries.

UPS systems use “inverters” to convert the DC battery power to AC power. An inverter is electrical
circuitry to change the direct current to alternating current. High-quality UPS systems use a dual inverter
system for smoother power conversion.

UPS “SIZE” The first question people ask about UPS units is, “How big does it need to be?” This all
depends on several things:

What must be supported? This translates directly into how much electricity must be supplied at a
given point in time.

How many minutes must the battery pack provide this level of support?
Is your area prone to power problems?
Will this UPS be managed remotely through manufacturer-provided software?

UPS units are rated according to the number of volt-amps they can deliver. Volt-amps are different
from watts and you cannot equate the volt-amps provided by a UPS with the watts used by an electronic
device. Typical power factors (which is watts per volt-amp) for a workstation is 0.6 or 0.7. So if your PC
records a drain of 250 watts, you need a UPS with a 417 volt-amp rating (for a 0.6 power factor). Always
be careful to never overload a UPS beyond its rated capacity. Doing so will severely damage it.

Most UPS manufacturers have a software tool for estimating UPS sizes. Where possible, use their
programs to size your UPS. In the absence of that tool:

1. Begin with a list of all equipment for which you will need to provide electricity. This may include
personal computers, monitors, servers, critical printers, network hubs, and telecommunications
equipment—whatever will be supported by the UPS.

2. Determine the wattage ratings on all these devices by checking their nameplates. The numbers may
be expressed as watts. We need the numbers in volt-amps (VA) since that is a more accurate number
for UPS sizing. Multiply the watts by 1.4 to get the volt-amps load.

3. If the power usage is provided in amps, then multiply that number by the line voltage (120 volts in
North America and 230 volts for Europe) to get a volt-amp rating.

4. Total the volt-amp requirements for all the supported equipment.Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

186

This is the amount of load you need to support. From here you check with the manufacturer for the
size of unit to support this load for the amount of time you select.

SWITCHING TO BATTERIES A UPS uses power line filters to address minor power disturbances, but
its main weapon against a power loss or severe brownout is a near-instantaneous switch to battery power.
This is good for keeping your systems alive but hard on the batteries. If your UPS must often switch to
batteries because of poor power regulation in your area, then your battery life will suffer significantly. As
the batteries rapidly age in this environment, they would not provide protection for the length of time you
may be counting on from your UPS.

Recharging the batteries is another issue. Some UPS systems allow you to choose between a fast
recharge or a slow recharge. The frequency and duration of outages in your area should determine if you
must recharge your batteries as fast as possible or use a more gentle slow recharge process. Fast
recharging puts a large drain on your restored power supply.

If you switch to generator power, you do not want the batteries to recharge from the generator as it
might take away too much of the power needed elsewhere. If you plan to recharge the batteries from the
generator, be sure that is included in the power load plan when sizing the generator, and that the batteries
are on a slow recharge cycle.

UPS LOCATIONS If you have concentrated your data processing main computers and servers into one
room, then selecting a location for your UPS will be easy. Electricians will run a separate electrical circuit
from the UPS to the equipment to be protected. Electrical codes require these outlets to be a different
color so you will know which circuit you are plugging into.

Some critical machinery and computers will be located away from the central computer room. For
these devices, consider smaller UPS units located adjacent to the equipment. These units will not have a
long battery life and will be used to keep the machine operational long enough to shut it down gracefully.
Be sure not to lose sight of these satellite units as they will need to be tested and their batteries
maintained over time. Remote monitoring software is ideal for this situation.

ADVANCED UPS FEATURES Modern UPS units offer much more than battery backup. They possess
microprocessor logic to support a wide range of services. They can provide alarms of error conditions
both on the unit and through your data network. This is a very useful feature since they are often stuck in
some dark back room where an audible alarm only serves to annoy the mice.

The network signaling of power conditions is a very useful feature. Depending on the capabilities of
your UPS and data systems, a UPS can start the orderly shutdown of equipment to protect it before the
UPS batteries are exhausted. This feature is very useful over weekends and holidays when no one is
around. In some cases, it can order a restart when power is restored. A more sophisticated UPS system
stores a log of the power supply status for later analysis. Do you know how noisy your power lines are?
Do you know the frequency and magnitude of sags and spikes that occur on your electrical power lines?

A UPS is a critical component of a data network. Remote monitoring software allows a network
control analyst to monitor the status of each remote UPS and display the current line voltage and the
voltage/current draw on the equipment. This helps to track which lines seem to have the most variation
and potentially drive it back to a root cause in your facility. If some electrically driven machine in your
facility is causing problems in your internal power grid, it needs to be identified and provided with better
electrical isolation.

UPS TESTING It is great to have a UPS system set up and running, but it needs to be tested if there is to
be a credible plan. So on a weekend in your slow time of the business cycle, you should plan for a UPS
load test. This will demonstrate your power support system capabilities before a blackout strikes.

To set up the test, shut down the programs on all your computers but leave the computer running. The
idea is to not lose any data but to still pull each system’s normal electrical load. Bring in your UPS

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

187

service technician to address issues during and after the test. Warn management you are going to do this.
When all is in place, have an electrician cut the power to the UPS in that part of your facility and see what
happens.

This test has several goals:

1. The first is to see what is not on the UPS that should be. Once the power is cut and the batteries are
humming, you will see which server, computer, network device, etc., has been overlooked. Now
look to see which low-value items are connected and wasting valuable emergency power.

2. Second, you need to know how well your UPS will support the load you have attached to it. If it is
overloaded, you must plug some equipment into other power sources or get a bigger UPS unit.

3. When you shut down the servers’ operating system, bring along a stopwatch and write down how
long it takes. This will tell you the minimum amount of time the UPS must hold for you to perform
orderly system shutdowns. If your servers are far apart in different rooms and the same person is
expected to shut them all down, that may add travel time to the time you must allow on the UPS.
Plan your time for the worst case.

4. Observe exactly what information the UPS displays about the remaining minutes of power given the
current consumption rate. Compare this to the operator instructions you have provided to the
after-hours support team. Be sure to also train the facility electricians on how to read the UPS
display panel.

5. Exercise your power shedding plan while someone observes the impact on the UPS. How much
additional time do you get for each level shut off?

POWER GENERATORS

If your facility absolutely must maintain its power supply in the face of any sort of electrical problem,
then you will need your own electrical generation system. This is a large leap in complexity above UPS
systems and takes extensive planning. There are some industries that quickly come to mind as requiring
this level of support. Hospitals need it to support electronic medical equipment, food storage sites need it
to prevent spoilage, and even Internet hosting providers need it to ensure maximum application
availability to their customers.

On the other hand, it is kind of nice to switch from having a problem to being in control of it. A
properly sized and installed electrical generation system can return some benefits in the form of keeping
your company running while other companies cope with a rolling blackout, by the potential of selling
electrical power back to the utility company, and by running your generator during peak electrical usage
times thereby avoiding the highest cost electrical power.

Sizing Your Generator

Once you decide the need for maximum power availability, you begin with determining what it is you
need to support. If it is everything within a building or an isolated part of a building, you could contract
for an electrician to monitor the amount of electricity used in that building or part of the building and use
that as a starting point for sizing your equipment. If the generator is only supporting one portion of the
facility, you must have a way to isolate it from the rest of the structure.

Next, you need to know how long your generators must provide electricity. If you live in an area that
experiences widespread natural disasters, such as floods, hurricanes, earthquakes, or blizzards, then you
might want to allow for running this system for several days at a time. A good place to start here is to use
your personal experience for the frequency and length of outages in your area to plan on system size. This
will help to determine the size of your fuel storage system for running the generator.

Switching Time

Some industries, like hospitals, have a standard amount of time they can be without electrical service.
Their generator must switch on automatically. However, mechanical engines take some time to start and

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

188

run up to operating speed (ever start your car on a cold day?). Some of the fastest generators can
automatically sense the loss of electrical power and start providing stand-by power in less than 10
seconds.

The question here is how long of a gap your company can tolerate. During this brief outage, UPS
systems can maintain power to critical devices. Some equipment, like refrigerators, can tolerate a brief
gap since they are already chilled down. Some equipment, like lights, can ever so briefly be out if
supported by a backup emergency lighting system. So when deciding how long your company can
function without electrical power, be very specific of what is needed and why.

The alternative is to always run generators together with pulling power from the power grid. As you
can quickly discern, this is yet another step in complexity that distracts you from your core business.
Rather than take this step, most companies settle for quick-switching generators supplemented with UPS
support at critical points.

Generator Testing

More than any other power support system, the engines on your generators will take regular care. Begin
by running them monthly to ensure they function on demand. Next, they need to be tested under load.
This can be arranged for a weekend where everything they are to support is turned on and the electricity
disconnected for a few hours from the power grid. Periodic testing under load is a critical component of
your power backup system credibility.

During your testing, monitor the actual fuel consumption to generate a given unit of power. Fuel
consumption is also a matter of air temperature (height of summer or the depths of winter). Aside from
the manufacturer’s claim, use this to determine how long your onsite fuel supply will last for delivering
electricity.

Testing also exercises the people supporting your generators. By drilling them on their duties, they
will be able to respond more quickly in a crisis. Be sure to rotate personnel to provide sufficient trained
backup staff.

Working with Your Public Utility

Unlike a UPS or line conditioner, a generator has the potential to help pay for itself. During peak
electrical usage periods, such as the depths of winter or the oppressive heat of summer, running your
generators will reduce your draw on the community’s power grid. Some utilities base their year-long
electrical rates on the peak usage at any point over the year. By using your generators on these days, you
contribute to the overall containment of electrical rates. And even then the units don’t need to run all day,
just during the peak usage hours of the day. If your generation capability is sufficient to run your entire
company site, then the utility may call you and ask that you run your generators at those times to reduce
peak usage.

Another aspect of running your own generators is the selling of power back to the power utility. This
must be investigated with your local board of public utilities as to how much you would be paid and what
conditions must be met. But if you are in an area of unreliable power, you might be able to address your
own problems and cover some of your costs at the same time.

Environmental and Regulatory Issues

Like all good things, there are some downsides. Running an internal combustion engine to make
electricity puts pollution into the air. Some jurisdictions limit the number of hours per month that a
generator can be run (except in a crisis). Before purchasing your generator, check for any requisite
permits for such things as fuel storage, air pollution, taxes, etc.

ACTION STEPS FOR YOUR PLAN

If you have a UPS, be sure that it is properly maintained. Most UPSs require regular preventativeCo
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

189

maintenance, such as changing the air filters and checking the condition of the batteries. If you skip this
step, then you are destined to discover how important it is the next time your UPS is needed.

Most large UPS units come with a small display panel that indicates the condition of the UPS’s ability
to supply power. Master this panel and all controls before an emergency arises. Never open the front of
the UPS as the unit is an electrical shock hazard. The unit should only be opened by trained electricians.

In the event of a power outage, the front panel display can tell you how long the UPS batteries will be
able to supply power to all the devices attached. This is a very important piece of information. Most
computer servers take a long time to recover if they suddenly lose power. They require time to shut down
“gracefully.” You need to know the typical amount of time required to shut down each critical server.

Most UPS units include an audible alarm for when they are on battery power. It is important to know
what these alarms are and what to do when you hear them. If the UPS units are in a place where a security
guard can hear them after hours, be sure the guard knows what to do.

EMERGENCY LIGHTING

In a large building, it can get very dark very quickly in a blackout. Even if flashlights are readily
available, you need to be able to find them. Also, a sudden blackout can be very disorienting to some
people. This only adds an element of panic to the moment. To address this, most legal jurisdictions
require the installation of emergency lights that come on whenever power to the building is lost. This
provides some light for the safe evacuation of offices and workplaces.

These lights depend on a battery to power the lights in a blackout. To be sure that the lights and the
battery are ready when they are needed, they must be checked monthly according to the manufacturer’s
testing steps.

SOMETHING FOR YOUR SUPPORT PLAN

Following are three notices for you to consider as additions to your power support plan.
The first is an insert for your immediate steps contingency plan to be kept at the help desk and posted

on the computer room walls. When power drops, employees should execute the steps on this notice to
contain the problem while the technical staff is called in.

The second is a wall notice on priorities—which equipment to turn off in what order so that your UPS
and generator system can be freed to support the most critical systems.

The third is a set of instructions for making up the power shedding tags described on the Power
Shedding Priorities page.

Power Outage Action Plan

1. Immediate Action.

a. Notify your facility’s Maintenance supervisor immediately.

b. Notify your Supervisor.

Primary: (name and number here)
Alternate: (name and number here)

c. Determine the scope of the problem.

Look outside the office. Is there power everywhere else in the building?

Send someone outside to see if the electricity is on outside of the building. (Do not go
yourself. You must sit by the phone to coordinate action until your supervisor arrives.) Are
there lights on in any other buildings? Are traffic signals working? Are street lights on?

d. Notify the facility’s Disaster Recovery Manager.Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

190

e. Begin a log sheet of all events to include when the lights went out, who was notified and
when, any communications with the power company, etc.

2. Physical Layout.

a. UPS Room

Send someone to look at the UPS. Note how long the display indicates the batteries are
projected to last.

Execute the power shedding plan.

Keep monitoring the UPS and continue shedding power using devices. When the UPS time
falls below 20 minutes, begin shutting down all the servers.

Call all system administrators and the network manager.

Power Shedding Priorities

When electrical power fails or when the power company notifies you that a failure is imminent, the
drain on the UPS batteries is minimized by turning off equipment according to its power shedding
priority. After reliable power is restored, turn equipment back on according to its priority. Start the
most critical systems first.

When a power outage occurs or is anticipated, notify the Help Desk, Facility Security, your
supervisor, and the Data Processing Manager. Monitor the UPS systems to see how much time is
remaining on the batteries (instructions are posted on the UPS devices).

This approach uses Power Shedding Priorities A through D, with A being least critical and D
being most critical equipment to keep running. Priority is set according to:

Which systems directly support facility production.
Which systems will cause widespread problems if they stop working.
Which systems are difficult to restart if they stop suddenly.

1. As soon as you lose electrical power, shut off nonessential systems and equipment identified
with a green “A” power shedding label, such as CRTs, terminals, printers, card processing
equipment.

Notify other company sites on your network.

Update the Help Desk, Security, and the Data Processing Manager.

2. When the UPS units show 15 minutes of power remaining, shut off low-priority CPUs and
devices identified with a Yellow “B” power shedding label. When you progress to this step,
notify:

The Help Desk and the Data Processing Manager.

3. When the UPS units show 5 minutes of power remaining, shut off all remaining equipment and
servers identified with a red “C” power shedding label. When you progress to this step, notify:

The Help Desk and the Data Processing Manager.

4. Let equipment identified with a tan “D” power shedding label “die” on their own as power
drops off. This is equipment that can tolerate a sudden power drop.

Communication is important!
Ensure that management and the appropriate support people know when you start the next step of
shutting down or restarting systems. Data Processing Management will call in the required system
support people for a proper restart.

Power Shedding Tag Instructions

Labeling your equipment: Make up labels on colored paper and then laminate them.Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.

Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

191

A = Green B = Yellow C = Red D = Tan

Resources:
Liebert www.liebert.com
American Power Conversion www.apc.com
Tripplite www.tripplite.com
Underwriters Laboratories www.ul.com

CONCLUSION

Electricity is a powerful resource necessary to operate the modern business. As with any resource, you
need to be familiar with its role in your operation and how its absence will affect your company. In the
absence of clean power from the electric utility, the main sources of electrical power are a
battery-operated UPS and a generator. A thorough understanding of the electrical requirements of your
organization will help you to design the most cost-effective plan to protect against its absence.

Co
py
ri
gh
t
@
20
11
.
AM
AC
OM
.
Al
l
ri
gh
ts
r
es
er
ve
d.
M
ay
n
ot
b
e
re
pr
od
uc
ed
i
n
an
y
fo
rm
w
it
ho
ut
p
er
mi
ss
io
n
fr
om
t
he
p
ub
li
sh
er
,
ex
ce
pt
f
ai
r
us
es
p
er
mi
tt
ed
u
nd
er
U
.S
.
or
a
pp
li
ca
bl
e
co
py
ri
gh
t
la
w.
EBSCO : eBook Collection (EBSCOhost) – printed on 2/2/2018 11:51 AM via AMERICAN PUBLIC UNIV SYSTEM
AN: 349248 ; Wallace, Michael, Webber, Larry.; The Disaster Recovery Handbook : A Step-by-Step Plan to Ensure Business Continuity and
Protect Vital Operations, Facilities, and Assets
Account: s7348467.main.ehost

http://www.liebert.com

http://www.apc.com

http://www.tripplite.com

http://www.ul.com

Still stressed with your coursework?
Get quality coursework help from an expert!