Research Paper Undergraduate 3,887 words

How to Prepare and Test a Continuity of Operations Plan

Last reviewed: April 14, 2016 ~20 min read

Contingency Planning

Information Security contingency plans are very important for firms operating in today's world, where cyber security is a top issue a result of business's technological and digital dependence. This paper will discuss the planning steps, possible recovery options, and recommended testing requirements needed to support a successful business contingency/continuity of operations environment. Included will be recommendations for a proposed 24-month cycle business contingency testing plan, what should be tested and how the test should be conducted. Critical corporate assets will be ranked with the type of testing (i.e. plan reviews, tabletop exercises and backup recovery tests). Costs associated with the recommended testing process will also be taken into consideration, including personnel, equipment and production costs.

Planning Steps

Step 1 is to examine the organization of the IS department. An IS department should be organized in order to guard against an attack, blackout or any other natural or man-made disaster that can impact the integrity of information related to a business's procedures and processes. The purpose of a contingency plan/continuity of operations environment is to ensure that the hierarchy of structure (including hardware, software, work teams, management and crews involved in supervision) are able to conduct business fluidly and without interruption while maintaining safety of data through secure networks and storage devices. This requires a high degree of diligent oversight, supported by weekly assessments, made routine according to a standardized formula that incorporates analysis of the latest development in technology, threats, and safety issues related to cyber security. Advisory notices should be directed towards proper personnel within the IS department, so that individual staff members are alerted to any adjustments that require attention; and the department should organize itself into teams or squads consisting of a threat recognition team, a problem solving team, an info/data gathering team, a specs squad, a systems design unit, and a maintenance/review squad.

Once the IS department is organized, it can proceed to Step 2: risk assessment and business impact assessment. The purpose of each is to analyze the impact that a disruption can have on the organization and how to mitigate it (Vacca, 2009). Stakeholders in the organization (including but not limited to: directors, board members, employees, creditors, government advisors/agencies, owners, unions, and suppliers) must be called upon to assess the drivers that propel the firm forward and that are indispensible to the business's smooth operation. Drivers are the core components/strategies that offer real value to the organization, such as intellectual property or operations of data -- and once these are determined and rated, the organization can perceive how much time, energy, and available resources should be directed towards ensuring that the driver is supported and backed-up should a disaster strike. As Bahan (2003) indicates, it is the top priority of managers overseeing the business impact assessment to determine a top-down arrangement of drivers that require immediate support and are, therefore, first in line to be restored to working order in an infrastructure collapse event.

The risk assessment development can then proceed: it is accomplished by identifying risks to operational facilities based on precedent as well as potential threats that are currently at large (this is why a department team should be assigned to threat identification). Stemming the impact of potential disasters via risk management is a necessary step in any contingency/continuity of operations plan. The more potential disasters that can be averted ahead of time, the better (Haes, Grembergen, 2009).

Recovery Options

A recovery option is only as effective as the organization's ability to maintain communication lines in the event of a disaster. Therefore, a contingency plan as well as a continuity of operations plan must consider how a communications strategy that will enable the business to stay online in terms of connectivity between stakeholders (i.e., suppliers, supply chain managers, directors, consumers, clients, etc.). Recovery options are available for a range of scenarios for a range of business types. Selecting the right option will depend on the type of business being conducted and the type of disaster being prepared for. Strategic continuity software can be purchased by any business from a number of distributers/producers who specialize in supporting organizations in recovery type situations. Ponemon Institute and companies like Symantec are leaders in the industry of helping firms to identify their recovery needs (cyber security options include utilizing a data breach risk calculator, which helps in the risk management stage identified above, and which can be used to help the firm develop its recovery plan). Other recovery options include the framework guide to IT infrastructure recovery through the use of security provisos, such as data loss prevention software (DLP), which helps a business's IS to track data being utilized at any given moment (regardless of the state of activity).

The most important element of an appropriate recovery option is that have simplicity and utilize the IS core for efficiency (Sawy, 2003). A recovery manager should be appointed and should be able to identify the various options to key players in the firm. These would include cloud services, virtualization, mobile connectivity, social networking, electronic-based vaulting (if applicable), managed recovery, and recovery point objectives (LaChapelle, 2014).

Cloud-based recovery options allow firms to back up data systems by utilizing cloud technology, which stores data for smaller firms at affordable rates. Virtualization is another option that gives firms even more flexibility by allowing them duplicate a total copy of a data center, which can then be accessed and utilized when needed. Virtual machines are available for server extension. Mobile connectivity can be an essential element of a recovery plan and should be considered as a potential additional option for helping workers to stay connected and in communication. Likewise, social networking facilitates this end. At the same time, some firms may not have the resources to manage their own recovery; therefore, outsourcing may be a recovery option to consider (this would be a managed recovery. Another option is to cut the amount of backups that are needed by the firm by implementing an electronic-based vaulting system (such as remote libraries and software replication systems). Finally, recovery point objectives are an option as they cover a total scenario in which strategic points are identified and objectives (whether zero data loss prevention is critical or whether recover time objectives are critical) in the maintaining of business operations).

Recommended Testing Requirements

In order for effective testing of the firm's contingency plan and continuity of operations place to be enacted, it is essential to have the complete staff trained on what is in store for the operation. Training the staff about how a contingency operation is the highest imperative/requirement at this stage of the implementation program.

A contingency response team should be organized and trained to handle an emergency event that requires implementation of the continuity plan. The response team is responsible for restoring system functions and ensuring that data is back online within the requisite amount of time.

At the same time, test objectives must be identified and met. In order to guarantee that these objectives are met, a review process should be in place for verifying response times, achievement, and maintenance of data and support systems.

Prior to the implementation of the testing of the plan, a series of pre-tests can be conducted in order to test the effectiveness of the risk management portion of the cyber security contingency plan.

Penetration testing is one method of testing cyber security as a means of risk management -- the first stage of a contingency plan. The method of testing is one in which hacker's attack is simulated so the operating form can observe whether there is any exposure or holes in the system's security (Haes, Grembergen, 2009). Auditing and monitoring tools incorporated in the plan are the use of CORE Security Technology which is a security auditing tool that graphs security-related data for users to see a visual representation of the degree to which the security system in place is effectively thwarting attacks and is set up to protect against possible assaults from a number of areas. The determination of where backups are stored is based on system preferences, whether backups are desired to be locally controlled or whether cloud backups are desired in which case an alternate system is in control. Both are approved in the case of disaster to guarantee the integrity of the data (Krutz, Vines, 2010).

Another area of concern that is a recommended area for testing is the use of cell phones, laptops and other mobile devices within a firm. The Government Accountability Office reported in 2012 that laptops and cellular devices are a risk for businesses and that one way to mitigate this risk is to enable encryption for these devices. Users are susceptible to hacking and can thereby be used by cyber attackers to gain access to data inside a firm's secure walls. Thus testing should include these devices as well and to make sure that networks are properly secured via pass codes.

Once these pre-tests are conducted, the contingency test can proceed. An effective contingency plan test will involve developing

Notification procedures

Coordination among recovery the recover team(s)

A plan for systems recovery on an alternate platform from the backup medium/media

Connectivity -- both internally and externally

The use of alternate equipment to achieve systems performance

The restoration of normal operations

And a process review with a report generated upon conclusion of the review (Information Technology Contingency Planning, 2012)

Recommendations for a Proposed 24-month Cycle Business Contingency Testing Plan

A 24-month cycle business contingency testing plan is an efficient way to maintain organization within the firm when it comes to being prepared for an emergency and to being prepared to implement the continuity of operations plan. Drills should be conducted, therefore, on a regular basis to ensure that key players are compatible with their functions and are aware of their role in the event of an emergency requiring implementation of the contingency plan. This can be a draining process and it can be especially draining on valuable resources, so testing should be limited in its duration and performance. How an organization chooses to conduct its testing depends on its financial ability to conduct such tests as well.

Also to be considered over the 24-month stretch is how much downtime should go between testing is conducted. This will be the "maximum tolerable downtime (MTD)" (Gilbert, 2015) and it will correlate with the balance that an organization will want to maintain between effective planning and testing and efficient maximization of resources to prevent losses upon the event of an emergency. For instance, if improper testing is conducted over this period, it could result in longer outages, which could result in diminished returns and revenue for the firm. Thus, a trade-off is in order and should be balanced between spending resources and time on training and risking both when it comes to actually implementing the strategy when an emergency hits.

One way to balance this equation is to chart the cost of practicing and running the contingency plan at intervals over the course of a 24-month business cycle and the "cost of daily disruption of daily operations" that will result from a failure to fully and successfully implement the contingency plan when it comes time to put it into play (Gilbert, 2015). Such a chart will allow for a representation of what the cost is that is to be considered.

At the end of the 24-month cycle, a review of the contingency plan's effectiveness should be performed with updated acknowledgment of advances in technology provided by the various teams assigned the responsibility of monitoring advancements made in tech, specs and cyber-security threats. This information will be used to update the contingency plan according, and this should be done at the conclusion of every 24-month cycle.

What should be tested over the course of the 24-month cycle are the various parts of the contingency plan, including personnel, tech support, programs, systems, back-ups, and alternative systems. These should be tested according to the formula stipulated above, with teams being given notice ahead of time that a test will be forthcoming; this allows them to prepare and coordinate with one another. Coordination should be scored by an objective observer (a third party can be hired to perform what would essentially be a contingency plan audit). The various parts of the contingency plan implementation would be tested, including connectivity, the speed of recovery, and the usage of alternative systems.

Types of Testing

Tabletop

Tabletop testing is an appropriate method of testing that can be utilized to measure the responses of team members involved in the contingency plan process. These members are notified of the emergency and the need for implementation of the plan. They are gathered at a specific location and given a timeframe in which they are expected to have systems back up and running. Their success is measured by the contingency scenario that they are able to devise at the tabletop gathering. This is a system of group work that tests the ability of the personnel to work together, to communicate, to coordinate, and to theorize about possible outcomes. It is a very easy and extremely low-cost method of testing that can be conducted without much strain on the daily operations of the business itself. In other words, the business does not have to shut down to allow for this type of test to be conducted, as the main components of the test are simply the personnel involved in the tabletop group. It is, however, an effective way to prepare for an emergency response and it can be measured and reviewed and verified to see whether the team managed to meet all points and requirements expected of the contingency plan response.

Critical core assets in tabletop testing are the knowledge that personnel have and the skills they possess and are able to draw upon in order to provide adequate responses. The discussion-based nature of the tabletop test allows participants to engage in what are essentially role-playing exercises. The discussion, moreover, should be guided by a role-player who acts as a facilitator, throwing different scenarios at the participants and allowing them to respond. Ideas can be bounced back and forth between players, and information can be reinforced in a kind of classroom setting that is conducive to learning exactly what is needed and expected of the response team. Tabletop testing can be conducted more frequently than any other type of testing because of its low-cost nature and ability to be exercised without draining many resources.

There is, however, a limited degree of pressure upon participants because of the relatively informal setting and lack of emergency and urgency associated with the testing, so the results must be valued accordingly. In an emergency type situation or in a real-life setting, results should be expected to vary, according to pressure, stress, and rushed communications. Therefore, while tabletop testing can be one effective method of testing the contingency plan, it is by no means the only type of test that should be conducted.

Functional testing/System Testing

Functional/System Testing is a much more rigorous and focused type of testing. It involves one part or aspect of the contingency response team at a time and the goal is to hone the skills needed to respond to an actual emergency in a real setting. This type of test is typically conducted whenever a new system is installed and personnel need to know how the procedure to restore it and/or back it up is conducted. Whenever a continuity plan is updated, functional/system testing is a way for teams to be better prepared for the event.

This type of testing can be conducted in order to break in and test new software and/or hardware and to practice risk management in pre-test trials where cyber security threats are identified and a hack is simulated by teams who want to test out a system's security walls.

When a simulation of an emergency event is conducted, it is a functional/systems test that is performed in order to see how teams respond to the event in a real-life setting, situation or scenario. The pressure in these tests is more strongly felt as the setting is not in a classroom but in the actual work environment and actual responses are expected of key players.

The cost of implementing a functional test can be much higher than a tabletop test and therefore these are conducted less frequently. A functional test may be conducted once every year or every two years because of the costs associated with the expense of resources, with shutting down temporarily and with monitoring and reviewing the overall process. This type of test requires a full-blown audit when conducted on a large scale and this process can be quite expensive as an auditing firm is typically hired to conduct the test over a certain period of time.

Planning Reviews

A plan review is another type of test that can be conducted. In this type of test is like a tabletop test except there is no scenario utilization in order to actually test the responses of the participants in the room. A plan review is a gathering of the participants who will discuss the contingency plan together and re-orient themselves with it as well as anyone who is new and not familiar with it already. A plan review is a type of "test" that can be conducted most frequently, even on a monthly basis, as it is quick, informal and does not require a lot of response from participants. It is basically a reviewing process, in which procedures are gone over once again, and the time frame can be that of a typical meeting of managers and workers at the firm -- an hour is sufficient to set up the review. In this amount of time the entire plan may not be able to be gone over, but parts can be discussed, and parts that were not discussed in one meeting can be examined in the next meeting. In this manner, the entire review process is gone over in due time and everyone is acquainted with the procedures that are needed.

You’re 80% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2016). How to Prepare and Test a Continuity of Operations Plan. PaperDue. https://www.paperdue.com/essay/how-to-prepare-and-test-a-continuity-of-2158108

Always verify citation format against your institution’s current style guide requirements.