There are a number of problems facing the future of information technology including the fact that networks are increasingly asked to expand in order to accommodate more and more data. Many experts believe that such increases will mean two things; one that the networks will become increasingly secure, and two because of the security, the data contained on the network will become more difficult to access. This study sought to determine the various processes that are currently being used to secure data on various networks, and to determine if that security will, or will not, ensure that data will become incrementally more difficult to obtain. To this end, this study used the most current literature available to determine if there is a problem with the data being stored in the current manner, or if there is a perception that the data will be safe throughout the centuries, and what we can do, and are doing to ensure the viability of data currently being collected. Following the review, a summary of the research and important findings are presented in the conclusion.
Review and Discussion
In the context of this study, the term security refers to information security which means the level of availability, confidentiality, and integrity of computer-based information (Robinson & Valeri, 2011). Information security is vitally important today given that virtually all electronic transactions are stored in one fashion or another for varying lengths of time (Datt, 2011). For example, every hour, Google processes more than one petabyte of information (a petabyte is a million gigabytes) and Facebook hosts billions of photographs (Datt, 2011). Likewise, more than one million consumer exchanges are processed by Walmart each hour (Datt, 2011). In this regard, Datt emphasizes that, "The data deluge creates challenges for the storage and management of information, and both challenges and unprecedented opportunities in the mining of such information for actionable intelligence" (2011, p. 46). Some indications of the explosion in data generation in recent years can be discerned from the following trends:
By 2010, the digital universe had reached unprecedented levels, growing by 62% to nearly 800,000 petabytes;
By 2011, the digital universe was expected to grow almost as fast to 1.2 million petabytes, or 1.2 zettabytes.
These trends indicate that by 2020, the digital universe will be 44 times as big as it was in 2009 containing approximately 35.2 zettabytes of data (Datt, 2011).
A recent study showed that the Internet Archive already contains multiple petabytes of data (Rosenthal, 2010, p. 47) and that data collections are constantly expanding at an ever increasing rate. The situation leads many to wonder how safe and secure the data is and who should be worried about whether it really is safe or not. Much of the concern, of course, is warranted, and that concern also drives companies and government entities to make back-ups, and sometimes even back-ups of back-ups. Such insecurity means that even more data is being saved, adding to the general accumulation. Rosenthal writes that most of the companies that provide really large data storage solutions tout the fact that very little data will be lost or compromised if users store data with their firms. However, with so little experience at data storage, most of these firms have no true idea whether the data will ever be compromised or not.
Another recent study concluded that openness and the integrity of personal data are particularly critical elements for the success of a range of future e-science endeavors (Axelsson & Schroeder, 2009, p. 213). If openness and integrity of personal data is to play a key role in success stories, then it would seem that a study such as the one being proposed would be of special consideration regarding how open data will be in the future. This type of scenario leads to some interesting speculation, i.e., will the data currently being stored survive for centuries, what factors could cause the data to be compromised, will the security measures being taken to safeguard the data ultimately end up causing the data to become inaccessible? In sum, secure data storage is important for a number of reasons, including the following:
Properly storing data is a way to safeguard the research investment.
Data may need to be accessed in the future to explain or augment subsequent research.
Other researchers might wish to evaluate or use the results of research.
Stored data can establish precedence in the event that similar research is published (Westra, 2014, para. 2).
Irrespective of what form data is stored, data access controls are required to assure that the distribution of the stored data is scalable, reliable, and sufficiently secure; however, the majority of large scale storage systems are non-relational key/value databases, including Yahoo's PNUTShell, eBay's Odyssey, Google's BigTable, Amazon's Dynamo and SimpleDB (Zhang & Wang, 2010). According to Zhang and Wang, "In these systems, scalability, consistency, availability and partition tolerance properties are commonly desired. In addition to all the properties, a high-quality distributed data systems must take security into account, especially in cloud computing. But enhancing system security usually weakens significantly its scalability and openness" (2010, p. 292).
Generally speaking, there are two types of file storage formats currently in use: an access format and a preservation format (Park & Oh, 2012). Access formats are appropriate for situations in which users view documents or otherwise manipulate them in a real-time fashion while preservation formats are suitable for storing documents in electronic archives for long periods of time (Park & Oh, 2012). Preservation formats provide "the ability to capture the material into the archive and render and disseminate the information now and in the future" (Park & Oh, 2012, p. 45). Both access and preservation formats play a role in how stored documents are processed over time, with the former being needed to ensure ready accessibility to the data while the latter is necessary to ensure its long-term integrity and security (Park & Oh, 2012).
The information that is used to support digital preservation methods is termed preservation metadata (Foundations and standards, 2008). In this capacity, preservation metadata is defined as "information that supports the process of ensuring the availability, identity, understandability, authenticity, viability, and renderability of digital materials" (Foundations and standards, 2008, p. 15). The core standard that is used for the majority of digital data preservation efforts at present is the Open Archival Information Systems (OAIS) reference model which became an ISO standard in 2003 (Foundations and standards, 2008). The OAIS reference model accomplishes three basic steps in digital data storage:
1. It defines a common vocabulary for preservation-related concepts that anyone working in the field should know and understand;
2. It defines an information model for objects and metadata that an OAIS should support; and,
3. It defines a functional model for the activities that an OAIS should perform (Foundations and standards, 2008, p. 16).
The flexibility of the OAIS reference model has made it possible for almost all digital storage repositories to conform with these requirements (Foundations and standards, 2008). In this regard, the editors of Library Technical Reports note that, "OAIS is a high-level reference model that allows a good bit of interpretation in its actual implementation; so much, in fact, that nearly all preservation repositories claim OAIS conformance" (Foundations and standards, 2008, p. 16).
There has also been a growing body of research devoted to the most appropriate file formats for long-term preservation with respect to other resource types. For example, Folk and Barkstrom (2003) report that there are a number of different file format attributes that can affect the long-term preservation of engineering and scientific data, including: (a) the ease of archival storage, (b) ease of archival access, (c) usability, (d) data scholarship enablement, (e) support for data integrity, and (f) maintainability and durability of file formats. In addition, other researchers have recommended the conversion of word processing files in digital storage centers into file preservation formats that are more suitable for long-term storage (Park & Oh, 2012). Likewise, there have also been recommendations advanced concerning the most appropriate file formats for three-dimensional objects with respect to the long-term reliability (Park & Oh, 2012).
Beyond the foregoing initiatives, other researchers have proposed a number of different criteria that should be applied under different long-term storage circumstances. For instance, Sullivan (2006) developed a set of properties that are desirable for long-term preservation format storage to explicate the purpose of PDF)/A using an archival and records management context. In this regard, Sullivan (2006) suggests that relevant criteria include (a) device independence, (b) self-containment, (c) self-describing, (d) transparency, (e) accessibility, (f) disclosure, and (g) adoption. The technical characteristic criteria developed by Sullivan (2006) were comprised of factors such as (a) open specification, (b) compatibility, and (c) standardization) and market characteristics including (a) guarantee duration, (b) support duration, (c) market penetration, and (d) the number of independent producers. Likewise, Rog and van Wijk (2008) posited a quantifiable evaluation approach to determine file format composite scores using seven primary criteria: (a)…