This paper examines the challenges and strategies associated with long-term digital data storage in an era of exponential information growth. Drawing on current literature, it surveys the scale of modern data generation, reviews access and preservation file formats, and discusses the Open Archival Information Systems (OAIS) reference model as a core preservation standard. The paper also evaluates criteria for selecting long-term file formats, outlines best practices for protecting stored data integrity, and considers cloud-based and local storage solutions. The review concludes that while the volume of digital data poses significant preservation challenges, a growing body of methodologies and standards offers promising pathways for ensuring data openness, security, and long-term accessibility.
This study guide is drawn from PaperDue's library of 130,000+ paper examples across 47 subjects.
There are a number of problems facing the future of information technology, including the fact that networks are increasingly asked to expand in order to accommodate more and more data. Many experts believe that such increases will mean two things: first, that networks will become increasingly secure; and second, that because of this heightened security, the data contained on those networks will become more difficult to access. This study sought to determine the various processes currently being used to secure data on various networks, and to assess whether that security will — or will not — ensure that data becomes incrementally more difficult to obtain.
To this end, the study drew on the most current literature available to determine whether there is a genuine problem with the way data is currently being stored, or merely a perception that data will remain safe indefinitely. It also examined what steps are being taken to ensure the long-term viability of data currently being collected. Following the review, a summary of the research and its most important findings are presented in the conclusion.
In the context of this study, the term security refers to information security — that is, the level of availability, confidentiality, and integrity of computer-based information (Robinson & Valeri, 2011). Information security is vitally important today given that virtually all electronic transactions are stored in one fashion or another for varying lengths of time (Datt, 2011). For example, every hour Google processes more than one petabyte of information (a petabyte is one million gigabytes), and Facebook hosts billions of photographs (Datt, 2011). Likewise, more than one million consumer transactions are processed by Walmart each hour (Datt, 2011). As Datt emphasizes, "The data deluge creates challenges for the storage and management of information, and both challenges and unprecedented opportunities in the mining of such information for actionable intelligence" (2011, p. 46).
Some indications of the explosion in data generation in recent years can be discerned from the following trends:
A recent study showed that the Internet Archive already contains multiple petabytes of data (Rosenthal, 2010, p. 47) and that data collections are constantly expanding at an ever-increasing rate. This situation leads many to wonder how safe and secure that data really is — and who should be concerned about it. Much of the concern is warranted, and it also drives companies and government entities to make backups, and sometimes even backups of backups. Such insecurity means that even more data is being saved, adding to the general accumulation. Rosenthal notes that most companies offering large-scale data storage solutions tout the fact that very little data will be lost or compromised if users store data with their firms. However, with so little experience in long-term data storage, most of these firms have no reliable way of knowing whether data will eventually be compromised.
Another study concluded that openness and the integrity of personal data are particularly critical elements for the success of a range of future e-science endeavors (Axelsson & Schroeder, 2009, p. 213). If openness and integrity are to play key roles, then questions about how accessible data will be in the future become especially important. This leads to interesting speculation: Will data currently being stored survive for centuries? What factors could cause data to be compromised? Could the very security measures taken to safeguard data ultimately render it inaccessible? In sum, secure data storage is important for a number of reasons, including the following:
Regardless of what form data is stored in, access controls are required to ensure that the distribution of stored data is scalable, reliable, and sufficiently secure. However, the majority of large-scale storage systems are non-relational key/value databases, including Yahoo's PNUTShell, eBay's Odyssey, Google's BigTable, Amazon's Dynamo, and SimpleDB (Zhang & Wang, 2010). According to Zhang and Wang, "In these systems, scalability, consistency, availability and partition tolerance properties are commonly desired. In addition to all the properties, a high-quality distributed data system must take security into account, especially in cloud computing. But enhancing system security usually weakens significantly its scalability and openness" (2010, p. 292).
Generally speaking, there are two types of file storage formats currently in use: an access format and a preservation format (Park & Oh, 2012). Access formats are appropriate for situations in which users view documents or otherwise manipulate them in real time, while preservation formats are suitable for storing documents in electronic archives for long periods of time (Park & Oh, 2012). Preservation formats provide "the ability to capture the material into the archive and render and disseminate the information now and in the future" (Park & Oh, 2012, p. 45). Both access and preservation formats play a role in how stored documents are processed over time: the former ensures ready accessibility to the data, while the latter is necessary to ensure its long-term integrity and security (Park & Oh, 2012).
The information used to support digital preservation methods is termed preservation metadata (Foundations and Standards, 2008). In this capacity, preservation metadata is defined as "information that supports the process of ensuring the availability, identity, understandability, authenticity, viability, and renderability of digital materials" (Foundations and Standards, 2008, p. 15). The core standard used for the majority of digital data preservation efforts is the Open Archival Information Systems (OAIS) reference model, which became an ISO standard in 2003 (Foundations and Standards, 2008). The OAIS reference model accomplishes three basic steps in digital data storage:
The flexibility of the OAIS reference model has made it possible for nearly all digital storage repositories to conform to these requirements (Foundations and Standards, 2008). As the editors of Library Technology Reports note, "OAIS is a high-level reference model that allows a good bit of interpretation in its actual implementation; so much, in fact, that nearly all preservation repositories claim OAIS conformance" (Foundations and Standards, 2008, p. 16).
There has also been a growing body of research devoted to identifying the most appropriate file formats for long-term preservation across different resource types. For example, Folk and Barkstrom (2003) report that a number of file format attributes can affect the long-term preservation of engineering and scientific data, including: (a) the ease of archival storage, (b) ease of archival access, (c) usability, (d) data scholarship enablement, (e) support for data integrity, and (f) maintainability and durability of file formats. Other researchers have recommended converting word processing files in digital storage centers into preservation formats more suitable for long-term storage (Park & Oh, 2012). There have also been recommendations advanced concerning the most appropriate file formats for three-dimensional objects with respect to long-term reliability (Park & Oh, 2012).
"Sullivan and Rog criteria for format selection"
"Best practices for securing and backing up data"
"Terabyte drives, XSEDE, and cloud alternatives"
The research showed that information security refers to the availability, confidentiality, and integrity of computer-based information. The research also confirmed that access formats and preservation formats are the two types of storage formats currently in use — the former being appropriate for short-term uses of stored data, and the latter for storing digital data in repositories over long periods of time. The research was consistent in emphasizing that today's information explosion will have significant implications for the future storage and accessibility of digital data, and that a number of constraints complicate ensuring the integrity of data stored over the long term.
You’re 49% through this paper. Sign up to read the remaining 3 sections.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.