Archiving steps
Last updated
Was this helpful?
Last updated
Was this helpful?
The process of digital archiving involves several key steps:
Crawling/Scraping: This is the initial step where automated tools, known as crawlers or scrapers, systematically browse the internet to collect data. These tools capture web pages, multimedia files, and other digital content.
Storage: Once collected, the data is stored in a structured and secure manner. This often involves the use of large-scale storage solutions that can handle the volume and complexity of the data.
Preservation: Preservation ensures that the stored data remains accessible and usable over time.This includes regular backups, format migration, and implementing preservation metadata to manage and track the lifecycle of the digital content.
Those practicing digital archiving and preservation may face several significant challenges. Below we list some key considerations which need to be accounted for when archiving digital content:
Volume/Size: The sheer volume of digital content being produced daily presents a massive challenge in terms of storage capacity and management. Deciding what should be saved and stored, maintenance schedules, handling storage costs, and coherent naming systems are all decisions which have to be addressed by archivists.
Dynamic content: Websites are increasingly dynamic, featuring interactive elements and real-time data, which complicates the archiving process. Archiving content which relies on user interactions (e.g. clicking on content), features like 3D tours and navigation menus can be tricky. Likewise, platforms which consistently change their designs can be hard for crawlers to capture. For instance, X (formerly Twitter) for unpaid content as well as added pop-ups requiring log-ins to view tweets, making it difficult for web archiving tools to document data.
Legality and ethics: When a file or media is archived, a digital reproduction is typically produced (essentially creating a copy of an original).There are many surrounding who owns the replication? Who has the power to modify or delete it? Further, archivists may face legal issues surrounding copyright, intellectual property and data privacy.These can pose significant hurdles to web archiving efforts.
Accessibility: Ensuring that archived content remains accessible to users, including those with disabilities, requires careful planning and implementation of accessibility standards.
Privacy: Considerations for what data should be collected and retained. Ensuring that the archive upholdspractices and data minimization (for personal identifying information), where applicable.
Environmental impact: Storing vast amounts of information digitally takes a through the massive amounts of electricity used to power data centers, cloud storage, emissions of the hardware (computers and hard drives), wifi, etc. Careful considerations around what data to store (and for how long), where and how to store it, lower impact media files, and (e.g. using deep archival storage) can mitigate some of these concerns.
Decide what information to archive. Make a list of the different types of information and collections you want to store.
Collect the files, folders and images you want to store keeping in mind all the places they may be stored (e.g. emails, GDrives, Microsoft Office, external hard drives, USBs, Cloud Storage, cameras, photo libraries)
Organize! Make sure your files and collections are named, sorted and categorized in a way that makes sense to navigate and will be accessible to others (if public). Adding metadata to your files (who, what, where, who) will help with identifying the files in the future. Create a document that keeps track of what you stored, where it is and when it was last updated.
Backup this data (make copies, store on different storage devices). Consider if the places you are using for storage are self-managed or owned by a third party and if there are fees (cloud storage).
Maintenance: check your storage levels, refresh your storage and remember to update your tech periodically to avoid your storage device from becoming redundant.
Digital archives need ongoing maintenance to remain accessible online. They have to be resilient to hardware and software changes, storage volume, and the addition of new content. Some tips for digital archiving (inspired by and ):