Here are some common issues that arise in archival projects, and some ideas for helping improve your workflow.
Generally, an archival project is designed from the beginning as a way to make information more accessible and to help keep it available long term. However, every digital project requires resources to maintain and needs to be updated from time to time. There are some things you can do to minimize the effort it will require to keep your archival project functional, starting with choosing file types to store and organize your content that help you avoid ending up with a bunch of unreadable content five years from now:
Choose a “lossless” file type for all media. Some file types use formulas to reconstruct parts of a file rather than storing all the data. This leads to much smaller file sizes, but also means that you will be missing information later. You can always save a version at a smaller size or in a different format from a large, high-data file, but you can’t go the other direction. There are lots of charts and information available online that can help you find a lossless file type for each type of content. Some typical examples include using .wav instead of .mp3 for audio files, and .tif instead of .jpg for images.
Wherever possible, look for non-proprietary, open-source file types. These are formats that can be read by lots of programs, so you don’t have to worry that your files will become unreadable if a particular program is no longer around. This might mean using .xml instead of .doc for texts, or .csv instead of .xls. As will lossless file types, there are lots of resources online for finding files.
When possible, choose a file type that includes metadata or associated instructions for the content within the file itself. The Library of Congress maintains a useful guide to preferred file types for different kinds of content that takes into account other factors as well.
Another critical step is keeping a separate backup of all parts of your project: the digital content, its organizational structure and metadata, and the structure and design of any websites or interfaces you build to share the content. Even if one of your tools or platforms includes backups as part of the service, it is wise to create a separate backup in a different format and a different place.
Here are some common issues that arise in archival projects, and some ideas for helping improve your workflow:
A memory-intensive computer program (like many image or video processors) needs to run for a long time and you can’t do something else while it runs: try setting up batch processing or an automation and then running that overnight or while you are away, or setting up a separate computer for this process. Many programs have built in batch-processing or time-delay functions that you can search in their help documentation, but this kind of automation can also be done at the computer level with third-party applications. Search on the type of automation you want and include either your computer type (PC, Mac, etc.) or the programs you want to automate. Searching for “Macro” and the program type may also find useful results. “Automator” for Macs and AutoHotKey-based programs for PCs are good external programs to consider.
Typically these programs are best at very simple, repetitive tasks, like opening a file, saving it in a particular format, and closing it -- this can save you a lot of time if you need to change the size and format of thousands of files!
Text content isn’t being entered consistently because of problems like typos or inconsistencies (like “Oregon” vs. “OR” in a “State” category): Create controlled-entry fields. Almost every database or spreadsheet software has a way to force users to enter options only from a particular list of options. This is an especially good practice if you will have different people entering material. Many programs also have a way to create internal descriptions for fields that are shorter than the field titles; this can be a good way to document how to decide what content is correct in marginal or nonspecific cases.