How to start archiving games

  1. What to archive
    1. Which games
    2. Which files
      1. Digital copies
      2. Physical copies
      3. Browser games
      4. Mobile games
      5. General tips
  2. Best storage media
    1. Floppy disk
    2. CD, DVD and Blue-Ray
    3. Magnetic Tape
    4. HDD and SSD
    5. Cloud
    6. General tips
  3. Fighting file corruption
  4. Best file system
    1. RAID
    2. File systems
    3. Deduplication
    4. Compression
    5. Checksums
    6. Conclusion
  5. The procedure
    1. Archiving the data
    2. Storing the devices
    3. Checking data integrity
    4. Replacing faulty devices
  6. The community

What to archive

Which games

First you should make peace with the fact, that you don’t have the resources to archive all the games. That is fine, because honestly there is a lot of shovelware that will not be missed. If you don’t already have your own set of priorities, I would recommend to archive:

  1. The games you personally care about the most, even if the game is really bad.
  2. The really obscure games that nobody seems to know about.
  3. Whatever other games you like.

Space is limited. If a game is popular and takes up lot of storage space, consider assuming that somebody else will preserve it (unless it falls into the first category as written above).

If the game is an online game, working in the client-server model, it is still worth preserving the client. It will take a lot of effort, but the server side can be reverse engineered. If you care about it a lot, you can help it further by gathering as much information as you can, mainly by finding software to log packets sent between the server and your game client, then doing as much as you can in the game. In the worst case, a game could be theoretically recreated from video footage.

Archiving a game also does not necessarily mean just the installation files. Try to preserve manuals, artwork and accompanying software (like editors) too. Even mods can be worth saving, if you find them good enough, especially those that fix bugs or add widescreen support etc.

Usually you want to archive the latest version of the game, but sometimes it is better to grab an older version. The same applies for remakes, sometimes the original is better (like still having LAN option, while the remake relies on servers). When you know the game well, you will know if that is the case.

Which files

Digital copies

You get an installer

If you have downloaded installation files from gog.com, itch.io or other, then it is simple. Just archive the installer executable and any related data files.

The service installs it for you

I will use Steam here as an example, but the following applies to any similar service too. In case of game launchers/stores, that install the games for you, your only option is to find out all that the installation process does and copy it. The installation process does one or more of these steps:

So, to figure it out you have to:

  1. Open Steam on computer that never had the game installed (uninstalling often leaves files or registry keys laying around which we could then overlook by mistake).
  2. Use some software that can either create a list of all your files (so you can compare state before and after installation) or can straight up detect file changes. That is for finding all the game files that are added during installation. Then you can archive them. Don’t forget that it is not just the file what is important but also knowing in what directory it should be placed.
  3. Do the same for windows registry. You need to know what keys and values are added/changed and where.
  4. Now you can let Steam install the game. You will probably want to run the game once too, in case there is any first time setup.
  5. Review the changes after installation, archive the files and write down registry changes, preferably in a .reg file format, so that they can be applied easily using standard windows command line tools.

You can also make a list of installed software to see what dependencies and in what version are added during game installation, but that just allows you to write down whatever you did not have already installed and usually it is not hard to figure out what dependency you are missing to run the game.
Some games might have DRM software in them and refuse to start when Steam is not running, so prefer obtaining the installation files instead. Don’t worry too much about not being able to get them though, because the protection can be bypassed. There are lists of games that do/don’t have Steam DRM, but they are incomplete and possibly outdated, so you can only be sure by trying it yourself on a computer without Steam.

Physical copies

Having a physical copy means having the installation files of the game (usually anyway, some just have Steam installer instead of the game...). Normally you would just copy the CD/floppy/cartridge contents to your computer and that would be it. Unfortunately, it is not that simple thanks to copy protection.
Besides the first attempts of copy protection, which made the game refuse to continue unless you type in some code from the manual, the whole point is to make the game unable to run without the original physical copy being present in the computer.

Floppy disk

I don’t know much about these. Despite being old, they already had a lot of different tricks to prevent people from making an identical copy of the original floppy disk. It seems like special equipment might be sometimes needed to be able to properly read all these quirks and make sure you will have a good image. There also appears to be a lot of different image formats to pick from, so finding the best one is not going to be easy. I would look for an open format, that can handle the common copy protection and is at least somewhat popular on the platform you are interested in. But it is probably easier to just copy the game data and disable the code that checks for the original floppy disk.

Compact disc

CD/DVD copy protection is quite similar to floppy disks. Again it is about introducing features into the original disc that won’t be copied over by standard disc read and write process and the game will check for them before starting.

To create a faithful image of these discs, you need any drive that can read the disc and special software, like Daemon Tools or Alcohol 120%. It is also important to use the right file format for the image. You need raw format, like MDF which comes with additional MDS file that contains the metadata. If you just use ISO, the game won’t work without cracking the copy protection, because it will recognize that the image does not match the original CD. However, cracking the copy protection of games that come on CD/DVD is significantly harder than doing it for an old game that still released on floppy disks.

The CD may also contain software that will give you trouble, if it detects that you have installed some of the disc burning software. Some went as far as to install a rootkit during the disc autorun setup. Just something to be aware of, especially if you have the game installed on the computer while trying to create an image of its CD.
There are many more types of DRM, like region locking or online registration, but they don’t interfere with copying the disc, just with playing the game.

Special care needs to be taken when doing this with console games. Consoles have extra protections, including encryption of the disc, so finding a drive that can read the disc is way harder and you need the decryption key too. Alternatively, the console itself can be modded to allow copying the game files to external drive or over the network. You will have to research the console you have games for and get the right tools to overcome the protection. The newer the console, the harder it will be, so you might want to search online for an existing image of your game instead.

Cartridges

I have no experience with them, but it appears to be much the same story with the addition of special hardware needed to be able to read the data. Again, there is copy protection, so the cartridge image must include it or the game code has to be modified to not check for it.

Browser games

Flash

Most of the early browser games were built on Adobe Flash. Usually they can be downloaded from a website directly by viewing the website source and opening the .swf file link. Nowadays it is a little bit more complicated, because the websites that are still active are using various other methods to run the flash game instead of relying on you having the flash player extension (although the better web sites may have a download button). You can try somehow disabling the website’s player, so that the link gives you the file instead of loading up the player, or look for a browser extension that can download flash files, but first you should try finding it in one of the flash databases that are already archiving them, like Flashpoint.

Unity

Another popular option is Unity. The older versions of Unity Web Player use NPAPI, which is no longer supported in popular web browsers, so if your web browser does not support it, you will have to get an old version of it or find a different web browser that does support it. Of course, you will also need to find the older version of Unity Web Player extension. Once you can run it, you should be able to right click the page and save the whole web page from your web browser, which will give you .html file and a folder with accompanying files. Look for a .unity3d file in that folder. Sometimes it can have a different extension, like .txt, so just rename the extension to .unity3d.

Since version 5.4, Unity uses WebGL. That means no more .unity3d files, but instead, from what I understand, JavaScript and WebAssembly files. It should be possible to just save the whole web page, which should download all the necessary .js and .wasm files (WebAssembly can also be in text format). Sometimes the downloaded page may not work properly, because it is not correctly linking to the downloaded script files. You can fix that manually, but it might be better to use a tool for downloading websites, like the wget command on Linux.

HTML5

Lately there have been more of “HTML5” games, which are actually more of a JavaScript games, because the HTML is pretty much always combined with CSS and JavaScript, which does most of the logic. Again, what was said about Unity 5.4+ applies here as well. You should be able to get the files by just saving the whole web page from your web browser, which will download the page .html file and a folder with accompanying .css and .js files.

Notes

Modern web browsers tend to limit scripts opened from disk in what files they can access. So the downloaded game might not be able to work unless you disable this and possibly other security features in your web browser. When all fails, you can try running the game in a local server and connect to that from your web browser.

Besides the ability to execute in a web browser, browser games are the same as other games. That means they may download more files into the web browser’s cache when you run them, they may be just a client for a server and they may even contain DRM. On the other hand they don’t touch Windows registry or install any dependencies. Still, test them thoroughly to be sure that they do work offline (at least those not relying on server) and that you did archive all the necessary files.

Sometimes saving the web page won’t give you all the files you need. You might need to look through the page source and figure out how the game is loaded. In the worst case the game is running on the server and it is just generating html/css that is then sent to your web browser. That is basically like game streaming, so the only legal thing you can do is to record the game.

I have no experience with web development, so there are probably other technologies and details I missed, but I hope the information here is enough to get you started.

Mobile games

Pre-Android

Before Android OS came around, games were mostly written in Java, specifically using the J2ME version. Sometimes Adobe Flash was used instead. Then there are completely different platforms like BREW and more. So again, research the device you are interested in.
In case of J2ME, you need to extract the game’s .jar file from your phone. There also might be a .jad file there, but that is only a text description of the application, the .jar file has all the necessary game files. It seems that to do that you need just the right software and a way to connect with your phone (like Bluetooth).

You might be able to encounter DRM here too. The linked J2ME guide mentions encryption of the .jar file and I would not be surprised, if there were other security features in use. I would recommend to first look up online whether the files of the game you want were already extracted by someone else (J2ME, BREW).

Android

Here games get installed by copying the .apk file to the right place, sometimes there is also an .obb file that contains additional resources for the game, because of size limits put on the .apk file. So, you need to extract these two files from your phone or download them directly from the store. There are many tools online to accomplish both. Of course, expect DRM features to make this harder for you. Most probably you will need to gain root access to your phone. The game might also download more data after it is first executed, so make sure to archive those too.

General tips

Being able to run the game is also quite important. Installation files for common dependencies, like DirectX and various runtimes are probably nothing to worry about and will be easy to find in the future too. Not to mention they are often part of the game installation.
But a game might need a crack, a keygen or some other piece of software to make it work. Console emulators tend to not work without the original firmware. Browser games may need certain version of a player extension or even of a web browser. The more obscure the game and the software, the more you should consider preserving it, because while most of this can be reverse engineered, nobody might care about doing it for your favorite obscure game.

If the game was released for Linux or Mac in addition to Windows, you should still archive the Windows version. Almost no games are open source, so only its developer can easily update the game to work on future operating systems. When the game is no longer supported, it is only a matter of time before someone has to make an unofficial patch or utility to make it even run. Because most PC games are exclusive to Windows, the patch is way more likely to be made for the Windows copy and while a Linux copy might break, Wine will always be available on new Linux systems so it will be easier to use the Windows copy instead.
Console versions are a good choice too. It tends to be pretty simple to run them once a good emulator exists.

To save space, it is a good idea to compress your games’ data. If you have an installer for the game, that is usually already well compressed, but if you just copied the game’s files, you can benefit a lot from putting them into an archive. Personally, I would recommend 7zip. It is open source and does very good compression. To get the most of it, you will want to play around with its parameters. Just be aware that you might need a lot of RAM to compress large files.

When you make an image or a copy of the game, you should try to install it. Otherwise you might end up archiving broken image or incomplete copy. Also, the game might update itself or download more data during first run. You will need to archive those too, otherwise the game won’t work after the server is gone.

Apart from DRM code, if you got the game from unoriginal source, it might contain even worse kinds of malware. Fortunately, in case of old and obscure games, the potential malware will usually be also very old and therefore easy to detect. Malware is usually in the executable files, sometimes in the libraries. The game resources (textures, video, sound…) should be clean. Text files are usually clean, but they might contain malware, if they are source files or contain console commands.

If you don’t have an antivirus, you can check the potentially risky files using Virus Total. Just calculate checksum of your files and check their database. If it is not there, you can upload the file. You can also try ClamAV, which is an open source virus scanner. It is not great, but it should be enough for checking old games, which if they are infected at all, are probably infected by old, easily detectable malware.

Best storage media

First of all, forget about keeping the original cartridges and CDs. You would need a lot of storage space and you will need to replace them eventually anyway, either because they start to degrade or there is no commercially available device that can read them anymore.

Floppy Disk

Low capacity and hard to get. Skip.

CD, DVD and Blue-Ray

While some CDs and DVDs can last more than half a decade, their low capacity makes them more expensive than Blue-Ray for storing the same amount of data. Also, there is a M-Disc version of these that supposedly is able to last several centuries. It is questionable though, whether it is important for the media to last this long, because it won’t be of any help if there are no easily obtainable Blue-Ray readers in the future. So, unless you really want that longevity, there are other options that are either cheaper or less bothersome to deal with.

To work with these you need to buy the right optical drive, one that can both read and write and that supports the disc type you want to use. DVD has two standards, DVD- and DVD+. Blue-Ray has multiple types with different capacity. Depending on what you get, you might be able to write only once on the disc, so if the game gets any new updates, you will have to put them on a different one. It is recommended to store the discs vertically in jewel cases, keeping the surface clean, avoiding light (don’t worry about magnets) and keeping the right temperature and humidity.

Magnetic Tape

By far the cheapest option when comparing the tape cartridges themselves with the other options. But there is an extra investment in getting the tape drive, which can be quite expensive, unless you find older models. It has some drawbacks though, so I would go with tapes only when you want to get really serious with game preservation, like storing more than a few TBs of data.

It writes data sequentially, so random access is very slow and updating or adding files later is a pain, because you have to write the changes at the end of the tape. That is a problem for recently released games that are still getting updates, less for old games, but even those might in the future receive for example an important fan patch. To not get lost in this you need some software and a classic hard drive or SSD to store information about where all your files are stored on the tapes.

Under proper storage conditions, it is said tapes can last up to 30 years. That involves the right temperature, humidity, darkness and protecting them from dust and strong magnetic fields. Tapes nowadays use LTO standard, which is an open standard. That is good, but they keep releasing new versions of it which have limited backwards compatibility, so even though the tapes can last two or three decades, you might want to upgrade sooner, so that you don’t risk not having access to a reader for your old tapes.

HDD and SSD

For a HDD the average lifespan seems to be less than 10 years and it can fail at any moment. However, that is true for those in constant use. When used for backups, they won’t be used nearly as much and so they last much longer.
The price is similar to optical media, but without the worry that you will scratch the surface. Compared to tape, the susceptibility to magnetic field is much smaller. The main reason for failure seems to be mechanical, so just watch out for impacts and vibrations (mainly during operation, it can survive much more when unplugged). It might also be a good idea to use them once a few years to mitigate seizing of the moving parts. Sometimes the circuit board fails, but it seems to be rare and data are usually recoverable. They are easy to use. There is no need for a special device to read them and you can update the data on them easily.

SSD is in some ways more reliable than HDD because it lacks moving parts and SSDs seem to be failing less often than HDDs. However, they seem to develop errors more often and an old drive, when stored without power, will retain data only for a year or two, depending on temperature. Brand new SSD should retain data much longer, but compared to the other good options for archiving it seems to have the shortest lifespan. Add slightly higher price to it and it makes HDD the better choice for archiving, unless you keep dropping your drives on the ground.

USB Flash drive and microSD card work very similarly to SSD drive, so expect about the same reliability or worse.

Lastly, you can get internal or external drives. External come sealed in a casing and tend to be accessible only through a USB connection, instead of the usual interface like SATA. So, they have built in adapter that allows hard drive to communicate through USB. So far, I have not seen them fail, but it is possible.

Cloud

Good thing about cloud storage is that you won’t accidentally loose your data by dropping it on the floor. The bad thing is that you don’t own it, so you may loose access to it anyway. Storing a lot of data in cloud can also get expensive. Finally, it might not be legal in your country to upload your game installation files, so you would have to put them into encrypted archives. If you decide to use it, just make sure you have the uploaded data backed up on some physical media too.

General tips

What is the best storage medium for you might change in the future as new technologies are invented. Keep an eye out on what everyone else is using, so you are not stuck with obsolete storage and no easy way how to get data out of it.

You want to make multiple copies of your data, so that you can recover in case one of your copies is lost or corrupted. There are various rules for this, most known is 3-2-1 rule, but they can be interpreted in many ways, so think of them as recommendations. They share the same important ideas though:

There is one thing that can affect all your copies at once even when they are stored quite far from each other and that is EMP. It can happen naturally from ejections of Sun matter or artificially from nuclear weapons. Keep in mind that:

Fighting file corruption

Because nothing lasts forever, there is a possibility, that as the storage device ages, the data stored on it will get corrupted (also known as “bit rot”). That is, some bits become unstable and flip. For some files, like image and audio, that just means there will be extra noise. For other files, like compiled libraries, it means they won’t work anymore.

Based on some anecdotal evidence here and there, it looks like file corruption is not such a big deal as some make it out to be. The other guy didn’t find any errors after 5 years, the first guy found just a dozen of errors after 7 years and a big part of them could have happened during copying. The fact is, storage media have ways how to correct errors. CDs have error correction codes, magnetic tapes have them too. The same applies for hard drives and they also can detect bad sectors and replace them with spare ones. The real problem are errors that happen during writing and reading, as can be seen in the papers referenced here. When archiving, these can be overcome by verifying the file with checksum after writing it. Still, it is a good idea to check your files every year or two, just to be sure. You need to regularly check up on the physical state of your storage anyway.

There is one more thing to consider and that is the RAM of your computer. It too can encounter errors and there is so called ECC RAM, which includes the ability to correct them. When archiving on something rewritable, like a hard drive, this does not matter that much. Even if the computer writes a corrupted copy of a file because of an error in RAM during copying, you can just check the written file is the same as the original file and overwrite it in case it is not. However, ECC RAM does become more desirable when you are using single write only media.

Best file system

RAID

First, let’s talk about RAID. It uses multiple drives to spread data over them with extra copies or parity information, depending on the used scheme. When a drive in RAID fails, its contents can be restored using the remaining drives. To make a RAID setup, you can get a hardware controller that will manage your drives, or get a software that will make you computer do it.

In my opinion, RAID is not a good fit for archiving. It requires having the your redundant copies connected together, so at best you can have some of them in a different location but still connected through internet, which while protecting against real life accidents still makes them vulnerable to software issues. If something wipes your local drives in RAID, it could wipe the ones connected over internet as well. You also can’t easily use it with other storage media than HDDs and SSDs. And while you could fit more data on your drives when using parity data instead of making multiple copies, you need special software to restore the data from parity. If you still want to try it, software RAID seems like the better option now.

File systems

When you use a HDD or SSD for storage, you need to choose a file system to format it with. There are a lot of them, so instead of listing them all, it will be better to write down what the file system should do and then talk about those that fit the requirements. The requirements will partially depend on what file system most games are developed for, so they will fit within the file system’s limits. Windows has by far the most games released for it. That means our baseline is Microsoft’s NTFS file system. Now, to the requirements:

NTFS

Although it is a proprietary file system, most operating systems can write to it or at least read it. It has a journal, which helps to protect against corruption when power loss or system crash happens during writing, but it is metadata only, so the file will still be corrupted even though the file system will be fine after the operating system checks the disk. It can do some compression and bunch of things that are not needed for archiving. In Windows there are certain symbols that can’t be used in paths, but NTFS by itself does not have such limitation, so a Linux system can even write paths that Windows can’t handle. Overall, NTFS is good enough, but there are better options.

ext4

Right now, the most common file system on Linux and can be understood by Windows as well. It has some useful features, like extents (allows more efficient use of space), better journal with checksums and H-Tree directory system (more robust than commonly used B-Tree). It does not do compression and file checksums on its own, but it is an improvement over NTFS.

F2FS

Made specifically for SSD and its ilk. Among other things it is supposed to reduce number of writes, which is not that important for archiving, during which we do only a small number of writes. It does have a journal in a sense, it uses a log structure instead of trees. That also means that when you modify a file, a new copy is created instead of writing into the original data (also known as “copy on write”), which is safer regarding corruption. It does not use space as well as ext4, but it can compress by itself. No checksums though. You could use it, but the support on Windows seems lacking, so personally, I would prefer ext4 or NTFS.

JFS, XFS and UFS2

These are similar to ext4 in their features. The main difference for archiving is that they have less support in current operating systems than ext4.

VxFS, Reiser4, NOVA and HAMMER2

These bring some extra features over the already mentioned file systems, but are even more niche. I would recommend picking one of the more popular options.
NOVA is for non-volatile memory and compared to F2FS it has some extra protection using metadata checksums and parity for files (similarly to RAID4).
Reiser4 also has some extra protection compared to ext4 and similar file systems.
VxFS and HAMMER2 support features like compression or deduplication.

ReFS

The successor to NTFS. It can do most of what NTFS can do and adds resiliency features on top of that. It can do compression, deduplication, copy on write and most importantly it allows checksums for data, so that when a file’s data block gets corrupted, it will notice that on the next read of that file and show you an error. In combination with Storage Spaces software it is also able to detect and fix corrupted files automatically in the background, but it requires having redundant copies, so that basically only applies for having multiple disks connected to your computer in a RAID setup. The largest disadvantage of ReFS is that it has little support. It only works on Linux using some third party closed source driver and while it works on Windows, not every Windows edition can format a drive with ReFS, though the support will probably get better in the future. If you use only Windows and have access to ReFS, give it a try, because the support for the similar Btrfs and ZFS might not be perfect on Windows.

Btrfs

Does pretty much what ReFS can, but some features are not quite done yet, like RAID. Compression can be enabled in the file system but deduplication is done by an utility program after files are done copying instead of during copying. It is a bit different from other file systems thanks to subvolumes, but for archiving that does not make a difference. It is part of the Linux kernel so it is readily available on Linux and on Windows with subsystem for Linux or WinBtrfs driver.

OpenZFS

There is also a proprietary version of ZFS, but it is not much different from the open source one and is harder to get. Compared to Btrfs, it has more features (like native support for encryption and deduplication), but naturally it takes longer to learn how to use it. Also, because we don’t want to have our storage always connected to the computer, it is necessary to export the ZFS pool and then the import operation can take a while. It can be used on windows thanks to subsystem for Linux or OpenZFSOnWindows (which seems to have issues). On Linux it is fully supported, but not part of the kernel because of some licensing issues.

Deduplication

Saves space by finding blocks of data that are the same and replaces them with single block of data and some pointers. It can be combined with compression, but I am not sure how exactly it works together in individual file systems that support it, but it seems like having both can be better than just one of them. It makes data recovery harder, because when the block gets corrupted it will affect all files that point to it, but that is why we duplicate the data over multiple storage media, so that we don’t have to recover data from a failing device in the first place.

The amount of space saved depends primarily on what kind of data are stored. Game installation files are mostly unique, they may just share engine files and other dependencies like DirectX installer. The situation is made worse when the game installer is compressed, because that makes finding duplicate data less likely. It also depends on some settings, like the block size and the process requires significant amount of RAM.

I tried deduplication with both Btrfs and OpenZFS on some GOG installers and disk images. It saved no space. You can give it a try yourself, but on this kind of data I see it as pointless.

Compression

Some file systems can compress files while writing them. Again, I tried writing the same files to Btrfs and OpenZFS formatted disk with the strongest compression enabled. It saved very little, like 1 MB for each GB of data at best. The installation files are obviously already compressed. It can be still convenient to have it enabled, so that you don’t have to compress the few compressible files yourself, but it is not a must have file system feature for archiving video games.

Checksums

These are important to detect file corruption. You can either calculate them yourself and write them into a text file next to the file that server as input, or you can let file system handle it.

File system that support checksums do them for individual data blocks instead of the whole file. For example, Btrfs by default uses CRC32 hashing algorithm on 16 kB blocks. So for a bigger file you can have hundreds of these. That means you will spend around 250 MB on block checksums for each 1 TB of data. If you instead calculated checksum for the whole file, assuming you have around 1000 files in 1 TB of data, you would spend only a few MB at most. It is not a huge difference, but it is enough to fit a few old games.

However, CRC32 hashing algorithm can only output 32 bits worth of different values. The blocks and especially the whole files are much bigger than that, so there are bound to be collisions and that is a problem. The worst case scenario is that a file gets corrupted in such a way, that the computed hash stays the same, and so we won’t know the file is corrupted (unless comparing with a different copy of the file). Fortunately, we are talking only about a few bits flipping here and these hashing algorithms are specifically made to have very different outputs for very similar inputs. Here someone tried to find collisions, for CRC32 there were only two and the colliding words were completely different.

In my opinion, it is fine to go the way of convenience, sacrifice those few hundred MB and leave checksumming to the file system. If you want to be paranoid, you can decrease the odds of hash collision further by using bigger hash, like SHA256. If you want to be really paranoid, in addition to block checksums done by file system, calculate more checksums yourself for the whole file using different hashing algorithms. The probability that a file’s block gets corrupted while its checksum stays the same and all the checksums of the whole file stay the same is very tiny.

Lastly, we have to talk about corruption happening during copying. Even when you use a file system that checks integrity of data blocks, it can still happen that the file gets corrupted in computer memory during copying and will arrive corrupted on the archival drive. The file system receiving a file has no way of comparing it with the original, so it assumes the data it gets are correct. Therefore, even when you use ReFS/Btrfs/OpenZFS in your archival drive, after you copy a file there, you have to compare it with the original copy to make sure it didn’t change in transit.

Conclusion

You have two choices.

Personally, I will go with Btrfs. I won’t be enabling compression or doing deduplication, but I want the automatic checksums for convenience and in the end I found it to be pretty easy to use for simple archiving.

The procedure

Archiving the data

Unlike regular backups, archiving is a one time action. We write the data on our storage devices and will not modify the data for a long time, if ever. So, no special software is really needed for this. Just put the stuff you want to archive into single folder on your computer and copy the folder’s contents into your archival hard drive, magnetic tape, or burn it on an optical disc.

After that, you need to check the data copied correctly. That means doing a binary comparison of the files on your computer with the files on the device. There are many programs that can do that. Even your operating system probably already has a command line tool that can do that.
In the very unlikely case that you find a difference between the files, try overwriting the corrupted file and check it again. If it keeps getting copied wrong, it is no longer an unlucky error and there is probably something wrong with the device or your computer.

If you have a drive with a checksumming file system, you are done.
Otherwise, you need to calculate one or more hashes for each copied file and save them in a text file. It is probably better to save the hashes of each file into a separate text file, named the same way as the file they belong to. For example, if you are archiving GOG installer of Unreal named “setup_unreal_gold_2.0.0.6.exe”, you could keep its hashes in a “setup_unreal_gold_2.0.0.6.exe.sha” file next to it. Doing this manually would take long, there is most probably a program that can do this, or you can write a simple script for it using console commands.

Now you just need to repeat this for the other devices you have, so that you have multiple copies of the data. For an extra precaution, you should not buy the same storage device multiple times at once, but buy devices of different age, so that it is less likely they will all fail closely after each other.

Storing the devices

For external hard drives and SSDs, which are in a protective casing, this is pretty simple. Just put them in a place, where (ordered by priority):

In practice, just take your external drive, put it in a small bag and leave it on a stable shelf that is not in direct sunlight or in your bathroom. If you have a naked drive, you need to be more careful. Put it in an anti-static bag and throw in some desiccant to absorb moisture. Putting it in some padded container won’t hurt either.

Storage of optical discs is a little more complicated. They should be stored vertically in jewel cases, so they are protected from scratches. Don’t expose them to light, keep the temperature between -10 and 23 °C and keep the humidity between 20 and 50 %. Of course, don’t touch the disc surface when handling them. More details here.

Magnetic tapes are much more delicate. In an essence, you should maintain a stable temperature. The higher the storage temperature, the less humid the air has to be. For 20 °C you should have below 40 % humidity. The tapes also should be stored vertically and protected from dust. Apparently, they also should be unspooled and rewound every few years. This website has a detailed explanation of proper magnetic tape handling.

Checking data integrity

No device lasts forever, so we have to check on their state regularly. For drives I would recommend doing this every year or two. For tapes and especially optical discs the period can be longer. When the time comes, you take your device and you check the integrity of all files stored on it by calculating the file’s hash and comparing it with the one saved when the data were first copied.

If you have a drive with checksumming file system, just run its corruption checking procedure. For example, with Btrfs you simply run the scrub command.
Otherwise you have to calculate file checksums yourself and compare them with the original ones you saved in text files. Again, find a program that can do it for you, or write a simple script.
If you want to be extra cautious, you should not check all your devices at once, but check each of them a few weeks apart from the others. That would prevent loss of all your copies, for example in a case of a software bug caused by a recent update, which silently corrupts your data.

When a corrupted file is found, you can fix it by copying it again from a good copy. If there is a lot of corruption, or there are problems with reading the device, it is time to replace it.

Replacing faulty devices

While most people will replace their storage device as soon as it starts showing signs of failure, so that they avoid data loss, because we have multiple copies of our data, we don’t have to worry about sudden failure and have the advantage of using the device for as long as its files can be still read correctly.

For hard drives, people rely on their SMART reports to get an idea of the drive’s health. If you see the physical attributes get worse (like spin retry count or seek error), you should get a new drive, but you can keep this one around until it fails. However, if the attributes that imply problems with the magnetic storage get worse (like reallocated sectors or uncorrectable errors), or you discover more than a few corrupted files, I think the drive can’t be relied on keeping data anymore and should not be used at all.

For optical discs, you should look at their read error rate and whether any files are corrupted. You could also replace them, if they get scratched a lot, but they have a tolerance for scratches, so it is hard to tell what damage is too much damage before it just fails to read.

For magnetic tape, it depends on the physical state, like when the tape becomes sticky. File corruption appearing can also be used as an indicator. Read up on what can go wrong with magnetic tapes using the previously provided links.

Anyway, to replace a faulty device, just buy a new one and fill it with data from one of the copies that are still good.

The community

There are too many video games for a single person to preserve, this is a community effort. A place is needed where people could share lists of video games and related files they have archived, so that people don’t overly focus on popular games, which are archived many times over. I don’t know about any such place, but there are some potential starting points here and there. There of course various organizations, which are worth supporting, but I would not rely on them fully, because an individual can get away with things that organizations may shy away from in fear of getting sued.

Finally, keeping video games from getting lost is great, but it would be of little use when nobody could play them anymore. Unfortunately, uploading copies of most video games is in most countries illegal. However, occasionally helping a friend or two is really not worth the effort for a publisher to go after you.

Good luck out there, may your storage be wast and its data stable.



signature