Immediate archive implementation

We recently announced our immediate archival strategy, which will ensure the content on ResearchEquals stays available in case of an immediate, disastrous event (e.g., accidents, business shutting down). In this post, we describe how we implemented this archive strategy, which is run at zero cost using open-source software.

The immediate archive is continuously updated and made available using GitHub and GitHub Actions. We created new API routes on ResearchEquals where you can find the most recent modules and collections. These API routes are the basis of scripts that downloads all the relevant metadata and files. We are automating archival using these scripts through GitHub Actions, resulting in automated updates every night. It also means that if you are so inclined, you can at any time create your own local archive using the same scripts (one for modules; one for collections). You can also create your own local copy of the entire archive repository.

Simply downloading all these files is the first step - we are making all this content immediately available for consumption as well. If you visit https://archive.researchequals.com right now, you'll see our immediate archive in action. The immediate archive will stay available regardless of whether we can afford to pay any bills, even though you will not be able to create and add any new content on the archive. We are hosting it using GitHub Pages and creating all the pages using eleventy. As you may notice, the page design is in need of improvement — which is forthcoming. We wanted the archive to be available as soon as possible, because you never know when disaster may strike, after all.

A big benefit to our immediate archive is that we're maintaining URL structure. This means that when you are viewing a module on ResearchEquals (e.g., researchequals.com/modules/xxxx-xxxx) you will be able to view it on the archive using the same structure (e.g., archive.researchequals.com/modules/xxxx-xxxx). As a result, if the main website is ever down for whatever reason, the archive is also your immediate backup. In case of an immediate disaster, any of the ResearchEquals maintainers can switch the main domain to the archive at any time, without breaking any DOI links.

Because we're using GitHub to host our archive, we are limited by the permitted filesizes. Since the launch of ResearchEquals, we had a soft-limit of 100MB per published file in an attempt to restrict the costs from rising rapidly. GitHub has a hard-limit of 100MB per published file, so we are going to stick to that limit. We have had one request for increased file sizes at this time, so if we have to increase the limit we will also have to revisit our archival process as well.

All in all, we are pleased that the biggest part of our immediate archive strategy is implemented within several months of its creation. The implementation is in no way complete, and we will keep making ongoing improvements — especially the design. Nonetheless, this first step is important to ensure what gets published today stays published, regardless of what may occur tomorrow.


Immediate archive implementation
Liberate Science GmbH September 18, 2023
How we review & edit (s03e13)