With the recent announcement that Hypergraph will evolve into a web publishing platform for research modules (with a new name forthcoming), Liberate Science also signed up to become a member of CrossRef.
This means we'll start issuing ("minting") Digital Object Identifiers (DOIs) for research modules in the near future. A clear and long-term strategy for this is valuable, and I want to share that journey openly.
In this post, I expand how our DOIs will look like.
What is a DOI?
A DOI is like an entry in a phonebook, resolving an ID (name) to information about that ID - most often a URL. Each DOI is constructed with a prefix for the issuer (10.53962 in our case) and a suffix for the specific content.
The suffix has to be unique and is strongly recommended to be opaque (e.g., not include the date of issuing). This DOI Primer provides more in-depth information on DOIs (your favorite TV show probably has one too!).
DOI suffix specifications
We will generate DOIs inspired by Martin Fenner's "Cool DOIs" idea.
In short this means our DOIs will look like this: https://doi.org/10.53962/xxxx-xxxx.
Algorithm
Specifically, we will generate DOI suffixes using the following algorithm:
- Take a random number 17,179,869,184 < X < 34,359,738,367
- Convert it into binary
- Split the binary string from 2 into groups of five characters
- Decode the split binary numbers from Step 3
- Match the number to the Crockford version of Base32 encoding
- Add a final character as checksum by matching the 32 modulo of the random number selected during Step 1, with the same mechanism as in Step 5
- Mint the DOI
As an example, we'll do the same procedure for an example number. For ease of the example, we'll aim for only four characters (xxxx) instead of eight (xxxx-xxxx):
- Drawn value: 15,123
- Binary: 11101100010011
- Split into groups of five characters: [11101, 10001, 0011] (note the last one is not five characters, but can be right filled with zeroes, as 0011 and 00011 are both 3 in binary)
- Decode the binary numbers: 29, 17, 3
- Match to Crockford: xh3
- 15,123 % 32 = 19 which is a k in Crockford
- Suffix xh3k resulting in our DOI 10.53962/xh3k
As mentioned before, our DOIs will have eight characters (xxxx-xxxx), which allows for roughly 17 billion unique suffixes, which we do not expect to exhaust in our lifetime.
If we do end up exhausting these 17 billion options, we can always extend the schema with another four characters: xxxx-xxxx-xxxx. In that situation we may have paid CrossRef billions because each issued DOI for a journal article would cost us around $1 (on top of our yearly membership fees).
Open questions
DOIs are most frequently issued for papers, preprints, books, and some other types. It remains unclear at the moment what type a research module would fall under (if any), as these have specific needs regarding provenance (e.g., parent modules). How we will end up implementing our automated registration is something we will also have to figure out.
I will keep you posted on how this develops through future blog posts.
Finally: Many thanks go to the open source code of cirneco and base32-url for digging into how to implement this! Find our own implementation here.