coBibs Automatic File-Download and more…

8 minute read

It has been a while since the last time that I wrote about coBib and several important features have since been included. In this post, I will highlight the most important new features and walk you through how to configure and use them.

Automatic File-Download

Since version 3.2.0, coBib will attempt to download the PDF of a newly added entry automatically. Out of the box, this feature will work for addition via arXiv IDs. Support for arbitrary DOI entries needs to be configured by providing a dictionary of URL maps. Other parsers do not provide this functionality, yet. But for now, let us dive into a simple example…

Entries added via arXiv IDs

Consider adding a new entry via an arXiv ID using the following command:

cobib add --arxiv 2009.10095

This will add the new entry Egger2020 to your database. Furthermore, it will automatically download the PDF version of this article and save it to ~/.local/share/cobib/Egger2020.pdf. You can follow the download progress on the command-line via the commands output:

Downloading: [================                        ]   41.8%   1 MB / 2 MB
...
Downloading: [========================================]  100.0%   2 MB / 2 MB
Successfully downloaded ~/.local/share/cobib/Egger2020.pdf

If you want to change the default download location, you can do so by overwriting the config.utils.file_downloader.default_location setting in your config file (read this blogpost for more information on coBibs Python configuration system).

Entries added via DOIs

Now, let us turn towards downloading DOI entries. The reason why this case is more difficult, is that a DOI can resolve to a landing page of any Journal. Thus, there is no single recipe to determine the link to the articles PDF file.

To circumvent this problem, the user needs to manually specify a dictionary of URL mappings in the config.utils.file_downloader.url_map setting. The key-value pairs of this dictionary should be formatted similar to this example:

config.utils.file_downloader.url_map[
    r"(.+)://quantum-journal.org/papers/([^/]+)"
] = r"\1://quantum-journal.org/papers/\2/pdf/"

Let us walk through this snippet to understand what it does:

The key r"(.+)://quantum-journal.org/papers/([^/]+)" is a raw Python-string containing a regex pattern. This pattern is made up of the following parts:
- (.+)://: this matches any file protocol (e.g. http:// or https://) and captures the string preceding ://.
- quantum-journal.org/papers/: this is the main body of the URL and will be matched as plain text.
- ([^/]+): this is another regex capture matching any characters other than /.
The value of this entry, r"\1://quantum-journal.org/papers/\2/pdf/", is also a regex pattern which is used as a substitution pattern. The placeholders \1 and \2 will be replaced by the first and second captured string, respectively.

Here is a concrete example; when adding a new entry from a DOI via:

cobib add --doi 10.22331/q-2021-06-17-479

, the URL of the Journals landing page will resolve to https://quantum-journal.org/papers/q-2021-06-17-479/. This URL matches the key which we discussed above:

https://quantum-journal.org/papers/q-2021-06-17-479/
vvvvv                              vvvvvvvvvvvvvvvv
(.+) ://quantum-journal.org/papers/([^/]+)         /

Thus, after inserting the captured groups into the value pattern, we obtain the URL:

\1   ://quantum-journal.org/papers/\2              /pdf/
^^^^^                              ^^^^^^^^^^^^^^^^
https://quantum-journal.org/papers/q-2021-06-17-479/pdf/

, which is the correct URL pointing to the PDF version of this article.

As you can see, this recipe allows you to map from any Journals landing page to the URL pointing to the articles PDF. However, figuring out the correct regex pattern can be difficult at times. I am planning to collect a list of working patterns on coBibs Wiki in the future and encourage everyone to add their patterns for the benefit of everybody else.

Improvements to the `modify` Command

Two important changes have been made to how the filtering of the list command and the modifications of the modify command are interpreted:

Filter values are now interpreted as regex patterns (which we already looked at above): This means you can perform more advanced filtering in a similar way to how you can search through your database for regex patterns. Here is a simple example:
```
1

 cobib list ++label "\D+_\d+"
```
This will list all entries with a label matching the regex pattern \D+_\d+. In words, this pattern will match any label which starts with multiple non-digit characters, followed by an underscore and ends with multiple digits. This is a common label format used by many Journals, e.g. Rossmannek_2021.
Modifications get evaluated as f-strings: These kind of strings are a powerful feature of the Python language and allow you to use variables which will be interpreted literally. Within coBibs scope that means you can use any field name of your entry as a variable which will be replaced by its value in the current entry. Take an example again:
```
1

 cobib modify "pages:{pages.replace('--', '-')}" -- ...
```
In this example, the pages field is modified in such a way, that the occurrence of the string, --, gets replaced by a single dash, -. This showcases well, how having the fields available as proper variables allows you to perform advanced modifications which could otherwise be a long and manual editing task.

Finally, combining these two features, allows you to perform even more powerful modifications to your database. My favorite example is the following:

cobib modify "label:{label.replace('_', '')}" -- ++label "\D+_\d+"

This combines our previous two examples into one, renaming all entries which follow the naming pattern <string>_<number> (typically referring to <author>_<year>, e.g. Rossmannek_2021) to follow the pattern <string><number> (e.g. Rossmannek2021).

I consider this a killer feature of coBib making it standout against other bibliography-management tools out there.

Journal Abbreviations

In version 3.2.0, I also added the new Journal Abbreviations. With these, the export command can be used to automatically convert the journal fields of all entries to their abbreviated version. This is a useful functionality to ensure compliance with Journal requirements for BibTeX citation styles.

In order to use this feature, you must configure a list of abbreviations in the config.utils.journal_abbreviations setting. As an example, consider this snippet:

config.utils.journal_abbreviations += [("Annalen der Physik", "Ann. Phys.")]

Here, we are adding a new Journal Abbreviation for the “Annalen der Physik” journal, whose short-hand name is “Ann. Phys.”. Note, that we include the punctuation as part of the abbreviation which can be easily removed during exporting, if necessary.

Now, let us look at this example in action! Since we are only interested in the journal field, I will keep the entry to a bare minimum:

---
einstein:
  ENTRYTYPE: article
  journal: Annalen der Physik
...

We can export it in three different ways to a BibTeX file:

cobib export -b einstein.bib -s -- einstein

@article{einstein,
 journal = {Annalen der Physik},
}

This is the normal export command, which leaves the Journal name unchanged.

cobib export --abbreviate -b einstein.bib -s -- einstein

@article{einstein,
 journal = {Ann. Phys.},
}

By specifying the --abbreviate argument, the Journal name gets replaced with our configured abbreviation.

cobib export --abbreviate --dotless -b einstein.bib -s -- einstein
```
1
2
3

@article{einstein,
 journal = {Ann Phys},
}
```
Finally, by adding the --dotless argument, we can even remove the punctuation from the abbreviated name.

coBib will attempt to always store the full Journal name when adding new entries, and will automatically elongate an abbreviated name when encountering it during the add command. It will warn you about entries where it fails to elongate or abbreviate Journal names, so you can gradually grow your list of Journal abbreviations.

Database Format Linter

A Linter is a static code analysis tool which can check for stylistic/formatting errors (among other things).

Last but not least, I want to showcase the new _lint_database shell utility added as part of version 3.1.0. It can detect possible shortcomings in your database file formatting. This allows you to keep up-to-date with the latest improvements to coBibs database formatting, including newly supported features like proper number-support for numeric fields and list-support for fields like file, url and tags.

Here is an example database file with several shortcomings:

---
einstein:
  ENTRYTYPE: article
  ID: einstein
  author: Albert Einstein
  doi: 10.1002/andp.19053221004
  journal: Annalen der Physik
  month: June
  number: "10"
  pages: 891--921
  title: Zur Elektrodynamik bewegter K{\"o}rper
  url: http://dx.doi.org/10.1002/andp.19053221004, https://onlinelibrary.wiley.com/doi/epdf/10.1002/andp.19053221004
  volume: "322"
  year: "1905"
...

You can find the points for improvement by running cobib _lint_database in the terminal. For this example, you will obtain the following output:

literature.yaml:8 Converting field 'month' of entry 'einstein' from 'June' to 'jun'.
literature.yaml:9 Converting field 'number' of entry 'einstein' to integer: 10.
literature.yaml:12 Converted the field 'url' of entry 'einstein' to a list. You can consider storing it as such directly.
literature.yaml:13 Converting field 'volume' of entry 'einstein' to integer: 322.
literature.yaml:14 Converting field 'year' of entry 'einstein' to integer: 1905.
literature.yaml:4 The field 'ID' of entry 'einstein' is no longer required. It will be inferred from the entry label.

As you can see, this command produces lint messages in the format: <database path>:<line number> <message>. You can go through these messages one by one and change your database file as suggested. For example:

literature.yaml:8 Converting field 'month' of entry 'einstein' from 'June' to 'jun'.: As of version 3.1.0 coBib always stores the month field as a three-letter code, which ensures that macros from most common BibTeX citation styles will work correctly.
```
1

  month: jun
```
literature.yaml:9 Converting field 'number' of entry 'einstein' to integer: 10.: Numeric fields are now supported as well, which means that you can properly express numbers as such (and not as strings):
```
1
2
3

   number: 10
   volume: 322
   year: 1905
```

literature.yaml:12 Converted the field 'url' of entry 'einstein' to a list. You can consider storing it as such directly.

: coBib can also handle YAML lists properly which greatly improves the readability of the fields file, url and tags:

   url:
     - http://dx.doi.org/10.1002/andp.19053221004
     - https://onlinelibrary.wiley.com/doi/epdf/10.1002/andp.19053221004

literature.yaml:4 The field 'ID' of entry 'einstein' is no longer required. It will be inferred from the entry label.: Finally, coBib also no longer requires the redundant ID field and correctly infers this information from the label. Thus, you can safely remove these lines from your database.

In the end, the above database file could look like this:

---
einstein:
  ENTRYTYPE: article
  author: Albert Einstein
  doi: 10.1002/andp.19053221004
  journal: Annalen der Physik
  month: jun
  number: 10
  pages: 891--921
  title: Zur Elektrodynamik bewegter K{\"o}rper
  url:
    - http://dx.doi.org/10.1002/andp.19053221004
    - https://onlinelibrary.wiley.com/doi/epdf/10.1002/andp.19053221004
  volume: 322
  year: 1905
...

I am planning to add a utility to automatically resolve lint warnings in the future, so stay tuned for that!

Online documentation

I would like to leave with a final word on the new online documentation of coBib which is now hosted at https://cobib.gitlab.io/cobib/cobib.html. I hope that it may serve you as a useful resource in addition to coBibs manual page

Max Rossmannek

coBibs Automatic File-Download and more…

Automatic File-Download

Entries added via arXiv IDs

Entries added via DOIs

Improvements to the `modify` Command

Journal Abbreviations

Database Format Linter

Online documentation

You may also enjoy

coBib goes Textualized!

coBib’s New Configuration

Testing TUI applications in Python

Introducing coBib

Max Rossmannek

Automatic File-Download

Entries added via arXiv IDs

Entries added via DOIs

Improvements to the modify Command

Journal Abbreviations

Database Format Linter

Online documentation

You may also enjoy

coBib goes Textualized!

coBib’s New Configuration

Testing TUI applications in Python

Introducing coBib

Improvements to the `modify` Command