Getting the data

I used the following workflow to access all papers of S. Shelah listed on ‘MathSciNet’

First I retrieve the MR-numbers. I use the mrnumbers command from mrtools.py. The option --url should point to a search result from ‘MathSciNet’. The option --crawl specifies, that I want the results from all pages and finally I specify where I want to dump the list of MR-numbers.

$ mrtools.py mrnumbers --crawl --dump files/mrnumbers.yaml --url \
> "http://tinyurl.com/shelahmsn"

Next I will access the bib-files associated to these numbers. For this purpose I am using the bib command.

$ mrtools.py bib --load files/mrnumbers.yaml --dump files/msn.bib

Now that I have the bibliographic data from ‘MathSciNet’, I am going to create the merge keys. This can be done using pybibtools.py with the make-key command. The option -k tells the script which fields should be used for the key creation. There are to special values for -k, namely normauthor and normtitle. They call listb.normalizetex.norm_author() or listb.normalizetex.norm_title() resp. and add these fields to the bibliography.

$ pybibtools.py make-key -k normauthor -k year -k normtitle \
> -o files/norm_msn.bib files/msn.bib

Now let’s do the same for ‘listb’.

$ pybibtools.py make-key -k normauthor -k year -k normtitle \
> -o files/norm_listb.bib files/listb.bib

This raises a RuntimeError if the generated keys are not unique. If so a note is taken in the respective ‘info’ file.

Finally, we are able to merge the datasets.

$ pybibtools.py merge --left -o files/merged.bib \
> files/norm_listb.bib files/norm_msn.bib

Note that the merge command is not commutative.

Here is some data of the first trial run. The last column indicates that 842 entries could be matched.

# entries with URL with MR-no.
listb 1144 0 0
msn 1019 889 1019
merged 1144 752 842