Endnote Gem
HOW from HAOWorking on an Endnote parser for MX as part of our interactions with the Biodiversity Heritage Library (BHL). We want to collect new terms parsed from Journal of Hymenoptera Research (JHR) articles OCRed on BHL.
But then we ran into the first issue: JHR literature citations are not available in a nice format we can import (based on article, BHL has them in Endnote based on volume). Ideally in the end it will be published (in part) as my first gem on Github (results of project #1). The logic for creating, or perhaps better termed the justification for creating, an Endnote parser primarily has to do with Google. We at the Hymenotpera Anatomy Project are adding lots of references and MX reference addition is form heavy (unavoidable)...that is if you type it all in. Google Scholar exports references in Endnote. Thus the proposed work flow is something like this:
- I have a citation I want to enter
- I Google it
- Cut the Endnote file
- Paste and verify in MX
It will be most useful if we can then export all references in Endnote as well (or perhaps some other library friendly formats?). That way we can return the nicely formatted references for those who need them (including JHR and maybe BHL).
What I would really like to see is BHL OCR returned to me based on pages. I know you can already get by asking BHL to email you the pages, and rumor has it that a wrapper is being written to hack just this, but it would be lovely to access it directly without the hack.


4 Comments:
Irene,
I'm working on much the same problem, e.g. http://iphylo.blogspot.com/2009/11/biodiversity-heritage-library-viewer.html. If one goal is to be able to go from Endnote citations to BHL content then I'm working on exactly that problem at the moment. I'm assembling bibliographic lists (in RIS format) and then looking at mapping these to BHL, aiming to get links to the first page in BHL.
If you want an Endnote file of Journal of Hymenoptera Research articles I can make one for articles published after 1999 (not mapped to BHL yet). You can harvest earlier citations using Google Scholar by searching for articles from that journal. You can automatically extract the bibliographic data using Zotero. It will be a bit ropey, but you won't have to start from scratch.
Rod,
Thanks very much for your comment and help. MJY showed me this morning your export in RIS format and it is exactly what will be helpful to import the JHR references we need.
Ill try Zotero.
Using Google Scholar I was imagining we can get the citations formatted by searching for article titles...then using a 'cut and paste' tool to import the reference into our database. Maybe a bit faster than typing it all out in a form.
My primary goal right now is getting OCRed literature from BHL to parse for the HAO. Part of that would be the citation for the article that contains the term. We created a gem that scrapes BHL for OCR return (http://github.com/mjy/rubyBHL) but it returns the entire volume. Not really useful (or not as useful if returned by articles or pages).
Irene,
So, if you want a citation for a term, am I right in thinking that you'd like a service that takes a BHL PageID (i.e., the page that contains the term) and returns the corresponding article?
Rod,
Thinking of a service that returns exact pages from an article. This could manifest itself in a couple of ways. If we had a term + page number (or range of pages) the service would return exactly that/those pages. So entire articles could be returned based on page numbers or a single page where the term is described. This service is already almost there..but you have to retrieve the pages via email.
Another useful service would be to create links to the BHL based on actual page numbers in the article... when creating the links on BHL the page numbers dont correspond (ie http://www.biodiversitylibrary.org/item/21356#8 is actually page 214).
We are still just exploring the possibilities for our project and def open to ideas.
Post a Comment
<< Home