N/APosted on - 04/19/2012
Cross Ref Labs is glad to announce the "pdf-extract". An open source set of tools for libraries extracting citation reference from PDFs. Aside from extracting citation references, will this "pdf-extract" has another features like converting your PDF files to Microsoft Office Application format?
First Public Released of “PDF-extract” – from Cross Ref Labs
The PDF-extract tool function is limited to extract the semantically reference of PDF scholarly journal articles, and it will work only on a full text PDF, if the PDF contains some scanned images, the PDF-extract will probably not work; that is to say that PDF-extract is not designed to make conversion of PDF files to other formats such as MS Office format.
The PDF-extract is also available in a web Form at http://extracto.labs.crossref.org, were you can use and test the PDF-extract functionalities, unfortunately this is still experimental which make it very slow.
Nevertheless, the guys at Cross Ref Labs are working on the PDF-extract to make it able to extract more semantically sections such as: methods, tables, captions and so on.