First Public Released of “PDF-extract” – from Cross Ref Labs

Asked By 20 points N/A Posted on -

Cross Ref Labs is glad to announce the "pdf-extract". An open source set of tools for libraries extracting citation reference from PDFs. Aside from extracting citation references, will this "pdf-extract" has another features like converting your PDF files to Microsoft Office Application format?

Answered By 0 points N/A #134492

First Public Released of “PDF-extract” – from Cross Ref Labs



The PDF-extract tool function is limited to extract the semantically reference of PDF scholarly journal articles, and it will work only on a full text PDF, if the PDF contains some scanned images, the PDF-extract will probably not work; that is to say that PDF-extract is not designed to make conversion of PDF files to other formats such as MS Office format.

The PDF-extract is also available in a web Form at, were you can use and test the PDF-extract functionalities, unfortunately this is still experimental which make it very slow.

Nevertheless, the guys at Cross Ref Labs are working on the PDF-extract to make it able to extract more semantically sections such as: methods, tables, captions and so on.


Login/Register to Answer

Related Questions