A web-based tool that converts scientific papers from PDF to HTML

The folks at the Allen Institute for Artificial Intelligence just released an intriguing tool – “Paper to HTML”, which allows you to download a scientific article and turn it into an HTML web page.

The goal, as they wrote in their email, is to improve accessibility: Screen readers and accessibility technology generally find it much easier to parse HTML than PDFs:

This week, a team of researchers and engineers led by Lucy Lu Wang released a prototype of their tool that converts scientific PDFs to HTML, making them readable by screen readers and much more easily viewed on mobile devices. After learn that less than 3% of scientific articles meet minimum accessibility criteria, AI2 is looking for new and better ways to make scientific publication accessible to the widest possible audience

I downloaded a science article that I was reading recently and damn it the tool did a terrific job. This is a screenshot of the HTML code generated above.

It will also make it easier for me to cut items from PDF files. Right now, most of the time when I cut and paste a PDF document into, say, Google Docs or Word, the text is cut out entirely with line breaks. But this tool renders all the text as

simple HTML files, which cut and paste as a full bolus of text. I’m in!

Esther L. Gunn