Tabula System

From GM-RKB
Jump to navigation Jump to search

Tabula System is a free and open-source PDF table extraction system.



References

2018

  • https://github.com/tabulapdf/tabula
    • QUOTE: If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple web interface.

      Caveat: Tabula only works on text-based PDFs, not scanned documents. If you can click-and-drag to select text in your table in a PDF viewer (even if the output is disorganized trash), then your PDF is text-based and Tabula should work.