How to read tables from PDF: tabula-py Sometimes the raw data | Big Data Science
How to read tables from PDF: tabula-py Sometimes the raw data for analysis is stored in pdf documents. To automatically extract data from this format straight into a dataframe, try tabula-py. It is a simple Python wrapper for tabula-java that can read PDF tables and convert to pandas dataframe as well as CSV / TSV / JSON files. Just first install it through your pip package manager: pip install tabula-py And then import into your Python script: import tabula as tb And you can use: file = 'DataFile.pdf' data = tb.read_pdf (file, pages = '12') df = pd.DataFrame (data) Examples: https://medium.com/codestorm/how-to-read-and-scrape-data-from-pdf-file-using-python-2f2a2fe73ae7 Documentation: https://tabula-py.readthedocs.io/en/latest/
Big Data Science channel gathers together all interesting facts about Data Science. For cooperation: a.chernobrovov@gmail.com. 💼 — https://t.me/bds_job — channel about Data Science jobs and car...