PDF Extractor 2.0

Pricing

$30.00/month + usage

Try for free

Go to Apify Store

PDF Extractor 2.0

Try for free

Developed by

cat

Maintained by Community

💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

0.0 (0)

Pricing

$30.00/month + usage

132

Last modified

a day ago

Automation

Developer tools

Welcome to PDF Extractor

🍂 About PDF Format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.

🍂 About This Actor

💫 Extract contents from PDF documents

Features :

⭐ Extract PDF pages as Text or Image (SVG, PNG, JPEG).
⭐ Extract PDF Metadata.
⭐ Extract PDF Table of Contents
⭐ Extract PDF Tables
⭐ Extract Encrypted PDF (password protected)
⭐ Extract Embedded images.
⭐ Extract Attachments.
⭐ Extract multiple URL files

🍂 Tutorial

Input Parameters

Name	Type	Description
`url`	Array `[String]`	List of PDF document `URL`
`content`	String	Output pages format (`text, svg, png, jpg`)
`images`	Boolean `(true/false)`	Extract embedded images
`attachments`	Boolean `(true/false)`	Extract embedded files
`tables`	Boolean `(true/false)`	Extract tables

Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.

Dataset Output Format :

[	
	# URL-1: Metadata
	{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
	# URL-1: Page Contents
	{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
	{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },
	...
	# URL-2: Metadata
	{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
	# URL-2: Page Contents
	{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
	{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },	
	...
]

🍂 Output Samples

PDF Sample #1

URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf

{

}

PDF Sample #2

URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf

{

}

✏️ Support

⚡️ Feel free to reach out to the developer for any issues or suggestions for improvement.

On this page

Welcome to PDF Extractor

Share Actor:

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

418

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

Jiří Moravčík

796

5.0

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

Akash Kumar Naik

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

codemaster devops

HTML to PDF Converter

jancurn/url-to-pdf

Loads a web page in headless Chrome using Puppeteer and prints it to PDF. The input is a JSON object and output is a PDF file.

Jan Čurn

500

PDF Text Extractor

sami_apify/PDF-Text-Extractor

This actor downloads PDFs from provided URLs, extracts text content from them, and saves the extracted data into an Apify dataset. It’s ideal for scraping and processing PDFs available online.

sami

HTML to PDF converter

apify/html-to-pdf-converter

Convert HTML string to A4 PDF.

Apify

102

4.3

Markdown Converter

jindrich.bar/markdown-converter

A simple Actor for converting pdf / doc / docx files to Markdown.

Jindřich Bär

Website To PDF Converter

louisdeconinck/website-to-pdf-converter

Convert websites to high-quality PDF documents with customizable options. This powerful actor allows you to transform website pages with both static HTML and dynamic content into professional-grade PDFs, offering a wide range of customization features such as page format, orientation, margins, …

Louis Deconinck

5.0

Convert Image to PDF and PDF to Image

akash9078/image-pdf-converter

Convert images (JPG, PNG, BMP, and more) into high-quality PDFs, or extract images from PDF files in seconds. Image–PDF Converter Pro delivers fast, reliable, and professional results for all your document and image conversion needs.

Akash Kumar Naik