PDF Extractor 2.0 avatar
PDF Extractor 2.0

Pricing

$30.00/month + usage

Go to Store
PDF Extractor 2.0

PDF Extractor 2.0

Developed by

cat

Maintained by Community

💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

0.0 (0)

Pricing

$30.00/month + usage

2

Monthly users

5

Runs succeeded

>99%

Last modified

4 months ago

Welcome to PDF Extractor

🍂 About PDF Format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.

🍂 About This Actor

💫 Extract contents from PDF documents

Features :

  • ⭐ Extract PDF pages as Text or Image (SVG, PNG, JPEG).
  • ⭐ Extract PDF Metadata.
  • ⭐ Extract PDF Table of Contents
  • ⭐ Extract PDF Tables
  • ⭐ Extract Encrypted PDF (password protected)
  • ⭐ Extract Embedded images.
  • ⭐ Extract Attachments.
  • ⭐ Extract multiple URL files

🍂 Tutorial

Input Parameters

NameTypeDescription
urlArray [String]List of PDF document URL
contentStringOutput pages format (text, svg, png, jpg)
imagesBoolean (true/false)Extract embedded images
attachmentsBoolean (true/false)Extract embedded files
tablesBoolean (true/false)Extract tables

Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.

Dataset Output Format :

1[	
2	# URL-1: Metadata
3	{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
4	# URL-1: Page Contents
5	{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
6	{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },
7	...
8	# URL-2: Metadata
9	{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
10	# URL-2: Page Contents
11	{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
12	{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },	
13	...
14]

🍂 Output Samples

PDF Sample #1

URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf

1{
2
3}

PDF Sample #2

URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf

1{
2
3}

✏️ Support

⚡️ Feel free to reach out to the developer for any issues or suggestions for improvement.

Pricing

Pricing model

Rental 

To use this Actor, you have to pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period. You also pay for the Apify platform usage.

Free trial

7 days

Price

$30.00