Pricing

Pay per usage

Go to Store

PDF Text Extractor

Try for free

Developed by

Jiří Moravčík

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

5.0 (1)

Pricing

Pay per usage

Total users

723

Monthly users

Runs succeeded

>99%

Issues response

1.4 days

Last modified

2 months ago

Integrations

Automation

You can access the PDF Text Extractor programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4# Replace '<YOUR_API_TOKEN>' with your token.
5client = ApifyClient("<YOUR_API_TOKEN>")
6
7# Prepare the Actor input
8run_input = { "urls": ["https://arxiv.org/pdf/2307.12856"] }
9
10# Run the Actor and wait for it to finish
11run = client.actor("jirimoravcik/pdf-text-extractor").call(run_input=run_input)
12
13# Fetch and print Actor results from the run's dataset (if there are any)
14print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
15for item in client.dataset(run["defaultDatasetId"]).iterate_items():
16    print(item)
17
18# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

PDF Text Extractor API in Python

The Apify API client for Python is the official library that allows you to use PDF Text Extractor API in Python, providing convenience functions and automatic retries on errors.

Install the apify-client

$pip install apify-client

Other API clients include:

PDF Text Extractor API in JavaScript

PDF Text Extractor API through CLI

PDF Text Extractor OpenAPI definition

PDF Text Extractor API

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

375

PDF Extractor 2.0

jupri/pdf-extractor-2-0

💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

cat

PDF Text Extractor

sami_apify/PDF-Text-Extractor

This actor downloads PDFs from provided URLs, extracts text content from them, and saves the extracted data into an Apify dataset. It’s ideal for scraping and processing PDFs available online.

sami

HTML to PDF converter

apify/html-to-pdf-converter

Convert HTML string to A4 PDF.

Apify

4.3

HTML to PDF Converter

jancurn/url-to-pdf

Loads a web page in headless Chrome using Puppeteer and prints it to PDF. The input is a JSON object and output is a PDF file.

Jan Čurn

472

Website To PDF Converter

louisdeconinck/website-to-pdf-converter

Convert websites to high-quality PDF documents with customizable options. This powerful actor allows you to transform website pages with both static HTML and dynamic content into professional-grade PDFs, offering a wide range of customization features such as page format, orientation, margins, …

Louis Deconinck

5.0

Markdown Converter

jindrich.bar/markdown-converter

A simple Actor for converting pdf / doc / docx files to Markdown.

Jindřich Bär

HTML string to PDF

mhamas/html-string-to-pdf

Convert HTML string to A4 PDF.

Matej Hamas

Google Slides Replacer

kamil.stus/google-slides-replacer

Automate the creation of Google Slides presentations from a template, with support for dynamic text replacement.

Kamil Štus

HTML to PDF Converter Pro 🔄

powerful_bachelor/html-to-pdf-converter-pro

🔄 Convert web pages to high-quality PDFs with special canvas element handling! Perfect for 📄 documentation, 🖨️ printing, and 🔒 archiving. Features include batch processing and flexible page settings. Transform your web content into professional PDFs! 🚀