List of strings (urls): Useful if you want to provide the urls manually
Field (column) of a specified dataset: For cases when the PDF/image url was scraped by another Actor and stored to dataset
Key-value store: For cases when the PDF/image was scraped by another Actor and stored to key-value store

Pricing

The Actor is using pay-per-event pricing mode. There is flat fee for spinning the Actor up, and then for every successfully processed document page.

Documents failed to be processed are not counted.

For actual prices, see Actor pricing info.

Output

Each processed url is stored in default dataset in following shape:

{
    text: string, // The full text extracted from document.
    language: string, // Language of the document.
    url: string, // Original url.
    raw: object, // Raw data from the underlying OCR service. May change in the future.
}

In the processing ended up with error, the shape is different:

{
    url: string, // Original url.
    error: string, // Error message.
}

Planned features

The Actor is Work-in-progress - stay tuned for new features.

Integrability improvements

Skip some of the keys in key-value store
Skip some of the urls in dataset

On this page

Document OCR

Share Actor:

Receipt Scanner

confidential_sand/receipt-scanner

Extract store name, date, total, items and more from receipt images or PDFs using AI-powered OCR. Ideal for expense tracking, finance automation, and data extraction workflows. Handles messy real-world formats with high accuracy.

Artur Malev

Docling

vancura/docling

Docling document parser & converter – Convert documents into structured data without complexity. This Actor leverages the powerful Docling library to parse and transform various document formats into clean, structured outputs ready for analysis or integration.

Václav Vančura

183

5.0

OCR for Google Maps pins

danielmilevski9/google-maps-pins-map-ocr

Actor will try to find pins specified exactly by sprite https://github.com/apify-alexey/gmaps-ocrpin/blob/main/pin.png and store coordinates of the pins found in dataset and OUTPUT

Daniel Milevski

394

5.0

Markdown Converter

jindrich.bar/markdown-converter

A simple Actor for converting pdf / doc / docx files to Markdown.

Jindřich Bär

Invoice Generator Pro 🧾

powerful_bachelor/invoice-generator-pro

✨ Create professional invoices instantly! 💼 Input business details, add client info, and generate polished invoices that maintain your brand. 💯 Customize with logos, colors, and payment terms. 🚀 Save time, get paid faster, and look professional with every transaction! 📊 Track finances easily. 💸

Powerful Bachelor

PDF Extractor 2.0

jupri/pdf-extractor-2-0

💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

cat

100

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

376

Obchodni Rejstrik Downloader

valek.josef/obchodni-rejstrik-downloader

Downloads data from Czech company registry https://or.justice.cz/

Josef Válek

PDF Text Extractor

sami_apify/PDF-Text-Extractor

This actor downloads PDFs from provided URLs, extracts text content from them, and saves the extracted data into an Apify dataset. It’s ideal for scraping and processing PDFs available online.