
Document OCR
Pricing
Pay per event

Document OCR
0.0 (0)
Pricing
Pay per event
2
Total users
25
Monthly users
8
Runs succeeded
93%
Last modified
2 days ago
The Actor takes list of document urls (PDFs or images) and language, runs them through OCR, and stores the results in dataset.
The Actor is using OCR Space service to perform the OCR.
The list of document urls can be provided as
- List of strings (urls): Useful if you want to provide the urls manually
- Field (column) of a specified dataset: For cases when the PDF/image url was scraped by another Actor and stored to dataset
- Key-value store: For cases when the PDF/image was scraped by another Actor and stored to key-value store
Pricing
The Actor is using pay-per-event pricing mode. There is flat fee for spinning the Actor up, and then for every successfully processed document page.
Documents failed to be processed are not counted.
For actual prices, see Actor pricing info.
Output
Each processed url is stored in default dataset in following shape:
{text: string, // The full text extracted from document.language: string, // Language of the document.url: string, // Original url.raw: object, // Raw data from the underlying OCR service. May change in the future.}
In the processing ended up with error, the shape is different:
{url: string, // Original url.error: string, // Error message.}
Planned features
The Actor is Work-in-progress - stay tuned for new features.
Integrability improvements
- Skip some of the keys in key-value store
- Skip some of the urls in dataset