
Pay-as-you-go API / JSON scraper
Pricing
$25.00 / 1,000 results

Pay-as-you-go API / JSON scraper
Scrape as pay-as-you-go any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.
0.0 (0)
Pricing
$25.00 / 1,000 results
2
Monthly users
8
Runs succeeded
99%
Last modified
a year ago
Download and format JSON endpoint data
Download any JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. This actor is the pay-as-you-go version of API / JSON scraper
Features
- Optimized, fast and lightweight
- Small memory requirement
- Works only with JSON payloads
- Easy recursion
- Filter and map complex JSON structures
- Comes enabled with helper libraries: lodash, moment
- Full access to your account resources through
Apify
variable - The run fails if all requests failed
Handling errors
This scraper is different from cheerio-scraper that you can handle the errors before the handlePageFunction
fails.
Using the handleError
input, you can enqueue extra requests before failing, allowing you to recover or trying a different URL.
1{ 2 handleError: async ({ addRequest, request, response, error }) => { 3 request.noRetry = error.message.includes('Unexpected') || response.statusCode == 404; 4 5 addRequest({ 6 url: `${request.url}?retry=true`, 7 }); 8 } 9}
Filter Map function
This function can filter, map and enqueue requests at the same time. The difference is that the userData from the current request will pass to the next request.
1const startUrls = [{ 2 url: "https://example.com", 3 userData: { 4 firstValue: 0, 5 } 6}]; 7 8// assuming the INPUT url above 9await Apify.call('pocesar/pay-as-you-go-api-json-scraper', { 10 filterMap: async ({ request, addRequest, data }) => { 11 12 if (request.userData.isPost) { 13 // userData will be inherited from previous request 14 request.userData.firstValue == 0; 15 16 // return the data only after the POST request 17 return data; 18 } else { 19 // add the same request, but as a POST 20 addRequest({ 21 url: `${request.url}/?method=post`, 22 method: 'POST', 23 payload: { 24 username: 'username', 25 password: 'password', 26 }, 27 headers: { 28 'Content-Type': 'application/json', 29 }, 30 userData: { 31 isPost: true 32 } 33 }); 34 // omit return or return a falsy value will ignore the output 35 } 36 }, 37})
Examples
Flatten an object
1{ 2 filterMap: async ({ flattenObjectKeys, data }) => { 3 return flattenObjectKeys(data); 4 } 5} 6/** 7 * an object like 8 * { 9 * "deep": { 10 * "nested": ["state", "state1"] 11 * } 12 * } 13 * 14 * becomes 15 * { 16 * "deep.nested.0": "state", 17 * "deep.nested.1": "state1" 18 * } 19 */
Submit a JSON API with POST
1{ 2 "startUrls": [ 3 { 4 "url": "https://ow0o5i3qo7-dsn.algolia.net/1/indexes/prod_PUBLIC_STORE/query?x-algolia-agent=Algolia%20for%20JavaScript%20(4.13.0)%3B%20Browser%20(lite)&x-algolia-api-key=0ecccd09f50396a4dbbe5dbfb17f4525&x-algolia-application-id=OW0O5I3QO7", 5 "method": "POST", 6 "payload": "{\"query\":\"instagram\",\"page\":0,\"hitsPerPage\":24,\"restrictSearchableAttributes\":[],\"attributesToHighlight\":[],\"attributesToRetrieve\":[\"title\",\"name\",\"username\",\"userFullName\",\"stats\",\"description\",\"pictureUrl\",\"userPictureUrl\",\"notice\",\"currentPricingInfo\"]}", 7 "headers": { 8 "content-type": "application/x-www-form-urlencoded" 9 } 10 } 11 ] 12}
Follow pagination from payload
1{ 2 filterMap: async ({ addRequest, request, data }) => { 3 if (data.nbPages > 1 && data.page < data.nbPages) { 4 // get the current payload from the input 5 const payload = JSON.parse(request.payload); 6 7 // change the page number 8 request.payload = { ...payload, page: data.page + 1 }; 9 // add the request for parsing the next page 10 addRequest(request); 11 } 12 13 return data; 14 } 15}
Omit output if condition is met
1{ 2 filterMap: async ({ addRequest, request, data }) => { 3 if (data.hits.length < 10) { 4 return; 5 } 6 7 return data; 8 } 9}
Unwind array of results, each item from the array in a separate dataset item
1{ 2 filterMap: async ({ addRequest, request, data }) => { 3 return data.hits; // just return an array from here 4 } 5}
Pricing
Pricing model
Pay per resultThis Actor is paid per result. You are not charged for the Apify platform usage, but only a fixed price for each dataset of 1,000 items in the Actor outputs.
Price per 1,000 items
$25.00