RAG Web Browser
No credit card required
RAG Web Browser
No credit card required
Web browser for a retrieval augmented generation workflows. Retrieve and return website content from the top Google Search Results Pages
Do you want to learn more about this Actor?
Get a demoSearch term(s)
query
stringOptional
Use regular search words or enter Google Search URLs. You can also apply advanced Google search techniques, such as AI site:twitter.com
or javascript OR python
Number of top search results to return from Google. Only organic results are returned and counted
maxResults
integerOptional
The number of top organic search results to return and scrape text from
Output formats
outputFormats
arrayOptional
Select the desired output formats for the retrieved content
Default value of this property is ["text"]
Request timeout in seconds
requestTimeoutSecs
integerOptional
The maximum time (in seconds) allowed for request. If the request exceeds this time, it will be marked as failed and only already finished results will be returned
Default value of this property is 60
Search Proxy Group
proxyGroupSearch
EnumOptional
Select the proxy group for loading search results
Value options:
"GOOGLE_SERP": string"SHADER": string
Default value of this property is "GOOGLE_SERP"
Maximum number of retries for Google search request on network / server errors
maxRequestRetriesSearch
integerOptional
The maximum number of times the Google search crawler will retry the request on network, proxy or server errors. If the (n+1)-th request still fails, the crawler will mark this request as failed.
Default value of this property is 1
Crawler: Proxy configuration
proxyConfiguration
objectOptional
Enables loading the websites from IP addresses in specific geographies and to circumvent blocking.
Default value of this property is {"useApifyProxy":true}
Initial concurrency
initialConcurrency
integerOptional
Initial number of Playwright browsers running in parallel. The system scales this value based on CPU and memory usage.
Default value of this property is 3
Minimal concurrency
minConcurrency
integerOptional
Minimum number of Playwright browsers running in parallel. Useful for defining a base level of parallelism.
Default value of this property is 10
Maximal concurrency
maxConcurrency
integerOptional
Maximum number of browsers or clients running in parallel to avoid overloading target websites.
Default value of this property is 10
Maximum number of retries for Playwright content crawler
maxRequestRetries
integerOptional
Maximum number of retry attempts on network, proxy, or server errors. If the (n+1)-th request fails, it will be marked as failed.
Default value of this property is 1
Request timeout for content crawling
requestTimeoutContentCrawlSecs
integerOptional
Timeout (in seconds) for making requests for each search result, including fetching and processing its content.
The value must be smaller than the 'Request timeout in seconds' setting.
Default value of this property is 30
Wait for dynamic content (seconds)
dynamicContentWaitSecs
integerOptional
Maximum time (in seconds) to wait for dynamic content to load. The crawler processes the page once this time elapses or when the network becomes idle.
Default value of this property is 10