OpenSearch Integration
No credit card required
OpenSearch Integration
No credit card required
Transfer data from Apify Actors to Amazon OpenSearch Service. This Actor is a good starting point for building question-answering systems, search functionality, or Retrieval-Augmented Generation (RAG) use cases.
Do you want to learn more about this Actor?
Get a demoOpenSearch URL
openSearchUrl
stringRequired
The URL of the Amazon OpenSearch Service instance to connect to
AWS Access Key ID
awsAccessKeyId
stringRequired
The AWS access key ID for the Amazon OpenSearch Service
AWS Secret Access Key
awsSecretAccessKey
stringRequired
The AWS secret access key for the Amazon OpenSearch Service
OpenSearch Index Name
openSearchIndexName
stringRequired
The name of the index in the Amazon OpenSearch Service where the data will be stored
Auto-create index
autoCreateIndex
booleanOptional
When set to true, the integration will automatically create the index if it does not exist in the Amazon OpenSearch Service instance
Default value of this property is true
AWS Region
awsRegion
stringOptional
The AWS region where the Amazon OpenSearch Service instance is located
Default value of this property is "us-east-1"
AWS Service Name
awsServiceName
EnumOptional
The AWS service name for the Amazon OpenSearch Service
Value options:
"aoss": string"es": string
Default value of this property is "aoss"
Use SSL
useSsl
booleanOptional
When set to true, the integration will use SSL to connect to the Amazon OpenSearch Service instance
Default value of this property is true
Verify SSL certificates
verifyCerts
booleanOptional
When set to true, the integration will verify SSL certificates when connecting to the Amazon OpenSearch Service instance
Default value of this property is true
Use AWS4 authentication
useAWS4Auth
booleanOptional
When enabled, the integration will use AWS4 authentication to connect to the Amazon OpenSearch Service instance.
Note: If you are connecting to an OpenSearch Service instance that is not hosted on AWS, set this to false. In this case, AWS credentials are not required and will be ignored. You can provide dummy values for awsAccessKeyId and awsSecretAccessKey.
Default value of this property is true
Embeddings provider (as defined in the langchain API)
embeddingsProvider
EnumRequired
Choose the embeddings provider to use for generating embeddings
Value options:
"OpenAI": string"Cohere": string
Default value of this property is "OpenAI"
Configuration for embeddings provider
embeddingsConfig
objectOptional
Configure the parameters for the LangChain embedding class. Key points to consider:
-
Typically, you only need to specify the model name. For example, for OpenAI, set the model name as
{"model": "text-embedding-3-small"}
. -
It's crucial to ensure that the vector size of your embeddings matches the size of embeddings in the database.
-
Always specify the model in the following format
{"model": "your-embedding-model-name"}
-
Here are some examples of embedding models names:
-
For more details about other parameters, refer to the LangChain documentation.
Embeddings API KEY (whenever applicable, depends on provider)
embeddingsApiKey
stringRequired
Value of the API KEY for the embeddings provider (if required).
For example for OpenAI it is OPENAI_API_KEY, for Cohere it is COHERE_API_KEY)
Dataset fields to select from the dataset results and store in the database
datasetFields
arrayRequired
This array specifies the dataset fields to be selected and stored in the vector store. Only the fields listed here will be included in the vector store.
For instance, when using the Website Content Crawler, you might choose to include fields such as text
, url
, and metadata.title
in the vector store.
Default value of this property is ["text"]
Dataset fields to select from the dataset and store as metadata in the database
metadataDatasetFields
objectOptional
A list of dataset fields which should be selected from the dataset and stored as metadata in the vector stores.
For example, when using the Website Content Crawler, you might want to store url
in metadata. In this case, use metadataDatasetFields parameter as follows {"url": "url"}
Custom object to be stored as metadata in the vector store database
metadataObject
objectOptional
This object allows you to store custom metadata for every item in the vector store.
For example, if you want to store the domain
as metadata, use the metadataObject
like this: {"domain": "apify.com"}.
Enable incremental updates for objects based on deltas
enableDeltaUpdates
booleanOptional
When set to true, this setting enables incremental updates for objects in the database by comparing the changes (deltas) between the crawled dataset items and the existing objects, uniquely identified by the datasetKeysToItemId
field.
The integration will only add new objects and update those that have changed, reducing unnecessary updates. The datasetFields
, metadataDatasetFields
, and metadataObject
fields are used to determine the changes.
Default value of this property is true
Dataset fields to uniquely identify dataset items (only relevant when `enableDeltaUpdates` is enabled)
deltaUpdatesPrimaryDatasetFields
arrayOptional
This array contains fields that are used to uniquely identify dataset items, which helps to handle content changes across different runs.
For instance, in a web content crawling scenario, the url
field could serve as a unique identifier for each item.
Default value of this property is ["url"]
Delete expired objects from the database
deleteExpiredObjects
booleanOptional
When set to true, delete objects from the database that have not been crawled for a specified period.
Default value of this property is true
Delete expired objects from the database after a specified number of days
expiredObjectDeletionPeriodDays
integerOptional
This setting allows the integration to manage the deletion of objects from the database that have not been crawled for a specified period. It is typically used in subsequent runs after the initial crawl.
When the value is greater than 0, the integration checks if objects have been seen within the last X days (determined by the expiration period). If the objects are expired, they are deleted from the database. The specific value for deletedExpiredObjectsDays
depends on your use case and how frequently you crawl data.
For example, if you crawl data daily, you can set deletedExpiredObjectsDays
to 7 days. If you crawl data weekly, you can set deletedExpiredObjectsDays
to 30 days.
Default value of this property is 30
Enable text chunking
performChunking
booleanOptional
When set to true, the text will be divided into smaller chunks based on the settings provided below. Proper chunking helps optimize retrieval and ensures accurate and efficient responses.
Default value of this property is true
Actor Metrics
3 monthly users
-
2 stars
Created in Oct 2024
Modified a month ago