Pricing

$0.07 / 1,000 results

Go to Store

Bluesky Jetstream Scraper

Try for free

Developed by

june

Bluesky Social Feed Scraper collects posts from Bluesky's Jetstream API. Filter by hashtags, usernames, or languages to gather targeted data. Includes media attachments, user profiles, and reply context. Perfect for social research, trend analysis, and content monitoring on the platform.

1.0 (1)

Pricing

$0.07 / 1,000 results

Total users

Monthly users

Runs succeeded

>99%

Last modified

2 months ago

Social media

Developer tools

Other

🌊 Bluesky Jetstream Scraper

The Bluesky Jetstream Scraper is a tool built for Apify to collect and analyze real-time data from the Bluesky social network using the ATProto Firehose (Jetstream). This scraper allows you to filter posts by various criteria and customize the output format.

🔄 Jetstream vs. Crawling: This scraper uses Bluesky's Jetstream (firehose) API, which provides a continuous stream of real-time data directly from Bluesky's servers. Unlike traditional crawling methods that make numerous API requests to gather posts (which face rate limits and higher resource usage), the Jetstream approach is more efficient, providing access to the full stream of content as it's created without the limitations of crawling individual endpoints. This makes it ideal for large-scale data collection, trend analysis, and real-time monitoring.

⚠️ Real-Time vs. Historical: The Jetstream approach is designed for collecting current, real-time data only and is not suitable for historical data collection or analyzing posts over extended periods of time. It captures the content stream as it happens but cannot access posts from the past. If you need historical data analysis or content from specific time periods in the past, you would need to use different methods such as the Bluesky Query API (with appropriate rate limiting).

📣 Platform Notice: It's important to note that Bluesky and its API infrastructure are still evolving platforms. API specifications, data formats, and endpoints may change over time. While we strive to keep this scraper up-to-date with any platform changes, users should be aware that occasional updates may be necessary to maintain compatibility as the Bluesky ecosystem continues to develop.

📋 Input Schema Parameters

This section describes in detail how each input parameter affects the behavior of the scraper and the resulting output.

🔍 Filtering Parameters

`hashtags`

Type: Array of strings
Description: A list of hashtags to filter posts by (without the # symbol)
Behavior: The scraper will only collect posts that contain at least one of the specified hashtags. When multiple hashtags are provided, posts matching ANY of these hashtags will be included (OR logic).
Example: If you set ["apify", "scraping"], the output will include all posts containing either #apify OR #scraping.

`usernames`

Type: Array of strings
Description: A list of Bluesky usernames to filter posts by (will be resolved to DIDs for efficient filtering)
Behavior: The scraper will only collect posts authored by the specified users. When multiple usernames are provided, posts from ANY of these users will be included (OR logic).
Example: If you set ["user1.bsky.social", "user2.bsky.social"], the output will include all posts from either user1 OR user2.

`languages`

Type: Array of strings
Description: Languages to filter posts by (multiple selection allowed)
Behavior: The scraper will only collect posts in the specified languages. When multiple languages are provided, posts in ANY of these languages will be included (OR logic). If a post doesn't have a language field, the scraper can auto-detect its language (if detectLanguage is enabled).
Example: If you set ["en", "pt"] (English and Portuguese), the output will include all posts in either English OR Portuguese.

`wantedCollections`

Type: Array of strings
Description: Specific Bluesky collections to filter from Jetstream (defaults to feed posts)
Behavior: Controls what types of content are collected from the Bluesky firehose. Options include:
- app.bsky.feed.post: Regular posts
- app.bsky.feed.like: Like interactions
- app.bsky.feed.repost: Repost interactions
- app.bsky.graph.follow: Follow relationships
- app.bsky.graph.block: Block relationships
- app.bsky.actor.profile: Profile updates
Example: If you set ["app.bsky.feed.post", "app.bsky.feed.repost"], the output will include both original posts AND reposts.

📊 Content Inclusion Parameters

`includeMedia`

Type: Boolean
Description: Whether to include URLs for media attachments
Behavior: When set to true, the output will include media URLs from posts. When set to false, media URLs will be excluded, and mediaUrl, mediaThumbnailUrl fields will be empty, hasMedia will be false, and mediaCount will be 0.
Example: If set to false with a language filter of ["pt"], the output will include Portuguese-language posts but without any media URLs or media-related fields populated.

`includeImages`

Type: Boolean
Description: Whether to include URLs for images in the output
Behavior: When set to true, the output will include image URLs from posts. When set to false, image URLs will be excluded, and imageUrl field will be empty, and hasImages will be false.
Example: If set to false, posts with images will still be included in the output, but image URLs won't be extracted or included in the result fields.

`includeReplies`

Type: Boolean
Description: Whether to include reply information in collected posts
Behavior: When set to true, the output will include information about which posts are replies, and to which posts they are replying. When set to false, this information will be excluded.
Example: If set to true, posts that are replies will have isReply set to true, along with replyToRoot and replyToParent fields containing the URIs of the root and parent posts.

🗣️ Language Settings

`detectLanguage`

Type: Boolean
Description: Whether to automatically detect the language of posts that don't specify one
Behavior: When set to true, the scraper will use language detection to determine the language of posts that don't include language metadata. This is particularly useful when filtering by language. When set to false, posts without language metadata will not match any language filter.
Example: If filtering for Japanese posts and this is set to true, posts without explicit language metadata might still be included if they contain Japanese text.

👤 User Profile Settings

`enrichUserProfiles`

Type: Boolean
Description: Whether to fetch additional user profile information for post authors
Behavior: When set to true, the output will include extended information about post authors, such as their description, follower/following counts, post counts, and avatar URLs. When set to false, only basic author information (DID, handle, name) will be included.
Example: If set to true, each post in the output will include additional fields like authorDescription, authorFollowersCount, etc.

⏱️ Data Collection Parameters

`maxPosts`

Type: Integer
Description: Maximum number of posts to collect (0 for unlimited)
Behavior: Controls how many posts will be collected before the scraper stops. Setting to 0 means the scraper will continue until the time limit is reached.
Example: If set to 100, the scraper will stop after collecting 100 posts that match the filter criteria.

`timeLimit`

Type: Integer
Description: Maximum time to run the scraper in minutes
Behavior: Controls how long the scraper will run before stopping, regardless of how many posts have been collected.
Example: If set to 30, the scraper will stop after 30 minutes, even if it hasn't reached the maxPosts limit.

🔌 Connection Settings

`region`

Type: String enum ("us-east" or "us-west")
Description: Region for the Jetstream server
Behavior: Controls which regional Bluesky Jetstream server the scraper connects to. This can affect latency and potentially the volume of data received.
Example: If you're collecting data from the US West Coast, selecting us-west might provide lower latency.

`instance`

Type: Integer (1 or 2)
Description: Instance number for the Jetstream server
Behavior: Selects which specific Jetstream instance to connect to within the selected region.
Example: If experiencing connection issues with instance 1, switching to instance 2 might help.

`autoReconnect`

Type: Boolean
Description: Whether to automatically reconnect if the connection is lost
Behavior: When set to true, the scraper will attempt to reconnect to Jetstream if the connection drops. When set to false, the scraper will terminate on connection loss.
Example: For long-running data collection jobs, setting this to true helps ensure continuous data collection despite temporary network issues.

`maxRetries`

Type: Integer
Description: Maximum number of reconnection attempts
Behavior: Controls how many times the scraper will try to reconnect before giving up.
Example: If set to 5, the scraper will make up to 5 reconnection attempts before terminating.

⚙️ Advanced Settings

`saveCheckpoints`

Type: Boolean
Description: Whether to periodically save collected data to prevent loss on errors
Behavior: When set to true, the scraper will periodically save collected data to disk, allowing recovery from a checkpoint if the process is interrupted.
Example: If set to true and the scraper crashes after collecting 400 posts, you might be able to recover 350 of them from the last checkpoint.

`proxy`

Type: Object
Description: Proxy configuration for the scraper
Behavior: Controls whether and how the scraper uses Apify proxies for connections.
Example: Setting useApifyProxy to true allows the scraper to use Apify's proxy infrastructure, which can help avoid rate limiting.

`debugMode`

Type: Boolean
Description: Whether to enable detailed logging for troubleshooting
Behavior: When set to true, the scraper will output more detailed logs about its operation, which can help diagnose issues.
Example: If you're not seeing the expected output, setting this to true can provide insights into what's happening.

`verboseDebug`

Type: Boolean
Description: Whether to enable extremely detailed logging for message format diagnostics
Behavior: When set to true, the scraper will output extremely detailed logs, including raw message contents. This generates large log files.
Example: Useful only for advanced debugging when developing or modifying the scraper.

🎨 Customizing Output Format

The scraper allows you to customize the data fields included in the output through several parameters:

Field Selection Controls

These parameters control which data fields are included in the output:

includeMedia: Controls whether media URLs and related fields are included
includeImages: Controls whether image URLs and related fields are included
includeReplies: Controls whether reply information fields are included
enrichUserProfiles: Controls whether extended author profile fields are included

Output Format Options

On the Apify platform, you can download your dataset in several formats:

JSON: The default format with complete data structure
CSV: Tabular format suitable for spreadsheet applications
Excel: Direct Excel file download
RSS: For feed readers
HTML: For web viewing

To change the download format:

Navigate to the "Storage" tab in your Apify account
Select the dataset from your actor run
Click the "Download" dropdown menu
Choose your preferred format

For customized data processing, you can also use the Apify API to retrieve the data programmatically in your preferred format.

🔄 Combining Filters

When multiple filter types are used together (hashtags, usernames, languages), the scraper applies AND logic between different filter types:

If you set both hashtags and languages, posts must match BOTH criteria (contain one of the hashtags AND be in one of the languages).
If you set both usernames and languages, posts must be authored by one of the specified users AND be in one of the specified languages.

⚪ Default Behavior (No Filters)

When no filter options (hashtags, usernames, languages) are selected:

The scraper will collect all posts from the Bluesky Jetstream without any filtering
All posts will match the filter criteria automatically
The only limits will be the maxPosts parameter and/or the timeLimit parameter
You'll get a diverse, unfiltered stream of Bluesky content
Other inclusion settings like includeMedia and includeImages will still be applied
Collection types will be limited to what's specified in wantedCollections (defaults to feed posts)

This approach is useful for general data collection when you want to analyze the overall Bluesky content without focusing on specific topics, users, or languages.

📝 Example Scenarios

🤝 Bluesky Firehose Scraping Etiquette

When using the Bluesky Jetstream (firehose), it's important to follow these ethical guidelines and best practices:

📜 Official Guidelines

Respect the Terms of Service: Always adhere to Bluesky's official Terms of Service and API Usage Guidelines.
Attribution: When publishing research or analysis based on Bluesky data, properly attribute the source.
Privacy Awareness: Though the data is publicly available, be mindful that users may not expect their content to be analyzed at scale.

🔧 Technical Best Practices

Rate Limiting: The scraper already implements rate limiting, but be cautious about running multiple instances simultaneously.
Efficient Filtering: Use the filtering options to collect only the data you need rather than scraping everything.
Connection Management: Use the autoReconnect and maxRetries settings responsibly to avoid creating excessive connection attempts.
Data Storage: Handle collected data securely and in compliance with relevant privacy regulations like GDPR.

🔍 Responsible Usage

Research Purpose: Clearly define your research or business purpose before collecting data.
Minimize Collection: Only collect the data fields necessary for your analysis.
Respect Boundaries: Avoid excessive scraping that might impact the platform's performance.
Consider Opt-Out: When presenting results, consider providing ways for users to opt-out of having their content included.

⚖️ Legal Considerations

Data Protection: Comply with applicable data protection laws in your jurisdiction.
User Privacy: Even though posts are public, respect user privacy by anonymizing data when possible.
Terms Changes: Regularly check for updates to Bluesky's terms as the platform is evolving.

Following these guidelines ensures ethical use of the Bluesky firehose while maintaining a positive relationship with the platform and its community.

Bluesky Profile Posts Scraper

piotrv1001/bluesky-profile-posts-scraper

The Bluesky Profile Posts Scraper efficiently extracts posts from Bluesky profiles, capturing text content, images, videos, and engagement metrics. Ideal for social media analysis, trend tracking, and content monitoring.

Piotr Vassev

1.0

Bluesky Posts Scraper

lexis-solutions/bluesky-posts-scraper

The Apify Bluesky Posts Scraper allows a programmatic search for posts on Bluesky and the option to export to CSV, JSON, Excel, or integration with Zapier, Make, or any custom workflow.

Lexis Solutions

160

4.4

Bluesky Scraper

red.cars/bluesky-scraper

Extract data from Bluesky social network without authentication. Scrape profiles, posts, threads, and analyze engagement metrics from the decentralized AT Protocol platform. Perfect for social media analytics, competitor research, and content strategy optimization.

AutomateLab

1.0

Bluesky Post Scraper

bytepulselabs/bluesky-post-scraper

Scrape all Bluesky posts. Add one or more Bluesky handles and extract post content, embeds, and engagement metrics. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

BytePulse Labs

5.0

Bluesky Users Scraper

lexis-solutions/bluesky-users-scraper

The Apify Bluesky Users Scraper allows a programmatic search of the Bluesky user database and the option to export to CSV, JSON, Excel, or integration with Zapier, Make, or any custom workflow.

Lexis Solutions

4.7

Bluesky

canadesk/bluesky

Collect or Search for Posts, Track Followers and Follows, or Find Accounts on Bluesky. It's fast and costs little!

Canadesk Support

1.0

Bluesky Follower Scraper

bytepulselabs/bluesky-follower-scraper

Scrape all Bluesky followers. Add one or more Bluesky handles and extract name, handle, description, and avatar. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

BytePulse Labs

5.0

Bluesky Profile Scraper

bytepulselabs/bluesky-profile-scraper

Scrape all Bluesky profile info. Add one or more Bluesky handles and extract the number of followers, follows, posts, name, description, and avatar. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

BytePulse Labs

4.9

Bluesky Follow Scraper

bytepulselabs/bluesky-follow-scraper

Scrape all Bluesky follows. Add one or more Bluesky handles and extract name, handle, description, and avatar. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

BytePulse Labs

5.0

BlueSky Feed Scraper

harvest/bluesky-feed-scraper

Scrapes data from a specified BlueSky feed URL and outputs detailed information about the posts, including metadata, authors, embedded media, and statistics such as likes, replies, and reposts.

Harvest Data

3.9

Top 8 social media scrapers in 2025

Bluesky Jetstream Scraper

Bluesky Jetstream Scraper

🌊 Bluesky Jetstream Scraper

📋 Input Schema Parameters

🔍 Filtering Parameters

hashtags

usernames

languages

wantedCollections

📊 Content Inclusion Parameters

includeMedia

includeImages

includeReplies

🗣️ Language Settings

detectLanguage

👤 User Profile Settings

enrichUserProfiles

⏱️ Data Collection Parameters

maxPosts

timeLimit

🔌 Connection Settings

region

instance

autoReconnect

maxRetries

⚙️ Advanced Settings

saveCheckpoints

proxy

debugMode

verboseDebug

🎨 Customizing Output Format

Field Selection Controls

Output Format Options

🔄 Combining Filters

⚪ Default Behavior (No Filters)

📝 Example Scenarios

🤝 Bluesky Firehose Scraping Etiquette

📜 Official Guidelines

🔧 Technical Best Practices

🔍 Responsible Usage

⚖️ Legal Considerations

You might also like

Bluesky Profile Posts Scraper

Bluesky Posts Scraper

Bluesky Scraper

Bluesky Post Scraper

Bluesky Users Scraper

Bluesky

Bluesky Follower Scraper

Bluesky Profile Scraper

Bluesky Follow Scraper

BlueSky Feed Scraper

Related articles

`hashtags`

`usernames`

`languages`

`wantedCollections`

`includeMedia`

`includeImages`

`includeReplies`

`detectLanguage`

`enrichUserProfiles`

`maxPosts`

`timeLimit`

`region`

`instance`

`autoReconnect`

`maxRetries`

`saveCheckpoints`

`proxy`

`debugMode`

`verboseDebug`