
Fast Instagram Post Scraper 🚀
Pricing
Pay per usage

Fast Instagram Post Scraper 🚀
Instagram Post Scraper. Post data: hashtags, comment_count, like_count, usertags, images, videos, shortcode and etc. Scrape Instagram Posts with Ease and Speed
5.0 (1)
Pricing
Pay per usage
3
Monthly users
43
Runs succeeded
98%
Response time
2.3 days
Last modified
2 days ago
FEATURE REQUEST / View count
Closed
Hey,
Is it possible to add view_count to the post item?
Best
Labsed
And also the video duration.
Instagram Scraper (instagram-scraper)
view_count has been added, but it looks like it's basically null, and I need to make sure that the video duration is available via the video_dash_manifest info.
Labsed
Thanks for considering the feedback. Attaching my crawler in case it can help you with the manifest manipulation, or any other means.
1import { 2 createHttpRouter, 3 HttpCrawler, 4 HttpCrawlerOptions, 5 HttpCrawlingContext, 6 ProxyConfiguration, 7} from "crawlee"; 8import jp from "jsonpath"; 9import * as cheerio from "cheerio"; 10import Video from "../../entities/video"; 11import Channel from "../../entities/channel"; 12import ChannelStats from "../../entities/channel-stats"; 13import VideoStats from "../../entities/video-stats"; 14import { CrawlerInterface, Platform } from "../../types"; 15 16export class InstagramCrawler 17 extends HttpCrawler<any> 18 implements CrawlerInterface 19{ 20 constructor(options?: HttpCrawlerOptions) { 21 const router = createHttpRouter(); 22 23 router.addDefaultHandler(async ({ request, json }: HttpCrawlingContext) => { 24 const nodes: Record<string, any>[] = jp.query( 25 json, 26 "$.data.user.edge_owner_to_timeline_media.edges[?(@.node.__typename=='GraphVideo')].node", 27 ); 28 29 for (const node of nodes) { 30 const video = Video.create({ 31 channel: request.userData.channel, 32 videoId: node.shortcode, 33 title: node.edge_media_to_caption?.edges[0].node.text ?? "", 34 duration: extractDurationFromManifest( 35 node.dash_info.video_dash_manifest, 36 ), 37 publishedAt: new Date(node.taken_at_timestamp * 1000), 38 }); 39 40 await Video.upsert(video, ["channel", "videoId"]); 41 42 await VideoStats.save({ 43 video, 44 views: node.video_view_count, 45 comments: node.edge_media_to_comment.count, 46 reactions: node.edge_media_preview_like.count, 47 topReactions: [ 48 { name: "Like", count: node.edge_media_preview_like.count }, 49 ], 50 }); 51 } 52 }); 53 54 router.addHandler( 55 "stats", 56 async ({ json, request }: HttpCrawlingContext) => { 57 const data: Record<string, any> = jp.value( 58 json, 59 "$..data.xdt_shortcode_media", 60 ); 61 62 await Video.update( 63 { id: request.userData.video.id }, 64 { 65 title: data.edge_media_to_caption?.edges[0].node.text ?? "", 66 duration: parseInt(data.video_duration), 67 publishedAt: new Date(data.taken_at_timestamp * 1000), 68 }, 69 ); 70 71 await VideoStats.save({ 72 video: request.userData.video, 73 views: data.video_view_count, 74 comments: data.edge_media_to_parent_comment.count, 75 reactions: data.edge_media_preview_like.count, 76 topReactions: [ 77 { name: "Like", count: data.edge_media_preview_like.count }, 78 ], 79 }); 80 }, 81 ); 82 83 router.addHandler("cadd", async ({ log, json }: HttpCrawlingContext) => { 84 const data: Record<string, any> = jp.value(json, "$..data.user"); 85 86 log.info("Got channel data", data); 87 88 const channel = Channel.create({ 89 platform: Platform.INSTAGRAM, 90 channelId: data.id, 91 username: data.username, 92 name: data.full_name, 93 }); 94 95 await Channel.upsert(channel, ["platform", "channelId"]); 96 97 log.info("Channel saved", channel); 98 }); 99 100 router.addHandler( 101 "cstats", 102 async ({ request, json }: HttpCrawlingContext) => { 103 await ChannelStats.save({ 104 channel: request.userData.channel, 105 followers: json.data.user.edge_followed_by.count, 106 }); 107 }, 108 ); 109 110 let proxyConfiguration: ProxyConfiguration | undefined; 111 112 if (process.env.IG_PROXY_URL) { 113 proxyConfiguration = new ProxyConfiguration({ 114 proxyUrls: [process.env.IG_PROXY_URL], 115 }); 116 } 117 118 super({ 119 ...options, 120 proxyConfiguration, 121 requestHandler: router, 122 preNavigationHooks: [ 123 async (_: HttpCrawlingContext, gotOptions: any) => { 124 gotOptions.headers = { 125 "X-IG-App-ID": "936619743392459", 126 }; 127 }, 128 ], 129 }); 130 } 131 132 addChannelAddRequest(url: string) { 133 return this.addRequests([{ url, label: "cadd" }]); 134 } 135 136 addChannelStatsRequests(channels: Channel[]) { 137 return this.addRequests( 138 channels.map((channel) => ({ 139 url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`, 140 label: "cstats", 141 userData: { channel }, 142 })), 143 ); 144 } 145 146 addChannelVideoRequests(channels: Channel[]) { 147 return this.addRequests( 148 channels.map((channel) => ({ 149 url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`, 150 userData: { channel }, 151 })), 152 ); 153 } 154 155 addVideoStatsRequests(videos: Video[]) { 156 return this.addRequests( 157 videos.map((video) => ({ 158 url: "https://www.instagram.com/graphql/query", 159 method: "POST", 160 headers: { "Content-Type": "application/x-www-form-urlencoded" }, 161 payload: new URLSearchParams({ 162 doc_id: "8845758582119845", 163 variables: JSON.stringify({ shortcode: video.videoId }), 164 }).toString(), 165 useExtendedUniqueKey: true, 166 label: "stats", 167 userData: { video }, 168 })), 169 ); 170 } 171} 172 173function extractDurationFromManifest(manifestXml: string): number { 174 const match = cheerio 175 .load(manifestXml, { xmlMode: true })("MPD") 176 .attr("mediaPresentationDuration") 177 ?.match(/^PT([\d.]+)S$/); 178 179 return match?.[1] ? parseInt(match[1]) : 0; 180}
Instagram Scraper (instagram-scraper)
Thanks a lot, I found the video_duration parameter, I grabbed the post collection and the parameter was not as comprehensive as the single page information
Pricing
Pricing model
Pay per usageThis Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.