Fast Instagram Post Scraper 🚀 avatar
Fast Instagram Post Scraper 🚀

Pricing

Pay per usage

Go to Store
Fast Instagram Post Scraper 🚀

Fast Instagram Post Scraper 🚀

Developed by

Instagram Scraper

Maintained by Community

Instagram Post Scraper. Post data: hashtags, comment_count, like_count, usertags, images, videos, shortcode and etc. Scrape Instagram Posts with Ease and Speed

5.0 (1)

Pricing

Pay per usage

3

Monthly users

43

Runs succeeded

98%

Response time

2.3 days

Last modified

2 days ago

LB

FEATURE REQUEST / View count

Closed

Labsed opened this issue
2 months ago

Hey,

Is it possible to add view_count to the post item?

Best

LB

Labsed

2 months ago

And also the video duration.

IS

view_count has been added, but it looks like it's basically null, and I need to make sure that the video duration is available via the video_dash_manifest info.

LB

Labsed

2 months ago

Thanks for considering the feedback. Attaching my crawler in case it can help you with the manifest manipulation, or any other means.

1import {
2  createHttpRouter,
3  HttpCrawler,
4  HttpCrawlerOptions,
5  HttpCrawlingContext,
6  ProxyConfiguration,
7} from "crawlee";
8import jp from "jsonpath";
9import * as cheerio from "cheerio";
10import Video from "../../entities/video";
11import Channel from "../../entities/channel";
12import ChannelStats from "../../entities/channel-stats";
13import VideoStats from "../../entities/video-stats";
14import { CrawlerInterface, Platform } from "../../types";
15
16export class InstagramCrawler
17  extends HttpCrawler<any>
18  implements CrawlerInterface
19{
20  constructor(options?: HttpCrawlerOptions) {
21    const router = createHttpRouter();
22
23    router.addDefaultHandler(async ({ request, json }: HttpCrawlingContext) => {
24      const nodes: Record<string, any>[] = jp.query(
25        json,
26        "$.data.user.edge_owner_to_timeline_media.edges[?(@.node.__typename=='GraphVideo')].node",
27      );
28
29      for (const node of nodes) {
30        const video = Video.create({
31          channel: request.userData.channel,
32          videoId: node.shortcode,
33          title: node.edge_media_to_caption?.edges[0].node.text ?? "",
34          duration: extractDurationFromManifest(
35            node.dash_info.video_dash_manifest,
36          ),
37          publishedAt: new Date(node.taken_at_timestamp * 1000),
38        });
39
40        await Video.upsert(video, ["channel", "videoId"]);
41
42        await VideoStats.save({
43          video,
44          views: node.video_view_count,
45          comments: node.edge_media_to_comment.count,
46          reactions: node.edge_media_preview_like.count,
47          topReactions: [
48            { name: "Like", count: node.edge_media_preview_like.count },
49          ],
50        });
51      }
52    });
53
54    router.addHandler(
55      "stats",
56      async ({ json, request }: HttpCrawlingContext) => {
57        const data: Record<string, any> = jp.value(
58          json,
59          "$..data.xdt_shortcode_media",
60        );
61
62        await Video.update(
63          { id: request.userData.video.id },
64          {
65            title: data.edge_media_to_caption?.edges[0].node.text ?? "",
66            duration: parseInt(data.video_duration),
67            publishedAt: new Date(data.taken_at_timestamp * 1000),
68          },
69        );
70
71        await VideoStats.save({
72          video: request.userData.video,
73          views: data.video_view_count,
74          comments: data.edge_media_to_parent_comment.count,
75          reactions: data.edge_media_preview_like.count,
76          topReactions: [
77            { name: "Like", count: data.edge_media_preview_like.count },
78          ],
79        });
80      },
81    );
82
83    router.addHandler("cadd", async ({ log, json }: HttpCrawlingContext) => {
84      const data: Record<string, any> = jp.value(json, "$..data.user");
85
86      log.info("Got channel data", data);
87
88      const channel = Channel.create({
89        platform: Platform.INSTAGRAM,
90        channelId: data.id,
91        username: data.username,
92        name: data.full_name,
93      });
94
95      await Channel.upsert(channel, ["platform", "channelId"]);
96
97      log.info("Channel saved", channel);
98    });
99
100    router.addHandler(
101      "cstats",
102      async ({ request, json }: HttpCrawlingContext) => {
103        await ChannelStats.save({
104          channel: request.userData.channel,
105          followers: json.data.user.edge_followed_by.count,
106        });
107      },
108    );
109
110    let proxyConfiguration: ProxyConfiguration | undefined;
111
112    if (process.env.IG_PROXY_URL) {
113      proxyConfiguration = new ProxyConfiguration({
114        proxyUrls: [process.env.IG_PROXY_URL],
115      });
116    }
117
118    super({
119      ...options,
120      proxyConfiguration,
121      requestHandler: router,
122      preNavigationHooks: [
123        async (_: HttpCrawlingContext, gotOptions: any) => {
124          gotOptions.headers = {
125            "X-IG-App-ID": "936619743392459",
126          };
127        },
128      ],
129    });
130  }
131
132  addChannelAddRequest(url: string) {
133    return this.addRequests([{ url, label: "cadd" }]);
134  }
135
136  addChannelStatsRequests(channels: Channel[]) {
137    return this.addRequests(
138      channels.map((channel) => ({
139        url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`,
140        label: "cstats",
141        userData: { channel },
142      })),
143    );
144  }
145
146  addChannelVideoRequests(channels: Channel[]) {
147    return this.addRequests(
148      channels.map((channel) => ({
149        url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`,
150        userData: { channel },
151      })),
152    );
153  }
154
155  addVideoStatsRequests(videos: Video[]) {
156    return this.addRequests(
157      videos.map((video) => ({
158        url: "https://www.instagram.com/graphql/query",
159        method: "POST",
160        headers: { "Content-Type": "application/x-www-form-urlencoded" },
161        payload: new URLSearchParams({
162          doc_id: "8845758582119845",
163          variables: JSON.stringify({ shortcode: video.videoId }),
164        }).toString(),
165        useExtendedUniqueKey: true,
166        label: "stats",
167        userData: { video },
168      })),
169    );
170  }
171}
172
173function extractDurationFromManifest(manifestXml: string): number {
174  const match = cheerio
175    .load(manifestXml, { xmlMode: true })("MPD")
176    .attr("mediaPresentationDuration")
177    ?.match(/^PT([\d.]+)S$/);
178
179  return match?.[1] ? parseInt(match[1]) : 0;
180}
IS

Thanks a lot, I found the video_duration parameter, I grabbed the post collection and the parameter was not as comprehensive as the single page information

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.