Metadata Extractor

Pricing

Pay per usage

Try for free

Go to Apify Store

Metadata Extractor

Try for free

Developed by

Jan Čurn

Maintained by Community

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

0.0 (0)

Pricing

Pay per usage

1.4K

Last modified

2 years ago

Developer tools

Open source

Dockerfile

FROM apify/actor-node:16
COPY package*.json ./
RUN npm --quiet set progress=false \
 && npm install --only=prod --no-optional \
 && echo "Installed NPM packages:" \
 && (npm list --only=prod --no-optional --all || true) \
 && echo "Node.js version:" \
 && node --version \
 && echo "NPM version:" \
 && npm --version
COPY . ./

main.js

1const Apify = require('apify');
2
3const { log } = Apify.utils;
4
5Apify.main(async () => {
6    const input = await Apify.getInput();
7    const { urls = [], proxy = { useApifyProxy: false } } = input
8
9    if (input.url) urls.push(input.url)
10
11    const requests = [];
12    for (const url of urls) {
13        if (!new URL(url)) throw new Error('All URLs must be valid URLs!');
14        requests.push({ url });
15    }
16
17    const requestList = await Apify.openRequestList('start-urls', requests);
18    const proxyConfiguration = await Apify.createProxyConfiguration({ ...proxy });
19
20    const crawler = new Apify.CheerioCrawler({
21        requestList,
22        proxyConfiguration,
23        maxConcurrency: 50,
24        handlePageFunction: async ({ $, request }) => {
25            const meta = {};
26
27            for (const tag of $('head meta')) {
28                const name = $(tag).attr('name') || $(tag).attr('property') || $(tag).attr('http-equiv');
29                const content = $(tag).attr('content');
30                if (name) meta[name] = content ? content.trim() : null;
31            }
32
33            const result = {
34                url: request.url,
35                title: ($('head title').text() || '').trim(),
36                meta,
37            };
38
39            return Apify.pushData(result);
40        },
41    });
42
43    log.info('Starting the crawl...');
44    await crawler.run();
45    log.info('Scraping finished! Metadata for each site is available in "Results".');
46});

package.json

{
	"name": "extract-metadata",
    "version": "0.0.2",
    "description": "Metadata extractor.",
	"dependencies": {
		"apify": "^2.0.7"
	},
    "scripts": {
        "start": "node main.js"
    },
    "author": "Jan Curn"
}

Meta Data Extractor

dainty_screw/metadata-extractor-reliable-web-page-metadata-extraction

Metadata Extractor is your go-to tool for extracting meta-data from web pages. Using Cheerio, it parses HTML to extract titles, descriptions, authors, and more.Perfect for content managers and SEO experts.

codemaster devops

Simple SEO Data Extractor

onescales/simple-seo-data-extractor

Grab SEO data from any webpage / URL and export the URL, Title Tag, Meta Description, Meta Keywords, Status Code, Canonical Tag and Meta Robots easily. Run the scraper for 1-100,000 pages. Run one time or on schedule or via API.

One Scales

5.0

Metadata Scraper

autofacts/metadata-scraper

A powerful web scraper that extracts various types of structured metadata from web pages, including JSON-LD, Microdata, Open Graph, Twitter Cards, and more. Perfect for SEO analysis, content aggregation, and research purposes.

Autofactor

5.0

Metadata Scraper

louisdeconinck/metadata-scraper

Automatically scrape metadata such as title, description, heading and article from websites. It will crawl the start URLs and then scrape the metadata from the detail pages automatically navigating through the pagination.

Louis Deconinck

100

5.0

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.

Powerful Bachelor

URL Metadata Crawler

easyapi/url-metadata-crawler

Extracting comprehensive metadata from web pages. Gather vital information like meta tags, favicons, Open Graph tags, and more, all while enjoying flexible options for customization. Perfect for SEO specialists, developers, and content creators looking to enhance their web presence! 🌐

EasyApi

TikTok Media and Metadata Retriever

gratenes/tiktok-media-and-metadata-retriever

An api for gathering media and metadata from any TikTok media url, supports vm.tiktok.com, vt.tiktok.com and other TikTok short links.

Ai SEO Content Curator

quaking_pail/ai-seo-content-markdown-scraper

The SEO Actor performs a full SEO audit for each URL, extracting key SEO metrics like titles, meta descriptions, and keywords. It also retrieves network information and integrates SEO audit data providing a comprehensive analysis stored in an organized database for further use.

AI_Builder

5.0

Html to Markdown Converter

antonio_espresso/html-to-markdown-converter

Crawl a target URL and convert its HTML content into clean, structured Markdown with optional heading-based chunking.

Antonio Blago

SEO/GEO - Schema Markup Scraper

wisteria_banjo/schema-markup-scraper

This actor to fetches JSON-LD/Schema Markup from Multiple URLs & checks whether the page contains markups for the following types: AggregateRating, Article, Event, FAQPage, LocalBusiness, Organization, Person, Product, & Review. Schema Markup helps search and generative engines find & read webpages.