Nordstrom Product Scraper
Pricing
$30.00/month + usage
Nordstrom Product Scraper
Nordstrom Product Spider scrapes detailed product info from Nordstrom.com, including name, description, price, colors, and sizes in JSON. Ideal for e-commerce analysis, competitor research, and cataloging. Supports multiple URLs, uses proxies, and delivers reliable, structured data for integration.
0.0 (0)
Pricing
$30.00/month + usage
0
Total users
2
Monthly users
2
Runs succeeded
>99%
Last modified
9 days ago
Apify Template for Scrapy Spiders
This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (getdataforme/central_repo
) when changes are pushed to spider files in src/spiders/
or src/custom/
. Below is an overview of the automated tasks performed to keep this repository in sync.
Automated Tasks
The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., src/spiders/example/example_parser_spider.py
) is modified in the central repository:
-
Repository Creation:
- Creates a new Apify repository (e.g.,
example_apify
) from this template (apify_template
) using the GitHub API, if it doesn't already exist. - Grants push permissions to the
scraping
team in thegetdataforme
organization.
- Creates a new Apify repository (e.g.,
-
Spider File Sync:
- Copies the modified spider file (e.g.,
example_parser_spider.py
) from the central repository tosrc/spiders/
in this repository. - Copies the associated
requirements.txt
(if present) from the spider's directory (e.g.,src/spiders/example/
) to the root of this repository.
- Copies the modified spider file (e.g.,
-
Input Schema Generation:
- Runs
generate_input_schema.py
to create.actor/input_schema.json
. - Parses the spider's
__init__
method (e.g.,def __init__(self, location:str, item_limit:int=100, county:str="Japan", *args, **kwargs)
) to generate a JSON schema. - Supports types:
string
,integer
,boolean
,number
(for Pythonstr
,int
,bool
,float
). - Uses
prefill
for strings anddefault
for non-strings, with appropriateeditor
values (textfield
,number
,checkbox
). - Marks parameters without defaults (e.g.,
location
) asrequired
.
- Runs
-
Main Script Update:
- Runs
update_main.py
to updatesrc/main.py
. - Updates the
actor_input
section to fetch input values matching the spider's__init__
parameters (e.g.,location
,item_limit
,county
). - Updates the
process.crawl
call to pass these parameters to the spider (e.g.,process.crawl(Spider, location=location, item_limit=item_limit, county=county)
). - Preserves existing settings, comments, and proxy configurations.
- Runs
-
Actor Configuration Update:
- Updates
.actor/actor.json
to set thename
field based on the repository name, removing the_apify
suffix (e.g.,example_apify
→example
). - Uses
jq
to modify the JSON file while preserving other fields (e.g.,title
,description
,input
).
- Updates
-
Commit and Push:
- Commits changes to
src/spiders/$spider_file
,requirements.txt
,.actor/input_schema.json
,src/main.py
, and.actor/actor.json
. - Pushes the changes to the
main
branch of this repository.
- Commits changes to
Repository Structure
src/spiders/
: Contains the Scrapy spider file (e.g.,example_parser_spider.py
).src/main.py
: Main script to run the spider with Apify Actor integration..actor/input_schema.json
: JSON schema defining the spider's input parameters..actor/actor.json
: Actor configuration with the repository name and metadata.requirements.txt
: Python dependencies for the spider.Dockerfile
: Docker configuration for running the Apify Actor.
Prerequisites
- The central repository (
getdataforme/central_repo
) must contain:generate_input_schema.py
andupdate_main.py
in the root.- Spider files in
src/spiders/
orsrc/custom/
with a valid__init__
method.
- The GitHub Actions workflow requires a
GITHUB_TOKEN
with repository creation and write permissions. jq
andpython3
are installed in the workflow environment.
Testing
To verify the automation:
- Push a change to a spider file in
src/spiders/
orsrc/custom/
in the central repository. - Check the generated Apify repository (e.g.,
getdataforme/example_apify
) for:- Updated
src/spiders/$spider_file
. - Correct
input_schema.json
with parameters matching the spider's__init__
. - Updated
src/main.py
with correctactor_input
andprocess.crawl
lines. - Updated
.actor/actor.json
with the correctname
field.
- Updated
Notes
Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in
getdataforme/central_repo
. Do not edit this repository directly. To modify the spider, update the corresponding file insrc/spiders/
orsrc/custom/
in the central repository, and the workflow will sync changes to this repository, including:
- Copying the spider file to
src/spiders/
.- Generating
.actor/input_schema.json
based on the spider’s__init__
parameters.- Updating
src/main.py
with correct input handling and spider execution.- Setting the
name
field in.actor/actor.json
(e.g.,example
forexample_apify
).Verification: After the workflow completes, verify the actor by checking:
src/spiders/$spider_file
matches the central repository..actor/input_schema.json
includes all__init__
parameters with correct types and defaults.src/main.py
has updatedactor_input
andprocess.crawl
lines..actor/actor.json
has the correctname
.- Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality.
- The workflow supports multiple spider types (
scrapy
,hrequest
,playwright
) based on the file path (src/spiders/
,src/custom/*/hrequest/
,src/custom/*/playwright/
). - Commits with
[apify]
in the message update only Apify repositories;[internal]
updates only internal repositories; otherwise, both are updated. - Ensure the spider's
__init__
uses supported types (str
,int
,bool
,float
) to avoid schema generation errors.
For issues, check the GitHub Actions logs in the central repository or contact the scraping
team.
On this page
Share Actor: