n8n Basics: Scrape Any Website Data — Step by Step
Web scraping used to require Python scripts, proxies, and hours of debugging. Today, you can automate the entire process visually — and n8n makes it surprisingly straightforward. In this guide, you will learn how to scrape any website, extract the data you need, and store or process it automatically — all without writing a single line of code.
What Is Web Scraping and Why Automate It?
Web scraping is the process of automatically extracting information from websites. Common use cases include:
- Monitoring competitor prices
- Collecting job listings or news headlines
- Tracking product availability
- Aggregating data for research or reporting
Doing this manually is time-consuming and error-prone. By automating it with n8n, you can schedule scraping runs, handle pagination, and send results directly to Google Sheets, Airtable, Slack, or any other tool in your stack.
What You Need Before You Start
To follow this tutorial, you will need:
- A running instance of n8n (self-hosted or cloud)
- The URL of the website you want to scrape
- Basic familiarity with the n8n canvas (nodes and connections)
No coding experience is required. n8n handles the heavy lifting through its built-in nodes.
Step 1: Create a New Workflow in n8n
Open your n8n dashboard and click New Workflow. Give it a descriptive name like Website Scraper – Product Prices. This keeps your workspace organized, especially as you build more automations.
Step 2: Add a Trigger Node
Every n8n workflow needs a trigger. For scheduled scraping, use the Schedule Trigger node. You can set it to run every hour, daily, or at any custom interval. If you want to run the scraper manually for testing, use the Manual Trigger node instead — you can always swap it out later.
Step 3: Fetch the Web Page with the HTTP Request Node
Add an HTTP Request node and connect it to your trigger. Configure it as follows:
- Method: GET
- URL: Paste the target website URL
- Response Format: Text (to receive raw HTML)
When you execute this node, n8n fetches the full HTML of the page. You can inspect the output in the node's result panel to confirm everything arrived correctly.
Step 4: Extract Data with the HTML Extract Node
This is where the magic happens. Add the HTML Extract node and connect it to the HTTP Request node. Inside this node, you define CSS selectors that point to the elements you want to extract.
For example, to extract product names and prices from an e-commerce page, you might configure:
- Key: productName — CSS Selector:
.product-title— Value: Text - Key: price — CSS Selector:
.product-price— Value: Text
Not sure which selector to use? Right-click any element in your browser, choose Inspect, and look at the class or ID attributes. Copy them directly into the n8n HTML Extract node.
Step 5: Transform and Clean the Data
Raw scraped data often needs cleanup — removing currency symbols, trimming whitespace, or reformatting dates. Use the Set node or the Code node in n8n to apply simple transformations. For example, you can strip the dollar sign from a price string and convert it to a number for calculations.
Example Transformation (Code Node)
In the Code node, a quick JavaScript snippet can handle most cleanup tasks:
- Remove extra spaces:
item.price.trim() - Parse a number:
parseFloat(item.price.replace('$', '')) - Format a date:
new Date(item.date).toISOString()
Step 6: Send Data to Your Destination
Once your data is clean, connect it to any output node. Popular choices within n8n include:
- Google Sheets — append rows automatically
- Airtable — create or update records
- Slack or Email — send instant alerts when data changes
- Webhook — push data to any external system
With n8n, you simply add the relevant node, authenticate your account, and map the fields from your scraper to the destination columns. No custom API integration needed.
Step 7: Test, Debug, and Activate
Before activating your workflow, run it manually and check each node's output. n8n shows the data flowing through every step, making it easy to spot issues. Once everything looks correct, toggle the workflow to Active — and it will run automatically on your chosen schedule.
Pro Tips for Better Scraping Results
- Handle pagination: Use a loop (the Loop Over Items node) combined with incrementing page numbers in the URL to scrape multiple pages.
- Respect robots.txt: Always check a website's scraping policy before automating requests.
- Add error handling: Use the Error Trigger node in n8n to get notified if something breaks.
- Throttle requests: Add a Wait node between iterations to avoid overloading servers.
When to Use an External Scraping API
Some websites use JavaScript rendering (React, Vue, etc.) that the HTTP Request node cannot handle, since it only fetches static HTML. In those cases, consider integrating a service like Browserless, ScrapingBee, or Apify via their API — and control the entire flow from within n8n using HTTP Request nodes pointed at the service endpoint.
Final Thoughts
Web scraping no longer requires a developer. With n8n, you can build a fully automated data collection pipeline in under 30 minutes. Whether you are monitoring prices, tracking news, or feeding data into your AI workflows, the combination of the HTTP Request and HTML Extract nodes gives you a powerful, flexible scraping engine.
Start small — pick one website, one data point — and expand your workflow from there. The n8n canvas grows with your needs, and every automation you build compounds your productivity over time.
This post was created with tools we use and recommend: n8n for workflow automation, Turbotic as an AI-native automation alternative, ElevenLabs for AI voiceover, Placid for visual content creation, and Hostinger for reliable VPS hosting. Some links are affiliate links.