Every content team wants better SEO, but running a proper audit on a blog post takes 30–60 minutes of manual work: checking keyword density, analyzing meta elements, assessing readability, spotting technical issues, and identifying backlink opportunities. Multiply that by 10 posts a week and it becomes a full-time job. This n8n workflow fixes that — send a URL, get a complete GPT-4 SEO analysis in seconds, all with built-in ethical scraping compliance.

In this guide you’ll build the workflow from scratch, understand each node, and learn how to hook the output into Slack, Google Sheets, or any dashboard you already use.

💡 Prefer to skip the build? Grab the ready-made template → and be running in under 10 minutes.

What You’ll Build

POST a blog URL to an n8n webhook from any app or script
n8n validates the URL and checks the site’s robots.txt for scraping permission
The blog’s HTML is fetched, converted to clean markdown, and fed to GPT-4o
GPT-4 returns a structured JSON report with scores across four SEO dimensions
The report comes back in the HTTP response — ready for dashboards, Sheets, or Slack

How It Works — The Big Picture

┌────────────────────────────────────────────────────────────────────┐
│ AI BLOG SEO ANALYZER │
│ │
│ [POST /webhook] → [Extract URL] → [Validate URL] │
│ ↓ │
│ [Check robots.txt] │
│ ↓ │
│ [Parse robots.txt Rules] │
│ ↓ │
│ [Scraping Allowed?] │
│ ↓ YES ↓ NO │
│ [Scrape Blog] [Return 403 Error] │
│ ↓ │
│ [Convert HTML → Markdown] │
│ ↓ │
│ [SEO Analysis (GPT-4o)] │
│ ↓ │
│ [Format Report] → [Return JSON Response] │
└────────────────────────────────────────────────────────────────────┘

What You’ll Need

n8n — self-hosted (free) or n8n Cloud
OpenAI API key — GPT-4o access required (~$0.01–$0.05 per audit depending on post length)
A webhook client — Postman, curl, or any HTTP tool
Build time: ~45 minutes from scratch
With the template: under 10 minutes (add API key + activate)

Step 1 — Webhook Trigger

Node: Webhook Trigger n8n-nodes-base.webhook

This is the entry point. It listens for POST requests and passes the payload to the rest of the workflow.

Configure it:

Set HTTP Method to POST
Set Response Mode to Using Respond to Webhook Node
Copy the generated webhook URL — you’ll POST to this from your client
Enable Allow all origins under Options if testing from a browser tool

Once activated, clients call it like this:

curl -X POST https://your-n8n.com/webhook/YOUR_WEBHOOK_ID \
  -H "Content-Type: application/json" \
  -d '{ "blogUrl": "https://techcrunch.com/2026/03/15/ai-startup-funding" }'

💡 Tip: The workflow accepts blogUrl, message, or url as the key — whichever you send, it’ll find the URL. Easy to connect from Telegram bots, Slack slash commands, or form submissions.

Step 2 — Extract Blog URL

Node: Extract Blog URL n8n-nodes-base.set

Normalizes the incoming payload so downstream nodes always find body.url regardless of which key the caller used.

Configure it (Manual mode):

Add one assignment: Name = body, Type = Object
Value = ={{ { url: $json.body.blogUrl || $json.body.message || $json.body.url } }}

Step 3 — Validate URL Input

Node: Validate URL Input n8n-nodes-base.code

Validates the URL format, ensures a value was provided, and sets default CSS selectors for content extraction. If the URL is invalid, the workflow throws an error here before wasting an API call.

// Output after validation:
{
  "url": "https://techcrunch.com/2026/03/15/ai-startup-funding",
  "userPrompt": "Provide a comprehensive SEO analysis with actionable recommendations.",
  "selectors": {
    "title": "title, h1",
    "content": "p, .content, article",
    "links": "a[href]",
    "images": "img[src]"
  },
  "timestamp": "2026-04-10T09:15:00.000Z"
}

💡 Tip: Extend the Code node to strip UTM parameters, normalize trailing slashes, or add a domain allowlist so only approved sites can be analyzed.

Step 4 — Check robots.txt (Ethical Scraping)

Node: Check robots.txt n8n-nodes-base.httpRequest

Fetches https://domain.com/robots.txt before touching any content. This is the ethical compliance gate.

Configure it:

Method: GET
URL: ={{ $json.url.split('/').slice(0, 3).join('/') }}/robots.txt
Set timeout to 10,000 ms and max redirects to 3

Step 5 — Parse Robots.txt Rules

Node: Parse Robots.txt Rules n8n-nodes-base.code

Reads the robots.txt response and checks whether the target URL path is disallowed. If scraping is blocked, it sets scrapingAllowed: false.

// If scraping is permitted, output looks like:
{
  "url": "https://techcrunch.com/2026/03/15/ai-startup-funding",
  "robotsInfo": "robots.txt found and analyzed",
  "scrapingAllowed": true,
  "timestamp": "2026-04-10T09:15:00.123Z"
}

💡 Tip: Many sites block /wp-admin/ and /search/ but allow /blog/ and /articles/. Regular blog posts are almost always permitted.

Step 6 — Scraping Allowed? (IF Branch)

Node: Scraping Allowed? n8n-nodes-base.if

Routes the workflow: scrapingAllowed = true proceeds to scrape; false returns a 403 error immediately.

Configure it:

Add condition: Left Value = ={{ $json.scrapingAllowed }}
Operator: Boolean → Is True
Connect Output 0 (TRUE) → Scrape Blog Content
Connect Output 1 (FALSE) → Return Scraping Blocked Error

Step 7 — Scrape Blog Content

Node: Scrape Blog Content n8n-nodes-base.httpRequest

Fetches the full HTML of the blog post. n8n’s HTTP Request node handles redirects, compressed responses, and most edge cases automatically.

Configure it:

Method: GET
URL: ={{ $json.url }}
Set timeout to 30,000 ms and max redirects to 5

{
  "data": "<!DOCTYPE html><html>...</html>",
  "headers": { "content-type": "text/html; charset=utf-8" },
  "statusCode": 200
}

Step 8 — Convert HTML to Markdown

Node: Convert HTML to Markdown n8n-nodes-base.markdown

Strips HTML tags and converts content to clean markdown — 40–60% fewer tokens than raw HTML, saving significant GPT-4 costs.

Configure it:

HTML: ={{ $json.data }}
Enable Code Block Style: Fence
Enable Use Link Reference Definitions

💡 Tip: For very long articles (>8,000 words), add a Code node to truncate: return [{ json: { data: $json.data.substring(0, 24000) } }]

Step 9 — SEO Analysis with GPT-4o

Node: SEO Analysis (GPT-4) @n8n/n8n-nodes-langchain.openAi

The core of the workflow. Sends the markdown to GPT-4o with a structured prompt covering four SEO dimensions, returns a JSON report.

Configure it:

Model: GPT-4o
Temperature: 0.1 (precise, repeatable analysis)
JSON Output: Enable
Add your OpenAI credential
User Message: ={{ $json.data }}

{
  "overallScore": 73,
  "executiveSummary": {
    "strengths": [
      "Strong primary keyword placement in H1 and first paragraph",
      "Good internal linking structure with 8 contextual links"
    ],
    "opportunities": [
      "Meta description missing — critical for CTR",
      "No FAQ schema markup for People Also Ask eligibility"
    ],
    "priorityActions": [
      "Write a 155-character meta description with primary keyword",
      "Add FAQ schema for top 5 questions in the article"
    ]
  },
  "keywordStrategy": {
    "primaryKeywords": ["AI startup funding", "venture capital 2026"],
    "longTailOpportunities": ["how much AI startup funding in 2026"]
  },
  "implementationRoadmap": {
    "quickWins": ["Add meta description", "Fix broken image alt tags"],
    "shortTerm": ["Create FAQ section", "Build 3 internal cluster posts"],
    "longTerm": ["Guest post campaign targeting DA 50+ sites"]
  }
}

💡 Tip: Customize the system prompt for your niche — e.g., “This is a SaaS marketing blog targeting CTOs” — for more targeted keyword and tone recommendations.

Step 10 — Format Analysis Report

Node: Format Analysis Report n8n-nodes-base.code

Parses the OpenAI response, extracts the JSON, and wraps it with metadata (URL, timestamp) before returning to the caller.

{
  "success": true,
  "url": "https://techcrunch.com/2026/03/15/ai-startup-funding",
  "analyzedAt": "2026-04-10T09:15:44.321Z",
  "overallScore": 73,
  "report": { ... }
}

Connecting the Output to Other Tools

Once running, chain the output of Format Analysis Report into:

Google Sheets: Append a row per URL with score, quick wins, and keyword gaps — build a running SEO audit log
Slack: Post scores and priority actions to #seo-reports channel every time a post is analyzed
Notion: Create a database record per analysis with scores as structured properties
Airtable: Track keyword opportunities across your entire content library in one view

The SEO Report Structure

Field	Type	Example	Description
`overallScore`	Integer	`73`	Aggregate SEO score 0–100
`contentOptimization.score`	Integer	`78`	Content quality and keyword integration
`keywordStrategy.primaryKeywords`	Array	`["AI funding"]`	Top keywords GPT-4 detected in content
`keywordStrategy.longTailOpportunities`	Array	`["best AI startups 2026"]`	Missing keyword angles to target
`technicalSEO.score`	Integer	`65`	Technical health score
`technicalSEO.issues`	Array	`["No canonical tag"]`	Technical problems found
`backlinkPotential.score`	Integer	`81`	How link-worthy the content is
`implementationRoadmap.quickWins`	Array	`["Add meta description"]`	High-impact, low-effort fixes

Scaling This Workflow

The webhook trigger is perfect for on-demand audits. For batch use, replace it with a Schedule Trigger + Google Sheets source to run overnight audits across your entire blog library. Or wire it to an RSS feed node to auto-audit every new post you publish.

For high-volume use (100+ URLs/day), add a Wait node between the HTTP scrape and GPT-4 call. OpenAI’s Tier 1 rate limit on GPT-4o handles roughly 20–30 blog audits per minute — more than enough for most teams.

Skip the Build — Get the Ready-Made Template

Includes the complete workflow JSON, a step-by-step Setup Guide, and a Credentials Guide showing exactly where to find your OpenAI API key. Import, configure, and start auditing in under 10 minutes.

Download the Template — $14.99 →