How to Auto-Categorize Blog Posts with GPT-4, GitHub, and Google Sheets in n8n

If you run a blog with dozens — or hundreds — of markdown posts sitting in a GitHub repo, keeping categories and tags consistent is a pain. You either do it by hand (tedious) or you skip it (bad for SEO). This n8n workflow reads every post in your repo, ignores the ones you already analyzed, feeds the rest to GPT-4, and drops the suggested categories and tags into a Google Sheet for your review. It works for Astro, Next.js, Hugo, Jekyll — any static site that stores posts as markdown with frontmatter.

By the end of this guide you will have a form-triggered workflow that processes only new posts on each run, uses a two-sheet diff to avoid re-work, and returns a structured JSON response (old categories, new categories, old tags, new tags) for clean before/after comparison.

What this workflow does

The workflow has one job: look at the posts in your GitHub repo, figure out which ones haven’t been categorized yet, and ask GPT-4 to suggest categories and tags based on the content of each post. The proposals land in a Google Sheet so you can review and apply them later — the workflow never pushes commits to your repo, which keeps it safe to run.

  • Trigger: a simple n8n form so you can fire it on demand
  • Source: the content/ folder of your GitHub repo (configurable)
  • Deduplication: posts already logged in Google Sheets are skipped
  • AI engine: GPT-4.1-mini with a structured JSON output parser
  • Sink: Google Sheets with columns FileName, Categories, Proposed Categories, Tags, Proposed Tags

Architecture at a glance

The workflow runs in three phases. First, it pulls two lists in parallel: the full file list from GitHub and the list of previously analyzed filenames from Google Sheets. Second, a Code node computes the difference — posts in GitHub that are not yet in the sheet. Third, it loops through the difference, fetches each file’s content, passes it to the AI agent, and appends the AI’s structured response back to Google Sheets.

Nodes in the workflow

Node Purpose
On form submission Manual trigger with a “Start process” form
Get row(s) in sheet Reads the list of already-categorized filenames
List posts/articles/pages Lists every file in the GitHub content folder
Aggregate (×2) Collapses both path lists into arrays
Merge Brings the two arrays into a single item
Check new repo files for AI processing Code node computing the diff
Switch Routes to “Finish” if empty, otherwise to the loop
Get post/article file Fetches the markdown source for each new file
Loop Over Posts/Pages Batches one post at a time to the AI
AI Agent + OpenAI Chat Model Analyzes content, proposes categories and tags
Structured Output Parser Forces clean JSON output
Append rows with posts / article analysis Saves proposals to Google Sheets

Step 1 — Prepare your Google Sheet

Create a new Google Sheet with exactly five columns in row 1, in this order: FileName, Categories, Proposed Categories, Tags, Proposed Tags. The workflow reads FileName to decide what to skip, and writes to all five on each run. Grab the Sheet ID from the URL — it’s the long token between /d/ and /edit.

Tip: If you want to skip the setup and import a working template immediately, the pre-built n8n JSON + step-by-step setup PDF is available for $14.99 on EasyWorkflows.

Step 2 — Import the workflow into n8n

Open your n8n instance, click the “+” menu, and choose Import from File (or paste JSON). After import you’ll see the full workflow with sticky notes already in place. Nothing is active yet — the trigger is a form, so there are no webhooks firing in the background.

Step 3 — Connect credentials

You need three credentials. Click each highlighted node and create a fresh credential — do not share credentials across nodes that don’t need to share them.

  1. GitHub — a personal access token with repo read scope. Used by List posts/articles/pages and Get post/article file.
  2. OpenAI — a standard API key. Used by the OpenAI Chat Model node.
  3. Google Sheets OAuth2 — run through n8n’s OAuth flow. Used by both Get row(s) in sheet and Append rows with posts / article analysis.

Full step-by-step instructions for getting each token — including screenshots of the GitHub scope page and the Google Cloud OAuth consent screen — are in the Credentials Guide PDF included with the product.

Step 4 — Configure the three placeholders

After import, open these three nodes and replace the placeholder values with your own:

List posts/articles/pages
  owner      = YOUR_GITHUB_USERNAME
  repository = YOUR_REPO_NAME
  filePath   = src/content/blog/  (adjust to your folder)

Get post/article file
  owner      = YOUR_GITHUB_USERNAME
  repository = YOUR_REPO_NAME

Get row(s) in sheet  +  Append rows with posts / article analysis
  documentId = YOUR_GOOGLE_SHEET_ID

That’s all the config. The workflow does not hardcode category names or tag taxonomies — the AI agent reads whatever categories already exist in your posts and proposes additions that match your style.

Step 5 — Tune the AI system prompt (optional)

The AI Agent node ships with a default system message that tells GPT-4 to read frontmatter, infer categories and tags from content, and return a structured JSON object. If you already have a strict controlled vocabulary (e.g. “categories must be drawn from this list of 12”), add that list to the system message. GPT-4.1-mini handles vocabulary constraints well — the output parser will still enforce valid JSON either way.

The structured output schema is:

{
  "old_categories": ["business"],
  "categories":     ["business", "history-of-computing"],
  "old_tags":       ["DigitalGirls", "WomenInIT"],
  "tags":           ["DigitalGirls", "WomenInIT", "GraceHopper", "Debugging"]
}

Step 6 — Run it

Click Execute Workflow, open the form URL, submit it, and watch the executions panel. On the first run the sheet is empty so the diff returns every file in your repo. On every run after that, only new posts since the last successful run are processed — the sheet acts as the state store.

If you want this to run nightly, swap the On form submission trigger for a Schedule trigger. Everything downstream stays the same.

Why this approach beats the alternatives

  • Vs. cron job + script: no server to maintain, no cron log to babysit, and the AI step is visual.
  • Vs. a GitHub Action: Actions bill per-minute and don’t give you a diffable proposal review step — this workflow writes to a sheet so you can vet the AI’s suggestions before applying them.
  • Vs. doing it by hand: obvious.

Common issues

The loop finishes immediately with zero items

The diff node returned an empty list, which means every file in your repo is already logged in the sheet. Delete the test rows from the sheet and re-run.

GPT-4 returns categories outside my taxonomy

Add your controlled vocabulary to the system message on the AI Agent node. Example: “Categories must be one of: business, tutorial, news, opinion. Do not invent new categories.”

Rate limit errors on the GitHub node

A personal access token gets 5,000 req/hour — plenty for most repos. If you’re hitting the limit, enable Continue on Fail on Get post/article file and add a small Wait node inside the loop.

Skip the setup — get the ready-to-import template

n8n JSON + Setup Guide PDF + Credentials Guide PDF. Import, connect, done.

Get the Template – $14.99

What to do next

Once the proposals are in Google Sheets, you have two good options. Option one: review manually and copy the winning categories and tags back into your markdown frontmatter. Option two: extend this workflow with a second step that opens a pull request on your repo with the frontmatter updates applied automatically. The second is a ~6-node addition using the GitHub node’s create file and create PR operations — a clean follow-up if you want zero-touch automation.

Either way, you now have a GPT-4-powered categorization pipeline that costs pennies per post and keeps your content taxonomy clean as your blog grows.