Every month, the same grind: open an email, download the PDF, squint at the invoice number, manually type the amount into a spreadsheet, and pray you didn’t transpose a digit. If your business processes more than a handful of invoices per week, that manual routine is silently eating hours you could spend on work that actually matters. What if every PDF invoice that landed in your inbox was automatically read, parsed, and filed into a clean database — without you touching it once?
That’s exactly what you’ll build in this guide. Using n8n, OpenAI’s GPT-4o mini, and Airtable, you’ll create a workflow that watches your Gmail inbox for invoice emails, extracts text from attached PDFs, sends that text to AI for structured parsing, and logs every field — vendor name, invoice number, amounts, line items, due dates — into an Airtable base. The whole thing runs on autopilot every 30 minutes.
Prefer to skip the setup? Grab the ready-made template → and be up and running in under 10 minutes.
What You’ll Build
- A schedule trigger checks your Gmail inbox every 30 minutes for unread emails with PDF attachments matching invoice-related subjects.
- The workflow downloads the PDF, extracts raw text, and sends it to OpenAI with a carefully tuned prompt that returns structured JSON — vendor name, invoice number, dates, amounts, line items, and payment status.
- A Code node validates and formats the AI output, then pushes a clean record into your Airtable base with 13 fields covering every detail you’d normally type by hand.
- The source email is marked as read so it won’t be processed twice.
How It Works — The Big Picture
The workflow runs as a single linear pipeline with one conditional branch. Emails without PDF attachments are silently skipped, while PDFs flow through extraction, AI parsing, and database logging.
┌──────────────────────────────────────────────────────────────────────────┐ │ INVOICE DATA EXTRACTOR — AI POWERED │ │ │ │ [Schedule] → [Gmail: Fetch] → [IF: Has PDF?] │ │ │ Yes │ No │ │ ↓ ↓ │ │ [Extract PDF] [Skip] │ │ ↓ │ │ [OpenAI: Parse] │ │ ↓ │ │ [Code: Format] │ │ ↓ │ │ [Airtable: Save] │ │ ↓ │ │ [Gmail: Mark Read] │ └──────────────────────────────────────────────────────────────────────────┘
What You’ll Need
- n8n — self-hosted or cloud (version 1.0 or higher)
- Gmail account — with OAuth2 credentials configured in n8n
- OpenAI API key — with access to the
gpt-4o-minimodel (orgpt-4ofor higher accuracy on complex invoices) - Airtable account — free tier works fine; you’ll need one base with a table matching the schema below
Estimated build time: 30–45 minutes from scratch, or under 10 minutes with the template.
Building the Workflow
1 Check Inbox Every 30 Minutes (Schedule Trigger)
This node kicks off the workflow on a regular interval. Open n8n, create a new workflow, and add a Schedule Trigger node. Set it to run every 30 minutes. You can adjust this later — every 15 minutes if you process high volumes, or once per hour if invoices trickle in slowly.
Tip: During testing, switch this to “manual” execution so you can trigger runs on demand without waiting for the timer.
2 Fetch Unread Invoice Emails (Gmail)
Add a Gmail node and set the operation to Get Many. This node searches your inbox for unread emails that have PDF attachments and match invoice-related subject lines. Configure it as follows:
- Limit:
10(processes up to 10 emails per run) - Search query:
has:attachment filename:pdf is:unread subject:(invoice OR receipt OR bill) - Download Attachments: Enabled, with prefix
attachment_
The search query is the key here — it filters for only unread emails containing a PDF file whose subject mentions “invoice,” “receipt,” or “bill.” You can customize these keywords to match the vocabulary your vendors use.
{
"id": "18e3a5b7c9d2f100",
"subject": "Invoice #INV-2026-0847 from Acme Supplies",
"from": { "emailAddress": "billing@acmesupplies.com" },
"attachment_0": {
"fileName": "INV-2026-0847.pdf",
"mimeType": "application/pdf"
}
}
Tip: If you receive invoices from specific vendors, tighten the query with from:billing@vendor.com to avoid picking up random PDFs like flight confirmations.
3 Has PDF Attachment? (IF Node)
Add an IF node to verify that the attachment is actually a PDF. Check that attachment_0.mimeType equals application/pdf. Emails that pass go to the extraction step. Emails that fail (wrong attachment type, or the attachment was an image) are routed to a No Operation node and silently skipped.
This guard prevents errors downstream. Some emails match the search query but attach Excel files or images instead of PDFs — the IF node filters those out cleanly.
4 Extract Text from PDF (Extract From File)
Add an Extract From File node. Set the operation to PDF and the binary property name to attachment_0 (matching the prefix you set in the Gmail node). This node converts the PDF binary data into raw text that the AI can read.
After this node runs, the output looks like this:
{
"data": "INVOICE\nAcme Supplies Inc.\n742 Evergreen Terrace\nSpringfield, IL 62704\n\nBill To: Carter & Associates LLC\nInvoice #: INV-2026-0847\nDate: March 28, 2026\nDue Date: April 27, 2026\n\nDescription Qty Price Amount\nConsulting Services 40 $150.00 $6,000.00\nSoftware License 1 $499.00 $499.00\n\nSubtotal: $6,499.00\nTax (8.25%): $536.17\nTotal: $7,035.17\n\nPayment Terms: Net 30"
}
Tip: The Extract From File node works well with text-based PDFs. If your invoices are scanned images (no selectable text), you’ll need to add an OCR step before this node — tools like Google Cloud Vision or AWS Textract can handle that.
5 Parse Invoice with AI (OpenAI)
This is where the magic happens. Add an OpenAI node configured for the Chat resource. Set the model to gpt-4o-mini (fast, cheap, and surprisingly accurate for structured extraction) and the temperature to 0.1 (low randomness = consistent output).
The prompt instructs the AI to return a strict JSON object with 12 fields covering every piece of data you’d want from an invoice. Here’s what the prompt asks for:
| Field | Type | Example |
|---|---|---|
vendor_name |
String | Acme Supplies Inc. |
vendor_address |
String | 742 Evergreen Terrace, Springfield, IL 62704 |
invoice_number |
String | INV-2026-0847 |
invoice_date |
Date | 2026-03-28 |
due_date |
Date | 2026-04-27 |
subtotal |
Number | 6499.00 |
tax_amount |
Number | 536.17 |
total_amount |
Number | 7035.17 |
currency |
String | USD |
payment_status |
String | unpaid |
line_items |
Array | [{description, qty, unit_price, amount}] |
notes |
String | Payment Terms: Net 30 |
Tip: For invoices with complex layouts or non-English text, switch the model to gpt-4o. It costs a bit more per call but handles edge cases much better. At typical invoice lengths (under 1,000 tokens), even gpt-4o costs less than a penny per invoice.
6 Format Extracted Data (Code Node)
The AI returns a JSON string inside its message content. This Code node does three things: extracts the JSON from the response (handling cases where the AI wraps it in markdown code blocks), validates each field with fallback defaults, and formats the line items array into a human-readable string for Airtable’s long text field.
The node also carries forward metadata from the original email — the subject line and sender address — so you always know where each invoice came from.
{
"vendor_name": "Acme Supplies Inc.",
"invoice_number": "INV-2026-0847",
"invoice_date": "2026-03-28",
"due_date": "2026-04-27",
"subtotal": 6499.00,
"tax_amount": 536.17,
"total_amount": 7035.17,
"currency": "USD",
"payment_status": "unpaid",
"line_items_text": "1. Consulting Services — Qty: 40 × $150 = $6000\n2. Software License — Qty: 1 × $499 = $499",
"notes": "Payment Terms: Net 30",
"source_email_from": "billing@acmesupplies.com",
"processed_at": "2026-04-08T14:30:00.000Z"
}
7 Save to Airtable
Add an Airtable node set to the Create operation. Point it at your base and table, then map each field from the Code node output to the corresponding Airtable column. The mapping uses n8n expressions like ={{ $json.vendor_name }} to pull values dynamically.
Every processed invoice becomes a new row in your Airtable base — fully structured, searchable, and ready for reporting.
Make sure your Airtable column names match exactly — including capitalization and spaces. “Invoice Number” is not the same as “invoice_number” in Airtable.
8 Mark Email as Read (Gmail)
The final node marks the original email as read so it won’t be picked up again on the next run. It references the email ID from the “Fetch Unread Invoice Emails” node using the expression ={{ $('Fetch Unread Invoice Emails').item.json.id }}.
Tip: Want extra safety? Add a Gmail “Add Label” step before this one to move processed emails into a “Processed Invoices” label. That gives you an audit trail even if something goes wrong with Airtable.
The Data Structure
Before activating the workflow, create an Airtable base called “Invoice Tracker” with a single table named “Invoices” using this schema:
| Column | Type | Example | Description |
|---|---|---|---|
Vendor Name |
Single line text | Acme Supplies Inc. | Company that issued the invoice |
Invoice Number |
Single line text | INV-2026-0847 | Unique invoice identifier |
Invoice Date |
Date | 2026-03-28 | Date the invoice was issued |
Due Date |
Date | 2026-04-27 | Payment deadline |
Subtotal |
Currency | $6,499.00 | Amount before tax |
Tax |
Currency | $536.17 | Tax amount |
Total Amount |
Currency | $7,035.17 | Final amount due |
Currency |
Single line text | USD | Currency code |
Payment Status |
Single select | unpaid | Options: paid, unpaid, unknown |
Line Items |
Long text | 1. Consulting — 40 × $150 | Formatted breakdown of charges |
Notes |
Long text | Net 30 | Payment terms, PO numbers, etc. |
Source Email |
billing@acmesupplies.com | Sender of the original email | |
Processed At |
Date | 2026-04-08 | Timestamp when the workflow ran |
Column names must match the mapping in the Airtable node exactly. The easiest way: duplicate the table from the template’s included Airtable CSV, or create columns one by one following this schema.
Full System Flow
┌──────────────────────────────────────────────────────────────────────────────┐ │ │ │ GMAIL INBOX AIRTABLE │ │ ┌──────────────┐ ┌──────────────────┐ │ │ │ Unread email │ │ Invoice Tracker │ │ │ │ + PDF attach │ │ ┌──────────────┐ │ │ │ └──────┬───────┘ │ │ Vendor: Acme │ │ │ │ ↓ │ │ #INV-0847 │ │ │ │ [Schedule: 30 min] │ │ $7,035.17 │ │ │ │ ↓ │ └──────────────┘ │ │ │ [Gmail: Fetch unread] └────────▲─────────┘ │ │ ↓ │ │ │ [IF: Has PDF?]──No──→ [Skip] │ │ │ │ Yes │ │ │ ↓ │ │ │ [Extract PDF Text] │ │ │ ↓ │ │ │ [OpenAI: Parse JSON] │ │ │ ↓ │ │ │ [Code: Format Data]──────────────────────────→┘ │ │ ↓ │ │ [Gmail: Mark Read] │ │ │ └──────────────────────────────────────────────────────────────────────────────┘
Testing Your Workflow
- Send yourself a test email with a sample PDF invoice attached. Use a subject line like “Invoice #TEST-001 from Test Vendor” so it matches the search query.
- In n8n, click Test Workflow (the play button) to trigger a manual run.
- Watch the execution — each node should show a green checkmark. Click any node to inspect its output data.
- Check your Airtable base. You should see a new row with all 13 fields populated from the test invoice.
- Verify the original email is now marked as read in Gmail.
| Problem | Likely Cause | Fix |
|---|---|---|
| Gmail node returns 0 items | No unread emails match the search query | Send a test email with “invoice” in the subject and a PDF attached, then wait a moment for Gmail to index it |
| Extract From File returns empty text | The PDF is a scanned image, not text-based | Add an OCR step (Google Cloud Vision or AWS Textract) before the extraction node |
| OpenAI returns malformed JSON | Invoice text was too garbled or incomplete | Check the extracted text — if it’s unreadable, the source PDF may need OCR. Also try switching to gpt-4o for better handling |
| Airtable node fails with “field not found” | Column name mismatch between n8n mapping and Airtable | Compare column names character by character — watch for extra spaces, capitalization differences, or missing columns |
| Same invoice processed twice | Mark as Read step failed or email wasn’t marked before the next scheduled run | Check the Gmail node credentials have write permissions. Consider adding a label-based filter as a secondary guard |
Frequently Asked Questions
How much does each invoice cost to process with OpenAI?
Very little. A typical invoice is 300–800 tokens of input text. With GPT-4o mini at $0.15 per million input tokens, processing one invoice costs roughly $0.0001 — less than a hundredth of a penny. Even at 1,000 invoices per month, your OpenAI bill would be under $0.50.
Does this work with scanned PDF invoices (images, not text)?
The Extract From File node handles text-based PDFs natively. For scanned invoices where the text is embedded in an image, you’ll need to add an OCR step before extraction. Google Cloud Vision, AWS Textract, or the n8n HTTP Request node calling an OCR API all work. The rest of the workflow stays the same.
Can I use Google Sheets instead of Airtable?
Yes. Replace the Airtable node with a Google Sheets “Append Row” node and map the same fields to spreadsheet columns. The rest of the workflow doesn’t need any changes. Airtable gives you richer field types (currency, single select, date pickers), but Sheets is perfectly functional.
What if I use Outlook instead of Gmail?
Swap the two Gmail nodes for their Microsoft Outlook equivalents — “Microsoft Outlook Trigger” for fetching emails and “Microsoft Outlook” for marking as read. The search query syntax differs slightly, but the concept is identical. Update the credential type in n8n accordingly.
Can the workflow handle invoices in languages other than English?
GPT-4o mini handles most major languages well. The prompt asks for specific JSON field names in English, but the AI will extract data from French, Spanish, German, and many other languages and map it into the English schema. For best results with non-Latin scripts, consider upgrading to GPT-4o.
What happens if two invoices arrive in the same email?
The current workflow processes the first PDF attachment per email. If a sender regularly attaches multiple PDFs to one email, you can modify the workflow to loop through all attachments by adding a “Split In Batches” node after the IF check and adjusting the binary property name dynamically.
Get the Invoice Data Extractor Template
Stop building from scratch — get the complete workflow JSON, Airtable schema, and step-by-step setup guides. Import it, add your credentials, and start processing invoices in under 10 minutes.
Instant download · Works on n8n Cloud and self-hosted
What’s Next?
- Add Slack notifications — send a message to your #accounting channel whenever a high-value invoice (over $5,000) is processed.
- Auto-flag overdue invoices — add a second scheduled workflow that checks Airtable daily and flags rows where the due date has passed and payment status is still “unpaid.”
- Build a payment approval flow — route invoices above a threshold to a manager via email for approval before marking them as “approved” in Airtable.
- Connect to accounting software — push the structured data into QuickBooks, Xero, or FreshBooks via their APIs to eliminate double entry entirely.