HomeAI AutomationHow to Build an AI Lead…
AI Automation

How to Build an AI Lead Processing Pipeline with Apify, Gemini & Google Sheets in n8n

How to Build an AI Lead Processing Pipeline with Apify, Gemini & Google Sheets in n8n

Your sales team is drowning in raw lead data. You’ve scraped thousands of contacts from LinkedIn, Apollo, or custom sources — but half have missing emails, the phone formats are inconsistent, and nobody wants to manually clean 2,000 rows in a spreadsheet before Monday’s call list is ready. There has to be a better way.

There is. This guide walks you through building a fully automated AI lead processing pipeline in n8n that scrapes leads with Apify, validates and deduplicates them using Google Gemini, writes clean records to Google Sheets, and sends your team a Telegram summary — all without touching a single CSV file.

Prefer to skip the build? Grab the ready-made template → and have it running in under 10 minutes.

What You’ll Build

  1. A parent workflow (or manual trigger) fires this sub-workflow, passing along scraping parameters.
  2. n8n sends a POST request to kick off your Apify actor (e.g., Apollo scraper, LinkedIn scraper, or any custom actor).
  3. After a 30-second wait, the workflow polls Apify for results and feeds them into a batching loop.
  4. Google Gemini AI processes up to 1,000 leads per batch — validating required fields, deduplicating by email, generating unique Lead IDs, and formatting phone/location data.
  5. Validated leads are appended to a Google Sheet. Your team receives a Telegram message per batch showing exactly how many leads were added, flagged, or rejected.

How It Works — The Big Picture

+————————————————————————-+
| AI LEAD PROCESSING PIPELINE |
| |
| [Execute Trigger] -> [POST Apify] -> [Loop/Batch] |
| | |
| +——+——+ |
| v (batch out) | (done) |
| [AI Agent] [Wait 30s] -> [GET Apify] |
| | ^ |
| | +– (feeds back to Loop) |
| v |
| [Telegram Batch Report] |
| |
| AI Agent Tools: [Append to Sheets] [Check for Duplicates] |
+————————————————————————-+

The workflow uses n8n’s SplitInBatches node as a loop controller. Output 0 sends the current batch to the AI agent; output 1 routes through a 30-second wait to poll Apify for fresh data, which feeds back into the loop.

What You’ll Need

  • n8n instance (self-hosted or cloud) — n8n.io
  • Apify account — apify.com (free tier: 5 compute units/month; paid from $49/mo)
  • Google account with Gemini API access — aistudio.google.com
  • Google Sheets spreadsheet (lead database)
  • Telegram Bot (via @BotFather — free)
  • Build time: ~45 minutes from scratch | Template time: ~8 minutes

Step-by-Step Build

Step 1 — When Executed by Another Workflow (Trigger)

This workflow runs as a sub-workflow called by a parent orchestrator. Using executeWorkflowTrigger with passthrough mode means any data the parent sends flows directly into this pipeline.

  1. Add an Execute Workflow Trigger node.
  2. Set Input Source to passthrough.
  3. In your parent workflow, use an Execute Workflow node pointing to this workflow’s ID.
Tip: Test standalone by temporarily replacing the trigger with a Manual Trigger node during development.

Step 2 — Trigger Apify Scrape (HTTP POST)

This HTTP Request node fires your Apify actor to start scraping.

  1. Set Method to POST.
  2. Set URL: https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs
  3. Add Authorization header: Bearer YOUR_APIFY_API_TOKEN
  4. Set Body to JSON with your actor’s input schema.
// Example: Apollo.io scraper input
{
  "searchUrl": "https://app.apollo.io/#/people?sortByField=...",
  "maxResults": 1000
}
Tip: Find your Actor ID in Apify Store → your actor → API tab. Token at apify.com/account/integrations.

Step 3 — Loop Over Items (SplitInBatches)

The SplitInBatches node processes data in chunks of 1,000 and routes through two paths.

  1. Set Batch Size to 1000.
  2. Set Options → Reset to false.
  • Output 0 (current batch): routes to the AI Agent.
  • Output 1 (done): routes to Wait → Apify GET → back to loop.

Step 4 — Wait 30s for Apify

Pauses 30 seconds for the Apify actor to complete before polling results.

  1. Set Amount to 30, Unit to Seconds.
Tip: For 5,000+ result jobs, increase to 60–120s or use Apify’s webhook callback to n8n — more reliable for production.

Step 5 — Fetch Apify Results (HTTP GET)

Retrieves all scraped leads from the last Apify actor run.

  1. Set Method to GET.
  2. Set URL: https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs/last/dataset/items
  3. Same Authorization header as Step 2.
// Example Apify lead record
{
  "name": "Sarah Thompson",
  "email": "sarah.thompson@acmecorp.com",
  "phone": "+1 512 867 5309",
  "company": "Acme Corp",
  "title": "VP of Marketing",
  "location": "Austin, TX"
}

Step 6 — AI Lead Processing Agent (Google Gemini)

A Langchain AI Agent powered by Google Gemini validates and deduplicates leads using two Google Sheets tool calls.

Per batch, the agent:

  1. Validates required fields: Name, Email, Company Name.
  2. Deduplicates by checking existing emails in the sheet.
  3. Generates Lead IDs: AP-DDMMYY-xxxx (e.g. AP-110426-0001).
  4. Formats phones (prefers mobile, wraps in quotes) and location (City, Country).
  5. Writes valid leads to Google Sheets one row at a time.
  6. Returns one consolidated Telegram summary for the entire batch.
Tip: Use gemini-1.5-flash for high-volume batches — faster and cheaper than Pro with no quality loss for data validation tasks.

Step 7 — Append Lead to Google Sheets (AI Tool)

AI-callable tool that appends a validated lead row. The agent calls this once per valid lead.

  1. Set Operation to Append, connect your Google Sheets OAuth2 credential.
  2. Set Document ID (from sheet URL) and Sheet Name.
  3. Map columns via $fromAI('FieldName') expressions.

Step 8 — Check for Duplicate Leads (AI Tool)

AI-callable tool that reads existing emails to detect duplicates before writing.

  1. Set Operation to Read Rows, same document and sheet as Append tool.
  2. Connect as an AI Tool to the agent.
Tip: For sheets with 10,000+ rows, use a Lookup operation on the Email column instead of reading all rows.

Step 9 — Send Batch Report via Telegram

Sends the AI agent’s structured summary to your Telegram chat.

  1. Set Chat ID to your chat ID (from @userinfobot).
  2. Set Text to ={{ $json.output }}.
Sample Telegram output:
✅ Batch Complete
Total Processed: 47 | Added: 41 | Flagged: 4 | Skipped: 2

Contacts Added:
- James Carter -- Acme Corp -- james.carter@acmecorp.com
- Emily Rodriguez -- TechVentures -- emily.r@techventures.io
- Michael Chen -- GrowthLabs -- m.chen@growthlabs.com
(+38 more)

⚠️ Flagged: Lead 12 missing LinkedIn; Lead 28 missing Job Title
❌ Skipped: Lead 5 missing Email; Lead 33 missing Company Name

Google Sheets Schema

Column Type Example Notes
Lead ID Text AP-110426-0001 Auto-generated — required
Name Text James Carter Required field
Email Text james.carter@acmecorp.com Required — deduplication key
Phone Number Text “(512) 867-5309” Quoted to preserve format
Company Name Text Acme Corp Required field
Job Title Text VP of Marketing Optional — flagged if missing
Website / LinkedIn URL linkedin.com/in/jamescarter Optional
Address Text Austin, TX City + State/Country only
Company Summary Text B2B SaaS, marketing automation Optional — AI-generated
Relevant Partner Text HubSpot Optional

Customization Ideas

  • Multiple Apify actors: Chain this workflow for Apollo, LinkedIn, or Instagram scrapers all feeding one clean sheet.
  • Apify webhook: Replace the Wait node with an Apify webhook that POSTs results to n8n instantly — more reliable at scale.
  • CRM push: Add a HubSpot or Salesforce node after Google Sheets to sync validated leads directly to your CRM.
  • Lead scoring: Extend the Gemini agent prompt to assign a 1–10 score based on title, industry, and company size.
  • Slack alerts: Replace Telegram with a Slack node to post batch reports to your #leads channel.

Skip the Build — Get the Template

Full workflow JSON, Setup Guide PDF, and Credentials Guide PDF — ready to import and activate.

Get the Template — $14.99 →

Troubleshooting

  • Agent returns empty output: Check Gemini API key validity and confirm model name (gemini-1.5-flash or gemini-1.5-pro).
  • Apify GET returns empty array: Actor still running — increase Wait to 60–120s or switch to Apify webhooks.
  • Google Sheets “Not found”: Verify Document ID and Sheet GID in both tool nodes match the actual URL.
  • Telegram not sending: Confirm bot token is valid and Chat ID is numeric (check with @userinfobot).
  • Memory key error: Static session key is intentional for sub-workflow isolation — change to dynamic if needed.