Zarif Automates

How to Automate Report Generation with AI

ZarifZarif
|

I used to spend Sunday nights pulling numbers from five dashboards, copy-pasting into a Google Doc, and writing the same paragraph for the third time that month. Now an automation does it in 90 seconds and emails me before I wake up. Here is the exact system, what it costs, and why most "AI report" tutorials online are missing the part that actually matters.

Definition

Automated report generation is the practice of pulling data from source systems on a schedule, using a language model to write narrative analysis, and rendering the result as a formatted document delivered to stakeholders.

TL;DR

  • The full pipeline takes about 6 hours to build and runs for under $10 per month per report.
  • The hard part is not the AI — it is making the data layer reproducible and the output format stable.
  • GPT-5-mini ($0.25/$2.00 per 1M tokens) handles weekly metric reports; reserve GPT-5 or Claude Sonnet 4.6 for executive-summary-grade analysis.
  • Render to Google Docs first, then convert to PDF — it gives you the cleanest typography for the lowest effort.
  • Always include a "what changed since last week" section. That is where the real insight lives.

Why most AI report tutorials miss the point

Search "AI report generation" and you get 50 articles that show ChatGPT writing a report from a CSV. That is not automation. That is a demo. Real automation has four properties most tutorials skip:

  1. The data pulls itself on a schedule with no human paste step
  2. The report's structure is the same every week so readers can scan it
  3. The narrative cites specific numbers and changes, not vague summaries
  4. The output lands in the inbox or Slack channel where the audience already is

Skip any of those and you built a toy. Get all four and you have a system that pays for itself in the first week.

The architecture

The pipeline is six stages:

  1. Trigger — cron schedule, usually weekly Monday at 6 AM
  2. Data layer — SQL queries, API pulls, or Google Sheets reads
  3. Aggregation — compute deltas versus the prior period
  4. Narration — LLM writes the analysis sections from a structured prompt
  5. Render — output to Google Doc, PDF, or markdown
  6. Delivery — email, Slack, or both

The middle step, aggregation, is the one almost everyone skips. Without explicit deltas, the LLM hallucinates trends.

Step 1: Decide what report you are automating first

Do not boil the ocean. Pick one report. The candidates that work:

  • Weekly revenue report — Stripe data, MRR, new versus churn, top 10 customers
  • Marketing performance — GA4, ad platform spend, CPL, CAC trend
  • Sales pipeline — HubSpot or Pipedrive, weighted pipeline, conversion rates
  • Product analytics — feature adoption, DAU/MAU, retention cohorts
  • Ops weekly — ticket volume, response time, top categories

I tell clients to start with whichever report they currently spend the most time on manually. That is your highest-ROI build.

Step 2: Lock the data layer first

Before you touch an LLM, your data needs to be deterministic. Three rules:

  1. Every metric has one canonical source. No "the dashboard says X but the export says Y."
  2. Every metric has a documented SQL query or API call you can re-run.
  3. The output is a structured JSON blob, not free text.

I keep a metrics.sql file in a private repo with named queries. The automation runs them against Postgres and dumps results as JSON. That JSON is what the LLM sees, never the raw database.

Example structure:

{
  "period": "2026-04-28 to 2026-05-04",
  "mrr": {"current": 47200, "prior": 44800, "delta_pct": 5.4},
  "new_customers": 12,
  "churned_customers": 3,
  "top_accounts": [{"name": "Acme", "mrr": 2400}]
}

Clean inputs, clean outputs. Garbage in, hallucinated trends out.

Step 3: Write a structured prompt with an output schema

Free-form prompts produce free-form reports. You want the same structure every week. Use the OpenAI Responses API with response_format (Structured Outputs, strict JSON Schema) — or Anthropic's tool-use with a structured tool — then render the JSON to your final document.

response = client.responses.create(
    model="gpt-5-mini",
    input=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": json.dumps(metrics_blob)}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "weekly_report",
            "strict": True,
            "schema": REPORT_SCHEMA
        }
    }
)

My prompt template:

You are the analytics director writing a weekly report.
Input: a JSON object with metrics for the current and prior period.
Write four sections:
1. Headline — a single sentence stating the most important change.
2. Wins — 2 to 3 bullet points with specific numbers.
3. Concerns — 2 to 3 bullet points with specific numbers.
4. Recommended actions — 2 to 3 bullets, each tied to a metric.

Rules: cite the actual numbers from the input. Never invent metrics.
If a number is flat, say "flat at X" and do not call it a trend.

The "flat is flat" instruction is crucial. Without it, models invent trends out of statistical noise.

Tip

Run the same prompt twice on the same input and compare. If the two outputs disagree on what is "the headline," your prompt is too vague. Tighten the rules until you get consistent outputs.

Step 4: Compute deltas and anomalies before the LLM sees them

Do not ask the LLM to do math. It is bad at it and there is no upside. Pre-compute everything in code:

  • Period-over-period percent change for every metric
  • Z-score versus the trailing 8-week average to flag anomalies
  • Rank changes in any "top N" lists

Pass those as additional fields in the input JSON. The LLM's only job is to narrate, not to calculate. This is the single biggest reliability lever in the whole pipeline.

Step 5: Render to Google Docs (and then PDF)

You have three rendering options:

  • Markdown to PDF with markdown-pdf or Pandoc — fastest, ugliest
  • Google Docs API — best typography, easiest collaboration
  • HTML to PDF with Puppeteer — most control, most fragile

I use Google Docs because finance and exec teams already live there. The Docs API has a batchUpdate endpoint that takes formatting instructions in JSON. Create from a template doc, replace placeholder tokens like {{HEADLINE}}, and you have a branded report in seconds.

For PDF delivery, the Drive API has a export?mimeType=application/pdf query that converts on the fly.

Step 6: Add the "what changed" section

This is the secret sauce. Most reports tell you what the numbers are. Good reports tell you what changed and why it matters. Build a section that compares this week to last week explicitly:

  • Metrics that flipped from green to red or vice versa
  • Customers that entered or left the top 10
  • Anomalies above 2 standard deviations from the trailing average

This section is what makes executives actually read the report.

Step 7: Deliver where people already are

A report sitting in Drive that nobody opens is a waste of your $10. Push deliveries:

  • Email — Gmail or SendGrid, with the PDF attached and the headline section in the body
  • Slack — post the headline to a channel with a link to the full doc
  • Notion — create a new page weekly under a "Reports" parent

I run all three in parallel because different audiences want different surfaces. Cost is negligible.

Step 8: Build a kill switch and a sanity check

Before sending, run a sanity check:

  1. Did every metric in the JSON appear somewhere in the report?
  2. Are all numbers in the report present in the source JSON?
  3. Did the model hallucinate a metric name not in the input?

If any check fails, route to a human review queue instead of sending. I use a simple regex match for numbers and a name-list check for metric names. Catches about 95 percent of bad outputs.

Warning

Never put an automated report into the executive team's inbox without two weeks of dry runs. The first time it sends to your CEO with hallucinated numbers, you lose all credibility for the system. Send it to yourself first.

What this costs

For a single weekly report of medium complexity:

  • OpenAI API (GPT-5-mini, ~5,000 input + 1,500 output tokens per run): about $0.005 per report, ~$0.02/month at weekly cadence
  • n8n self-hosted on a $5 VPS or Cloud Starter at $24/month for 2,500 executions; Make.com is now credit-based as of November 2025
  • Google Workspace (already paying for it): $0 incremental
  • Postgres on Supabase free tier: $0

Total: under $25 per month for one report. Compare to the 4 hours per week a human spent doing it manually — that is a $200 per week savings at $50 per hour. Two-week payback.

What goes wrong in production

Three common failures, in order of frequency.

Schema drift. Your data source changes a column name and the entire pipeline silently breaks. Mitigation: add a schema validation step that fails loud.

Model regressions. OpenAI rotates the default gpt-5-mini alias and the report tone shifts. Anthropic does the same with claude-sonnet-4-6. Mitigation: pin to a specific dated snapshot (e.g. gpt-5-mini-2025-08-07) and update on a schedule, not automatically.

Empty data periods. Holiday weeks have anomalously low data and the model writes "catastrophic decline." Mitigation: add an is_holiday flag to the input JSON and instruct the prompt to handle it.

FAQ

What is the best AI model for report generation?

GPT-5-mini ($0.25/$2.00 per 1M tokens, May 2026) is the right default for routine weekly reports. Use GPT-5 ($1.25/$10.00) or Claude Sonnet 4.6 ($3/$15) for executive-grade summaries where tone and nuance matter. The model choice matters less than the quality of your input data and prompt structure.

Can I automate reports without coding?

Yes, with n8n, Make.com, or Zapier you can wire together a database query, an OpenAI call, and a Google Docs update with no code. The trade-off is debugging — when something breaks, code is easier to step through than visual nodes.

How do I prevent the AI from making up numbers?

Pre-compute every number in code before the LLM sees the data, force the model to only use values from the input JSON, and run a post-generation sanity check that flags any number in the output not present in the input. That three-layer approach catches almost everything.

How long does it take to build the first report?

Plan for 6 to 10 hours end to end if you have basic SQL and API skills. Half of that is on the data layer, not the AI part. Subsequent reports are much faster because you reuse the rendering and delivery components.

Should I use ChatGPT or Claude for report writing?

Both work. GPT-5 and GPT-5-mini have first-class Structured Outputs (strict JSON Schema), making them slightly better for hard-formatted reports. Claude Sonnet 4.6 has a reputation for cleaner narrative tone and supports prompt caching at 10 percent of base input price for repeat context. For most weekly reports the difference is invisible. Pick the one whose API you already have wired up.

The best report is the one that gets read. Automate the pull, automate the math, automate the writing, and spend your Sunday nights doing literally anything else.

Zarif

Zarif

Zarif is an AI automation educator helping thousands of professionals and businesses leverage AI tools and workflows to save time, cut costs, and scale operations.