FlowCast: human-in-the-loop demand forecasting for an energy retailer
- Client
- A regional Japanese energy retailer
- Scale
- ~1,800 end-customer accounts · 150M+ slot-level demand records
- Buyer
- Taka Takizawa, CEO
- Industry
- Energy / utilities (electricity retail)
- Geography
- Japan
- Status
- in production
Executive summary
Client. A regional Japanese electricity retailer serving roughly 1,800 end-customer accounts across residential and industrial voltage tiers. CEO Taka Takizawa was the buyer; the system is run by the operations team.
Solution. FlowCast: a human-in-the-loop forecasting system that replaces the daily Excel workflow the operations team had been using to predict next-day 30-minute electricity demand. A Python worker runs Similar Day Search overnight on five years of consolidated actuals and weather data. The operator reviews the forecast on a web dashboard, adjusts where needed, then submits to the operator's existing JEPX-facing demand management system through its CSV submission API.
Outcome. The daily cycle moved from a multi-hour gather-and-build workflow to a focused review-and-submit. The system runs the full production book, syncs actuals daily and weekly, ingests weather actuals and forecasts continuously, and pages Slack when anything breaks. Every forecast is reproducible from a (strategy_version, snapshot_date, worker_image) lineage stamp.
Why it landed.
- Pain was concrete and daily. Every weekday, before market close, the operator had to produce 48 30-minute slots per client group from a fragile spreadsheet, with separate browser tabs for weather lookup and submission.
- The hard part wasn't the forecasting math. It was the plumbing around it: three actuals sources (legacy CSV, kakuhou confirmed, nippou rolling), two weather locations matching the client's service area, partial-failure-tolerant sync jobs, and a write-back path to the existing CSV submission API.
- Human-in-the-loop framing matched the buyer's risk tolerance. Bad market-close submissions are expensive. The algorithm proposes, the operator approves.
- The system fit the operator's existing workflow instead of replacing it. Single dashboard, one login, inline edits on the predicted slots. No second vendor portal.
The brief
The operations team ran daily next-day demand forecasting out of a single Excel workbook (BG実績日量ver3.xlsm). Every weekday afternoon, before market close, the cycle was:
- Pull yesterday's actuals from the existing demand management system
- Look up the day's weather actuals from WeatherNews in a second tab
- Find similar historical days by hand
- Build next-day predictions for 48 30-minute slots, per client group
- Re-enter the result into the submission UI
Every step compounded errors. Weather lookups happened in a separate browser tab. Client-group breakouts depended on someone re-keying numbers. Forecast quality was usable, but the cycle time and the failure surface were the actual cost.
The brief was small and specific: take the spreadsheet workflow, give it a real system, keep the operator in the loop, and make sure nothing submitted was wrong.
What we built
FlowCast is a three-tier system on top of a Postgres system of record:
Review UI (Vercel) → Edge Functions (Supabase) → Prediction Worker (Fly.io Tokyo)
↓
Supabase Postgres (Tokyo)
↑
external APIs: existing demand management system + WeatherNews
The operator's day:
- Overnight, a worker pulls confirmed actuals from yesterday and ingests today's weather actuals plus tomorrow's forecast
- The worker runs Similar Day Search across five years of consolidated actuals weighted by weather and seasonality, producing 48 slot predictions per client group
- The operator opens the dashboard mid-morning, reviews the forecast against the similar days the engine leaned on, adjusts slots that look off, and submits the reviewed forecast to the existing demand management system through its CSV submission API
Behind the scenes, the system carries the load:
- Three actuals sources (legacy CSV 2020-2025, confirmed kakuhou 2026+, rolling 2-month nippou) reconciled by a consolidated view so the operator never sees source-switching logic
- Yearly partitioned tables keep queries against five years of 30-minute data fast
- Cron-triggered sync jobs (daily, weekly, continuous) with partial-failure guards so a single bad day doesn't poison the pipeline
- Bearer-token authentication on every edge function (constant-time compare, fails closed) so the API surface isn't open to the internet
- Lineage stamps on every forecast run so any prediction can be reproduced from its inputs
- Slack watchdogs that page when cron jobs fail, MAPE spikes, or sync windows are missed
How it runs in production
The Similar Day Search algorithm itself is intentionally simple: find historical days resembling the target by weather, day-of-week, season, and client events. Weight them, average. Apply per-group adjustments.
The work that matters is everything around the algorithm:
- Reconciling three actuals sources into one trustworthy read path
- Two-location weather ingestion to match the client's service geography
- A write path to the existing demand management system that matches its high/low CSV split format exactly
- A review UI that surfaces the similar days the engine used so the operator can sanity-check the basis for each slot
The operator never sees the reconciliation, the partitioning, or the failure-recovery logic. They see a dashboard, a forecast, an evidence trail, and a submit button.
Outcome
After production cutover, the system is:
- Driving daily forecasts for the client's full ~1,800-account book
- Submitting reviewed forecasts to the existing demand management system through its CSV API
- Self-healing on the common failure modes (partial sync failures, transient API errors, weather data gaps)
- Producing forecast history that's reproducible from its lineage tags rather than from screenshots and spreadsheet copies
The headline result is cycle-time compression. The operator now spends focused review time, not data-gathering time. The second-order result is that the forecast is auditable. A prediction made three weeks ago can be re-derived from its inputs without guesswork.
What we deliberately did not do
We did not promise FlowCast would beat the spreadsheet's accuracy out of the gate. We promised it would replace the workflow, leave the operator in control, and stop being a source of fragility. That's what shipped.
A second engagement would focus on:
- Temperature-band clustering for sharper similar-day weighting
- Per-client-group strategy selection (residential and industrial baselines behave differently enough that one model underfits both)
- Tighter integration of scheduled client events (planned outages, industrial-customer event calendars)
These were scoped out of the initial build deliberately. The bar for production cutover was a stable system on a simple algorithm. Tuning surface area gets added after the foundation is trusted.
Stack
| Layer | Technology |
|---|---|
| Review UI | Vite + React 19 + Tailwind 4 + Tremor (Vercel) |
| API gateway | Supabase Edge Functions (Deno / TypeScript), bearer-token authentication |
| Prediction worker | Python 3.12 on Fly.io, Tokyo region |
| Database | Supabase Postgres Pro (Tokyo) |
| External APIs | Existing demand management system (CSV high/low submission), WeatherNews (multi-location actuals + forecast) |
| Observability | pg_cron + Slack webhook watchdogs |
| Lineage | strategy_version + worker_image + snapshot_date stamps on every forecast_run |
What this means if you're considering something similar
If you have a daily or weekly forecasting, submission, or approval workflow running out of a spreadsheet today, FlowCast is the pattern.
The work is rarely the algorithm. It's the surrounding plumbing: actuals sync, external API integration, operator review UX, lineage for auditability, and alerts for when things break. Most of these are tractable in a 4-6 week build. The operator stays in the loop the whole time, which makes the cutover a workflow upgrade rather than a vendor switch.
If that fits your situation, the diagnostic is the right next step.