Section I · Method

How it
works.

The case is answered on the dashboard. This page shows the machinery behind it: the data pipeline, the engine choice, and the actual SQL.

Architecture · Pipeline · SQL

01 · Architecture

Three lanes, one loop.

Static files on a CDN, an in-browser SQL engine, and a parquet file that never leaves the user's device. The same loop Power BI runs — without the desktop tool, the .pbix file, or the workspace setup.

Build time

Offline · runs once

15 Excel files
~94 MB raw
Python ETL
pandas + openpyxl
sales.parquet
5.8 MB · ZSTD

CDN

Netlify · static assets

Static HTML/CSS/JS
Next.js build
duckdb.wasm
~3 MB · lazy
sales.parquet
5.8 MB · cached

Runtime

Browser · your device

React UI
date picker → event
DuckDB-WASM
SQL window funcs
Recharts viz
KPIs + chart

The same loop Power BI runs — load, aggregate, render — but compiled into a few megabytes of static files. No server, no scheduler, no license. Every visitor gets the same engine running on their device.

02 · Pipeline

Excel → Parquet, in six steps.

The data pipeline runs once, offline. It produces a single 5.8 MB parquet that downstream pages query live.

Read

15 .xlsx files via pandas + openpyxl. Header at row 3 (rows 1–2 are title/blank).

pd.read_excel(path, sheet_name='Günlük Satis Raporu', header=2)

Rename

Turkish source columns → canonical English. Drop the always-null Metrics column.

df.rename(columns=COLUMN_MAP)

Coerce

Numeric columns to Int32/Int64/Float32/Float64 with errors=coerce → NaN. Day to date32.

df['day'] = pd.to_datetime(df['day']).dt.date

Concatenate

All 15 frames into one DataFrame. 1,784,613 rows × 11 columns.

df = pd.concat(frames, ignore_index=True)

Validate

Null rates, duplicate (store, product, day) keys, p99.9 outliers, negatives, zero-sales rows. Honest report.

validation.json → 6 anomalies surfaced

Write

Parquet with ZSTD level 9, dictionary-encoded strings (vendor/store/product_name), 64k row groups for mobile streaming.

pq.write_table(t, path, compression='zstd', compression_level=9)

03 · Engine choice

Why DuckDB-WASM.

Three real alternatives. One fit.

Pre-aggregate to JSON

Tiny payload (~100 KB)
Zero engine bundle

Every new question = new aggregation + redeploy
Schema locked at build time
No flexibility for the date filter case requirement

Insufficient

Backend API + SQL DB

Server-side SQL flexibility
Hide raw data

Needs hosting & uptime
Network round-trip per query
Breaks offline
Adds auth, CORS, rate-limit complexity

Overkill for a case study

DuckDB-WASM in browser

Real SQL flexibility, full window functions
Offline-first after first load (SW cache)
Zero backend, zero scheduling, zero ops
URL-shareable, embed-anywhere

~3 MB engine bundle (paid once, cached forever)

Right tool for this question

04 · The query

One SQL statement. Four windows.

This is the exact query that runs every time you move the date picker on the dashboard. No paraphrase, no simplification.

queries.ts · kpiQuery()

WITH sel AS (
  SELECT CAST('2022-08-16' AS DATE) AS sel_date    -- ❶ user picks the date
)
SELECT
  -- ❷ YTD: Jan 1 of selected year → selected date
  SUM(CASE WHEN day BETWEEN date_trunc('year',  (SELECT sel_date FROM sel))
                       AND (SELECT sel_date FROM sel)
           THEN net_sales END) AS ytd,

  -- ❸ MTD: 1st of selected month → selected date
  SUM(CASE WHEN day BETWEEN date_trunc('month', (SELECT sel_date FROM sel))
                       AND (SELECT sel_date FROM sel)
           THEN net_sales END) AS mtd,

  -- ❹ Previous-year mirror windows (offset by INTERVAL 1 YEAR)
  SUM(CASE WHEN day BETWEEN date_trunc('year',  (SELECT sel_date FROM sel) - INTERVAL 1 YEAR)
                       AND (SELECT sel_date FROM sel) - INTERVAL 1 YEAR
           THEN net_sales END) AS ytd_py,

  SUM(CASE WHEN day BETWEEN date_trunc('month', (SELECT sel_date FROM sel) - INTERVAL 1 YEAR)
                       AND (SELECT sel_date FROM sel) - INTERVAL 1 YEAR
           THEN net_sales END) AS mtd_py
FROM sales;

Annotations

❶

Parameter bind

The user-selected date enters as a CTE. Everything else references it — there's no max(day) trick.

❷

YTD window

date_trunc('year', sel) gives Jan 1 of the selected year. Range: [Jan 1 → selected date].

❸

MTD window

date_trunc('month', sel) gives the 1st of the selected month. Range: [month-start → selected date].

❹

Previous-year window

sel − INTERVAL 1 YEAR shifts the entire window back one year, preserving day-of-year. PY% is then (curr − prev) / prev on the client.

05 · Schema

Source columns mapped to canonical names.

Turkish source columns are renamed; types are explicit; nothing is implicit.

Source (TR)	Canonical (EN)	Type	Note
Satıcı	vendor	string	Single vendor (COMPANY X) in this dataset
Satıcı Ürün Kodu	product_code	int32	—
Alıcı Ürün Kodu	buyer_product_code	int32	Usually == product_code; preserved separately
Satıcı Ürün Adı	product_name	string	Dict-encoded in parquet
Barkod	barcode	int64	—
Mağaza	store	string	Dict-encoded · 2,097 distinct
Satıcı Teslim Noktası Kodu	delivery_point	int32	—
Gün_	day	date32	Datetime in source; truncated to day
Metrics	—	—	Always null · dropped
Satış Miktarı	qty	float32	—
Kg/Lt	kg_lt	float32	—
Net Satis Tutari (TL)	net_sales	float64	Primary fact column

Method is in place.
Now let's ask what the data actually shows →