Section I · Method

How it
works.

The case is answered on the dashboard. This page shows the machinery behind it: the data pipeline, the engine choice, and the actual SQL.

Architecture · Pipeline · SQL
01 · Architecture

Three lanes, one loop.

Static files on a CDN, an in-browser SQL engine, and a parquet file that never leaves the user's device. The same loop Power BI runs — without the desktop tool, the .pbix file, or the workspace setup.

01
Build time
Offline · runs once
  1. 15 Excel files
    ~94 MB raw
  2. Python ETL
    pandas + openpyxl
  3. sales.parquet
    5.8 MB · ZSTD
02
CDN
Netlify · static assets
  1. Static HTML/CSS/JS
    Next.js build
  2. duckdb.wasm
    ~3 MB · lazy
  3. sales.parquet
    5.8 MB · cached
03
Runtime
Browser · your device
  1. React UI
    date picker → event
  2. DuckDB-WASM
    SQL window funcs
  3. Recharts viz
    KPIs + chart

The same loop Power BI runs — load, aggregate, render — but compiled into a few megabytes of static files. No server, no scheduler, no license. Every visitor gets the same engine running on their device.

02 · Pipeline

Excel → Parquet, in six steps.

The data pipeline runs once, offline. It produces a single 5.8 MB parquet that downstream pages query live.

01

Read

15 .xlsx files via pandas + openpyxl. Header at row 3 (rows 1–2 are title/blank).

pd.read_excel(path, sheet_name='Günlük Satis Raporu', header=2)
02

Rename

Turkish source columns → canonical English. Drop the always-null Metrics column.

df.rename(columns=COLUMN_MAP)
03

Coerce

Numeric columns to Int32/Int64/Float32/Float64 with errors=coerce → NaN. Day to date32.

df['day'] = pd.to_datetime(df['day']).dt.date
04

Concatenate

All 15 frames into one DataFrame. 1,784,613 rows × 11 columns.

df = pd.concat(frames, ignore_index=True)
05

Validate

Null rates, duplicate (store, product, day) keys, p99.9 outliers, negatives, zero-sales rows. Honest report.

validation.json → 6 anomalies surfaced
06

Write

Parquet with ZSTD level 9, dictionary-encoded strings (vendor/store/product_name), 64k row groups for mobile streaming.

pq.write_table(t, path, compression='zstd', compression_level=9)
03 · Engine choice

Why DuckDB-WASM.

Three real alternatives. One fit.

A

Pre-aggregate to JSON

  • Tiny payload (~100 KB)
  • Zero engine bundle
  • Every new question = new aggregation + redeploy
  • Schema locked at build time
  • No flexibility for the date filter case requirement
Insufficient
B

Backend API + SQL DB

  • Server-side SQL flexibility
  • Hide raw data
  • Needs hosting & uptime
  • Network round-trip per query
  • Breaks offline
  • Adds auth, CORS, rate-limit complexity
Overkill for a case study
C

DuckDB-WASM in browser

  • Real SQL flexibility, full window functions
  • Offline-first after first load (SW cache)
  • Zero backend, zero scheduling, zero ops
  • URL-shareable, embed-anywhere
  • ~3 MB engine bundle (paid once, cached forever)
Right tool for this question
04 · The query

One SQL statement. Four windows.

This is the exact query that runs every time you move the date picker on the dashboard. No paraphrase, no simplification.

queries.ts · kpiQuery()
WITH sel AS (
  SELECT CAST('2022-08-16' AS DATE) AS sel_date    -- ❶ user picks the date
)
SELECT
  -- ❷ YTD: Jan 1 of selected year → selected date
  SUM(CASE WHEN day BETWEEN date_trunc('year',  (SELECT sel_date FROM sel))
                       AND (SELECT sel_date FROM sel)
           THEN net_sales END) AS ytd,

  -- ❸ MTD: 1st of selected month → selected date
  SUM(CASE WHEN day BETWEEN date_trunc('month', (SELECT sel_date FROM sel))
                       AND (SELECT sel_date FROM sel)
           THEN net_sales END) AS mtd,

  -- ❹ Previous-year mirror windows (offset by INTERVAL 1 YEAR)
  SUM(CASE WHEN day BETWEEN date_trunc('year',  (SELECT sel_date FROM sel) - INTERVAL 1 YEAR)
                       AND (SELECT sel_date FROM sel) - INTERVAL 1 YEAR
           THEN net_sales END) AS ytd_py,

  SUM(CASE WHEN day BETWEEN date_trunc('month', (SELECT sel_date FROM sel) - INTERVAL 1 YEAR)
                       AND (SELECT sel_date FROM sel) - INTERVAL 1 YEAR
           THEN net_sales END) AS mtd_py
FROM sales;
Annotations
Parameter bind

The user-selected date enters as a CTE. Everything else references it — there's no max(day) trick.

YTD window

date_trunc('year', sel) gives Jan 1 of the selected year. Range: [Jan 1 → selected date].

MTD window

date_trunc('month', sel) gives the 1st of the selected month. Range: [month-start → selected date].

Previous-year window

sel − INTERVAL 1 YEAR shifts the entire window back one year, preserving day-of-year. PY% is then (curr − prev) / prev on the client.

05 · Schema

Source columns mapped to canonical names.

Turkish source columns are renamed; types are explicit; nothing is implicit.

Source (TR)Canonical (EN)Type
Satıcıvendorstring
Satıcı Ürün Koduproduct_codeint32
Alıcı Ürün Kodubuyer_product_codeint32
Satıcı Ürün Adıproduct_namestring
Barkodbarcodeint64
Mağazastorestring
Satıcı Teslim Noktası Kodudelivery_pointint32
Gün_daydate32
Metrics
Satış Miktarıqtyfloat32
Kg/Ltkg_ltfloat32
Net Satis Tutari (TL)net_salesfloat64