
A data warehouse is a centralized repository that stores large volumes of structured, historical data from multiple sources. Unlike a standard database built for daily transactions, a “data warehouse design“ is optimized specifically for analysis and business intelligence. By separating analytical workloads from operational ones, companies can run complex queries across massive datasets to identify long-term trends without slowing down their primary applications.
The design decisions you make early – schema type, ETL strategy, layer architecture – will determine whether your warehouse stays performant and maintainable at scale or becomes a source of technical debt within two years.
Data Warehouse Architecture: The Three Layers
| Layer | Name | Purpose | Tools |
|---|---|---|---|
| Source Layer | Raw / Staging | Raw data ingested from source systems as-is | Fivetran, Airbyte, Stitch |
| Storage Layer | Data Warehouse Core | Cleaned, modeled, organized data | Snowflake, BigQuery, Redshift |
| Presentation Layer | Data Marts / Reports | Subject-specific views for end users | dbt, Looker, Tableau |
A well-designed warehouse keeps these layers clearly separated. The raw layer preserves original data (critical for debugging and reprocessing). The core layer applies business logic. The presentation layer delivers curated views to business users.
Schema Design: Star vs Snowflake vs Data Vault
This is where most warehouse design conversations begin – and where many teams get stuck.
| Schema | Structure | Pros | Cons | Best For |
|---|---|---|---|---|
| Star Schema | Fact table + denormalized dimensions | Simple queries; fast aggregations; easy for BI tools | Data redundancy in dimensions | Most analytics workloads |
| Snowflake Schema | Fact table + normalized dimensions | Less redundancy; consistent hierarchies | More joins; harder for non-technical users | Complex hierarchies, strict normalization |
| Data Vault | Hubs, Links, Satellites | Highly auditable; handles schema changes well | Complex; steep learning curve | Enterprise DWH; regulatory compliance |
The practical recommendation: Start with a star schema. The query simplicity and BI tool compatibility outweigh the normalization benefits of snowflake for most teams. Move to Data Vault only if audit requirements or extremely complex historical tracking demands it.
The Fact Table: Heart of the Star Schema
The fact table stores measurable business events – sales transactions, website clicks, inventory movements. Each row represents one event.
What goes in a fact table:
- Foreign keys to dimension tables
- Numeric measures (revenue, quantity, duration)
- Date keys (joining to date dimension)
What doesn’t belong:
- Descriptive attributes (those go in dimensions)
- Text fields (bad for aggregation)
- Calculated fields that can be derived at query time
Dimension Tables: The Context Around Facts
Dimension tables provide the “who, what, where, when” context for your facts.
| Dimension | What It Describes | Example Columns |
|---|---|---|
| Date dimension | Calendar hierarchy | Date, day, month, quarter, year, fiscal period |
| Customer dimension | Customer attributes | Name, segment, region, join date |
| Product dimension | Product attributes | Name, category, SKU, price |
| Geography dimension | Location hierarchy | City, state, country, region |
Critical concept: Slowly Changing Dimensions (SCD)
What happens when a customer moves to a different state or a product changes its category? How you handle this determines whether your historical data is accurate:
| SCD Type | How It Works | When to Use |
|---|---|---|
| Type 1 | Overwrite old value | History doesn’t matter |
| Type 2 | Add new row with effective dates | Need full historical accuracy |
| Type 3 | Add new column for current/previous | Only current and one prior state needed |
Type 2 is the most common – it preserves history by adding a new dimension row with start/end dates whenever an attribute changes.
Modern Data Warehouse Stack (2024-2025)
| Component | Leading Tools |
|---|---|
| Data ingestion (ELT) | Fivetran, Airbyte, dbt |
| Storage and compute | Snowflake, BigQuery, Databricks, Redshift |
| Transformation layer | dbt (industry standard for transformations) |
| Orchestration | Airflow, Prefect, Dagster |
| BI / visualization | Tableau, Looker, Power BI, Metabase |
| Data catalog | dbt docs, Atlan, Alation |
The modern shift is from ETL (transform before loading) to ELT (load raw, then transform in the warehouse). Cloud warehouses have the compute power to transform at scale – moving the transformation logic into dbt models rather than pre-warehouse pipelines.
Common Data Warehouse Design Mistakes
| Mistake | What Happens | Fix |
|---|---|---|
| Skipping the raw/staging layer | No source data to reprocess when logic changes | Always land raw data first |
| One giant fact table | Unmanageable; mixed granularity | One fact table per business process |
| No date dimension | Date-based queries become painful | Pre-build a date dimension spanning years |
| Business logic in the BI layer | Reports become inconsistent across tools | Define metrics in the warehouse / dbt |
| Ignoring grain definition | Queries return wrong aggregations | Define exactly what one row represents |
The Bottom Line
Data warehouse design is fundamentally about separating concerns: raw data from transformed data, facts from dimensions, source logic from business logic. Start with a star schema, build a robust staging layer, adopt dbt for transformations, and define your grain before writing a single table. The teams that get this right early avoid the painful rewrites that slow every analytics team down at scale.



