TLDR: Star schemas have denormalized dimension tables, snowflake schemas have normalized dimension tables. Probably favor star schemas unless you really need a snowflake schema.

TLDR Version

Use star schemas if:

  • Need fast query performance
  • Storage is not a concern
  • Want simpler maintenance

Use snowflake schemas if:

  • Data consistency in your warehouse is critical
  • Need to optimize for storage
ConcernStar Schemas (Denormalized)Snowflake Schemas (Normalized)
StorageMore redundant, higher storage needsEfficient storage
IntegrityPotential inconsistenciesOverall better integrity from normalization
Query PerformanceFaster (fewer joins)Slower (more joins needed)

Longer Version

Star Schemas

Star schemas are the most common data model used in OLAP systems. This includes examples like Snowflake, Redshift, Clickhouse, etc.

A star schema is composed of two different types of tables: fact tables and dimension tables.

Fact Tables

In a fact table, each row represents an event that happened at a particular time.

  • Some columns in the table are attributes
    • Cost of buying an item, the price of the item sold, etc
  • Other columns are foreign keys to dimension tables

Dimension Tables

Dimension tables are used to represent the who, what, where, when, how, and why of the event. If we had a "sales" fact table, it might have these dimensions:

Dimension TypeExample DimensionExample Dimension Columns
whocustomercustomer_name, location, phone
whatproductproduct_name, category, brand
wherestorelocation, region, market
whentimedate, month, year, season
howshippingdelivery_method, carrier
whypromotionpromotion_name, discount_type

Snowflake Schemas

The only difference between a snowflake schema and a star schema is that the snowflake schema's dimension tables are normalized.

Consider a simple sales database using a snowflake schema with a sales fact table, product dimension, and a category dimension (pictured below).

By normalizing the product dimension and referencing categories by id, any updates for a product's category (moving "Electronics" to "Consumer Electronics") only needs to happen in one place: the category dimension.

diagram of a snowflake schema

This results in:

Stronger data integrity

  • Updates to category information happen in one place
  • Can't accidentally have "Electronics" and "electronics" as different categories

Better storage efficiency

  • Category details stored once, not repeated for every product

Single source of truth for data

  • Each piece of information exists in exactly one place
  • Easier to enforce business rules