Database Design Principles and Normalization (1NF, 2NF, 3NF)

A database that works today can become a nightmare tomorrow if its tables are poorly structured. Duplicate data creeps in, updates require touching dozens of rows, a single typo corrupts records across the system, and queries that should be simple turn into convoluted workarounds. These problems do not come from bad SQL. They come from bad design.

Database normalization is the systematic process of organizing tables and columns to reduce redundancy, eliminate inconsistencies, and make your data reliable by design. It is built on a set of progressive rules called "normal forms," each one building on the previous to tighten data integrity further. Understanding normalization is essential for every developer who creates or modifies database schemas, because the decisions you make at the design stage affect every query, every update, and every bug report for the lifetime of the application.

This guide explains the first three normal forms with concrete before-and-after examples, discusses when breaking the rules (denormalization) is the right call, and walks through designing a complete e-commerce schema from scratch so you can see normalization applied to a real project.

Why Normalization Matters

Consider this single table that stores order data for a small online shop:

CREATE TABLE shop_data (
    order_id       INT,
    order_date     DATE,
    customer_name  VARCHAR(100),
    customer_email VARCHAR(100),
    customer_city  VARCHAR(50),
    product_name   VARCHAR(100),
    product_price  DECIMAL(10,2),
    quantity       INT
);

INSERT INTO shop_data VALUES
(1, '2024-06-01', 'Alice Martin', 'alice@example.com', 'New York', 'Wireless Mouse', 25.99, 2),
(1, '2024-06-01', 'Alice Martin', 'alice@example.com', 'New York', 'USB-C Cable',     9.99, 1),
(2, '2024-06-03', 'Bob Jones',    'bob@example.com',   'Chicago',  'Wireless Mouse', 25.99, 1),
(3, '2024-06-05', 'Alice Martin', 'alice@example.com', 'New York', 'Desk Lamp',      34.50, 1);

This table "works." You can query it and get results. But it has serious problems:

Redundancy: Alice's name, email, and city are stored three times. The Wireless Mouse name and price are stored twice. Every repeated value wastes storage and creates opportunities for inconsistency.
Update anomalies: If Alice moves to Boston, you must update every row where she appears. Miss one, and your database says Alice lives in both New York and Boston simultaneously.
```
-- Must update ALL rows for Alice. Miss one and data is inconsistent.
UPDATE shop_data SET customer_city = 'Boston'
WHERE customer_email = 'alice@example.com';
```
Insert anomalies: You cannot add a new product to the system until someone orders it, because the product data is tied to order rows.
Delete anomalies: If you delete Bob's only order, you lose the fact that Bob exists as a customer entirely.

Normalization solves all of these problems by splitting the data into properly structured, related tables.

First Normal Form (1NF)

The first normal form establishes the most basic structural rules for a relational table.

The Rules

A table is in 1NF when:

Every column contains atomic (indivisible) values. No lists, no sets, no comma-separated strings.
Every row is unique (the table has a primary key).
Each column contains values of a single type.

Violation: Non-Atomic Values

Here is a table that violates 1NF by cramming multiple values into single columns:

CREATE TABLE orders_bad (
    order_id    INT,
    customer    VARCHAR(200),
    products    VARCHAR(500),
    quantities  VARCHAR(100)
);

INSERT INTO orders_bad VALUES
(1, 'Alice Martin', 'Wireless Mouse, USB-C Cable', '2, 1'),
(2, 'Bob Jones',    'Wireless Mouse',              '1'),
(3, 'Alice Martin', 'Desk Lamp, Pen Pack, Notebook', '1, 3, 2');

The products and quantities columns contain comma-separated lists. This creates immediate problems:

-- How many Wireless Mouse units were sold total? Impossible with simple SQL.
-- Which orders contain "USB-C Cable"? Requires string parsing.
SELECT * FROM orders_bad WHERE products LIKE '%USB-C Cable%';
-- Fragile: matches "USB-C Cable Adapter" too

You cannot reliably query, sort, filter, or aggregate individual products because they are buried inside strings.

Fix: Make Every Value Atomic

Split each value into its own row:

CREATE TABLE order_items_1nf (
    order_id      INT,
    customer_name VARCHAR(100),
    product_name  VARCHAR(100),
    quantity      INT,
    PRIMARY KEY (order_id, product_name)
);

INSERT INTO order_items_1nf VALUES
(1, 'Alice Martin', 'Wireless Mouse', 2),
(1, 'Alice Martin', 'USB-C Cable',    1),
(2, 'Bob Jones',    'Wireless Mouse', 1),
(3, 'Alice Martin', 'Desk Lamp',      1),
(3, 'Alice Martin', 'Pen Pack',       3),
(3, 'Alice Martin', 'Notebook',       2);

Now every cell contains exactly one value:

order_id	customer_name	product_name	quantity
1	Alice Martin	Wireless Mouse	2
1	Alice Martin	USB-C Cable	1
2	Bob Jones	Wireless Mouse	1
3	Alice Martin	Desk Lamp	1
3	Alice Martin	Pen Pack	3
3	Alice Martin	Notebook	2

Queries become straightforward:

-- Total Wireless Mouse units sold
SELECT SUM(quantity) FROM order_items_1nf WHERE product_name = 'Wireless Mouse';
-- Result: 3

info

1NF is the absolute minimum for a properly functioning relational table. If your table violates 1NF, most SQL operations become unreliable or impossible. No amount of clever querying can compensate for a structural flaw at this level.

The table is now in 1NF, but it still has redundancy. Alice's name appears three times. We need the next normal form.

Second Normal Form (2NF)

The second normal form eliminates partial dependencies: situations where a non-key column depends on only part of a composite primary key instead of the whole key.

The Rules

A table is in 2NF when:

It is already in 1NF.
Every non-key column depends on the entire primary key, not just part of it.

2NF only applies to tables with composite primary keys (keys made of two or more columns). If a table has a single-column primary key, it is automatically in 2NF once it satisfies 1NF.

Violation: Partial Dependency

Look at our order_items_1nf table. The composite primary key is (order_id, product_name). Let's add a product price column:

CREATE TABLE order_items_partial (
    order_id      INT,
    product_name  VARCHAR(100),
    product_price DECIMAL(10,2),
    customer_name VARCHAR(100),
    quantity      INT,
    PRIMARY KEY (order_id, product_name)
);

INSERT INTO order_items_partial VALUES
(1, 'Wireless Mouse', 25.99, 'Alice Martin', 2),
(1, 'USB-C Cable',     9.99, 'Alice Martin', 1),
(2, 'Wireless Mouse', 25.99, 'Bob Jones',    1),
(3, 'Desk Lamp',      34.50, 'Alice Martin', 1);

Analyze the dependencies:

quantity depends on the full key (order_id, product_name). A specific order for a specific product has a specific quantity. This is correct.
product_price depends only on product_name. The price of a Wireless Mouse is 25.99 regardless of which order it appears in. This is a partial dependency.
customer_name depends only on order_id. The customer is the same for all items in order 1, regardless of the product. This is also a partial dependency.

Primary Key: (order_id, product_name)

customer_name  → depends on order_id only         ← PARTIAL dependency
product_price  → depends on product_name only     ← PARTIAL dependency
quantity       → depends on (order_id, product_name) ← FULL dependency ✓

The Problem with Partial Dependencies

The Wireless Mouse price (25.99) is stored in every row where that product appears. If the price changes, you must update every row:

-- Must find and update every occurrence
UPDATE order_items_partial SET product_price = 27.99
WHERE product_name = 'Wireless Mouse';
-- Miss a row? Now the same product has two different prices.

Fix: Separate Tables for Each Entity

Move partially dependent columns into their own tables:

-- Customers (customer_name depends on customer_id)
CREATE TABLE customers (
    customer_id   INT PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL
);

-- Products (product_price depends on product_name/product_id)
CREATE TABLE products (
    product_id   INT PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    price        DECIMAL(10,2) NOT NULL
);

-- Orders (links to customer)
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT NOT NULL REFERENCES customers(customer_id),
    order_date  DATE NOT NULL
);

-- Order items (full key dependency only)
CREATE TABLE order_items (
    order_id   INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity   INT NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

Now each fact is stored exactly once:

The Wireless Mouse price lives in one row in products.
Alice's name lives in one row in customers.
The quantity of a specific product in a specific order lives in order_items.

tip

Notice that order_items stores unit_price in addition to the product's price in the products table. This is intentional: the unit_price records the price at the time of the order, which may differ from the current product price. This is a common and correct pattern, not a normalization violation, because the unit price genuinely depends on the full key (this specific product in this specific order).

Third Normal Form (3NF)

The third normal form eliminates transitive dependencies: situations where a non-key column depends on another non-key column rather than directly on the primary key.

The Rules

A table is in 3NF when:

It is already in 2NF.
Every non-key column depends directly on the primary key, not on another non-key column.

The classic shorthand: every non-key column must provide a fact about the key, the whole key, and nothing but the key.

Violation: Transitive Dependency

Consider this employees table:

CREATE TABLE employees_3nf_bad (
    employee_id     INT PRIMARY KEY,
    employee_name   VARCHAR(100),
    department_id   INT,
    department_name VARCHAR(100),
    department_head VARCHAR(100)
);

INSERT INTO employees_3nf_bad VALUES
(101, 'Alice', 1, 'Engineering', 'Sarah'),
(102, 'Bob',   1, 'Engineering', 'Sarah'),
(103, 'Carol', 2, 'Marketing',   'Tom'),
(104, 'Dave',  2, 'Marketing',   'Tom'),
(105, 'Eve',   1, 'Engineering', 'Sarah');

Analyze the dependencies:

employee_name depends on employee_id. Correct.
department_id depends on employee_id. Correct.
department_name depends on department_id, which is not the primary key. This is a transitive dependency: employee_id → department_id → department_name.
department_head also depends on department_id, not on employee_id. Another transitive dependency.

employee_id → department_id → department_name    ← TRANSITIVE
employee_id → department_id → department_head    ← TRANSITIVE

The Problem

"Engineering" and "Sarah" are repeated for every Engineering employee. If the department head changes from Sarah to Mike, you must update every Engineering employee row:

-- Three rows to update. Miss one and Engineering has two heads.
UPDATE employees_3nf_bad SET department_head = 'Mike'
WHERE department_id = 1;

Fix: Extract the Transitive Dependency

Move department information into its own table:

CREATE TABLE departments (
    department_id   INT PRIMARY KEY,
    department_name VARCHAR(100) NOT NULL,
    department_head VARCHAR(100)
);

CREATE TABLE employees (
    employee_id   INT PRIMARY KEY,
    employee_name VARCHAR(100) NOT NULL,
    department_id INT REFERENCES departments(department_id)
);

INSERT INTO departments VALUES (1, 'Engineering', 'Sarah'), (2, 'Marketing', 'Tom');

INSERT INTO employees VALUES
(101, 'Alice', 1),
(102, 'Bob',   1),
(103, 'Carol', 2),
(104, 'Dave',  2),
(105, 'Eve',   1);

Now the department head is stored once. Changing it requires a single update:

UPDATE departments SET department_head = 'Mike' WHERE department_id = 1;
-- One row. Done. Every employee query reflects the change automatically via JOIN.

TO verify:

SELECT e.employee_name, d.department_name, d.department_head
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

employee_name	department_name	department_head
Alice	Engineering	Mike
Bob	Engineering	Mike
Carol	Marketing	Tom
Dave	Marketing	Tom
Eve	Engineering	Mike

info

The progression of normal forms:

Normal Form	Eliminates	Rule
1NF	Non-atomic values, missing keys	Every cell holds one value; every row is unique
2NF	Partial dependencies	Non-key columns depend on the whole key
3NF	Transitive dependencies	Non-key columns depend only on the key

Most production databases aim for 3NF. Higher normal forms (BCNF, 4NF, 5NF) exist but are rarely needed in typical application development.

How to Spot Normalization Problems

When reviewing a table design, ask these diagnostic questions:

For 1NF:

Does any column contain comma-separated values, JSON arrays, or lists? If yes, break them into separate rows.

For 2NF:

Does the table have a composite primary key? If so, does every non-key column depend on ALL parts of the key, or just some? If partial, extract the partially dependent columns.

For 3NF:

Can any non-key column be determined by looking at another non-key column instead of the primary key? If yes, extract the transitive chain into its own table.

General red flag: repeated groups of values. If the same combination of values (like "Engineering" + "Sarah") appears in multiple rows, those values probably belong in their own table.

Denormalization: When and Why

Normalization reduces redundancy and improves data integrity, but it comes with a cost: more tables mean more JOIN operations, and joins can slow down read-heavy queries on very large datasets. Denormalization is the deliberate decision to reintroduce some redundancy for performance reasons.

When Denormalization Makes Sense

Scenario	Why Denormalize
Read-heavy reporting dashboards	Pre-joining data avoids expensive runtime JOINs
High-traffic product listings	Storing `category_name` alongside `category_id` eliminates a join on every page load
Historical records (invoices, receipts)	Snapshot data at the time of the event, not the current state
Data warehouses / analytics	Star and snowflake schemas intentionally denormalize for query speed
Caching layers	Materialized views store pre-computed, denormalized results

Example: Denormalized Order History

A fully normalized invoice requires joining orders, order_items, products, and customers. For a frequently accessed "order history" page, you might create a denormalized read table:

CREATE TABLE order_history (
    order_id       INT,
    order_date     DATE,
    customer_name  VARCHAR(100),
    customer_email VARCHAR(100),
    product_name   VARCHAR(100),
    quantity       INT,
    unit_price     DECIMAL(10,2),
    line_total     DECIMAL(10,2)
);

This table duplicates data that exists in normalized tables, but it serves a single page with zero joins and maximum speed.

The Trade-Off

Aspect	Normalized	Denormalized
Data integrity	High (single source of truth)	Lower (duplicates can diverge)
Write speed	Faster (update one place)	Slower (update multiple places)
Read speed	Slower (joins required)	Faster (pre-joined data)
Storage	Less	More
Complexity of writes	Simple	Complex (must keep copies in sync)

warning

Denormalize only after you have a proven performance problem. Start with a properly normalized schema. Measure query performance. If specific queries are too slow and cannot be improved with indexes or query optimization, then selectively denormalize those specific access patterns.

Premature denormalization is a common mistake that trades long-term maintainability for performance gains you may never actually need.

Safe Denormalization Strategies

Instead of denormalizing your core tables, use these approaches that keep the source of truth normalized:

Materialized Views (PostgreSQL, Oracle, SQL Server):

CREATE MATERIALIZED VIEW order_summary AS
SELECT
    o.order_id, o.order_date,
    c.customer_name, c.email,
    p.product_name, oi.quantity, oi.unit_price
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;

-- Refresh periodically
REFRESH MATERIALIZED VIEW order_summary;

The normalized tables remain the source of truth. The materialized view is a read-optimized copy that can be refreshed on a schedule.

Computed/Cache Columns:

-- Store a computed total on the orders table for quick access
ALTER TABLE orders ADD COLUMN item_count INT;
ALTER TABLE orders ADD COLUMN computed_total DECIMAL(10,2);

-- Update via trigger or application logic when order_items change

The core normalized structure remains intact, and the cached columns are explicitly understood as derived data.

Designing the ShopSmart Schema from Scratch

Let's apply everything we have learned by designing a complete e-commerce database for a fictional company called ShopSmart. We will start from raw requirements and build a fully normalized schema step by step.

Step 1: Gather Requirements

ShopSmart needs to track:

Customers: name, email, phone, shipping address
Products: name, description, price, organized by category
Categories: hierarchical (e.g., Electronics > Computers > Laptops)
Orders: placed by a customer, with a date and status
Order Items: which products are in each order, quantity, price at time of purchase

Step 2: Identify Entities

From the requirements, the core entities are:

Customers
Categories
Products
Orders
Order Items (junction between Orders and Products)

Each entity becomes its own table. This is 3NF thinking from the start: each table represents one "thing," and every column describes that thing directly.

Step 3: Design Each Table

::: Different SQL dialects use different syntax for auto-incrementing primary keys.

For example:

MySQL → INT PRIMARY KEY AUTO_INCREMENT
PostgreSQL → SERIAL PRIMARY KEY or GENERATED ALWAYS AS IDENTITY
SQL Server → INT PRIMARY KEY IDENTITY(1,1)
SQLite → INTEGER PRIMARY KEY AUTOINCREMENT

Always adjust the syntax to match the database system you are using.

The example below is for SQLite. :::

Customers:

CREATE TABLE customers (
    customer_id   INTEGER PRIMARY KEY AUTOINCREMENT,
    first_name    VARCHAR(50)  NOT NULL,
    last_name     VARCHAR(50)  NOT NULL,
    email         VARCHAR(254) NOT NULL UNIQUE,
    phone         VARCHAR(20),
    street        VARCHAR(100),
    city          VARCHAR(50),
    state         VARCHAR(50),
    postal_code   VARCHAR(20),
    country       CHAR(2)      NOT NULL DEFAULT 'US',
    created_at    TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Every column depends on customer_id and nothing else. No transitive dependencies. The address fields (street, city, state, postal_code, country) all describe where this customer lives. If a customer could have multiple addresses, we would extract those into a separate addresses table, but for simplicity, one address per customer suffices here.

Categories (self-referencing for hierarchy):

CREATE TABLE categories (
    category_id   INTEGER PRIMARY KEY,
    category_name VARCHAR(100) NOT NULL,
    parent_id     INTEGER REFERENCES categories(category_id),
    CONSTRAINT categories_name_parent_uq UNIQUE (category_name, parent_id)
);

The self-referencing parent_id creates a tree structure. A top-level category like "Electronics" has parent_id = NULL. A subcategory like "Computers" has parent_id pointing to "Electronics."

INSERT INTO categories (category_name, parent_id) VALUES ('Electronics', NULL);  -- id=1
INSERT INTO categories (category_name, parent_id) VALUES ('Stationery', NULL);   -- id=2
INSERT INTO categories (category_name, parent_id) VALUES ('Computers', 1);       -- id=3
INSERT INTO categories (category_name, parent_id) VALUES ('Accessories', 1);     -- id=4
INSERT INTO categories (category_name, parent_id) VALUES ('Laptops', 3);         -- id=5

Electronics (1)
├── Computers (3)
│   └── Laptops (5)
└── Accessories (4)
Stationery (2)

Products:

CREATE TABLE products (
    product_id    INTEGER PRIMARY KEY AUTOINCREMENT,
    product_name  VARCHAR(200) NOT NULL,
    description   TEXT,
    price         DECIMAL(10,2) NOT NULL,
    category_id   INTEGER NOT NULL REFERENCES categories(category_id),
    is_active     BOOLEAN NOT NULL DEFAULT TRUE,
    created_at    TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT products_price_ck CHECK (price >= 0),
    CONSTRAINT products_name_uq UNIQUE (product_name)
);

Every column depends directly on product_id. The category_id links to the categories table via foreign key rather than storing the category name here (which would be a transitive dependency).

Orders:

CREATE TABLE orders (
    order_id      INTEGER PRIMARY KEY AUTOINCREMENT,
    customer_id   INTEGER NOT NULL REFERENCES customers(customer_id),
    order_date    TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status        VARCHAR(20) NOT NULL DEFAULT 'pending',
    shipping_street  VARCHAR(100),
    shipping_city    VARCHAR(50),
    shipping_state   VARCHAR(50),
    shipping_postal  VARCHAR(20),
    shipping_country CHAR(2) DEFAULT 'US',
    notes         TEXT,
    CONSTRAINT orders_status_ck
        CHECK (status IN ('pending','confirmed','processing','shipped','delivered','cancelled'))
);

tip

Notice that the shipping address is stored directly on the order, even though the customer already has an address. This is intentional denormalization for a valid reason: the shipping address is a historical snapshot. If the customer moves later, existing orders should still show where they were actually shipped, not the customer's new address.

Order Items (junction table):

CREATE TABLE order_items (
    order_item_id  INTEGER PRIMARY KEY AUTOINCREMENT,
    order_id       INTEGER NOT NULL REFERENCES orders(order_id) ON DELETE CASCADE,
    product_id     INTEGER NOT NULL REFERENCES products(product_id),
    quantity       INTEGER NOT NULL,
    unit_price     DECIMAL(10,2) NOT NULL,
    CONSTRAINT order_items_qty_ck CHECK (quantity > 0),
    CONSTRAINT order_items_price_ck CHECK (unit_price >= 0),
    CONSTRAINT order_items_order_product_uq UNIQUE (order_id, product_id)
);

The unit_price captures the price at the time of purchase, which may differ from the product's current price. The unique constraint on (order_id, product_id) prevents the same product from appearing twice in one order.

Step 4: Verify Normalization

Let's check each table against our normal forms:

1NF check: Every column is atomic. No comma-separated values. Every table has a primary key.

2NF check: No composite primary keys are used as the main key (we use surrogate IDENTITY keys). The unique constraint (order_id, product_id) on order_items is not the primary key, so partial dependency is not a concern. All non-key columns depend on the full primary key.

3NF check: Let's trace each table:

customers: Every column (name, email, address) depends directly on customer_id. No column determines another non-key column.
categories: category_name and parent_id depend directly on category_id.
products: product_name, price, description depend on product_id. category_id is a foreign key, not a transitive dependency. The category's name is not stored here.
orders: All columns depend on order_id. The shipping address is a snapshot, not a transitive dependency through customer_id.
order_items: quantity and unit_price depend on order_item_id. The product name is not stored here (it is accessed via JOIN).

All tables pass all three normal forms.

Step 5: Add Sample Data

INSERT INTO customers (first_name, last_name, email, city, country)
VALUES
    ('Alice', 'Martin', 'alice@example.com', 'New York', 'US'),
    ('Bob',   'Jones',  'bob@example.com',   'Chicago',  'US'),
    ('Carol', 'Smith',  'carol@example.com', 'London',   'GB');

INSERT INTO products (product_name, description, price, category_id) VALUES
    ('Wireless Mouse', 'Ergonomic wireless mouse', 25.99, 4),
    ('USB-C Cable',    '2m braided cable',          9.99, 4),
    ('Laptop Pro 15',  '15-inch professional laptop', 1299.99, 5),
    ('Notebook A5',    'Lined notebook, 200 pages',   4.50, 2);

INSERT INTO orders (customer_id, status, shipping_city, shipping_country) VALUES
    (1, 'confirmed', 'New York', 'US'),
    (2, 'pending',   'Chicago',  'US');

INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES
    (1, 1, 2, 25.99),
    (1, 2, 1,  9.99),
    (2, 3, 1, 1299.99);

Step 6: Query the Normalized Schema

Full invoice query (four-table join):

SELECT
    o.order_id,
    o.order_date,
    c.first_name || ' ' || c.last_name AS customer,
    p.product_name,
    oi.quantity,
    oi.unit_price,
    (oi.quantity * oi.unit_price) AS line_total
FROM orders o
JOIN customers c    ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id    = oi.order_id
JOIN products p     ON oi.product_id = p.product_id
ORDER BY o.order_id, p.product_name;

order_id	order_date	customer	product_name	quantity	unit_price	line_total
1	2024-06-15 10:00:00	Alice Martin	USB-C Cable	1	9.99	9.99
1	2024-06-15 10:00:00	Alice Martin	Wireless Mouse	2	25.99	51.98
2	2024-06-15 10:00:00	Bob Jones	Laptop Pro 15	1	1299.99	1299.99

Products by category (self-join for hierarchy):

SELECT
    parent.category_name AS parent_category,
    child.category_name  AS subcategory,
    p.product_name,
    p.price
FROM products p
JOIN categories child  ON p.category_id = child.category_id
LEFT JOIN categories parent ON child.parent_id = parent.category_id
ORDER BY parent.category_name, child.category_name;

parent_category	subcategory	product_name	price
Computers	Laptops	Laptop Pro 15	1299.99
Electronics	Accessories	Wireless Mouse	25.99
Electronics	Accessories	USB-C Cable	9.99
NULL	Stationery	Notebook A5	4.50

The Complete Schema Diagram

┌──────────────┐       ┌──────────────┐
│  categories  │       │  customers   │
├──────────────┤       ├──────────────┤
│ category_id  │◄─┐    │ customer_id  │◄──────┐
│ category_name│  │    │ first_name   │       │
│ parent_id    │──┘    │ last_name    │       │
└──────┬───────┘       │ email        │       │
       │               │ city, ...    │       │
       │               └──────────────┘       │
       │                                      │
┌──────┴───────┐       ┌──────────────┐       │
│  products    │       │   orders     │       │
├──────────────┤       ├──────────────┤       │
│ product_id   │◄──┐   │ order_id     │◄──┐   │
│ product_name │   │   │ customer_id  │───┘   │
│ price        │   │   │ order_date   │       │
│ category_id  │───┘   │ status       │       │
│ is_active    │       │ shipping_*   │       │
└──────────────┘       └──────┬───────┘       │
                              │               │
                       ┌──────┴───────┐       │
                       │ order_items  │       │
                       ├──────────────┤       │
                       │ order_item_id│       │
                       │ order_id     │───────┘
                       │ product_id   │
                       │ quantity     │
                       │ unit_price   │
                       └──────────────┘

Common Normalization Mistakes

Mistake 1: Storing Derived Data That Can Be Calculated

-- Bad: storing line_total when it equals quantity * unit_price
CREATE TABLE order_items (
    order_id   INTEGER,
    product_id INTEGER,
    quantity   INTEGER,
    unit_price DECIMAL(10,2),
    line_total DECIMAL(10,2)    -- redundant!
);

If quantity or unit_price is updated but line_total is not, the data becomes inconsistent. Compute it in your query instead:

SELECT quantity * unit_price AS line_total FROM order_items;

info

Storing computed totals on the order header (like orders.total_amount) is a common exception, especially for performance on frequently accessed summaries. If you do this, update it via triggers or application logic whenever order items change.

Mistake 2: Repeating Groups as Numbered Columns

-- Bad: what if a customer has 4 phone numbers?
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    name        VARCHAR(100),
    phone1      VARCHAR(20),
    phone2      VARCHAR(20),
    phone3      VARCHAR(20)
);

This violates the spirit of 1NF. Create a separate table:

CREATE TABLE customer_phones (
    phone_id    INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    phone_type  VARCHAR(20),
    phone_number VARCHAR(20) NOT NULL
);

Mistake 3: Over-Normalizing

Normalization can be taken too far. Splitting a customers table into customer_names, customer_emails, and customer_cities tables creates unnecessary joins with no benefit, because all of those attributes depend directly on customer_id and do not cause update anomalies.

Over-normalized (unnecessary):

CREATE TABLE customer_names (customer_id INTEGER PK, first_name VARCHAR, last_name VARCHAR);
CREATE TABLE customer_emails (customer_id INTEGER PK, email VARCHAR);
CREATE TABLE customer_cities (customer_id INTEGER PK, city VARCHAR);

Properly normalized (correct):

CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    first_name  VARCHAR(50),
    last_name   VARCHAR(50),
    email       VARCHAR(254),
    city        VARCHAR(50)
);

All columns depend on customer_id with no partial or transitive dependencies. Splitting them gains nothing.

Quick Reference

Normal Form	Problem It Solves	How to Fix
1NF	Multi-valued columns, missing keys	One value per cell; add a primary key
2NF	Partial dependency on composite key	Extract partially dependent columns into new tables
3NF	Transitive dependency (non-key depends on non-key)	Extract transitively dependent columns into new tables

When to denormalize:

Proven read performance bottleneck that indexes cannot solve
Historical snapshots (prices at time of purchase, shipping addresses)
Reporting tables or materialized views separate from the transactional schema
Data warehouses designed for analytics, not transactions

Summary

Database normalization is the process of structuring tables to eliminate redundancy, prevent update anomalies, and ensure every fact is stored in exactly one place. The three normal forms build progressively:

1NF requires atomic values and unique rows. No comma-separated lists, no multi-valued columns.
2NF eliminates partial dependencies by ensuring every non-key column depends on the entire primary key, not just part of it.
3NF eliminates transitive dependencies by ensuring every non-key column depends directly on the primary key, not through another non-key column.

Most production databases should target 3NF for transactional tables. Denormalization should be a conscious, measured decision made only when you have a proven performance need, and preferably implemented through materialized views or cache tables rather than corrupting your core normalized schema.

The ShopSmart walkthrough demonstrated how normalization works in practice: identify entities, give each its own table, connect them with foreign keys, and verify that every column describes its table's primary key directly. The result is a schema where each fact lives in one place, updates are simple, and the database protects its own integrity.

Why Normalization Matters​

First Normal Form (1NF)​

The Rules​

Violation: Non-Atomic Values​

Fix: Make Every Value Atomic​

Second Normal Form (2NF)​

The Rules​

Violation: Partial Dependency​

The Problem with Partial Dependencies​

Fix: Separate Tables for Each Entity​

Third Normal Form (3NF)​

The Rules​

Violation: Transitive Dependency​

The Problem​

Fix: Extract the Transitive Dependency​

How to Spot Normalization Problems​

Denormalization: When and Why​

When Denormalization Makes Sense​

Example: Denormalized Order History​

The Trade-Off​

Safe Denormalization Strategies​

Designing the ShopSmart Schema from Scratch​

Step 1: Gather Requirements​

Step 2: Identify Entities​

Step 3: Design Each Table​

Step 4: Verify Normalization​

Step 5: Add Sample Data​

Step 6: Query the Normalized Schema​

The Complete Schema Diagram​

Common Normalization Mistakes​

Mistake 1: Storing Derived Data That Can Be Calculated​

Mistake 2: Repeating Groups as Numbered Columns​

Mistake 3: Over-Normalizing​

Quick Reference​

Summary​

Table of Contents

Why Normalization Matters

First Normal Form (1NF)

The Rules

Violation: Non-Atomic Values

Fix: Make Every Value Atomic

Second Normal Form (2NF)

The Rules

Violation: Partial Dependency

The Problem with Partial Dependencies

Fix: Separate Tables for Each Entity

Third Normal Form (3NF)

The Rules

Violation: Transitive Dependency

The Problem

Fix: Extract the Transitive Dependency

How to Spot Normalization Problems

Denormalization: When and Why

When Denormalization Makes Sense

Example: Denormalized Order History

The Trade-Off

Safe Denormalization Strategies

Designing the ShopSmart Schema from Scratch

Step 1: Gather Requirements

Step 2: Identify Entities

Step 3: Design Each Table

Step 4: Verify Normalization

Step 5: Add Sample Data

Step 6: Query the Normalized Schema

The Complete Schema Diagram

Common Normalization Mistakes

Mistake 1: Storing Derived Data That Can Be Calculated

Mistake 2: Repeating Groups as Numbered Columns

Mistake 3: Over-Normalizing

Quick Reference

Summary