Skip to main content

Database Design Principles and Normalization (1NF, 2NF, 3NF)

A database that works today can become a nightmare tomorrow if its tables are poorly structured. Duplicate data creeps in, updates require touching dozens of rows, a single typo corrupts records across the system, and queries that should be simple turn into convoluted workarounds. These problems do not come from bad SQL. They come from bad design.

Database normalization is the systematic process of organizing tables and columns to reduce redundancy, eliminate inconsistencies, and make your data reliable by design. It is built on a set of progressive rules called "normal forms," each one building on the previous to tighten data integrity further. Understanding normalization is essential for every developer who creates or modifies database schemas, because the decisions you make at the design stage affect every query, every update, and every bug report for the lifetime of the application.

This guide explains the first three normal forms with concrete before-and-after examples, discusses when breaking the rules (denormalization) is the right call, and walks through designing a complete e-commerce schema from scratch so you can see normalization applied to a real project.

Why Normalization Matters

Consider this single table that stores order data for a small online shop:

CREATE TABLE shop_data (
order_id INT,
order_date DATE,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
customer_city VARCHAR(50),
product_name VARCHAR(100),
product_price DECIMAL(10,2),
quantity INT
);

INSERT INTO shop_data VALUES
(1, '2024-06-01', 'Alice Martin', 'alice@example.com', 'New York', 'Wireless Mouse', 25.99, 2),
(1, '2024-06-01', 'Alice Martin', 'alice@example.com', 'New York', 'USB-C Cable', 9.99, 1),
(2, '2024-06-03', 'Bob Jones', 'bob@example.com', 'Chicago', 'Wireless Mouse', 25.99, 1),
(3, '2024-06-05', 'Alice Martin', 'alice@example.com', 'New York', 'Desk Lamp', 34.50, 1);

This table "works." You can query it and get results. But it has serious problems:

  • Redundancy: Alice's name, email, and city are stored three times. The Wireless Mouse name and price are stored twice. Every repeated value wastes storage and creates opportunities for inconsistency.

  • Update anomalies: If Alice moves to Boston, you must update every row where she appears. Miss one, and your database says Alice lives in both New York and Boston simultaneously.

    -- Must update ALL rows for Alice. Miss one and data is inconsistent.
    UPDATE shop_data SET customer_city = 'Boston'
    WHERE customer_email = 'alice@example.com';
  • Insert anomalies: You cannot add a new product to the system until someone orders it, because the product data is tied to order rows.

  • Delete anomalies: If you delete Bob's only order, you lose the fact that Bob exists as a customer entirely.

Normalization solves all of these problems by splitting the data into properly structured, related tables.

First Normal Form (1NF)

The first normal form establishes the most basic structural rules for a relational table.

The Rules

A table is in 1NF when:

  1. Every column contains atomic (indivisible) values. No lists, no sets, no comma-separated strings.
  2. Every row is unique (the table has a primary key).
  3. Each column contains values of a single type.

Violation: Non-Atomic Values

Here is a table that violates 1NF by cramming multiple values into single columns:

CREATE TABLE orders_bad (
order_id INT,
customer VARCHAR(200),
products VARCHAR(500),
quantities VARCHAR(100)
);

INSERT INTO orders_bad VALUES
(1, 'Alice Martin', 'Wireless Mouse, USB-C Cable', '2, 1'),
(2, 'Bob Jones', 'Wireless Mouse', '1'),
(3, 'Alice Martin', 'Desk Lamp, Pen Pack, Notebook', '1, 3, 2');

The products and quantities columns contain comma-separated lists. This creates immediate problems:

-- How many Wireless Mouse units were sold total? Impossible with simple SQL.
-- Which orders contain "USB-C Cable"? Requires string parsing.
SELECT * FROM orders_bad WHERE products LIKE '%USB-C Cable%';
-- Fragile: matches "USB-C Cable Adapter" too

You cannot reliably query, sort, filter, or aggregate individual products because they are buried inside strings.

Fix: Make Every Value Atomic

Split each value into its own row:

CREATE TABLE order_items_1nf (
order_id INT,
customer_name VARCHAR(100),
product_name VARCHAR(100),
quantity INT,
PRIMARY KEY (order_id, product_name)
);

INSERT INTO order_items_1nf VALUES
(1, 'Alice Martin', 'Wireless Mouse', 2),
(1, 'Alice Martin', 'USB-C Cable', 1),
(2, 'Bob Jones', 'Wireless Mouse', 1),
(3, 'Alice Martin', 'Desk Lamp', 1),
(3, 'Alice Martin', 'Pen Pack', 3),
(3, 'Alice Martin', 'Notebook', 2);

Now every cell contains exactly one value:

order_idcustomer_nameproduct_namequantity
1Alice MartinWireless Mouse2
1Alice MartinUSB-C Cable1
2Bob JonesWireless Mouse1
3Alice MartinDesk Lamp1
3Alice MartinPen Pack3
3Alice MartinNotebook2

Queries become straightforward:

-- Total Wireless Mouse units sold
SELECT SUM(quantity) FROM order_items_1nf WHERE product_name = 'Wireless Mouse';
-- Result: 3
info

1NF is the absolute minimum for a properly functioning relational table. If your table violates 1NF, most SQL operations become unreliable or impossible. No amount of clever querying can compensate for a structural flaw at this level.

The table is now in 1NF, but it still has redundancy. Alice's name appears three times. We need the next normal form.

Second Normal Form (2NF)

The second normal form eliminates partial dependencies: situations where a non-key column depends on only part of a composite primary key instead of the whole key.

The Rules

A table is in 2NF when:

  1. It is already in 1NF.
  2. Every non-key column depends on the entire primary key, not just part of it.

2NF only applies to tables with composite primary keys (keys made of two or more columns). If a table has a single-column primary key, it is automatically in 2NF once it satisfies 1NF.

Violation: Partial Dependency

Look at our order_items_1nf table. The composite primary key is (order_id, product_name). Let's add a product price column:

CREATE TABLE order_items_partial (
order_id INT,
product_name VARCHAR(100),
product_price DECIMAL(10,2),
customer_name VARCHAR(100),
quantity INT,
PRIMARY KEY (order_id, product_name)
);

INSERT INTO order_items_partial VALUES
(1, 'Wireless Mouse', 25.99, 'Alice Martin', 2),
(1, 'USB-C Cable', 9.99, 'Alice Martin', 1),
(2, 'Wireless Mouse', 25.99, 'Bob Jones', 1),
(3, 'Desk Lamp', 34.50, 'Alice Martin', 1);

Analyze the dependencies:

  • quantity depends on the full key (order_id, product_name). A specific order for a specific product has a specific quantity. This is correct.
  • product_price depends only on product_name. The price of a Wireless Mouse is 25.99 regardless of which order it appears in. This is a partial dependency.
  • customer_name depends only on order_id. The customer is the same for all items in order 1, regardless of the product. This is also a partial dependency.
Primary Key: (order_id, product_name)

customer_name → depends on order_id only ← PARTIAL dependency
product_price → depends on product_name only ← PARTIAL dependency
quantity → depends on (order_id, product_name) ← FULL dependency ✓

The Problem with Partial Dependencies

The Wireless Mouse price (25.99) is stored in every row where that product appears. If the price changes, you must update every row:

-- Must find and update every occurrence
UPDATE order_items_partial SET product_price = 27.99
WHERE product_name = 'Wireless Mouse';
-- Miss a row? Now the same product has two different prices.

Fix: Separate Tables for Each Entity

Move partially dependent columns into their own tables:

-- Customers (customer_name depends on customer_id)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL
);

-- Products (product_price depends on product_name/product_id)
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100) NOT NULL,
price DECIMAL(10,2) NOT NULL
);

-- Orders (links to customer)
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT NOT NULL REFERENCES customers(customer_id),
order_date DATE NOT NULL
);

-- Order items (full key dependency only)
CREATE TABLE order_items (
order_id INT REFERENCES orders(order_id),
product_id INT REFERENCES products(product_id),
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
PRIMARY KEY (order_id, product_id)
);

Now each fact is stored exactly once:

  • The Wireless Mouse price lives in one row in products.
  • Alice's name lives in one row in customers.
  • The quantity of a specific product in a specific order lives in order_items.
tip

Notice that order_items stores unit_price in addition to the product's price in the products table. This is intentional: the unit_price records the price at the time of the order, which may differ from the current product price. This is a common and correct pattern, not a normalization violation, because the unit price genuinely depends on the full key (this specific product in this specific order).

Third Normal Form (3NF)

The third normal form eliminates transitive dependencies: situations where a non-key column depends on another non-key column rather than directly on the primary key.

The Rules

A table is in 3NF when:

  1. It is already in 2NF.
  2. Every non-key column depends directly on the primary key, not on another non-key column.

The classic shorthand: every non-key column must provide a fact about the key, the whole key, and nothing but the key.

Violation: Transitive Dependency

Consider this employees table:

CREATE TABLE employees_3nf_bad (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
department_name VARCHAR(100),
department_head VARCHAR(100)
);

INSERT INTO employees_3nf_bad VALUES
(101, 'Alice', 1, 'Engineering', 'Sarah'),
(102, 'Bob', 1, 'Engineering', 'Sarah'),
(103, 'Carol', 2, 'Marketing', 'Tom'),
(104, 'Dave', 2, 'Marketing', 'Tom'),
(105, 'Eve', 1, 'Engineering', 'Sarah');

Analyze the dependencies:

  • employee_name depends on employee_id. Correct.
  • department_id depends on employee_id. Correct.
  • department_name depends on department_id, which is not the primary key. This is a transitive dependency: employee_id → department_id → department_name.
  • department_head also depends on department_id, not on employee_id. Another transitive dependency.
employee_id → department_id → department_name    ← TRANSITIVE
employee_id → department_id → department_head ← TRANSITIVE

The Problem

"Engineering" and "Sarah" are repeated for every Engineering employee. If the department head changes from Sarah to Mike, you must update every Engineering employee row:

-- Three rows to update. Miss one and Engineering has two heads.
UPDATE employees_3nf_bad SET department_head = 'Mike'
WHERE department_id = 1;

Fix: Extract the Transitive Dependency

Move department information into its own table:

CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100) NOT NULL,
department_head VARCHAR(100)
);

CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100) NOT NULL,
department_id INT REFERENCES departments(department_id)
);

INSERT INTO departments VALUES (1, 'Engineering', 'Sarah'), (2, 'Marketing', 'Tom');

INSERT INTO employees VALUES
(101, 'Alice', 1),
(102, 'Bob', 1),
(103, 'Carol', 2),
(104, 'Dave', 2),
(105, 'Eve', 1);

Now the department head is stored once. Changing it requires a single update:

UPDATE departments SET department_head = 'Mike' WHERE department_id = 1;
-- One row. Done. Every employee query reflects the change automatically via JOIN.

TO verify:

SELECT e.employee_name, d.department_name, d.department_head
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
employee_namedepartment_namedepartment_head
AliceEngineeringMike
BobEngineeringMike
CarolMarketingTom
DaveMarketingTom
EveEngineeringMike
info

The progression of normal forms:

Normal FormEliminatesRule
1NFNon-atomic values, missing keysEvery cell holds one value; every row is unique
2NFPartial dependenciesNon-key columns depend on the whole key
3NFTransitive dependenciesNon-key columns depend only on the key

Most production databases aim for 3NF. Higher normal forms (BCNF, 4NF, 5NF) exist but are rarely needed in typical application development.

How to Spot Normalization Problems

When reviewing a table design, ask these diagnostic questions:

For 1NF:

  • Does any column contain comma-separated values, JSON arrays, or lists? If yes, break them into separate rows.

For 2NF:

  • Does the table have a composite primary key? If so, does every non-key column depend on ALL parts of the key, or just some? If partial, extract the partially dependent columns.

For 3NF:

  • Can any non-key column be determined by looking at another non-key column instead of the primary key? If yes, extract the transitive chain into its own table.

General red flag: repeated groups of values. If the same combination of values (like "Engineering" + "Sarah") appears in multiple rows, those values probably belong in their own table.

Denormalization: When and Why

Normalization reduces redundancy and improves data integrity, but it comes with a cost: more tables mean more JOIN operations, and joins can slow down read-heavy queries on very large datasets. Denormalization is the deliberate decision to reintroduce some redundancy for performance reasons.

When Denormalization Makes Sense

ScenarioWhy Denormalize
Read-heavy reporting dashboardsPre-joining data avoids expensive runtime JOINs
High-traffic product listingsStoring category_name alongside category_id eliminates a join on every page load
Historical records (invoices, receipts)Snapshot data at the time of the event, not the current state
Data warehouses / analyticsStar and snowflake schemas intentionally denormalize for query speed
Caching layersMaterialized views store pre-computed, denormalized results

Example: Denormalized Order History

A fully normalized invoice requires joining orders, order_items, products, and customers. For a frequently accessed "order history" page, you might create a denormalized read table:

CREATE TABLE order_history (
order_id INT,
order_date DATE,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
product_name VARCHAR(100),
quantity INT,
unit_price DECIMAL(10,2),
line_total DECIMAL(10,2)
);

This table duplicates data that exists in normalized tables, but it serves a single page with zero joins and maximum speed.

The Trade-Off

AspectNormalizedDenormalized
Data integrityHigh (single source of truth)Lower (duplicates can diverge)
Write speedFaster (update one place)Slower (update multiple places)
Read speedSlower (joins required)Faster (pre-joined data)
StorageLessMore
Complexity of writesSimpleComplex (must keep copies in sync)
warning

Denormalize only after you have a proven performance problem. Start with a properly normalized schema. Measure query performance. If specific queries are too slow and cannot be improved with indexes or query optimization, then selectively denormalize those specific access patterns.

Premature denormalization is a common mistake that trades long-term maintainability for performance gains you may never actually need.

Safe Denormalization Strategies

Instead of denormalizing your core tables, use these approaches that keep the source of truth normalized:

Materialized Views (PostgreSQL, Oracle, SQL Server):

CREATE MATERIALIZED VIEW order_summary AS
SELECT
o.order_id, o.order_date,
c.customer_name, c.email,
p.product_name, oi.quantity, oi.unit_price
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;

-- Refresh periodically
REFRESH MATERIALIZED VIEW order_summary;

The normalized tables remain the source of truth. The materialized view is a read-optimized copy that can be refreshed on a schedule.

Computed/Cache Columns:

-- Store a computed total on the orders table for quick access
ALTER TABLE orders ADD COLUMN item_count INT;
ALTER TABLE orders ADD COLUMN computed_total DECIMAL(10,2);

-- Update via trigger or application logic when order_items change

The core normalized structure remains intact, and the cached columns are explicitly understood as derived data.

Designing the ShopSmart Schema from Scratch

Let's apply everything we have learned by designing a complete e-commerce database for a fictional company called ShopSmart. We will start from raw requirements and build a fully normalized schema step by step.

Step 1: Gather Requirements

ShopSmart needs to track:

  • Customers: name, email, phone, shipping address
  • Products: name, description, price, organized by category
  • Categories: hierarchical (e.g., Electronics > Computers > Laptops)
  • Orders: placed by a customer, with a date and status
  • Order Items: which products are in each order, quantity, price at time of purchase

Step 2: Identify Entities

From the requirements, the core entities are:

  1. Customers
  2. Categories
  3. Products
  4. Orders
  5. Order Items (junction between Orders and Products)

Each entity becomes its own table. This is 3NF thinking from the start: each table represents one "thing," and every column describes that thing directly.

Step 3: Design Each Table

::: Different SQL dialects use different syntax for auto-incrementing primary keys.

For example:

  • MySQL → INT PRIMARY KEY AUTO_INCREMENT
  • PostgreSQL → SERIAL PRIMARY KEY or GENERATED ALWAYS AS IDENTITY
  • SQL Server → INT PRIMARY KEY IDENTITY(1,1)
  • SQLite → INTEGER PRIMARY KEY AUTOINCREMENT

Always adjust the syntax to match the database system you are using.

The example below is for SQLite. :::

Customers:

CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY AUTOINCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(254) NOT NULL UNIQUE,
phone VARCHAR(20),
street VARCHAR(100),
city VARCHAR(50),
state VARCHAR(50),
postal_code VARCHAR(20),
country CHAR(2) NOT NULL DEFAULT 'US',
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Every column depends on customer_id and nothing else. No transitive dependencies. The address fields (street, city, state, postal_code, country) all describe where this customer lives. If a customer could have multiple addresses, we would extract those into a separate addresses table, but for simplicity, one address per customer suffices here.

Categories (self-referencing for hierarchy):

CREATE TABLE categories (
category_id INTEGER PRIMARY KEY,
category_name VARCHAR(100) NOT NULL,
parent_id INTEGER REFERENCES categories(category_id),
CONSTRAINT categories_name_parent_uq UNIQUE (category_name, parent_id)
);

The self-referencing parent_id creates a tree structure. A top-level category like "Electronics" has parent_id = NULL. A subcategory like "Computers" has parent_id pointing to "Electronics."

INSERT INTO categories (category_name, parent_id) VALUES ('Electronics', NULL);  -- id=1
INSERT INTO categories (category_name, parent_id) VALUES ('Stationery', NULL); -- id=2
INSERT INTO categories (category_name, parent_id) VALUES ('Computers', 1); -- id=3
INSERT INTO categories (category_name, parent_id) VALUES ('Accessories', 1); -- id=4
INSERT INTO categories (category_name, parent_id) VALUES ('Laptops', 3); -- id=5
Electronics (1)
├── Computers (3)
│ └── Laptops (5)
└── Accessories (4)
Stationery (2)

Products:

CREATE TABLE products (
product_id INTEGER PRIMARY KEY AUTOINCREMENT,
product_name VARCHAR(200) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL,
category_id INTEGER NOT NULL REFERENCES categories(category_id),
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT products_price_ck CHECK (price >= 0),
CONSTRAINT products_name_uq UNIQUE (product_name)
);

Every column depends directly on product_id. The category_id links to the categories table via foreign key rather than storing the category name here (which would be a transitive dependency).

Orders:

CREATE TABLE orders (
order_id INTEGER PRIMARY KEY AUTOINCREMENT,
customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
order_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
shipping_street VARCHAR(100),
shipping_city VARCHAR(50),
shipping_state VARCHAR(50),
shipping_postal VARCHAR(20),
shipping_country CHAR(2) DEFAULT 'US',
notes TEXT,
CONSTRAINT orders_status_ck
CHECK (status IN ('pending','confirmed','processing','shipped','delivered','cancelled'))
);
tip

Notice that the shipping address is stored directly on the order, even though the customer already has an address. This is intentional denormalization for a valid reason: the shipping address is a historical snapshot. If the customer moves later, existing orders should still show where they were actually shipped, not the customer's new address.

Order Items (junction table):

CREATE TABLE order_items (
order_item_id INTEGER PRIMARY KEY AUTOINCREMENT,
order_id INTEGER NOT NULL REFERENCES orders(order_id) ON DELETE CASCADE,
product_id INTEGER NOT NULL REFERENCES products(product_id),
quantity INTEGER NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
CONSTRAINT order_items_qty_ck CHECK (quantity > 0),
CONSTRAINT order_items_price_ck CHECK (unit_price >= 0),
CONSTRAINT order_items_order_product_uq UNIQUE (order_id, product_id)
);

The unit_price captures the price at the time of purchase, which may differ from the product's current price. The unique constraint on (order_id, product_id) prevents the same product from appearing twice in one order.

Step 4: Verify Normalization

Let's check each table against our normal forms:

1NF check: Every column is atomic. No comma-separated values. Every table has a primary key.

2NF check: No composite primary keys are used as the main key (we use surrogate IDENTITY keys). The unique constraint (order_id, product_id) on order_items is not the primary key, so partial dependency is not a concern. All non-key columns depend on the full primary key.

3NF check: Let's trace each table:

  • customers: Every column (name, email, address) depends directly on customer_id. No column determines another non-key column.
  • categories: category_name and parent_id depend directly on category_id.
  • products: product_name, price, description depend on product_id. category_id is a foreign key, not a transitive dependency. The category's name is not stored here.
  • orders: All columns depend on order_id. The shipping address is a snapshot, not a transitive dependency through customer_id.
  • order_items: quantity and unit_price depend on order_item_id. The product name is not stored here (it is accessed via JOIN).

All tables pass all three normal forms.

Step 5: Add Sample Data

INSERT INTO customers (first_name, last_name, email, city, country)
VALUES
('Alice', 'Martin', 'alice@example.com', 'New York', 'US'),
('Bob', 'Jones', 'bob@example.com', 'Chicago', 'US'),
('Carol', 'Smith', 'carol@example.com', 'London', 'GB');

INSERT INTO products (product_name, description, price, category_id) VALUES
('Wireless Mouse', 'Ergonomic wireless mouse', 25.99, 4),
('USB-C Cable', '2m braided cable', 9.99, 4),
('Laptop Pro 15', '15-inch professional laptop', 1299.99, 5),
('Notebook A5', 'Lined notebook, 200 pages', 4.50, 2);

INSERT INTO orders (customer_id, status, shipping_city, shipping_country) VALUES
(1, 'confirmed', 'New York', 'US'),
(2, 'pending', 'Chicago', 'US');

INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES
(1, 1, 2, 25.99),
(1, 2, 1, 9.99),
(2, 3, 1, 1299.99);

Step 6: Query the Normalized Schema

Full invoice query (four-table join):

SELECT
o.order_id,
o.order_date,
c.first_name || ' ' || c.last_name AS customer,
p.product_name,
oi.quantity,
oi.unit_price,
(oi.quantity * oi.unit_price) AS line_total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
ORDER BY o.order_id, p.product_name;
order_idorder_datecustomerproduct_namequantityunit_priceline_total
12024-06-15 10:00:00Alice MartinUSB-C Cable19.999.99
12024-06-15 10:00:00Alice MartinWireless Mouse225.9951.98
22024-06-15 10:00:00Bob JonesLaptop Pro 1511299.991299.99

Products by category (self-join for hierarchy):

SELECT
parent.category_name AS parent_category,
child.category_name AS subcategory,
p.product_name,
p.price
FROM products p
JOIN categories child ON p.category_id = child.category_id
LEFT JOIN categories parent ON child.parent_id = parent.category_id
ORDER BY parent.category_name, child.category_name;
parent_categorysubcategoryproduct_nameprice
ComputersLaptopsLaptop Pro 151299.99
ElectronicsAccessoriesWireless Mouse25.99
ElectronicsAccessoriesUSB-C Cable9.99
NULLStationeryNotebook A54.50

The Complete Schema Diagram

┌──────────────┐       ┌──────────────┐
│ categories │ │ customers │
├──────────────┤ ├──────────────┤
│ category_id │◄─┐ │ customer_id │◄──────┐
│ category_name│ │ │ first_name │ │
│ parent_id │──┘ │ last_name │ │
└──────┬───────┘ │ email │ │
│ │ city, ... │ │
│ └──────────────┘ │
│ │
┌──────┴───────┐ ┌──────────────┐ │
│ products │ │ orders │ │
├──────────────┤ ├──────────────┤ │
│ product_id │◄──┐ │ order_id │◄──┐ │
│ product_name │ │ │ customer_id │───┘ │
│ price │ │ │ order_date │ │
│ category_id │───┘ │ status │ │
│ is_active │ │ shipping_* │ │
└──────────────┘ └──────┬───────┘ │
│ │
┌──────┴───────┐ │
│ order_items │ │
├──────────────┤ │
│ order_item_id│ │
│ order_id │───────┘
│ product_id │
│ quantity │
│ unit_price │
└──────────────┘

Common Normalization Mistakes

Mistake 1: Storing Derived Data That Can Be Calculated

-- Bad: storing line_total when it equals quantity * unit_price
CREATE TABLE order_items (
order_id INTEGER,
product_id INTEGER,
quantity INTEGER,
unit_price DECIMAL(10,2),
line_total DECIMAL(10,2) -- redundant!
);

If quantity or unit_price is updated but line_total is not, the data becomes inconsistent. Compute it in your query instead:

SELECT quantity * unit_price AS line_total FROM order_items;
info

Storing computed totals on the order header (like orders.total_amount) is a common exception, especially for performance on frequently accessed summaries. If you do this, update it via triggers or application logic whenever order items change.

Mistake 2: Repeating Groups as Numbered Columns

-- Bad: what if a customer has 4 phone numbers?
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
name VARCHAR(100),
phone1 VARCHAR(20),
phone2 VARCHAR(20),
phone3 VARCHAR(20)
);

This violates the spirit of 1NF. Create a separate table:

CREATE TABLE customer_phones (
phone_id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(customer_id),
phone_type VARCHAR(20),
phone_number VARCHAR(20) NOT NULL
);

Mistake 3: Over-Normalizing

Normalization can be taken too far. Splitting a customers table into customer_names, customer_emails, and customer_cities tables creates unnecessary joins with no benefit, because all of those attributes depend directly on customer_id and do not cause update anomalies.

Over-normalized (unnecessary):

CREATE TABLE customer_names (customer_id INTEGER PK, first_name VARCHAR, last_name VARCHAR);
CREATE TABLE customer_emails (customer_id INTEGER PK, email VARCHAR);
CREATE TABLE customer_cities (customer_id INTEGER PK, city VARCHAR);

Properly normalized (correct):

CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(254),
city VARCHAR(50)
);

All columns depend on customer_id with no partial or transitive dependencies. Splitting them gains nothing.

Quick Reference

Normal FormProblem It SolvesHow to Fix
1NFMulti-valued columns, missing keysOne value per cell; add a primary key
2NFPartial dependency on composite keyExtract partially dependent columns into new tables
3NFTransitive dependency (non-key depends on non-key)Extract transitively dependent columns into new tables

When to denormalize:

  • Proven read performance bottleneck that indexes cannot solve
  • Historical snapshots (prices at time of purchase, shipping addresses)
  • Reporting tables or materialized views separate from the transactional schema
  • Data warehouses designed for analytics, not transactions

Summary

Database normalization is the process of structuring tables to eliminate redundancy, prevent update anomalies, and ensure every fact is stored in exactly one place. The three normal forms build progressively:

  • 1NF requires atomic values and unique rows. No comma-separated lists, no multi-valued columns.
  • 2NF eliminates partial dependencies by ensuring every non-key column depends on the entire primary key, not just part of it.
  • 3NF eliminates transitive dependencies by ensuring every non-key column depends directly on the primary key, not through another non-key column.

Most production databases should target 3NF for transactional tables. Denormalization should be a conscious, measured decision made only when you have a proven performance need, and preferably implemented through materialized views or cache tables rather than corrupting your core normalized schema.

The ShopSmart walkthrough demonstrated how normalization works in practice: identify entities, give each its own table, connect them with foreign keys, and verify that every column describes its table's primary key directly. The result is a schema where each fact lives in one place, updates are simple, and the database protects its own integrity.