Understanding Normalization in Database Design

This article is intended to the audiences who are aware of designing the database but are not sure about the Normalization process.

What is Normalization?
When we want to store the some piece of information, we store it in database. Now storing a data in database mean, it should be stored properly so that while retrieving it, it should be easy. So we say store it in Normalized way. In short, Normalization can be defined as the process of organizing the data in database efficiently. The result of normalization is a logical database design and is called as Normal Form.

Why Normalization?
Goals of Normalization process are:

  1. It helps you to eliminate the redundant data from same table.
  2. It ensures the data dependencies between the tables are proper.
  3. A Normalized database design makes it easy to change in modification is required.

Advantages of Normalization

  1. Data redundancy is removed
  2. Faster update as redundant columns from a tables are removed.
  3. Easy understanding of structure
  4. Improvement in Index as be achieved
  5. Long term maintainability of database get easier

Disadvantages of Normalization

  1. Query to some extent get complicated
  2. Performance may degrade due to multiple joins

Suppose I have some data with me, say

UnNormalized Table



As you can see above, table is not properly managed

Normalization process is mainly divided in to stages which we call as Normal Form. Let’s talk about each Normal Form one by one with an example. Basically a database can be normalized into various normal forms such as-

First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce Codd Normal Form (BCNF)
and Fourth Normal Form (4NF) and so on.
But today I would like to talk about only up to Boyce Codd Normal Form because Fourth Normal Form and others is rarely used in the database design.

First Normal Form (1NF)
It says,
  • Eliminate all repeating groups in individual table
  • One cell should contain only one data
  • Create a separate table for each set of related data
  • Identify each set of related data with a primary key

 

As you can see above,
  • Our UnNormalized table does not have repeating group. So this is not applicable over here.
  • We have eliminated the comma separated values and shifted to another table Skill. Each employee in Employee table is related to Skill table through EmployeeId column (Note: It is not a foreign key relationship. We are just replacing the comma separated values in each cell in to rows and creating separate table for it)
  • Each cell is containing single value
  • Skills of table is identified with primary key

Second Normal Form (2NF)
It says,

  • Tables should be in First Normal Form (1NF)
  • Eliminate partial primary key dependencies
  • Non key column must depend on the entire composite primary key and create separate tables for sets of values that apply to multiple records
  • Relate these tables with a foreign key

 

As you can see above,
  • We have eliminated the partial primary key dependency of EmployeeId and created a new table which will contain the relation of EmployeeId and SkillId. This both columns will be the respective foreign key for Employee and Skill table.

Third Normal Form (3NF)

  • Table should be in Second Normal Form
  • Eliminate fields that are not dependent on key i.e. - Eliminate Transitive Dependencies. Create a separate table for it.

Let’s suppose I want to add new columns in Employee Table Manager and Project



Now, adding this 2 column violates the third Normal Form because the non key column Project is dependent on another not key column i.e. - Manager.
So we will normalize the table by separating nondependent column to another table. This way we can achieve third Normal Form.



Boyce Codd Normal Form (BCNF)
It says,

  • BCNF is based on the concept of a determinant.
  • A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.
  • A relation is in BCNF is, and only if, every determinant is a candidate key.
  • The same as 3NF except in 3NF we only worry about non-key attributes

Note: If there is only one candidate key then 3NF and BCNF are the same.

Consider another column in below table say ProjectTechnology



As shown above, each manager will be handling unique project, so we can say particular project is determined by particular manger where Manager and Project depicts a candidate key.

If we delete the entry of Manager Ronnie from the table we lose not only information of Project BCF but also the fact that project was developed in Asp.Net technology. We cannot make the entry of the fact that BCF project was developed using Asp.Net.

So let’s break this into separate tables



Fourth Normal Form (4NF)
It says,

  • Database design should follow the 1NF, 2NF, 3NF and BCNF if possible
  • There must not be more than one multivalued dependencies other than a candidate key.

I hope you got it what actually is the process of Normalization.

Conclusion

Thus concluding it, Normalization is a process which is a must to design any database. Hope you like this article. Please share your comments whether it’s good or bad. Your comments are valuable to me to get better.