Database Normalization

What is Normalization?

In this article, we'll introduce the concept of normalization and have a brief look at the most common normal forms. Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships among them. Normalization is a bottom-up technique for database design. The inventor of the relational model Edgar Codd proposed the theory of normalization with the introduction of FirstNormal Form and he continued to extend theory with Second and Third Normal Form. Later he joined with Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.

The theory of Normalization is still being developed further. For example there are discussions even on 6th Normal Form. But in most practical applications normalization is best achieved in the 3rd Normal Form.

The evolution of Normalization theories is illustrated below:

Normalization.jpg

Let's learn Normalization with a practical example.

Assume a book library maintains a database of Books Issued. Without any normalization, all the information could be stored in one table as shown in the following

SALUTATION CUSTOMER NAME CITY BOOK ISSUED
MR. RAJ BANGALORE LET US C,ORACLE DATABSE,
MISS. PRIYA CHENNAI PROGRAMMING WITH JAVA,C++ PROGRAMMING
MR. RAJ DELHI DBA FUNDAMENTALS, ORACLE PROGRAMMING

Here you see the Book Issued column has multiple values.

Now let's move in to the 1st Normal Form.

First Normal Form (1NF)

The First Normal Form (1NF) sets the very basic rules for an organized database:

  • Each table cell should contain a single value.
  • Each record needs to be unique. OR
  • It contains atomic values only.
SALUTATION CUSTOMER NAME CITY Book Issued
MR. Raj BANGALORE LET US C
MR. Raj BANGALORE ORACLE DATABSE
Miss Priya CHENNAI PROGRAMMING WITH JAVA
Miss Priya CHENNAI C++ PROGRAMMING
MR. Raj DELHI DBA FUNDAMENTALS
MR. Raj DELHI ORACLE PROGRAMMING

Note that now there is only one book in the Book Issued column for each record.

Second Normal Form (2NF)

  • Be in 1NF
  • Single Column Primary Key
  • Rule 3. Remove subsets of data that apply to multiple rows of a table and place them in separate tables
  • Rule 4. Create relationships among the new tables OR
  • If it is in 1NF and every non-key attribute is dependent on the primary key
MEMBERSHIP ID SALUTATION CUSTOMER NAME CITY
1 MR. RAJ BANGALORE
2 MISS. PRIYA CHENNAI
3. MR. RAJ DELHI

TABLE 1

BOOK ID MEMBERSHIP ID BOOK ISSUED
1 1 LET US C
2 1 ORACLE DATABASE
3 2 PROGRAMMING WITH JAVA
4 2 C++ PROGRAMMING
5 3 ORACLE PROGRAMMING
6 3 DBA FUNDAMENTALS

TABLE 2

We have divided our 1NF table into two tables, Table 1 and Table2. Table 1 contains member information. Table 2 contains the information for each book. Introduced a new field called Membership_id that is the primary key field for table 1. Records are uniquely identified in Table 1 using membership id. Another new column is book_id that is the primary key field for table 2. Records can be uniquely identified in Table 2 using book id.

Third Normal Form (3NF)

  • Be in 2NF
  • Has no transitive functional dependencies
  • Remove columns that are not fully dependent upon the primary key. OR
  • If it is in 2NF and every non-key attribute is non-transitively dependent on the primary key

What has to be determined is "is field A dependent upon field B or vice versa?" This means: "Given a value for A, do we then have only one possible value for B, and vice versa?" If the answer is yes, then A and B should be put into a new relation, with A becoming the primary key. A should be left in the original relation and marked as a foreign key.

What is a transitive functional dependency?

A transitive functional dependency is that changing a non-key column might cause any of the other non-key columns to change. Consider table 1. Changing the non-key column Full Name may change Salutation.

MEMBERSHIP ID SALUTATION ID CUSTOMER NAME CITY
1 1 RAJ BANGALORE
2 2 PRIYA CHENNAI
1 1 RAJ DELHI

TABLE 1

BOOK ID MEMBERSHIP ID BOOK ISSUED
1 1 LET US C
2 1 ORACLE DATABASE
3 2 PROGRAMMING WITH JAVA
4 2 C++ PROGRAMMING
5 3 ORACLE PROGRAMMING
6 3 DBA FUNDAMENTALS

TABLE 2

ID SALUTATION NAME
1 MR.
2 MISS
3 MRS.
4 DR.

TABLE 3

Now, there are no transitive functional dependencies.

Here, we have again divided our tables and created a new table that stores Salutations Details.

With that, now our database table is in 3NF.

Summary of Normalization Rules

That is the complete process. Having started off with an unnormalised table we finished with four normalized tables in 3NF. You will notice that duplication has been removed (apart from the keys needed to establish the links among those tables).

The process may look complicated. However, if you follow the rules completely, and do not miss any steps, then you should arrive at the correct solution. If you omit a rule there is a high probability that you will end up with too few tables or incorrect keys.

The following normal forms were discussed in this section

  1. First normal form: A table is in the first normal form if it contains no repeating columns.
  2. Second normal form: A table is in the second normal form if it is in the first normal form and contains only columns that are dependent on the entire (primary) key.
  3. Third normal form: A table is in the third normal form if it is in the second normal form and all the non-key columns are dependent only on the primary key. If the value of a non-key column is dependent on the value of another non-key column then we have a situation known as transitive dependency. This can be resolved by removing the columns dependent on non-key items to another table.

Now our database is in a level that cannot be further decomposed to attain higher forms of normalization. Proceeding to further levels of normalization is normally needed in complex databases. However we will be discussing the next levels of normalizations in brief in the following.

Boyce-Codd Normal Form (BCNF)

The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form

  • Satisfies all the requirements of the third normal form.
  • Every determinant must be a candidate key.

4th Normal Form

  • Satisfies all the requirements of the third normal form.
  • A relation is in 4NF if it has no multi-valued dependencies.

5th Normal Form

  • Satisfies all the requirements of the fourth normal form.
  • It cannot be decomposed into any number of smaller tables without loss of data.

6th Normal Form

  • Satisfies all the requirements of the Fifth normal form.
  • Hopefully we will have a clearly defined standard for the 6th Normal Form in the near future.

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.


Similar Articles