Chapter 4 - Normalizing a Data Model

Posted by Addison Wesley Free Book | C# Language September 01, 2009
Data normalization is probably one of the most talked-about aspects of database modeling. Before building your data model, you must answer a few questions about normalization. These questions include whether or not to use the formal normalization forms, which of these forms to use, and when to denormalize.

Second Normal Form (2NF)

Second normal form (2NF) specifies that, in addition to meeting 1NF, all non-key attributes have a functional dependency on the entire primary key. A functional dependency is a one-way relationship between the primary key attribute (or attributes) and all other non-key attributes in the same entity. Referring again to Table 4.3, if ArtistName is the primary key, then all other attributes in the entity must be identified by ArtistName. So we can say, "ArtistName determines ReleaseDate" for each instance in the entity. Notice that the relationship does not necessarily hold in the reverse direction; any genre may appear multiple times throughout this entity. Nonetheless, for any given artist, there is one genre. But what if an artist crosses over to another genre?

To answer that question, let's compare 1NF to 2NF. In 1NF, we have no repeating groups, and all attributes have a single value. However, in 1NF, if we have a composite primary key, it is possible that there are attributes that rely on only one of the primary key attributes, and that can lead to strange data manipulation anomalies. Take a look at Table 4.5, in

Table 4.5 Artists: 1NF Is Met, but with Problems

PK-Genre Signed Date Agent Agent Primary Phone Agent Secondry Phone
The Awkward Stage Rock 9/01/2005 John Doe (777)555-1234 NULL
Girth Metal 10/31/1997 Sally Sixpack (777)555-6789 (777)555-0000
Wasabi Peanuts Adult
1/01/2005   John Doe Null
The Bobby
Jenkins Band
R&B 3/15/1985   Johnny
The Bobby
Jenkins Band
Soul 3/15/1985   Johnny
Juices of Brazil Latin Jazz 6/01/2001   Jan
Juices of Brazil World Beat 6/01/2001   Jan

which we have solved the multiple genre problem. But we have added new attributes, and that presents a new problem.

In this case, we have two attributes in the primary key: Artist Name and Genre. If the studio decides to sell the Juices of Brazil albums in multiple genres to increase the band's exposure, we end up with multiple instances of the group in the entity, because one of the primary key attributes has a different value. Also, we've started storing the name of each band's agent. The problem here is that the Agent attribute is an attribute of the artist but not of the genre. So the Agent attribute is only partially dependent on the entity's primary key. If we need to update the Agent attribute for a band that has multiple entries, we must update multiple records or else risk having two different agent names listed for the same band. This practice is inefficient and risky from a data integrity standpoint. It is this type of problem that 2NF eliminates.

Tables 4.6 and 4.7 show one possible solution to our problem. In this case, we can break the entity into two different entities. The original entity still contains only information about our artists; the new entity contains information about agents and the bands they represent. This technique removes the partial dependency of the Agent attribute from the original entity, and it lets us store more information that is specific to the agent.

Table 4.6 Artists: 2NF Version of This Entity

PK-Artist Name PK-Genre SignedDate
The Awkward Stage Rock 9/01/2005
Girth Metal 10/31/1997
Wasabi Peanuts Adult Contemporary Rock 1/01/2005
The Bobby Jenkins Band R&B 3/15/1985
The Bobby Jenkins Band Latin Jazz 6/01/2001
  World Beat 6/01/2001

Table 4.7 Agents: An Additional Entity to Solve the Problem

PK-Agent Name Artist Name Agent
John Doe The Awkward Stage 555-1234 Null
Sally Sixpack Girth  (777)555-6789 (777)555-0000
Johnny Jenkins The Bobby Jenkins Band (444)555-1111 Null
Jane Doe Juices of Brazil 555-4321 555-9999

Total Pages : 7 12345