FREE BOOK

Chapter 4 - Normalizing a Data Model

Posted by Addison Wesley Free Book | C# Language September 01, 2009
Data normalization is probably one of the most talked-about aspects of database modeling. Before building your data model, you must answer a few questions about normalization. These questions include whether or not to use the formal normalization forms, which of these forms to use, and when to denormalize.

Fourth Normal Form (4NF) and Fifth Normal Form (5NF)

You've seen that 3NF generally solves most logical problems within databases.However, there are more-complicated relationships that often benefit from 4NF and 5NF. Consider Table 4.8, which describes an alternative, expanded version of the Agents entity.

Table 4.8 Agents: More Agent Information

PK-Agent Name PK-Agency PK-Artist Name Agent PrimaryPhone Agent SecondryPhone
John Doe AAA Talent The Awkward
Stage
(777)555-1234 Null
Sally Sixpack A Star Is Born
Agency
Girth (777)555-6789 (777)555-0000
John Doe AAA Talent Wasabi Peanuts (777)555-1234 Null
Johnny Jetkins Johnny
Jetkins Talent
The Bobby
Jetkins Band
(444)555-1111 Null
John Doe BBB Talent Juices of Brazil (777)555-4321 (777)555-9999

Specifically, this entity stores information that creates redundancy, because there is a multivalued dependency within the primary key. A multivalued dependency is a relationship in which a primary key attribute, because of its relationship to another primary key attribute, creates multiple tuples within an entity. In this case, John Doe represents multiple artists. The primary key requires that the Agent Name, Agency, and Artist Name uniquely define an agent; if you don't know which agency an agent works for and if an agent quits or moves to another agency, updating this table will require multiple updates to the primary key attributes. There's a secondary problem as well: we have no way of knowing whether the phone numbers are tied to the agent or tied to the agency. As with 2NF and 3NF, the solution here is to break Agency out into its own entity. 4NF specifies that there be no multivalued dependencies in an entity.

Consider Tables 4.9 and 4.10, which show a 4NF of these entities.

TABLE 4.9 Agent-Only Information

PK-Agent Name Agent PrimaryPhone Agent SecondryPhone Artist Name
John Doe (777)555-1234 Null The Awkward
Stage
Sally Sixpack (777)555-6789 (777)555-0000 Girth
John Doe (777)555-1234 Null Wasabi Peanuts
Johnny Jetkins (444)555-1111 Null The Bobby Jetkins Band
Jane Doe (777)555-4321 (777)555-9999 Juices of Brazil

Now we have a pair of entities that have relevant, unique attributes that rely on their primary keys. We've also eliminated the confusion about the phone numbers.

Often, databases that are being normalized with the target of 3NF end up in 4NF, because this multivalued dependency problem is inherently obvious when you properly identify primary keys. However, the 3NF version of these entities would have worked, although it isn't necessarily the most efficient form.

Now that we have a number of 3NF and 4NF entities, we must relate these entities to one another. The final normal form that we discuss is fifth normal form (5NF). 5NF specifically deals with relationships among three or more entities, often referred to as tertiary relationships. In 5NF, the entities that have specified relationships must be able to stand alone as individual entities without dependence on the other relationships. However, because the entities relate to one another, 5NF usually requires a physical entity that acts as a resolution entity to relate the other entities to one another. This additional entity has three or more foreign keys (based on the number of entities in the relationship) that specify how the entities relate to one another. This is how many-to-many relationships (as defined in Chapter 2) are actually implemented. Thus, if a many-to-many relationship is properly implemented, the database is in 5NF.

Frequently, you can avoid the complexity of 5NF by properly implementing foreign keys in the entities that relate to one another, so 4NF plus these keys generally avoids the physical implementation of a 5NF data model. However, because this alternative is not always realistic, 5NF is defined to help formalize this scenario.

Total Pages : 7 34567

comments