11 Important Database Designing Rules

Introduction

Before you start reading this article, let me confirm that I am not a guru in database design. I think it has helped me a lot when it comes to DB designing. The 11 points listed below are points I have learned via projects, my own experiences, and my reading. Any criticism is welcome.

The reason why I am writing a full-blown article is when developers sit to design a database; they tend to follow the three normal forms like a silver bullet. They tend to think normalization is the only way of designing. Due to this mindset, they sometimes hit roadblocks as the project moves ahead.

Article Overview

  • Rule 1 - What is the Nature of the application(OLTP or OLAP)?
  • Rule 2 - Break your data into logical pieces, make life simpler
  • Rule 3 - Do not get overdosed with rule 2
  • Rule 4 - Treat duplicate non-uniform data as your most significant enemy
  • Rule 5 - Watch for data separated by separators.
  • Rule 6 - Watch for partial dependencies.
  • Rule 7 - Choose derived columns preciously.
  • Rule 8 - Do not be hard on avoiding redundancy, if performance is the Key
  • Rule 9 - Multidimensional data is a different beast altogether
  • Rule 10 - Centralize name value table design
  • Rule 11 - For unlimited hierarchical data, self-reference PK and FK

a1.jpg

Courtesy - Image from Motion pictures

Said and done normalization rules are essential guidelines but taking them as a mark in stone is calling for trouble. Below are my own 11 rules, which I remember on the top head while doing DB design.

Rule 1. What is the Nature of the application(OLTP or OLAP)?

When you start your database design, the first thing to analyze is the Nature of the application you are designing for. Is it Transactional or Analytical? You will find many developers, by default applying normalization rules without thinking about the Nature of the application and then later getting into performance and customization issues. As said, there are two kinds of applications transaction based and analytical-based; let's understand what these types are.

  1. Transactional - In this application, your end user is more interested in CRUD, i.e., Creating, reading, updating, and deleting records. The official name for such type of database is called as OLTP.
  2. Analytical - In these kinds of applications, your end user is more interested in Analysis, reporting, forecasting, etc. These kinds of databases have less number of inserts and updates. The main intention here is to fetch and analyze data as fast as possible. The official name for such kinds of databases is OLAP.

a2.jpg

So, in other words, if you think inserts, updates, and deletes are more prominent, then go for normalized table design, or else create a flat denormalized database structure.

Below is a simple diagram showing how the names and address on the left-hand side is a simple normalized table and how we have created a flat table structure by applying a denormalized structure.

a3.jpg

Rule 2. Break your data into logical pieces, and make life simpler

This rule is the 1st rule from 1st normal formal. One of the signs of violation of this rule is if your queries are using too many string parsing functions like substring, char index, etc., this rule probably needs to be applied.

For instance, you can see the below table, which has student names; if you ever want to query a student named who has "Koirala" and not "Harisingh," you can imagine what kind of query you can end up with.

So the better approach would be to break this field into other logical pieces so we can write clean and optimal queries.

a4.jpg

Rule 3. Do not get overdosed with rule 2

Developers are cute creatures. If you tell them this is the way, they keep doing it; well, they overdo it leading to unwanted consequences. This also applies to rule 2, which we just talked about above. When you think about the decomposing, pause and ask yourself if it is needed; as said, the decomposition should be logical.

For instance, you can see the phone number field; you will rarely operate on ISD codes of phone numbers separately(Until your application demands it). So it would be wise to leave it as it can lead to more complications.

a5.jpg

Rule 4. Treat duplicate non-uniform data as your most significant enemy

Focus and refactor duplicate data. My worry about duplicate data is not that it takes hard disk space but the confusion it creates.

For instance, in the below diagram, you can see "5th Standard" and "Fifth standard" means the same. Now you can say due to wrong data entry or poor validation. Now, if you ever want to derive a report, they would show them as different entities, which is very confusing from an end-user point of view.

a6.jpg

One of the solutions would be to move the data into a different master table altogether and refer them via foreign keys. You can see in the below figure how we have created a new master table called "Standards" and linked the same using a simple foreign key.

a7.jpg

Rule 5. Watch for data separated by separators

The second rule of 1st normal form says to avoid repeating groups. One of the examples of repeating groups is explained in the below diagram. If you see the syllabus field closely, we have too much data stuffed in one field. These kinds of fields are termed "Repeating groups." If we have to manipulate this data, the query would be complex, and I doubt the performance of the queries.

a8.jpg

These kinds of columns with data stuffed with separators need special attention, and a better approach would be to move that field to a different table and link the same with keys for better management.

aa9.jpg

So now let's apply the second rule of 1st normal form "Avoid repeating groups." In the above figure, I created a separate syllabus table and then made a many-to-many relationship with the subject table.

With this approach, the syllabus field in the main table is no more repeated and has data separators.

Rule 6. Watch for partial dependencies

aa10.jpg

Watch for fields which are depending partially on primary keys. For instance, in the above table, we can see the primary Key is created on roll number and standard. Now watch the syllabus field closely. The syllabus field is associated with a standard and not with a student directly (roll number).

A syllabus is associated with the standard in which the student is studying and not directly with the student. So if tomorrow we want to update the syllabus, we have to update it for each student, which is painstaking and not logical. Moving these fields out and associating them with the standard table makes more sense.

You can see how we have moved the syllabus field and attached the same to the standards table.

This rule is nothing but the second normal form "All keys should depend on the full primary key and not partially."

Rule 7. Choose derived columns preciously

a11.jpg

If you are working on OLTP applications, getting rid of deriving columns would be a good thought until there is some pressing reason for performance. In the case of OLAP, where we do a lot of summations and calculations, these fields are necessary to gain performance.

In the above figure, you can see how the average field is dependent on marks and subject. So for such kinds of areas derived from other fields, think about whether they are essential. This is also one form of redundancy.

This rule is also called the 3rd normal form "No columns should depend on other non-primary key columns." My thought does not to apply this rule blindly to see the situation; it's not that redundant data is always wrong. If the redundant data is calculative data, see the situation and decide if you want to implement the third normal form.

Rule 8. Do not be hard on avoiding redundancy, if performance is the Key

a12.jpg

Do not make it a strict rule that you will always avoid redundancy. If there is a pressing need for performance, think about denormalization. In normalization, you need to make joins with many tables, and in denormalization, the joins reduce and thus increase performance.

Rule 9. Multidimensional data is a different beast altogether

OLAP projects mainly deal with multidimensional data. For instance, you can see the figure below; you would like to get sales per country, customer, and date. Simply put, you are looking at sales figures with three intersections of dimension data.

a13.jpg

For such situations, a dimension and fact design is a better approach. Simply put, you can create a central sales fact table with the sales amount field, and he connects all dimension tables using a foreign key relationship.

a14.jpg

a15.jpg

Rule 10. Centralize name value table design

Many times I have come across name-value tables. Name and value tables mean it has a key and some data associated with the Key. For instance, in the figure below, we have a currency table and a country table. If you watch the data closely, they only have Key and value.

a16.jpg

Creating one central table and differentiating the data by using a type field makes more sense for such kind of table.

Rule 11. For unlimited hierarchical data self-reference PK and FK

Many times we come across data with an infinite parent-child hierarchy. For instance, consider a Multi-level marketing scenario where one salesperson can have multiple salespeople below them. Using a self-referencing primary and foreign key for such systems will help achieve the same.

a17.jpg

This article is not meant to say that do not follow normal forms but do not follow them blindly, look at your project's Nature and the type of data you are dealing with.

a18.jpg

Conclusion

This article taught us about 11 crucial database design rules in detail. Happy reading.


Similar Articles