Normalization is something that we have brought up a few times in prior tutorials, but it is time to dive into the details about this important process when it comes to database design. Normalization helps database designers to focus on evaluating and correcting issues with table structures to help minimize the amount of data redundancy in our tables. Doing so will also help reduce the chance of potential data anomalies. The normalization process works through a series of stages or steps called normal forms.
The first three normal forms are defined as first normal form (1NF), second normal form (2NF) and third normal form (3NF). From a table structure perspective, second normal form is better than first normal form, and third normal form is better than second normal form. Although there are additional normal forms, third normal form is as high as we would need to go with normalization. We just brought up the fact that third normal form is better than second, so you may be wondering why we wouldn’t simply go to even higher normal forms. It’s important to note that the higher the normal form, the more join operations we must perform as with each normal form, we generally would add more tables to reduce redundancy. As important as reducing data redundancy is, it is only important to a point as we have to ensure that the database design and system offer fast performance. There may be situations where we may even need to denormalize our database (going down in normal form) to help improve performance while adding data redundancy.
Normalization is typically performed when we are first designing the database and takes place between the conceptual and logical data model creation. Once we have the basic conceptual model created, we can then go through each stage of normalization and determine if the structure can be improved. It’s important to note that normalization is not done on its own but rather, we need to ensure that we identify the business rules of the organization as well as the data constraints first. We will also determine what the functional dependencies are, the entities, the relationships between the entities as well as eliminate any multi-valued attribute. Then we can use normalization to validate and refine the data model for the database. The main purpose of going through the stages of normalization is to eliminate data redundancy where needed and eliminate any data anomalies.
Let us go back to our movie rating database. If this had been set up in a spreadsheet, it may look something like this:
We can see some issues here as the list of actors for the same movie seem to be different since it is entered in by the user. The genres also have the same issue with Titanic being listed in Drama by one user, Romance by another user and Drama and Romance by another. Throughout the upcoming tutorials, we will step through the normalization process to define the final normalized database structure.