Online College Courses for Credit



Author: Sophia Tutorial

Identify situations that require denormalization to generate information efficiently.

See More
Fast, Free College Credit

Developing Effective Teams

Let's Ride
*No strings attached. This college course is 100% free and is worth 1 semester credit.

47 Sophia partners guarantee credit transfer.

299 Institutions have accepted or given pre-approval for credit transfer.

* The American Council on Education's College Credit Recommendation Service (ACE Credit®) has evaluated and recommended college credit for 33 of Sophia’s online courses. Many different colleges and universities consider ACE CREDIT recommendations in determining the applicability to their course and degree programs.


what's covered
This tutorial explores the need to denormalized a database.

Although we have mostly focused on normalization to the third normal form, there are instances where we may want to denormalize a database as well. Although an optimal transactional database should be at least in 3NF, the reduction of data redundancy is only one of the main goals. As we discussed and have seen in the last few tutorials, the further we normalize, the more tables are created. As such, to generate useful information from the database, we have to join the tables together. The more joins we have, the more input/output operations and processing that’s required. Although most databases can handle this process quite effectively, this can become more of a challenge for databases that are much larger in size.

There are some data anomalies that may not make sense to split off. For example, in the US, the zip code can define the city and state which through normalization rules we may have split up in a separate table to use as a reference table. However, would that make sense to incorporate to avoid the redundancy of the city and state? In some situations, it may make sense, but it may not in others. Even looking at our database in PostgreSQL, we will see that the city/state/zip combination exists in the invoice, customer, and employee tables. Keeping it in a single table would introduce some redundant data in the data model. This would be an example where we may want to avoid those extra join conditions.

You can also have situations where the pre-aggregated or derived data from tables to also help avoid those extra join conditions. The invoice table with the total is a great example of this. The total could be calculated for the invoice by adding the total of the quantity and unit_price of each row from the invoice. However, if we needed to use the invoice total in different situations, we would have to calculate it each time which may not make sense. We could also have a temporary denormalized table with data stored in a format where we may have repeating groups of columns. This type of query may be impossible to generate the data required purely through SQL and may need other programming languages to store that information.

With data warehouses for business intelligence, we may also have denormalized data since the data has gone through a transactional database. For analytical purposes, we are worried more so about performance rather than data redundancy or data anomalies. If we do not need to worry about the constant data insert, update or deletion, denormalization will not be an issue.

Denormalization takes a database in third normal form to add redundancy for the sake of performance.