Although we have mostly focused on normalization to the third normal form, there are instances where we may want to denormalize a database as well. Although an optimal transactional database should be at least in 3NF, the reduction of data redundancy is only one of the main goals. As we discussed and have seen in the last few tutorials, the further we normalize, the more tables are created. As such, to generate useful information from the database, we have to join the tables together. The more joins we have, the more input/output operations and processing that’s required. Although most databases can handle this process quite effectively, this can become more of a challenge for databases that are much larger in size.
There are some data anomalies that may not make sense to split off. For example, in the US, the zip code can define the city and state which through normalization rules we may have split up in a separate table to use as a reference table. However, would that make sense to incorporate to avoid the redundancy of the city and state? In some situations, it may make sense, but it may not in others. Even looking at our database in PostgreSQL, we will see that the city/state/zip combination exists in the invoice, customer, and employee tables. Keeping it in a single table would introduce some redundant data in the data model. This would be an example where we may want to avoid those extra join conditions.
You can also have situations where the pre-aggregated or derived data from tables to also help avoid those extra join conditions. The invoice table with the total is a great example of this. The total could be calculated for the invoice by adding the total of the quantity and unit_price of each row from the invoice. However, if we needed to use the invoice total in different situations, we would have to calculate it each time which may not make sense. We could also have a temporary denormalized table with data stored in a format where we may have repeating groups of columns. This type of query may be impossible to generate the data required purely through SQL and may need other programming languages to store that information.
With data warehouses for business intelligence, we may also have denormalized data since the data has gone through a transactional database. For analytical purposes, we are worried more so about performance rather than data redundancy or data anomalies. If we do not need to worry about the constant data insert, update or deletion, denormalization will not be an issue.