There are many different types of databases but part of the evolution of databases has come from the use of flat files. As you have seen already with relational databases, they provide the database a means of organizing various topics into tables that relate to one another through the primary key and foreign keys. Flat files on the other hand lack that relationship but instead store data into a single table of data. Although this can work for certain cases there are a lot of problems that can occur as the criteria and relationships of the data become more complex.
Manual file systems are where the concept around databases has come from. Those old systems were manually handled through paper and pencil systems. These types of paper systems were organized to help manage the use of the data through a system of file folders and filing cabinets. Typically, there wasn’t a significant amount of data collection required and reporting would be quite limited. This type of system worked well for a data repository. However, as companies started to become larger and had more needs to report on data, managing the data in these manual file systems was too difficult.
The computerized file systems came next to help shift the approach of needing to work with manual file systems. Generating reports from those manual file systems would increase errors and be quite difficult to run on a regular basis. The early computer files in these computerized file systems were similar to the idea of manual physical files but stored on a computer similar to a simple Excel worksheet. If a business user needed information, there would be a central individual(s) that handled all of the data processing requests. These early computerized flat-file systems became just as much of a burden to manage due to related and overlapping data with no means of controlling or managing the data across each of the files.
The shortcomings of the file system are one of the big reasons for where we are with modern database systems. Those old systems took longer to develop, still had challenges to get quick answers and make sense of the data, and had very complex administration of the data. They also lacked security as we have with modern databases.
Let us consider the database that we have been working with for example and look at the invoice table and customer table. In our database, we store invoices in the invoice table and relate them to the customer through the customer_id. If we had to use a flat-file database, we could only have a single table to store the customer and invoice in. As such, all of the information would be stored in a single record. This means that the customer’s information like the name, address, phone, email, etc is stored in the same row as the invoice information. If the customer makes more than one purchase, all of that general information would have to be re-entered by the customer and repeated for each order that they place.
Can you see what can potentially go wrong with that? Imagine if you had to do that every time you made a purchase online from an organization. You could enter a different address or a nickname. How would the company identify which orders belong to you? Or what happens if you moved and needed to change your address? This is where we start to create a lot of data redundancy as you would have the same data stored not only in the same flat file but perhaps in other pockets of flat files for other purposes.
If we have data redundancy due to repeated data, we would have poor data security as we would have multiple copies of the same data making an increase of chance that we would have unauthorized access to the data. For flat files, the operating system would have to handle the security rather than the database itself. There would also be that data inconsistency where you can have different and conflicting versions of the data appearing in various locations like the address scenario that we looked at above. This same data inconsistency would increase the data-entry issues that may go undetected. We could also have data integrity issues as we may have products or tracks being entered in that do not exist at the company. This would create added errors with the data due to the inconsistency.
Since we would have repeated information, the data would become a lot larger to manage and maintain. We would also lack the flexibility to manage the data in a flat file as you would only be able to enter in data that fit the data structure. In our example, we would only have one customer and one invoice inserted together in the same flat file table.
Source: Authored by Vincent Tran