The Problem That Relational Databases Solve

As I said last time, “data independence” is a clean separation between applications and data. What problem does it solve, and how does it solve it?

In my previous post, I talked about the people who take the relational model for granted. Where did it first come from, and why?

(Most of this essay is taken or paraphrased from perhaps the best expositor of the relational concept, C. J. Date. I am grateful to Prof. Michael Stonebraker for his comments on an earlier draft. As always, any errors are mine alone.)

Data Independence

For a large enterprise, there is a very large body of crucial information. These are the “crown jewels” of the information technology part of the company. This information lasts for the whole lifetime of the enterprise. But applications come and go, like migrating birds. The next application to come along might want access data in a different way, for important reasons. The structure of the database structure must adapt well to these new and changing demands.

With the older styles of data organization (called “network” or “CODASYL”, roughly speaking), sometimes the new application could not be done efficiently. Many times, for all practical purposes, it was impossible to write the application with acceptable performance. You can find the details of this in many books, but to give just one analogy: suppose you have a program with nested loops. In many cases (not 2D arrays), it’s pretty obvious which loop ought to be on the outside. Well, imagine if you forced to do it the other way, even if it made the program very much slower. And that’s just one example.

To solve this, we want data organization that can do two things. First, give every application a view of the database that doesn’t change over time, so that the application keeps working. Second, have a way to change the physical organization of the data without changing any of the software that uses the database system, which may be needed to make the new applications faster without hurting the old ones, or not hurting enough that it matters much. This is called “data independence.”

The Relational Model

A novel and effective solution to data independence, the “relational”, was created by E. F. Codd, in 1970. By representing data in relations, in normalized form, you can solve both of the above problems. I won’t go over all that here; I recommend “An Introduction to Database Systems” by C. J. Date.

(By the way, notice that the name of the book isn’t “… to Relational Database Systems,” even though that’s what the book is. Why bother with a superlative adjective, when “everybody knows” that all database systems, other than ancient ones, are relational?)

The relational model, as an abstract concept, is an excellent and brilliant solution to the data independence problem. Later we’ll see that that is not the only problem for which people want to store data. But in the next post, I’ll look into how well actual relational database systems implement the concept.

Postscript: I am only talking here about the way data is modeled. I’ll talk about transaction issues later.)