ACID in Theory and Practice

The new so-called NoSQL data stores have been criticized, often by the traditional database community, because they sacrifice “ACID transactions”. Is this fair? How much does it matter? I’ll briefly go over what ACID transactions are and what they’re for, and then look at how they’re used, or not.

ACID

A “transaction” works this like: a thread (locus of control) does the following steps:

  • A “begin transaction” operation
  • Arbitrary computation, which can include:
    • examining data
    • modifying data
  • Either a “commit transaction” operation, or an “abort transaction” operation.

If the “commit transaction” operation completes (i.e. returns to its caller in the thread), the transaction is said to have committed. If the thread does an “abort transaction”, or if the thread halts (the thread gets an unhanded exception, the thread is killed, the process is killed, the hardware crashes), the transaction is said to have “aborted”.

(In some systems, the “begin transaction” is implicit when the previous transaction completes; it doesn’t matter.)

Ideally, a transaction has four properties, usually described with the helpful mnemonic “ACID”:

Atomic: If a transaction modifies the data and the transaction commits, all of the changes are performed; if the transaction aborts, none of them happens.

Consistent: There is some predicate on the data, called “consistency”. If the data is in a consistent state when the transaction has first started (before it performs any side-effects), then it is consistent after the transaction finishes. (This is trivially true if the transaction aborts.)

Isolated: Although many threads of control might be examining or modifying the data concurrently (interleaved in time), everything behaves as if they were sequential, i.e one at a time in some order.

Durable: If the transaction commits, any modifications it has made are “durable”, which means that they take effect even if there is a halt.

“Consistency” is hard to define. What is really means it that the data in the database is an accurate representation of the real world (for example, account X has $A and account B has $B), and that the transactions that moved the database state from a before-state to an after-state are consistent with real world operations (e.g. money has been withdrawn from account X and deposited in account Y).

Unfortunately, there isn’t any real way to check and enforce this. So what happens depends a lot on the application. Often what people mean by “consistency” is that certain invariants are met. Some database systems provide support for adding checks for these invariants, called “integrity constraint”. Meeting these constraints is necessary but not, in general, sufficient for consistency, but that’s often what people mean by “consistency”. Mostly people don’t pay much attention to “C”, anyway.

If a data storage system is both Atomic and Durable together, then modifications made by a committed transaction are all performed on the database, even in the face of a halting failure. This plus Isolation presents the application with an abstraction that’s very clean and easy to deal with.

Most important, ACID is entirely independent of the application. The concerns of the application are entirely separated from the concerns of failure and interleaving. This separation of concerns makes things much simpler, and reducing complexity is of great value.

Isolation in Theory and Practice

Writing an application is easy with isolation, because the programmer can ignore concurrency. But do people really use database systems this way? When we look around, we find the concept of an “isolation level”, in which an application can decide how much isolation it wants. Don’t they all want total isolation? Yes, but there’s a big problem: total isolation hurts performance severely in so many cases that it’s rarely used! If you don’t believe me, consider the following.

Thomas Kyte has written widely about Oracle DB, especially about how its transactions work. His book, “Expert Oracle Database Administration”, was recommended to me by a skilled Oracle database administrator; Kyle is highly respected. Although Oracle DB can do ACID transactions, the book strongly recommends against using them. Oracle DB has more than one “isolation level”. The strongest, READ REPEATABLE, provides ACID transactions. (Almost. If you care about the “phantom read” issue, you don’t need me to tell you about this stuff.) Instead, he recommends that you use the READ COMMITTED isolation level. He says that it is “the most commonly used isolation level” and that “it is rare to see a different isolation level used.”

When using READ COMMITTED, you are not guaranteed to get “repeatable reads”. That is, during the course of a transaction, you might read a value, and later read the same value and get a different result back, because of writes by concurrent transactions! Remember that a read is often not a direct request to read just one column of one row; it’s often part of a more general SQL query. You might not even know that there is some data that two SQL queries both read. This is not what I’d call the “I” in “ACID”. The concurrency, rather than being cleanly separated from the application, is now exposed to the application. The application writer has to know that reads are not repeatable and take that into account, which makes his or her life harder.

This isn’t specific to Oracle. In fact, it’s so pervasive in relational databases that it’s even part of the SQL standard. Isolation levels are so important that they aren’t just an implementation-specific hack. The official SQL standard defines several reduced levels of isolation.

Here’s another story about not using ACID. I and the rest of the ObjectStore team at Object Design once had the great opportunity to talk with some of the most renowned database experts in the world, at IBM’s Alamaden Research Center. These are the people who designed one of the earliest relational database systems (System/R), and continued to do groundbreaking work, which can you can read in many excellent papers they have published, many of which I had read. The group included Don Haderly, C. Mohan, Bruce Lindsay, and others, If these people don’t know about transactions, nobody does.

When they heard us say that ObjectStore provided real ACID transactions, they were surprised, and explained to us that nobody really uses those. They said you mustn’t do that, or your database system will be too slow.

They said, what our relational database applications use is “cursor-stability isolation”. Here’s how that one works. In a relational database, you typically perform a query, and get back a sequence of rows (a.k.a. tuples). The application iterates over the tuples, with a “cursor” to keep track of where in the sequence it’s up to. With “cursor-stable” isolation, when the cursor moves to a row, that row is locked. When it moves the cursor to the next row, the old row is unlocked and the next row is locked. At the end, the last row is unlocked and the transaction ends.

I was very surprised. While the application is working with one row, all the other rows could change out from under it. If you were trying to sum up some column (attribute) of each row, you might not get a consistent snapshot of the database.

For example, suppose each row represented a bank account, and an application A wants to transfer $100 from account A to account B. Concurrently, application B wants to sum up the total amount in all accounts. B should get the same answer no matter how it is interleaved with A. B should not see an inconsistent state where the debit has been done and the credit has not. I asked how an application e deal with such confusing behavior. This is a lot like the Oracle situation: reads are not repeatable.

I was very surprised: how can the application writers be expected to deal with this lack of isolation? The answer went something like this:

Summing up a column was really done in one SQL transaction using a SUM aggregate, and in that case the problem does not arise, because within a single SQL query, you do get isolated behavior. (This is true in Oracle as well.) Many common simple cases can be handled using SQL aggregation operators.

Yes, it’s true that if you have more than one query in your transaction, the application programmer does have to be aware of possible effects of interleaving. However, in real life (they said), most transactions are simple enough that it’s not so hard to reason about the effects of reduced isolation, and sometimes you can just ignore them.

To me, this was not a very satisfying answer. It’s like saying, well, it works in simple cases and when you’re lucky.

In ObjectStore, there were data structures much more complicated than tables. Indeed, ObjectStore could store anything that you can express in your programming languages (C++ or Java). We didn’t see any way to something analogous. We got away with using ACID because the sweet spot for ObjectStore wasn’t applications doing fine-grained interleaving.

Who Casts the First Stone?

The ACID transaction abstraction provides an excellent separation of concerns. It’s true that the NoSQL stores, with their “eventual consistency” properties, or their “return many possibly-different values” API’s, force the application to live with weaker guarantees than ACID. But so do the real relational database systems. Academic papers or commercial white papers that criticize the NoSQL data stores for not providing ACID should be fair: in the real world, nobody who cares about fine-grained concurrency is providing ACID guarantees.

Addendum of Nov 8, 2010

One of ITA’s very knowlegable Oracle experts pointed out some things that some issues that I should have discussed.

I should have mentioned that using Oracle’s “read committed” isolation, you do get repeatable reads within a single SQL query. When writing software that uses relational databases, it’s good to do as much as you can within a single query, rather than doing many queries as part of an imperative flow of control. All other things being equal, declarative code is better than imperative code. It is much easier for a person to reason about, which makes code clearer and easier to understand. Also, it makes code easier for a computer to understand. Writing an optimizer for imperative code is harder than writing one for declarative code.

Our expert tells me that sometimes programmers, who are generally trained in, and experienced with, imperative coding, will sometimes write programs that do one query after another, when it could have been done in a single query. To be sure, to do it in a single query can require you to learn morea bout SQL. But if you’re using a relational database that uses SQL, you really ought to learn that stuff. If you are using a tool, you should learn to use it.

Of course, not all situations allow you to take a transaction and make it only need one SQL statement. But if you can do that, you get transaction guarantees that are much closer to ACID. (It’s still not precisely ACID due to the so-called “phantom” scenario, but I will cut Oracle slack for that since it’s hard to solve in their architecture.)

However, I’ll add that one of the criticisms of the “NoSQL” data stores that the relational experts make is that they can only do one operation in a query. While that is true, and it is a disadvantage, it’s also true that if you use Oracle, your transaction has better properties if it only performs one operation (query) per transaction. That’s not an apples-to-apples comparison (traditional RDBMS’s are capable of doing multi-query transactions, but it’s something to think about.

3559