ACID in Theory and Practice

news and informationbusiness,health,entertainment,technology automotive,business,crime,health,life,politics,science,technology,travel

The new so-called NoSQL data stores have been criticized, often by the traditional database community, because they sacrifice “ACID transactions”. Is this fair? How much does it matter? I’ll briefly go over what ACID transactions are and what they’re for, and then look at how they’re used, or not.

ACID

A “transaction” works this like: a thread (locus of control) does the following steps:

  • A “begin transaction” operation
  • Arbitrary computation, which can include:
    • examining data
    • modifying data
  • Either a “commit transaction” operation, or an “abort transaction” operation.

If the “commit transaction” operation completes (i.e. returns to its caller in the thread), the transaction is said to have committed. If the thread does an “abort transaction”, or if the thread halts (the thread gets an unhanded exception, the thread is killed, the process is killed, the hardware crashes), the transaction is said to have “aborted”.

(In some systems, the “begin transaction” is implicit when the previous transaction completes; it doesn’t matter.)

Ideally, a transaction has four properties, usually described with the helpful mnemonic “ACID”:

Atomic: If a transaction modifies the data and the transaction commits, all of the changes are performed; if the transaction aborts, none of them happens.

Consistent: There is some predicate on the data, called “consistency”. If the data is in a consistent state when the transaction has first started (before it performs any side-effects), then it is consistent after the transaction finishes. (This is trivially true if the transaction aborts.)

Isolated: Although many threads of control might be examining or modifying the data concurrently (interleaved in time), everything behaves as if they were sequential, i.e one at a time in some order.

Durable: If the transaction commits, any modifications it has made are “durable”, which means that they take effect even if there is a halt.

“Consistency” is hard to define. What is really means it that the data in the database is an accurate representation of the real world (for example, account X has $A and account B has $B), and that the transactions that moved the database state from a before-state to an after-state are consistent with real world operations (e.g. money has been withdrawn from account X and deposited in account Y).

Unfortunately, there isn’t any real way to check and enforce this. So what happens depends a lot on the application. Often what people mean by “consistency” is that certain invariants are met. Some database systems provide support for adding checks for these invariants, called “integrity constraint”. Meeting these constraints is necessary but not, in general, sufficient for consistency, but that’s often what people mean by “consistency”. Mostly people don’t pay much attention to “C”, anyway.

If a data storage system is both Atomic and Durable together, then modifications made by a committed transaction are all performed on the database, even in the face of a halting failure. This plus Isolation presents the application with an abstraction that’s very clean and easy to deal with.

Most important, ACID is entirely independent of the application. The concerns of the application are entirely separated from the concerns of failure and interleaving. This separation of concerns makes things much simpler, and reducing complexity is of great value.

Isolation in Theory and Practice

Writing an application is easy with isolation, because the programmer can ignore concurrency. But do people really use database systems this way? When we look around, we find the concept of an “isolation level”, in which an application can decide how much isolation it wants. Don’t they all want total isolation? Yes, but there’s a big problem: total isolation hurts performance severely in so many cases that it’s rarely used! If you don’t believe me, consider the following.

Thomas Kyte has written widely about Oracle DB, especially about how its transactions work. His book, “Expert Oracle Database Administration”, was recommended to me by a skilled Oracle database administrator; Kyle is highly respected. Although Oracle DB can do ACID transactions, the book strongly recommends against using them. Oracle DB has more than one “isolation level”. The strongest, READ REPEATABLE, provides ACID transactions. (Almost. If you care about the “phantom read” issue, you don’t need me to tell you about this stuff.) Instead, he recommends that you use the READ COMMITTED isolation level. He says that it is “the most commonly used isolation level” and that “it is rare to see a different isolation level used.”

When using READ COMMITTED, you are not guaranteed to get “repeatable reads”. That is, during the course of a transaction, you might read a value, and later read the same value and get a different result back, because of writes by concurrent transactions! Remember that a read is often not a direct request to read just one column of one row; it’s often part of a more general SQL query. You might not even know that there is some data that two SQL queries both read. This is not what I’d call the “I” in “ACID”. The concurrency, rather than being cleanly separated from the application, is now exposed to the application. The application writer has to know that reads are not repeatable and take that into account, which makes his or her life harder.

This isn’t specific to Oracle. In fact, it’s so pervasive in relational databases that it’s even part of the SQL standard. Isolation levels are so important that they aren’t just an implementation-specific hack. The official SQL standard defines several reduced levels of isolation.

Here’s another story about not using ACID. I and the rest of the ObjectStore team at Object Design once had the great opportunity to talk with some of the most renowned database experts in the world, at IBM’s Alamaden Research Center. These are the people who designed one of the earliest relational database systems (System/R), and continued to do groundbreaking work, which can you can read in many excellent papers they have published, many of which I had read. The group included Don Haderly, C. Mohan, Bruce Lindsay, and others, If these people don’t know about transactions, nobody does.

When they heard us say that ObjectStore provided real ACID transactions, they were surprised, and explained to us that nobody really uses those. They said you mustn’t do that, or your database system will be too slow.

They said, what our relational database applications use is “cursor-stability isolation”. Here’s how that one works. In a relational database, you typically perform a query, and get back a sequence of rows (a.k.a. tuples). The application iterates over the tuples, with a “cursor” to keep track of where in the sequence it’s up to. With “cursor-stable” isolation, when the cursor moves to a row, that row is locked. When it moves the cursor to the next row, the old row is unlocked and the next row is locked. At the end, the last row is unlocked and the transaction ends.

I was very surprised. While the application is working with one row, all the other rows could change out from under it. If you were trying to sum up some column (attribute) of each row, you might not get a consistent snapshot of the database.

For example, suppose each row represented a bank account, and an application A wants to transfer $100 from account A to account B. Concurrently, application B wants to sum up the total amount in all accounts. B should get the same answer no matter how it is interleaved with A. B should not see an inconsistent state where the debit has been done and the credit has not. I asked how an application e deal with such confusing behavior. This is a lot like the Oracle situation: reads are not repeatable.

I was very surprised: how can the application writers be expected to deal with this lack of isolation? The answer went something like this:

Summing up a column was really done in one SQL transaction using a SUM aggregate, and in that case the problem does not arise, because within a single SQL query, you do get isolated behavior. (This is true in Oracle as well.) Many common simple cases can be handled using SQL aggregation operators.

Yes, it’s true that if you have more than one query in your transaction, the application programmer does have to be aware of possible effects of interleaving. However, in real life (they said), most transactions are simple enough that it’s not so hard to reason about the effects of reduced isolation, and sometimes you can just ignore them.

To me, this was not a very satisfying answer. It’s like saying, well, it works in simple cases and when you’re lucky.

In ObjectStore, there were data structures much more complicated than tables. Indeed, ObjectStore could store anything that you can express in your programming languages (C++ or Java). We didn’t see any way to something analogous. We got away with using ACID because the sweet spot for ObjectStore wasn’t applications doing fine-grained interleaving.

Who Casts the First Stone?

The ACID transaction abstraction provides an excellent separation of concerns. It’s true that the NoSQL stores, with their “eventual consistency” properties, or their “return many possibly-different values” API’s, force the application to live with weaker guarantees than ACID. But so do the real relational database systems. Academic papers or commercial white papers that criticize the NoSQL data stores for not providing ACID should be fair: in the real world, nobody who cares about fine-grained concurrency is providing ACID guarantees.

Addendum of Nov 8, 2010

One of ITA’s very knowlegable Oracle experts pointed out some things that some issues that I should have discussed.

I should have mentioned that using Oracle’s “read committed” isolation, you do get repeatable reads within a single SQL query. When writing software that uses relational databases, it’s good to do as much as you can within a single query, rather than doing many queries as part of an imperative flow of control. All other things being equal, declarative code is better than imperative code. It is much easier for a person to reason about, which makes code clearer and easier to understand. Also, it makes code easier for a computer to understand. Writing an optimizer for imperative code is harder than writing one for declarative code.

Our expert tells me that sometimes programmers, who are generally trained in, and experienced with, imperative coding, will sometimes write programs that do one query after another, when it could have been done in a single query. To be sure, to do it in a single query can require you to learn morea bout SQL. But if you’re using a relational database that uses SQL, you really ought to learn that stuff. If you are using a tool, you should learn to use it.

Of course, not all situations allow you to take a transaction and make it only need one SQL statement. But if you can do that, you get transaction guarantees that are much closer to ACID. (It’s still not precisely ACID due to the so-called “phantom” scenario, but I will cut Oracle slack for that since it’s hard to solve in their architecture.)

However, I’ll add that one of the criticisms of the “NoSQL” data stores that the relational experts make is that they can only do one operation in a query. While that is true, and it is a disadvantage, it’s also true that if you use Oracle, your transaction has better properties if it only performs one operation (query) per transaction. That’s not an apples-to-apples comparison (traditional RDBMS’s are capable of doing multi-query transactions, but it’s something to think about.

16 Responses to “ACID in Theory and Practice”

  1. NoSQL Daily – Sun Nov 7 › PHP App Engine Says:

    [...] Dan Weinreb’s blog » Blog Archive » ACID in Theory and Practice [...]

  2. James Knight Says:

    Postgres is implementing Predicate Locking and with it, a Serializable Snapshot Isolation mode, probably for version 9.1. This will allow it to provide an *actual* ANSI “Serializable” isolation mode and still allow for fine-grained concurrency, and having writes never block reads. It’s pretty cool work. (Neither Postgres nor Oracle currently provide a serializable mode at all: both actually provide snapshot isolation if you request serializable).

    http://wiki.postgresql.org/wiki/Serializable

  3. John Hugg Says:

    Hi Dan. Interestesting read. A counterpoint to the nobody really uses full isolation levels is VoltDB. Due to it’s unique single-threaded architecture, it can ONLY perform transactions with full serializability guarantees, and it does it with remarkable performance. Our head of field engineering, Tim Callaghan, used to be a die-hard Oracle user and he keeps saying how refereshing it is to actually not have to think about concurrency.

  4. Benjamin Black Says:

    It is a triumph of vendor marketing (Hi, John Hugg!) that reasonably informed engineers think ACID == SQL/relational and that NoSQL == EC. Both are false. As a couple of examples, have a look at Google Percolator http://research.google.com/pubs/pub36726.html and Scalaris http://code.google.com/p/scalaris/ .

  5. Dan Weinreb Says:

    James, I see that they are using the term “predicate locking”, but they don’t mean it in the way that it’s meant in “Transaction Processing” by Gray and Reuter, which I think of as the bible of TP. In their book, “predicate locking” meant locking on any predicate, in the sense that we use that word in Lisp: any boolean function at all. It’s very hard, if not impossible, to figure out which locks conflict with which others when they can do anything they damn well please. What Postgres is talking about is great, but it isn’t what Gray would term “predicate locking”. I need to look over the Wiki page you referred me to in more detail. Thanks very much for the reference!

  6. Dan Weinreb Says:

    John, yes, I see how that works in VoltDB. That’s very refreshing, indeed!

  7. J Chris A Says:

    “If you were trying to sum up some column (attribute) of each row, you might not get a consistent snapshot of the database.”

    This level of transactionality (views are consistent) is something that CouchDB can offer. So it’s very isolated, but it doesn’t support transactions that span multiple objects. There are different flavors of ACID.

  8. Dan Weinreb Says:

    I have added an addendum, above, explaining more about Oracle’s concurrency control, and how you can get better transactional behavior by using only one query per transaction.

  9. Clay Lenhart Says:

    Just for completeness, SQL Server has had Snapshot isolation level since 2005. It works great (assuming your tempdb database is fast enough). With this isolation level, the database will remain frozen from when you call BEGIN TRANSACTION. It’s nonlocking, so others can modify the data-but not the schema. ;)

    Schema modifications can be useful in some data warehouse scenarios (switching partitions come to mind), but for most people you won’t run into this issue, unless upgrading a database.

  10. Chris D Says:

    Good piece, but I think that in practical terms, what people actually want from “ACID” is atomic transactions. For most applications, isolation isn’t such a big deal: I’ve been bitten by isolation level exactly once in a decade of database engineering (and the app was doing the multiple-queries-as-imperative-programming thing you described).

    But, when you need a transaction, you need a transaction. The idea of manually maintaining before-and-after state and reverting to the previous state on a failure (and meanwhile other clients can read and overwrite the stuff you’ve already written) makes most of us look for some way to restructure the application.

  11. PA Says:

    I guess you must be familiar with the notion of read consistency:

    Data Concurrency and Consistency
    http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/consist.htm

    Furthermore, if one needs a specific point in time to run queries against, the notion of flashback queries (aka select as of) might come handy:

    Using Oracle Flashback Technology
    http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28424/adfns_flashback.htm#i1008579

    In short, not sure what your point is :)

    “If you are using a tool, you should learn to use it.”
    – Dan Weinreb

    Indeed :) )

  12. Rob Kinyon Says:

    Your point is well-taken. I do have issue with how you’ve defined Consistency. Consistency has nothing to do with “real-world” rules and everything to do with “in-the-database” rules (some of which may model real-world rules).

    Let’s say I have a table of artists and a table of cds, each of which has an artist_id. There is a FOREIGN_KEY relationship between cds.artist_id and artists.artist_id. So, we populate both tables a bit. Then, we do the following two transactions.

    A: BEGIN WORK
    A: SELECT artist_id FROM artists LIMIT 1;
    B: BEGIN WORK
    B: TRUNCATE TABLE cds;
    B: TRUNCATE TABLE artists;
    A: INSERT INTO cds (artist_id, name) VALUES (@artist_id, “Name”)
    B: COMMIT WORK
    A: COMMIT WORK

    When A commits, there is no artist for the new row in cds to refer to because B has already truncated both tables. So, the commit should fail. This is a consistency check.

  13. Dan Weinreb Says:

    @Chris: I guess it depends on the application, as so many infrastructure issues do. In the project I’m working on, the lack of ACID was such a big problem that we built an ACID layer on top of Oracle’s own transaction facility! There is a lot of concurrency in this application, and the implications of reduced isolation are very, very hard to model and understand. We really did have problems. We have lots of application-level developers; they are very smart guys, but most are not very familiar and comfortable with issues such as isolation, and even if they were, writing an application and being sure that it’s robust in the face of reduced isolation can be hard. I think you and I agree here.

  14. Kragen Javier Sitaker Says:

    Reposting my comment from Facebook, in reply to the summary, “In practice, you probably can’t take advantage of ACID if you have more than one query per transaction on a relational database.”

    My comment:

    Well, he’s saying with Oracle and System R, and only for reasons of performance. I think Postgres’s implementation of the SERIALIZABLE isolation level has reasonable performance, and in particular I’m having a hard time imagining what application couldn’t get adequate performance in 2010 with Postgres at SERIALIZABLE on SSD but could get adequate performance in 2000 with Oracle on spinning rust on any isolation level. It probably exists but I’m skeptical that it’s the usual case.

    Further elaboration not on Facebook:

    In my admittedly limited experience, the primary difficulty with SERIALIZABLE is not that it’s inefficient, but that you have to be careful to observe transaction discipline with it, because the database may abort your transaction when you try to commit it, because of a conflict (some other transaction wrote something you read). At this point, your application must be prepared to retry.

    In Postgres, unlike in Microsoft SQL Server, this does not cause any trouble with schema upgrade.

  15. Dan Weinreb Says:

    @J Chris A: Actually, it turns out I was wrong; see the addendum. If you computed the sum using a single SQL statement, which anyone would, you do get proper isolation and cannot see an inconsistent view. Cassandra, and I think CouchDB, have transactions that are truly ACID but can only consist of one operation, or at least operations that do not span multiple key-value pairs. There’s nothing wrong with having that tradeoff; I do not mean to be saying anything bad. Engineering tradeoffs are something all of us do, all the time.

    @Clay Lenhart: You’re saying that SQL Server’s Snapshot isolation gives you total isolation, without ever causing other transactions to wait or to need to abort? I have never heard of a way to achieve that. One can use MVCC to make everything consistent. The cost is having to keep around old versions, which in certain cases of longish transactions can be problematic. But you still have write locks and write-write conflicts.

  16. Dan Weinreb Says:

    @PA: My point is what I said it was: Oracle users use READ COMMITTED, which is not truly isolated, and application programs have to take this into account, whereas with real ACID they do not.

    @Rob: Most database books are pretty bad about defining “consistency”. I was using the definition I like the best, which is from the 1985 ACM Comp. Surv. paper by Hector Garcia-Molina et. al. What you are talking about is an “integrity check”. There are some tests that the DBMS can to do look for certain inconsistencies. There does not seem to be a consensus about what “consistent” means, as far as I can tell.

    @Kragen: Well, applications in practice always need to be prepared to retry. Even with Oracle’s READ COMMITTED, write-write conflicts are handled the usual way and ordinary deadlocks can happen, causing aborts.

Leave a Reply