NoSQL Storage Systems Never Violate ACID. Never? Well, Hardly Ever!
Everybody agrees that the new “NoSQL” storage systems “aren’t ACID”, or “don’t have transactions”. This is true <i>in a sense</i>, but without knowing the sense, it doesn’t tell you much.
In one sense, they <i>do</i> have transactions that are limited to having one operation per transaction. One operation could mean reading, writing, incrementing, or doubling the value associated with a particular key. For example, look at an “insert” operation in a key/value store. An operations acts on only one data object. Are these single-operation transactions ACID? Let’s check each criterion:
A means “atomic”: either all the operations happen, or none of them happens. Well, there’s only one operation. The key-value store <i>does</i> guarantee that either the insert happens, or it doesn’t. So the transaction atomic.
C means “consistent”. In relational database systems, people use this to mean that various interesting consistency guarantees are maintained. But here, we don’t have to worry about such things as referential integrity, since there are no references to have integrity; that is, there are no foreign keys. So it’s consistent.
I means “isolated”: concurrency is never seen by the application. The system behaves as if each operation happened at a particular, distinct moment in time. The key-value stores all make this guarantee.
D means “durable”: before the application is told that the transaction has been completed successfully (i.e. committed), any side-effects it does are in stable storage so that if a node stops (such as a crash of a process or a whole node) won’t lose the results of the side-effects. Here, a transaction is only one operation, but that doesn’t change anything: the system does provide “durability”. (Some systems might cheat by not actually forcing data to stable storage, but we’re not talking about those.)
So it appears to be ACID! OK, something has <i>got</i> to be wrong here, right?
Right. Where I tried to pull the wool over your eyes is the definition of “C”. “C” doesn’t just mean conforming to the databases integrity constraints. It means that the system returns the correct answer! That is, response to any operation is consistent with some state that the database could be in. There’s more than one such state when there are concurrent operations going on, which might be ordered in more than one way, depending on how the concurrency system works. So it’s clearer to think of “C” as meaning “correct”. (In the famous Gilbert and Lynch paper that “proves the CAP theorem”, that’s what they mean by “C”.)
The “NoSQL” storage systems are guaranteed return the correct answer <i>only</i>if there are no partitions in the network. But if there are (or were, e.g. at write time) partitions, they can return things like “two replicas say the value is X, but another replica says that the answer is Y”, and the application has to try to make sense of and cope with that. That is <i>not</i> “C”. This is usually called “eventually consistency”: if the partitions were to eventually heal and the system deferred accepting new operations until all the in-progress operations finished, and something went over the whole database to fix up any inconsistencies that happened during writes, then the system would become fully consistent, and would be behave correctly until the next partition.
that there are at least two nodes that cannot send messages between each other. It’s important to know that if a node in your your system is down, that’s considered a partition: it’s as if this node were disconnected from the network.
The “NoSQL” systems are ACID, as long as you accept that a transaction can only perform one operation, in the sense that the only thing that gets in the way of being ACID is when there are network partitions and the system is called upon to perform operations while the partition is still there.
“Partition” is a somewhat slippery concept that I will examine in an upcoming separate essay. But the basic ides is that a it means that there are at least two nodes that cannot send messages between them. It’s important to know that if a node in your your system is down, that’s considered a partition: it’s as if this node were disconnected from the network.
This also shows that the name “NoSQL” doesn’t explain everything that’s important about these systems. But you can’t pack a whole lot into a short, punchy name, so I’m not really complaining. ( do the same thing with the names of my blog essays; <i>mea culpa<i>. You just have to keep in mind that the lack of SQL is not the only important thing.
September 7th, 2010 at 11:13 am
“(Some systems might cheat by not actually forcing data to stable storage, but we’re not talking about those.)”
I haven’t been watching this NoSQL stuff extremely closely, but wasn’t part of the idea that these in-memory key-value stores could be extremely fast exactly because they didn’t force data to disk alla time? Only often enough to be stable enough (“D” enough) while still being very fast. Is that actually just a side-show in the NoSQL meme-complex, rather than a central part?
September 7th, 2010 at 11:38 am
[...] Dan Weinreb’s blog » Blog Archive » NoSQL Storage Sy… – <5> [...]
September 7th, 2010 at 9:59 pm
@David: Well, if they do claim to be in-memory, and they don’t store to the disk at all or until sometime later (“D enough”, they ought to make that clear. If they do (e.g. memcached), then fine; it’s just not in the scope of this discussion. One should compare apples with apples, or else know that the other person using a cherry instead.
September 8th, 2010 at 10:55 am
In my mind, there have always been two interesting properties of data stores:
* Atomicity / correctness of single operations
* Atomicity / correctness of multiple operations
The former is the domain of the original BDB and MySQL as well as the first round of NoSQL databases that have become popular. Sadly, most of these systems then try to tackle the latter, and I think that’s a mistake. Having multiple, consistent, atomic operations is a very different problem domain for which it makes sense to sacrifice a great deal of resources and expand complexity, but that’s not the case for all problems.
I’d like to see the current round of NoSQL databases not fall into the same trap that BDB and MySQL did….
September 9th, 2010 at 10:00 am
NoSQL Storage Systems Never Violate ACID. Never? Well, Hardly Ever!…
Interesting read from Dan Weinreb’s blog, NoSQL dbs are ACID for small values of ACID… Everybody agrees that the new “NoSQL” storage systems “aren’t ACID”, or “don’t have transactions”. This is true in a sense, but without knowing the sense, it does…
September 9th, 2010 at 6:02 pm
I’m a little confused here. Every major SQL database in its default configuration is not ACID compliant. Most of them are not even close. As far as I know, many major SQL databases cannot guarantee ACID under any configuration. Since SQL databases are far more pervasive than NoSQL databases, that seems worth at least a mention, no?
The truth is, ACID is not really necessary in the vast majority of systems. We know that because most of those systems are based on non-ACID compliant SQL databases and the world hasn’t ended yet. The fact that many of those systems’ designers thought they were using ACID compliant databases is neither here nor there.
September 10th, 2010 at 7:57 pm
@Turbulence: You are absolutely right. In the mid-90′s, when ObjectStore worked with IBM, we got to talk to the high experts at Almaden, who know more than anyone else about the subject, and did the heavy theory stuff for DB/2. They told us that nobody uses ACID, since it would be much too slow. They told as that what was widely used was “cursor locking”, in which you have a cursor that moves over the rows of a table, and only the row you’re looking at gets locked! As for Oracle, the documentation is very clear that you must not use the ACID isolation level (in fact, they explain that if you do, you can get this weird error message sometimes). Oracle’s default isolation mode is so peculiar that we had to build our OWN ACID layer on top of what Oracle gives us! I think the point you made is not nearly enough brought up in these conversations.
September 10th, 2010 at 7:59 pm
@Aaron: Are you sure that the InnoDB storage engine for MySQL does not provide atomic transactions with many operations? Or maybe you meant the word “original” to distribute over the “and”, so to speak, although I was told that the very early, original MySQL with the MyISAM storage engine didn’t even do as much as you’re saying. But I have never used MySQL and can hardly call myself an expert; I could easily be all wrong about this.
September 23rd, 2010 at 4:00 pm
It seems to me that memory based NoSQL systems that support transactions would be limited by many things, chielfy among them, memory. At the beginning of the post, it was mentioned that noSQL systems infact support atomicity of single operations (its either there or its not). Would it make sense to have NoSQL systems support short lived small transactions (>1 operation but not long lived transactions)
What are some of the use cases for long lived transactions with either SQL or NoSQL?
March 11th, 2011 at 11:38 pm
NoSQL Storage Systems Never Violate ACID. Never? Well, Hardly Ever!…
Everybody agrees that the new “NoSQL” storage systems “aren’t ACID”, or “don’t have transactions”. This is true in a sense, but without knowing the sense, it doesn’t tell you much….