Archive for the ‘Software Engineering’ Category

What Does the Proof of the “CAP theorem” Mean?

Monday, July 12th, 2010
news and informationbusiness,health,entertainment,technology automotive,business,crime,health,life,politics,science,technology,travel

Several years back, Eric Brewer of U.C. Berkeley presented the “CAP conjecture”, which he explained in these slides from his keynote speech at the PODC conference in 2004. The conjecture says that a system cannot be consistent, available, and partition-tolerant; that is, it can have two of these properties, but not all three. This idea has been very influential.

Seth Gilbert and Nancy Lynch, of MIT, in 2002, wrote a now-famous paper called “Brewer’s Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Services”. It is widely said that this paper proves the conjecture, which is now considered a theorem. Gilbert and Lynch clearly proved something, but what does the proof mean by “consistency”, “availability”, and “partition-tolerance”?

Many people refer to the proof, but not all of them have actually read the paper, thinking that it’s all obvious. I wasn’t so sure, and wanted to get to the bottom of it. There’s something about my personality that drives me to look at things all the way down to the details before I feel I understand. (This is not always a good thing: I sometimes lose track of what I originally intended to do, as I “dive down a rat-hole”, wasting time.) For at least a year, I have wanted to really figure this out.

A week ago, I came across a blog entry called “Availability and Partition Tolerance” by Jeff Darcy. You can’t imagine how happy I was to find someone who agreed that there is confusion about the terms, and that they need to be clarified. Reading Jeff’s post inspired me to finally read Gilbert and Lynch’s paper carefully and write these comments.

I had an extensive email conversation with Jeff, without whose help I could not have written this. I am very grateful for his generous assistance. I also thank Seth Gilbert for helping to clarify his paper for me. I am solely responsible for all mistakes.

I will now explain it all for you. First I’ll lay out the basic concepts and terminology. Then I’ll discuss what “C”, “A”, and “P” mean, and the “CAP theorem”. Next I’ll discuss “weak consistency”, and summarize the meaning of the proof for practical purposes.

Basic Concepts

The paper has terminology and axioms that must be laid out before the proof can be presented.

A distributed system is built of “nodes” (computers), which can (attempt to) send messages to each other over a network. But the network is not entirely reliable. There is no bound on how long a message might take to arrive. This implies that a message might “get lost”, which is effectively the same as taking an extremely long time to arrive. If a node sends a message (and does not see an acknowledgment), it has no way to know whether the message was received and processed or not, because either the request or the response might have been lost.

There are “objects”, which are abstract resources that reside on nodes. Objects can perform “operations” on other objects. Operations are synchronous: some thread issues a request and expects a response. Operations do not request other operations, so they do not do any messaging themselves.

There can be replicas of an object on more than one node, but for the most part that doesn’t affect the following discussion. An operation could “read X and return the value”, “write X”, “add X to the beginning of a queue”, etc. I’ll just say “read” for an operation that has no side-effects and returns some part of the state of the object, and “write” to mean an operation that performs side-effects.

A “client” is a thread running on some node, which can “request” an object (on any node) to perform an operation. The request is sent in a message, and the sender expects a response message, which might returns a value, and which confirms that the operation was performed. In general, more than one thread could be performing operations on one object. That is, there can be concurrent requests.

The paper says: “In this note we will not consider stopping failures, though in some cases a stopping failure can be modeled as a node existing in its own unique component of a partition.” Of course in any real distributed system, nodes can crash. But for purposes of this paper, a crash is considered to be a network failure, because from the point of view of another node, there’s no way to distinguish between the two. A crashed node behaves exactly like a node that’s off the network.

You might say that if a node goes off the network and comes back, that’s not the same as a crash because the node loses its volatile state. However, this paper does not concern itself with a distinction between volatile and durable memory.  There’s no problem with that; issues of what is “in RAM” versus “on disk” are orthogonal to what this paper is about.

Consistent

The paper says that consistency “is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.” They explain this more explicitly by saying that consistency is equivalent to requiring all operations (in the whole distributed system) to be “linearizable”.

“Linearizability” is a formal criterion presented in the paper “Linearizability: A Correctness Condition for Concurrent Objects”, by Maurice Herlihy and Jeannette Wing. It means (basically) that operations behave as if there were no concurrency.

The linearizability concept is based a model in which there is a set of threads, each of which can send an operation to an object, and later receive a response. Despite the fact that the operations from the different threads can overlap in time in various ways, the responses are as if each operation took place instantaneously, in some order. The order must be consistent with each thread’s own order, so that a read operation in a thread always sees the results of that thread’s own writes.

Linearizability per se does not include failure atomicity, which is the “A” (“atomic”) in “ACID”. But Gilbert and Lynch assume no node failures. So operations are atomic: they always run to completion, even if their response messages get lost.

So by “consistent” (“C”), the paper means that every object is linearizable. (That’s not what the “C” in “ACID” means, by the way, but that’s not important.) Very loosely, “consistent” means that if you get a response, it has the right answer, despite concurrency.

This is not what the “C” in “ACID transaction” means. It’s what the “I” means, namely “isolation” from concurrent operations. This is probably a source of confusion sometimes.

Furthermore, the paper says nothing about transactions, which have would have a beginning, a sequence of operations, and an end, which may commit or abort. “ACID” is talking about the entire transaction. The “linearizability” criterion only talks about individual operations on objects. (So the whole “ACID versus BASE” business, while cute, can be misleading.)

Available

“Available” is defined as “every request received by a non-failing node in the system must result in a response.” The phrase “non-failing node” seemed to imply that some nodes might be failing and others not. But since the paper postulates that nodes never fail, I believe the phrase is redundant, and can be ignored. After the definition, the paper says “That is, any algorithm used by the service must eventually terminate.”

The problem here is that “eventually” could mean a trillion years. This definition of “available” is only useful if it includes some kind of real-time limit: the response must arrive within a period of time, which I’ll call the maximum latency.

Next, it’s very important to notice that “A” says nothing about the content of the response. It could be anything, as far as “A” is concerned; it need not be “successful” or “correct”. (If think otherwise, see section 3.2.3.)

So “available” (“A”) means: If a client sends a request to a node, it always gets back some response within L time, but there is no guarantee about contents of the response.

Partition Tolerant

There is no definition, per se, of the term “partition-tolerant”, not even in section 2.3, “Partition Tolerance”.

First, what is a “partition”? They first define it to mean that there is a way to assort all the nodes into separate sets, which they call “components”, and all messages sent from a node in one component to another nodes in a separate component are lost. But then they go on to say “And any pattern of message loss can be modeled as a temporary partition separating the communicating nodes at the exact instance the message is lost.” or their formal purposes, “partition” simply means that a message can be lost. (The whole “component” business can be disregarded.)  That’s probably not what you had in mind!

In real life, some messages are lost and some aren’t, and it’s not exactly clear when a “partition” situation starts, is happening, or ends. I realize that for practical purposes, we usually know what a partition means, but if we’re going to do formal proofs and understand what was proved, one must be completely clear about these terms.

Even in a local-area network, packets can be dropped. Protocols like TCP re-transmit packets until the destination acknowledges that they have arrived. If that happens, it’s clearly not a network failure from the point of view of the application. “Losing messages” must have something to do with nodes entirely unable to communicate for a “long” time compared to the latency requirements of the system.

Furthermore, remember that node failure is treated as a network failure.

So “partition-tolerant” (“P”) means that any guarantee of consistency or availability is still guaranteed even if there is a partition. In other words, if a system is not partition-tolerant, that means that if the network can lose messages or any nodes can fail, then any guarantee of atomicity or consistency is voided.

CAP

The CAP theorem says that a distributed system as described above cannot have properties C, A, and P all at the same time. You can only have two of them. There are three cases:

AP: You are guaranteed get back responses promptly (even with network partitions), but you aren’t guaranteed anything about the value/contents of the response. (See section 3.2.3.) A system like this is entirely useless, since any answer can be wrong.

CP: You are guaranteed that any response you get (even with network partitions) has a consistent (linearizable) result. But you might not get any responses whatsoever. (See section 3.2.1.) This guarantee is also completely useless, since the entire system might always behave as if it were totally down.

CA: If the network never fails (and nodes never crash, as they postulated earlier), then, unsurprisingly, life is good. But if messages could be dropped, all guarantees are off. So a CA guarantee is only useful in a totally reliable system.

At first, this seems to mean that practical, large distributed systems (which aren’t entirely reliable) can’t make any useful guarantees! What’s going on here?

Weak Consistency

Large-scale distributed systems that must be highly available can provide some kind of “weaker” consistency guarantee than linearizability. Most such systems provide what they call “eventual consistency” and may return “stale data”.

For some applications, that’s OK. Google search is an obvious case: the search is already specified/known to be using “stale” data (data since the last time Google looked at the web page), so as long as partitions are fixed quickly relative to the speed of Google’s updating everything, (and even if sometimes not, for that matter), nobody is going to complain.

Just saying that results “might be stale” and will be “eventually consistent” is unfortunately vague. How stale can it be, and how long is “eventually”? If there’s no limit, then there’s no useful guarantee.

For a staleness-type weak consistency guarantee, you’d like to be able to say something like: “operations (that read) will always return a result that was consistent with all the other operations (that write) no longer ago than time X”. And this implies that “write” operations are never lost, i.e. always happen within a fixed time bound.

t-Connected Consistency

Gilbert and Lynch discuss “weakened consistency” in section 4. It’s also about stale data, but with “formal requirements on the quality of stale data returned”. They call it “t-Connected Consistency”.

It makes two assumptions. (a) Every node has a clock that can be used to do timeouts. The clocks don’t have to be synchronous with each other. (b) There’s some time period after which you can assume that an unanswered message must be lost. (c) Every node processes a received message within a given, known time.

The real definition of “t-Connected Consistency” is too formal for me to explain here (see section 4.4). It (basically) guarantees (1) when there is no partition, the system is fully consistent; (2) if a partition happens, requests can see stale data; and (3) and after the partition is fixed, there’s a time limit on how long it takes for consistency to return.

Are the assumptions OK in practice? Every real computer can do timeouts, so (a) is no problem. You can always ignore any responses to messages after the time period, so (b) is OK. It’s not obvious that every system will obey (c), but some will.

I have two reservations. First, if the network is so big that it’s never entirely working at any one time, what would guarantee (3) mean? Second, in the algorithm in section 4.4, in the second step (“write at node A”), it retries as long as necessary to get a response. But that could exceed L, violating the availability guarantee.

So it’s not clear how attractive t-Connected Consistency really is. It can be hard it is to come up with formal proofs of more complicated, weakened consistency guarantees. Most working software engineers don’t think much about formal proofs, but don’t underrate them. Sometimes they can help you identifying bugs that would otherwise be hard to track down, before they happen.

Jeff Darcy wrote a blog posting about “eventual consistency” about a half year ago, which I recommend. And there are other kinds of weak consistency guarantees, such as the one provided by Amazon’s Dynamo key-store, which worth examining.

Reliable Networks

Can’t you just make the network reliable, so that messages are never lost? (“Never” meaning that the probability of losing a message is as low as other failure mode that you’re not protecting against.)

Lots and lots of experience has shown that in a network with lots of routers and such, no matter how much redundancy you add, you will experience lost messages, and you will see partitions that last for a significant amount of time. I don’t have a citation to prove this, but, ask around and that’s what experienced operators of distributed systems will always tell you.

How many routers is “lots”? How reliable is it if you have no routers (layer 3 switches), only hubs (layer 2 switches)? What if you don’t even have hubs? I don’t have answers to all this. But if you’re going to build a distributed system that depends on a reliable network, you had better ask experienced people about these questions. If it involves thousands of nodes and/or is geographically distributed, you can be sure that the network will have failures.

And again, as far as the proof of the CAP theorem is concerned, node failure is treated as a network failure. Having a perfect network does you no good if machines can crash, so you’d also need each node to be highly-available in and of itself. That would cost a lot more than using “commercial off-the-shelf” computers.

The Bottom Line

My conclusion is that the proof of the CAP theorem means, in practice: if you want to build a distributed system that is (1) large enough that nodes can fail and the network can’t be guaranteed to never lose messages, and (2) you want to get a useful response to every request within a specified maximum latency, then the best you can guarantee about the meaning of the response is that it is guaranteed to have some kind of “weak consistency”, which you had better carefully define in such a way that it’s useful.

P.S.

After writing this but just before posting it, Alex Feinberg added a comment to my previous blog post with a link to this excellent post by Henry Robinson, which discusses many of the same issues and links to even more posts. If you want to read more about all this, take a look.

How to be a Programmer

Saturday, July 10th, 2010
news and informationbusiness,health,entertainment,technology automotive,business,crime,health,life,politics,science,technology,travel

If you are, or want to be, a programmer, you should read “How to be a Programmer: A Short, Comprehensive, and Personal Summary” by Robert L. Read. You can read it free, as PDF or HTML, or buy it as a paperback book for around $12.

It’s 41 pages long (plus appendixes including the GPL), so it’s not all that short. But it’s concise, simple, direct, and packed with extremely useful information that you won’t find anywhere else.

It includes some excellent advice about programming itself, but much more than just how to write code. For example:

  • How to Deal with Intermittent Bugs
  • How to Work with Poor Code
  • How to Deal with Difficult People
  • How to Stay Motivated
  • How to Disagree Honestly and Get Away with It
  • How to Know When to Apply Fancy Computer Science
  • How to Handle Boring Tasks
  • How to Gather Support for a Project
  • How To Tell People Things They Don’t Want to Hear

Based on my own experience, everything he says hits the nail right on the head. I do not have a single major disagreement with anything he says. (I have a few quibbles, and a very few points are dated, as it was written in 2003.)

Here are a few pieces of advice, so you can see what kind of thing he talks about:

  • When debugging, you may see things nearby that need improvement, but don’t fix those at the same time. (I make this mistake often; it’s just so tempting.)
  • If your system has redundant components so that if degrades gracefully, it’s especially important to have monitors that let you know that this is going on.
  • The way you interview a programmer will affect how much he or she wants to work for your company.
  • At the end of doing a job interview, sell your company. But you are talking to a programmer, so don’t color the truth. Start off with the bad stuff, then finish strong with the good stuff.

Everyone has missed deadlines because they neglected to include all the factors that contribute to the time it takes to complete a task. Here’s a checklist compiled from various sections of the essay, that you might want to use:

  • Documentation
  • Unit testing
  • Integration testing
  • Demos
  • Planning itself
  • Internal team meetings
  • Communicating with other groups
  • Dealing with outsiders
  • Vacation time
  • Sick time
  • Mandatory company-wide training seminar
  • Scheduled periodic activities
  • Maintenance of existing products
  • Maintenance of the development environment
  • Some failure
  • What part will each which individual work on
  • Problems that cannot presently even be conceptualized

Make the time in your schedule to read this!

Coders At Work, by Peter Seibel

Sunday, November 15th, 2009
news and informationbusiness,health,entertainment,technology automotive,business,crime,health,life,politics,science,technology,travel

In his new book, Coders at Work, Peter Seibel interviews some of the best software developers in the world, asking how they work, what practices they follow, how they learned, and what advice they can offer. Because Peter is, himself, an experienced senior software developer, he knows most relevant questions to ask, the ones that have to do with how real programmers do their work. He engages in a real back-and-forth conversation rather than just presenting a questionairre. You feel like you’re sitting there with them, as he asks all the same questions you’d want to ask if you were there yourself.

How do you you find the best programmers? He ran his own little contest: he got a lot of nominations, and people voted. I am confident that this worked, because a lot of the people he interviewed are people I know to be among the best. I know Guy L. Steele Jr and L Peter Deutsch, and consider them two of the very best in the world. Most of the others I have heard of.

Peter is also the author of Practical Common Lisp, the best book to read if you want to write real programs in Common Lisp. His understanding of the language and its deep concepts are second to none. He has also done advanced software development at several companies, including BEA Systems and Kenamea. Having that kind of experience lets him ask probing and relevant questions that reveal what’s really interesting about how the interviewees work and think. There’s no other book like it.

Programming with Concurrency

Monday, July 27th, 2009
news and informationbusiness,health,entertainment,technology automotive,business,crime,health,life,politics,science,technology,travel

New high-speed computers will have more and more cores as the years go by, and the ramp-up has started and is going very quickly.  To take advantage of those processors, some programs will need to use interesting (complicated and novel) concurrency.

But the history of concurrent software is littered with approaches that just turned out to be too hard to use, and the software was slow to develop and very hard to debug.  Now that we’re all in the same boat, how do we solve the software problem?

Many language designers think that the answer lies in pure (side-effect free) programming.  The best known, and quite practical, languages that are pure are Haskell and Erlang.

But many new languages are arriving based on the idea that you should use mostly side-effect-free code, and then when side-effects are needed, use transactions.  This is at least a trend if not a movement or revolution.

When Guy Steele came back from the JAOO Conference, I asked him for a quick report, and he sent me this (very slightly copy edited, used with Guy’s permission):

I was stunned by the end of the first day of JAOO 2008 when I realized that Anders Hejlsberg had given a plenary talk on C#, I had given a talk on Fortress, Bill Venners had given a talk on Scala, and Erik Meijer had given a talk on functional programming, and we had all delivered approximately the same message to this object-oriented crowd: the multicores are coming—no, they’re here—and the only plausible way to deal with them in the long run is to rein in the side effects inherent to the OO point of view and move as much as possible to a functional programming style with mostly-immutable data structures and implicit parallelism.

I am very excited by the new Clojure language, which is a dialect of Lisp based on exactly these same principles.  Rich Hickey apparently wasn’t at JAOO, but would have found friends there!

Normally I don’t try to learn a language unless I’m about to actually program in it.  But it’s worth learning a language when you pick up fundamental new ideas that might be helpful (or just interesting).  Haskell is like that (thanks, Alan Bawden, for letting me know).

If you might have to write highly-concurrent programs in the future, I recommend that you keep your eyes on all this.

Why Did M.I.T. Switch from Scheme to Python?

Sunday, May 10th, 2009
news and informationbusiness,health,entertainment,technology automotive,business,crime,health,life,politics,science,technology,travel

I’ve been seeing mail and blog postings, particularly from people in the Lisp community, why MIT has switched from using Scheme to Python in the freshman core curriculum for the department of Electrical Engineering and Computer Science.

At the International Lisp Conference, Prof. Gerry Sussman gave a short impromptu talk explaining the new freshman curriculum.  Just to get a second opinion, I later called Prof. Jacob White, one of the designers of the curriculum and lecturers for the core courses.  (Digression: Jacob and I have been close friends since I was six years old!)  He confirmed Gerry’s description.

Asking why they changed languages is, in some sense, the wrong question.

The freshman software engineering course, since 1985, has been based on the book Structure and Interpretation of Computer Programs (known as SICP), which uses Scheme.  The course is now nearly thirty years old.  Engineering has changed quite a lot in thirty years.  Since 1995, Gerry and his co-author Prof. Hal Abelson have advocated changing the freshman curriculum radically, not basing it on SICP.

In 1980, computer engineering was based on starting with clearly-defined things (primitives or small programs) and using them to build larger things that ended up being clearly-defined.  Composition of these fragments was the name of the game.

However, nowadays, a real engineer is given a big software library, with a 300-page manual that’s full of errors.  He’s also given a robot, whose exact behavior is extremely hard to characterize (what happens when a wheel slips?). The engineer must learn to perform scientific experiments to find out how the software and hardware actually work, at least enough to accomplish the job at hand.  Gerry pointed out that we may not like it this way (“because we’re old fogies”), but that’s the way it is, and M.I.T. has to take that into account.

The new approach also has the big advantage that it combines computer science with electrical engineering, whereas the old one taught them as entirely separate disciplines.  This way, students see how they interrelate.  Also, as Jacob points out, some of the same macro-principles apply to both software and hardware, and the students see this illustrated.  There is extensive lab work, making robots and mobile applications.

It just so happens that the robotics substrate software that comes with the system they’re using is programmed in Python.  Similarly, the mobile software environment is based on Python.  (Or, at least, the original plan was to use such a substrate, although it may have changed for various business reasons.)

Changing programming languages was absolutely not a goal of the curriculum change.  It was merely the result of the consequences of various decisions.  We can always discuss how it came to be that the robots and mobile devices are using Python instead of some other language, but that’s not the question being addressed here.  M.I.T. has nothing against Scheme. (And, of course, M.I.T. does teach classic software engineering, later in the curriculum.)

(Here’s another take on this topic.)

Reblog this post [with Zemanta]