Programming with Concurrency

New high-speed computers will have more and more cores as the years go by, and the ramp-up has started and is going very quickly.  To take advantage of those processors, some programs will need to use interesting (complicated and novel) concurrency.

But the history of concurrent software is littered with approaches that just turned out to be too hard to use, and the software was slow to develop and very hard to debug.  Now that we’re all in the same boat, how do we solve the software problem?

Many language designers think that the answer lies in pure (side-effect free) programming.  The best known, and quite practical, languages that are pure are Haskell and Erlang.

But many new languages are arriving based on the idea that you should use mostly side-effect-free code, and then when side-effects are needed, use transactions.  This is at least a trend if not a movement or revolution.

When Guy Steele came back from the JAOO Conference, I asked him for a quick report, and he sent me this (very slightly copy edited, used with Guy’s permission):

I was stunned by the end of the first day of JAOO 2008 when I realized that Anders Hejlsberg had given a plenary talk on C#, I had given a talk on Fortress, Bill Venners had given a talk on Scala, and Erik Meijer had given a talk on functional programming, and we had all delivered approximately the same message to this object-oriented crowd: the multicores are coming—no, they’re here—and the only plausible way to deal with them in the long run is to rein in the side effects inherent to the OO point of view and move as much as possible to a functional programming style with mostly-immutable data structures and implicit parallelism.

I am very excited by the new Clojure language, which is a dialect of Lisp based on exactly these same principles.  Rich Hickey apparently wasn’t at JAOO, but would have found friends there!

Normally I don’t try to learn a language unless I’m about to actually program in it.  But it’s worth learning a language when you pick up fundamental new ideas that might be helpful (or just interesting).  Haskell is like that (thanks, Alan Bawden, for letting me know).

If you might have to write highly-concurrent programs in the future, I recommend that you keep your eyes on all this.

16 Responses to “Programming with Concurrency”

  1. Linus Nordberg Says:

    | [...] the only plausible way to deal with them in the long run is to rein in the side effects inherent to the OO point of view and move as much data structures and implicit parallelism.

    What does that mean? I suppose that the first part means restraining side effects that you get from an object oriented way of modeling software but the second part? Move as much data structures as what? Move parallelism?

  2. Tim Daly Jr. Says:

    “…with them in the long run is to rein in the side effects inherent to the OO point of view and move as much data structures and implicit parallelism.”

    I don’t understand that part. Move as much data structures and implicit parallelism what?

  3. Michael Campbell Says:

    What do you mean by this?

    > … to rein in the side effects inherent to the OO point of view and move as much data structures and implicit parallelism.

    Move them… where? Something missing in translation/paraphrasing there, I think?

  4. Just A Coder Says:

    I think the last part of the final sentence is supposed to be:

    “and move as much *to* data structures and implicit parallelism [as possible].”

    (where I’ve added the “to” and the “[as possible]“). Only my guess, but that’s the way it made sense to me. I’d be happy to hear from Dan (and, if possible, from Guy) what was really intended.

  5. Dave Moon Says:

    It’s not that simple. The side-effect-free data structures are not really side-effect-free. To put it another way, they are not truly read-only, they are write-once. So they still need internal synchronization for that one write that initializes them. I am talking about Clojure but I think it applies to all the other languages too. You certainly gain something from this type of programming, but you don’t gain freedom from the hardware costs of synchronization. The main things you gain, in my not very informed opinion, are 1) freedom from amateur implementations of synchronization and 2) careful identification of where the visible side-effects are. Both of those make it easier to write more correct code, but maybe not faster code.

    Stepping back a bit, all this makes sense mainly if you believe in very fine-grained concurrency and automatic detection of parallelism. It is not clear that most programs, other than the many kinds of signal processing, have all that much parallelism in them just waiting to be discovered by the compiler or runtime.

    The other camp is coarse-grained concurrency with explicit boundaries between parallel threads. Here the approach should be to continue to use side-effects freely but to limit the visibility of side-effectable objects to one thread, except for special communication objects. Since synchronization is now rare its cost doesn’t matter so much. I am not aware of any languages that enforce limitation of visibility of side-effectable objects to the thread that created them, other than languages that effectively run each thread in its own address space, but there must be some.

    I agree that it is unlikely that the hardware designers are going to find a way to make single processing cores very much faster than they are today, so we software guys do seem to be stuck with having to use concurrency if we want to run faster.

  6. Leon P Smith Says:

    Erlang is not pure; it doesn’t have arbitrary mutable references, but message passing is a side effect. Using them, you can (inefficiently) emulate mutable references and introduce race conditions.

    Curiously though, race conditions seem to be more manageable in Erlang than shared state models of concurrency. The choice of not directly supporting a large class of traditional side effects in favor of message passing is a very interesting choice. Erlang’s syntax is a little overly quirky, but semantically I think it’s among the most profound contributions in the last 20 years.

  7. Rich Hickey Says:

    Hmm, I find myself disagreeing with Dave Moon, so I’ve probably misunderstood.

    For Clojure’s data structures, normal one-time initialization is of newly created objects in a single thread, so there is no other thread with which to synchronize. Thus I’m confused by “still need internal synchronization”. Even in the specialized parallel construction scenarios (e.g. parallel map on vectors), each sub-component of the data structure is allocated and written to by only a single thread and never written once published, so while there is workflow synchronization of course, there is none for the data itself. When you share an ‘immutable’ data structure with another thread, the memory will need to be made visible to the processor running that thread, but it being ‘read-only’ means that can be done so in the least cost way, and subsequent reads can be free of all but the most minimal memory coherence protocols.

    Perhaps we are dealing with different notions of synchronization, but often when talking about synchronization required for shared mutable memory we are talking about higher-level, higher cost things like CAS or locks. Neither of these is needed at any point (internally or externally) for the create-and-publish approach used in Clojure’s data structures. The lack of need for any higher-level synchronization for subsequent multithreaded reading is a big difference vs correctly reading any potentially mutable data structure.

    The fact that these persistent data structures end up being trees of immutable nodes yields many benefits, one being pervasive structural sharing. This means that making a copy for single-threaded or partitioned parallel multithreaded “mutation” is O(1). ‘Mutation’ creates newly allocated nodes, but can share freely with the source in the sparse case. When done, making the result immutable and persistent is also O(1). Even on modest (quad-core) hardware, a parallel implementation of map, when worst-case fully replacing, trounces a single threaded mutation in place using a mutable data structure, the latter having no sharing story other than “good luck with that”.

    Contrast this with any scheme for “single thread w/’private’ mutable memory”, given any sharing (and if there is no sharing there is no interesting problem, they are separate programs, or, functional spawn). You must serialize access, via ownership handoff (see Kilim), message queue (Erlang), or, at least, lock-while-copy. And a critical point in the last case is that if you ever share it, however infrequently, your in-owner-thread code will have to use synchronization constructs. The communication objects in all cases other than handoff are usually immutable anyway, so why not use them all the time if not prohibitive otherwise?

    Most important though, is that programming with “values” is vastly simpler than programming with memory. It may not currently be faster, but as the core counts climb it is likely to become so. I’m keeping my money on allocate -> initialize -> publish-immutably.

    As I finish this I’m realizing you probably were only talking about the costs of making memory visible to more than one core. Oh well, I’ve written the rant already :)

    I agree that thread/core/memory compartmentalization is going to become interesting and look forward to having the constructs to manage it. One language that formalizes the notion of ‘place’ is IBM’s X10 [2], although IIRC, an SMP is only a single place, the target being clustered HPC.

  8. rascunho » Blog Archive » links for 2009-07-27 Says:

    [...] Dan Weinreb’s blog » Blog Archive » Programming with Concurrency If you might have to write highly-concurrent programs in the future, I recommend that you keep your eyes on all this. (tags: danweinreb.org 2009 mes6 dia27 programming concurrency blog_post clojure trends *****) [...]

  9. Karl Krukow Says:

    I just wanted to let you know that Rich Hickey is coming to JAOO this year ;-) He is giving two talks: “an introduction to Clojure” and “The Clojure Concurrency Model”. I am particularly looking forward to the latter. Also, I will take the liberty and point out that you can get a 15% discount for JAOO registration:

    http://blog.higher-order.net/2009/07/13/jaoo-discount/

    Hope to see you there
    /Karl

  10. Dan Weinreb Says:

    @Karl, I would love to go to JAOO but I don’t think I can.

  11. Dan Weinreb Says:

    Rich, I don’t understand what Dave Moon means by “still need internal synchronization”, either. Perhaps he is talking about the level of abstraction below that seen by the programmer; but if so, I think that is beside the point. At the level of abstraction of the Clojure programmer, creation of a write-once data structure does not require any concurrency control to be done by the programmer.

    The “hardware cost of synchronizing” isn’t the problem being solved here. The problem being solved is bugs in programs that result from incorrect use of concurrency control mechanisms by programmers. That is, the issue is correctness rather than performance.

    When writing programs that have many threads operating in a shared space of memory/objects/whatever, experience has shown that unless the interactions between the threads can be kept very, very simple, it’s easy to introduce bugs that are hard to understand and hard to debug, particularly because the bugs are hard to reproduce. Various attempts to remedy this, such as the use of “monitors” or the Java “synchronized” construct, have not succeeded (at least in my opinion).

    To the extent that a program can be written without the need for concurrency control at all, these problems obviously cannot occur. If there are only “create” and “read” operations with no “update” operations (and the only “delete” operations being done by a GC), no concurrency control need be coded.

    When you do need concurrency control, the transaction concept is much easier to understand than locks or monitors. It has very simple and clear invariants. This is why database management systems have been using transactions for such a long time, and why so many modern file systems use transactions now.

    Now, it’s a separate issue where fine-grained concurrency control is applicable. The general idea is that if the way we’re going to get more processing power is by having many concurrent processors (e.g. cores), there needs to be some way to keep all those cores busy. In the case of the subsystem that I currently spend my time working on, it’s easy. We have a transaction processing system in which there are generally many concurrent requests. We simply run separate threads (either sharing threads within processes, or putting each thread in its own process; it doesn’t matter for purposes of this discussion), each thread handling a separate request. So we don’t need fine-grained concurrency control, for the most part. However, there are other applications in which finer-grained concurrency control is a good, or even clearly the best, model.

  12. Sundar Narasimhan Says:

    Interesting thread Dan — Knuth for one agrees with Dave Moon above.
    ….
    Donald: I don’t want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop

    Read the full article at
    http://www.informit.com/articles/article.aspx?p=1193856&ns=15105

  13. Dan Weinreb Says:

    I am very unaccustomed to disagreeing with Dave Moon. I had the privilege of working closely with him for at least twenty years, and although there have been a few times when he was actually wrong about something, they were very few and far between. To get some more insight on these issues, I turned to another friend who knows a great deal about this,and whom I hold in as high esteem as I do Dave Moon, namely, Guy Steele. He sent me this reply and gave me permission to publish it here:

    “I think you’re all right, because you’re talking about slightly different things. Moon is right that that when creating and initializing an immutable object, great care is required between the initialization and the publishing of the pointer to that object to other threads. This was a big problem with the first version of the Java memory model, for example; Bill Joy and I had overlooked some of the subtleties associated with object initialization. Moon is also right that this is best left to “professionals”, and I think everyone agrees that the high-level functional programming abstraction allows the fiddly details of keeping the hardware happy with memory-fence instructions or whatever to be buried where the application programmer never has to see them—but there is a (necessary) performance cost.”

  14. Dan Weinreb Says:

    I’d like to clarify the earlier comment from Sundar Narasimhan: the part starting with “Donald:” is a quotation from Donald Knuth. (I know that this was supposed to be clear from the context and typography, but I was confused so I thought other readers of the blog might be.) (Sundar is my boss at ITA, and is quite awesome, so I take this opinions extremely seriously!)

    The rest of Knuth’s answer is very interesting, too. He goes on to say:

    “…turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

    Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.[1]

    How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I’m wrong.

    I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.”

    I’m not so sure that these programs will have to be substantially rewritten every few years. The concept of threads does not seem to be quite that ephemeral.

    I have just started reading (on my Kindle!) the book “The Art of Multiprocessor Programming”, by Maurice Herlihy of Brown U. and Nir Shavit of Tel-Aviv U. and Sun Microsystems. There’s an extensive review at Amazon saying what all the chapters are and what’s covered. I’m guessing that what I hope to learn from this book will be valuable for many years. We’ll see, I guess.

  15. Dan Weinreb Says:

    In the Knuth interview that Sundar referenced about, Knuth refers to the “future demise of Moore’s Law”. I wonder if he is interpreting Moore’s law to say that the speed of instruction execution of CPU’s keeps going up at some rate. Actually it’s about the number of transistors on an integrated circuit.

    Here’s an interesting article that came out today in NetworkWorld quoting Intel CTO Justin Rattner saying that Moore’s Law has lots more time to run.

    http://www.networkworld.com/news/2009/092209-intel-cto-moores-law.html?source=NWWNLE_nlt_daily_am_2009-09-23

    Make your own judgment about how to balance the fact that this guy knows a lot about the topic but might have a vested interest in public perception. But the demise of silicon has been predicted for a LONG time. If you’re at least as old as me, you’ll remember when Josephson Junctions were going to take over from silicon; that was quite a long time ago. And Rattner is hardly the only person making this claim. Time will tell.

  16. Vadim Says:

    Re: Knuth’s alleged ignorance of True Meaning of Moore’s Law…

    As early as 2001 (ISBN-10: 157586326X), he said (right here, in
    Cambridge): “Moore’s Law certainly can’t go on indefinitely.”

    Re: “Sundar is my boss at ITA, and is quite awesome”.

    That was quite funny.