Archive for the ‘Java’ Category

VoltDB versus NoSQL

Sunday, July 11th, 2010

Mike Stonebraker is the co-founder and CTO of VoltDB, which makes a novel on-line transaction processing (OLTP) relational database management system (RDBMS). He recently gave a talk entitled “VoltDB Decapitates Six SQL Urban Myths”. You can read the slides here. Much of the talk is a reply to the claims of the community building data stores often referred to as NoSQL data stores.

Todd Hoff of HighScalability has written an excellent commentary on the talk. If you want to understand what’s going on with VoltDB, you can’t do better than to read this (including the commentary, with some replies from VoltDB). I have a bit to add.

Benchmarking

Dr. Stonebraker’s talk includes benchmark results, which VoltDB ran much faster than MySQL and , a well-known NoSQL data store.

Over many years, I have found that what nearly everybody wants is a predictive “single number” that says how much faster one DBMS is than another. But applications differ hugely in their workloads, and measured speed depends tremendously on using the DBMS in the best way, including layout, clustering, indexing, partitioning, and all kinds of options, such as whether transactions are immediately made durable or not. Saying that one DBMS is “N times faster” than another DBMS is very misleading. But everyone wants the magic number, and are too quick to assume that the result of one benchmark predicts speed in all situations.

One must take into account that the VoltDB engineers wrote these micro-benchmarks, and ran them on a very specific workload, knowing what they were trying to prove. I do appreciate that they made a good-faith attempt to be fair, based on John Hugg’s comments above. And I can vouch that John is a very smart guy, and I believe all that he says in his comments above. Nevertheless, they did not bring in experts in the other systems who would could tune them optimally. Different benchmarks might be less flattering.

The old argument about assembly language versus high-level languages would be analogous if RDBMS optimizers worked as well as C/Java/etc compilers. SQL is supposed to be declarative: you just ask for what you want, and the RDBMS figures out the best way to get it. But my experience, and what my friends tell me, is that the optimizers in some popular RDBMS’s (especially Oracle) frequently make bad choices, and picking the wrong query strategy can slow things down by huge factors. So the developers are forced to override the optimizer with “hints”. It’s been over 30 years, and still the optimizers fail. Maybe it’s time to declare the experiment a failure. (This may not be an issue for VoltDB, as the SQL might be always be very simple or something.)

Stored Procedures

He’s right that performance can be hurt by too many round trips to the DBMS. But Oracle users have know for a long time that you have to use stored procedures to get high performance; this is nothing new. When you do this with Oracle, you end up with lots of PL/SQL code. Most of your developers can’t understand it, and it’s a proprietary language so you’re “locked in” to Oracle (it’s very hard to switch to a different DBMS).

It’s one thing to provide stored procedures as a way to improve performance. But VoltDB requires you to use stored procedures, and each interaction with VoltDB is a transaction. Any application that mixes database access with other operations that must be done on the client side cannot use VoltDB. The application has to be written in the VoltDB manner, from the beginning. This is like “lock-in” in some ways.

More about the VoltDB presentation

Todd says: “In contrast, the VoltDB model seems from a different age: a small number of highly tended nodes in a centralized location.” I don’t think this is right. For disaster recovery (e.g. blackouts), you need a replica far away; this has always been an integral part of VoltDB’s justification for not logging to disk. And then you have to worry about network partitions over a WAN. WAN’s are not yet supported in VoltDB.

I find Todd’s point about Amazon’s Dynamo very compelling: why would Amazon do so much work if partitions are so rare? At Amazon scale, partitions must be frequent enough to justify all this work. Not all VoltDB customers will be operating at that scale, but John Hugg has said that it’s designed for “Internet scale”. Dr. Stonebraker is right that there’s no substitute for actual measurement of how likely partition is.

Putting the burden on application programmers

Serious production databases are usually manged by database experts/administrators, who decide where to replicate what, whether and how to partition tables (across nodes), and so on.

But with VoltDB, the application developers have to understand a lot about this. For example, they need to know whether a procedure is single-partitioned, so they can assert that in the code. So they have to know about sharding, where replicas are, and so on. It makes the application brittle insofar as changes by the database administrators could break those assertions.

For example, a VoltDB engineer explained to a customer: “The goal of VoltDB is to optimize single-partition transactions and part of the responsibility for that falls on the application developer. You must write the queries to operate properly within a single partition and then declare the procedure to be single-partitioned. [...] Today, VoltDB does not verify that the SQL queries within a single-partitioned procedure are actually single-partitioned.” Another VoltDB engineer said: “The vast majority (almost 100%) of your stored procedure invocations must be single-partition for VoltDB to be useful to you.”

Different “NoSQL” systems also put such burdens on application programmers to greater or lesser degrees, as well. RDBMS’s have traditionally boasted that they hide these issues from application programmers. VoltDB uses SQL, but what it provides is very different from the original concept of the relational model.

What is a “SQL” database system?

You can see more of this in their “VoltDB do’s and don’ts list” Perhaps the most important point is the first “Don’t”: “Don’t use ad hoc SQL queries as part of a production application.” Dr. Stonebraker’s talk is very much a defense of using SQL for OLTP, rather than the “NoSQL” models such as key-value stores. But what does the restriction against “ad hoc” queries mean?

The original fundamental claim of relational DBMS’s (as opposed to the previous generation, the CODASYL-type DBMS’s) is that you don’t have know the access pattern; you just say what you want in SQL, and the DBMS figures out how to do it. Applications keep working even if there are changes in the storage layout, indexing, and whatever else the DBMS uses. But, as a VoltDB engineer said, “Part of VoltDB’s underlying premise is that workloads are known in advance.”

Even though VoltDB uses SQL, maybe it isn’t as far from the “NoSQL” storage engines as one might think!

What Programming Language Do People Speak Well Of?

Friday, September 4th, 2009

I usually don’t write blog entries that are merely pointers to someone else’s blog entries, but I’m making an exception this time. A blogger named Lukas Biewald, in a blog called/of Dolores Labs, wrote an entry called The Programming Language With The Happiest Users.

He measured Twitter “tweets” that mention certain programming languages, and ascertained which were positive. I’m particularly interested because Lisp came in second place.

Interpreting this as “the programming langauge with the happiest users” depends on several tacit assumptions that seem dubious at best.  We don’t know that the people writing these comments are actually users.  The number of tweets sent about a language is not uncorreleated with the langauge; I bet there are fewer COBOL programmers using Twitter than Perl programmers.  Not everybody tweets about how much they like or dislike their langauge as much as everybody else. He knows this and mentions some of these problems at the end of the post, so I’m not saying this to criticize him.

Yes, the title of the blog post is sort of misleading, but written to get the attention of readers.  I cannot criticize him for that either, since I do the same thing.  Sometimes it backfires; a lot of people seem to have seen my post named “Why Did M.I.T. Switch from Scheme to Python” without getting my points, which were (1) they didn’t make a high-level decision to switch languages, but rather this fell out as an end consequence of decisions that had nothing to do with languages, and (2) this is only for the freshman core courses, not the whole curriculum.

It’s hard to draw any hard and meaningful and useful conclusions from this research, but I still find it interesting and entertaining.

SavaJe: What Happened?

Monday, March 30th, 2009

SavaJe was a company that did its own Java-based operating system for mobile phones, with its own operating system – “Java on bare metal”, almost a Lisp machine that way but on conventional hardware.  They did use C/C++ for low levels. This always sounded very interesting. Here’s what I was able to gather from some Web research.  I don’t have any direct knowledge; please send comments correcting any errors.

They released their SavaGe XE operating system in 2001.  It came with a secure browser, an email client, calendar, to-do list, contact list, MP3 music player, picture viewer, notepad editor, and games.  It supported device drivers for color displays, an external keyboard, Ethernet, wireless networking, and dialup networking. (That seems like a lot for a first release!)

They got lots of publicity and interest. James Gosling himself demoed it during his keynote at JaveOne 2006. It then ran on the “Jasper S20 mobile phone, made by Group Sense Limited PDA”.  At this point, over 700M phones had Java on them in one way or another, because of Java’s portability, familiarity to developers, and the built-in security (evidently the mobile phone vendors and carriers liked this a lot).  SavaJe supported a huge number of major libraries, including advanced 3D graphics, XML parsing, Mobile Media, etc, etc.

But in about October 2006, SavaJe evidently ran out of money, after having raised a total of $71M (!) in funding.  In April 2007, Sun Microsystems bought SavaJe.

Sun now uses the SavaJe technology in their JavaFX Mobile product, which came out early this year.  JavaFX Mobile claims to let you write “Rich Internet Applications” (RIA’s),  applications that can run on desktops, laptops, and every possible phone/handheld.  Sony, LG, and Sprint are on board, though not Apple.  It works with Google’s Android.  It’s apparently aimed at interactive applications with rich user interfaces, including animation, video, and so on.  It involves “JavaFX Script”, a declarative language that runs in the browser (if I understand correctly).

“With JavaFX 1.0 you’ll get a runtime, JavaFX Script, plug-ins to NetBeans 6.5 and Eclipse, and Adobe Systems’ Creative Suite version 3 and 4. The Adobe plug-ins let graphics artist create an asset and then wraps it in meta data so it can show up in the IDE with necessary attributes.” wrote The Register in Feb 2009.  Java is now on 2.1 billion mobile phones.

The primary competition is Microsoft’s Silverlight and Adobe’s Flex.  Has anyone tried JavaFX and seen how it compares?

Reblog this post [with Zemanta]

Complaints I’m Seeing About Java

Saturday, December 8th, 2007

At Object Design, I developed code in Java for over ten years, and I worked with Java more at BEA. I’d be happy to use it again. It has many strengths, as well as extensive libraries and some great tools. It’s a lot cleaner than many other popular languages.

Yet many people complain about it. I’ve been looking around the web seeing what the predominant complaints are. After filtering out the ones that no longer apply to the latest Java release, and the ones I don’t understand, and the ones that aren’t important enough to mention, I’ve come up with a list of current complaints that are interesting and have some validity. With each one, I’ve added some commentary. My comments are not deep; some are downright superficial. And they certainly reflect my own point of view, with which people can quite validly disagree.

  • Base types (int, float, etc.) are not objects. So they can’t be passed to many useful classes that take Object arguments, and they’re otherwise treated specially and differently, leading to non-uniformity. The situation has been greatly alleviated by auto-boxing and auto-unboxing, although that’s something of a kludge.
  • You can’t return more than one value from a method. If you want to, you have to return a little array (unless one value is an int and the other is a Person!) or an object of some special little class made just for this purpose. When I was helping Bill Joy and Guy L. Steele Jr. by reviewing drafts of the original Java Language Specification, I was originally upset that there was no way to do this. So I set out to find a small example program that obviously demanded such a feature, to convince them that multiple value returns must be added. I was unable to come up with one, and I could see that Java’s philosophy was to leave out things that are rarely used and not crucial, so finally didn’t say anything.
  • Java is call-by-reference (in the same sense as Lisp) for object parameters. There is no implicit copying, the way there is in C++ when you don’t use a “*” in the type. Some people like such implicit copying in some circumstances. One example I was given was passing an Iterator, so that the called would not mess up the state of the callers Iterator. Personally, I did not find that example convincing: sometimes you really do want the iterator to be advanced by the method you call, and sometimes you don’t. Explicit copying seems to me far superior to complicating such a basic thing as parameter passing.
  • Speaking of a called method messing up the state of an argument passed by a caller: for collections you can use Collections.unmodifiableList to make a read-only view of a List object to prevent the collection from being messed up. But that breaks the Liskov Substitution Principle: if the method’s parameter is declared to be a List, the method might be written to modify the List. An unmodifiable list is-not-a list. Should there be a Java interface for the read-only methods? Probably Josh Bloch has a well-thought-out discussion of this somewhere.
  • There is no multiple inheritance of implementation. That is, a class can only inherit from one other class. There are no “mixins”. In my own experience, multiple inheritance isn’t used all that often, but it’s not all that rare either. Multiple inheritance of interfaces is truly critical, but Java has that.
  • Java needs the Factory pattern because it doesn’t have polymorphic constructors as part of the language. In my opinion, many of the famous Design Patterns are conventions for extending the programming language, and so this is just a special case of that principle.
  • Getter and setter methods are not built into the language. I interpret this as being another place where the designers of Java wanted to keep things simple, at the cost of not adding syntactic sugar. C#, the Microsoft answer to Java, does have these: classes can have “properties”. But if Java had “properties”, the reflection API would have to be more complicated to represent them, and so on. Whether these should be part of the language is one of those things where reasonable people can easily differ. For example, Flavors had this and CLOS does not have this.
  • Variables declared in an outer class and referred to be an inner class must be declared “final”. This is a real shortcoming. It means that Java can’t do the most basic kind of “lexical scoping”. This would be even more egregious if Java programmers used more higher-level functions, but, as we’ll discuss, they usually don’t.
  • Assignment is denoted with an “=”, which is confusing because it looks like an equality predicate. Something else should have been used, like Algol’s “:=”. I have no strong opinion about this, per se, since I don’t even like this kind of lexical syntax especially, but obviously it was a goal to look like C++ at this level.
  • Java is case-sensitive. This is another issue about which people feel strongly either way, but no argument will persuade either side to change its mind.
  • There is no operator overloading, in the sense of C++. Now, hardly everybody thinks that operator overloading is a good thing! From my own point of view, it’s unfortunate that there are special “infix operators” that are so different from methods, in cases where the infix operators have function/method-like semantics (“+” does, “||” does not). Operator overloading would let them be treated more like methods. But it can be confusing, especially for beginners.
  • Integer overflow is entirely silent. This is bad.
  • When working with streams, you often have to create all these nested objects, with classes like BufferedInputStream. The underlying reasons for all this seem sensible but the result does make it hard to do easy things, and verbose too.
  • Checked exceptions stir up strong feelings in many people, some reasons being better than others. This is such a complex topic, and I am so interested in exceptions, that I propose to take it up in a later blog entry.
  • Many important library classes cannot be subclassed, including String and StringBuffer, because they are final. I am less convinced than some other people that it’s really so important to subclass these, but I’m not sure.
  • Arrays are not objects of any class except Object. You can’t subclass them and they hardly have any methods. There’s a class called Array full of static methods to do things with arrays. This leads to some non-uniformity, although in practice I don’t think this is a huge problem.
  • It’s very awkward to “use functions as objects”. For example, there isn’t an easy way to apply some function to every member 0f a collection. Imagine trying to translate this Lisp program into Java: “(defun compose (f g) (lambda (x) (funcall f (funcall g x))))”, i.e. take two functions, each of one argument, and return a new function that is the composition of those two functions. You have to use anonymous inner classes with all the types properly genericized, at the very least, which is prohibitively verbose.
  • Objections from Lisp and Scheme people: it’s all too verbose, there are no multimethods, expressions and statements should be the same thing, and there are no tail calls. There are no macros in the Lisp/Scheme sense; they could be added, as shown by Jonathan Bachrach and Keith Playford’s Java Syntactic Extender, but this hasn’t caught on, perhaps because the IDE’s would have to know about it. And, of course, Lisp/Scheme people don’t care for the lexical syntax.

There are probably other complaints. People love to complain about computer languages, which are all far from perfect. In the future, I’ll write a similar posting about Common Lisp.